Speech Recognition for Translators
Ο Tiago Neto, ο οποίος συνεργάζεται με τον Kevin Lossner σε θέματα συστημάτων αναγνώρισης φωνής και αποτελεσματικότερης ενσωμάτωσής τους σε μεταφραστικά εργαλεία και τη μεταφραστική ροή εργασίας, πρόκειται να επισκεφθεί την Αθήνα την Παρασκευή 17 Ιουλίου για μια πολύ ενδιαφέρουσα παρουσίαση τεχνολογικής κατεύθυνσης σε συνεργασία με την ΠΕΕΜΠΙΠ και να μοιραστεί μαζί μας μυστικά για αύξηση της παραγωγικότητας και εξορθολογισμό της μεταφραστικής ροής εργασίας, που πραγματικά μας «λύνουν τα χέρια»! Μέχρι τότε, μπορείτε να διαβάσετε το παρακάτω για μια πρώτη γνωριμία με τον καλεσμένο μας.
by Tiago Neto
For the past few months I have been using voice recognition software for translating after an invitation from Kevin Lossner to try Mac OS/X Yosemite’s built-in dictation tool, which, after my first attempt, yielded pathetic results.
However, as the saying goes in Portugal, the “hardware is always right” and the fault was obviously my own, as a student under Professor David Hardisty, named Joana Bernardo, had achieved remarkable success with this.
So, I stuck with it for a little while longer and actually got to improve both my dictation and software features to the point where it has become my main ancillary tool for translation when paired up with my CAT tool of choice. In fact, the productivity boost it provided was only matched by my adoption of CAT tools shortly after I began translating.
At about the time I had this figured out, I wrote a post for Kevin Lossner’s Translation Tribulations blog, detailing how to spec a virtual machine for using the OS/X dictation feature under a Windows environment as the guest operating system. This combination of OS/X and virtual machine software allows you to use whatever CAT tool you want (or none at all) and the built-in speech recognition.
OS/X’s built-in speech recognition has one specific advantage over all other solutions that have been explored so far: by downloading the improved audio dictionaries, you make it possible to dictate without sending any data to a remote server, thus preserving confidentiality – an important feature if you work with materials requiring such handling.
However, this also poses some limitations, namely the apparent inability to add vocabulary, some minor capitalization issues that arise whenever one stops dictating midsentence, and the lack of application-specific commands outside the built-in selection that comes with OS/X for the applications bundled with said operating system.
Well, these can all be easily circumvented. Vocabulary can be added both manually and automatically by using a very simple procedure that’s doesn’t even require you to speak the word that you want to add – this made automated feeding of vocabulary a real possibility. As for the capitalization problems and the application-specific commands to be used in say, your CAT tool, the solution is exactly the same: for the capitalization, you merely create a verbal command that will change the capitalization status of highlighted text. Such commands exist in word processors and in CAT tools. As for the application-specific commands and, you can do it in exactly the same manner, by simply chaining a verbal command to a mapped keystroke.
For the mobile professionals…
People on the move, particularly those that do not require a high level of confidentiality on the documents they translate, have a whole new set of possibilities. This is because the most advanced speech recognition technology currently available is based on mobile platforms. While the world-famous Dragon Naturally Speaking product line by Nuance is often mentioned, it merely supports 7 languages. Nuance’s mobile API supports roughly 40, a similar number to the OS/X built in speech recognition feature.
The major difference between the OS/X implementation of the feature and the mobile-based solutions lies with the flow of data: unlike computer based solution offered by the improved dictation audio dictionaries, iOS built-in dictation and Nuance’s API call upon remote servers through an Internet connection in order to process the audio recorded by the device into text, according to a previously chosen language. This is where confidentiality issues potentially arise, and will certainly become a hot topic for discussion and improvement over the coming months.
So, when you’re working based on a mobile platform you have 2 options:
Option A – do as much as you can on your mobile platform and then pass it on to your CAT tool
Option B – use your mobile platform as an interface for your primary machine which is running your CAT tool, word processor, whatever.
Option A basically requires you to use the export features in CAT tools in order to open the document and dictate into it on your mobile device. There are some subtleties to this, but it can allow for a rather good level of productivity, and would definitely unchain you from your desk.
Option B is far more powerful. You can be somewhat unchained from your desk, as you no longer need to sit in front of the keyboard. In fact, depending on your choice of software, you may not even need to be in the same country as your computer. However, from the current selection of available software, the applications that present the best results also require you to manipulate the cursor to position the output of the speech recognition in the correct place.
Option A – working with CAT tools and data bilingual file formats
This option works with any CAT tool that allows you to export the bilingual file in a commonly recognized format, such as .docx files in Trados Studio or the bilingual .rtf files in memoQ.
The main disadvantage of these procedures lies with the way these software packages use tags (Trados), or in the way certain mobile platforms have deprecated the use of a universal file format (Apple dropped the support for RTF files in iOS devices). Of course, these can all be circumvented with a minimum of fuss.
Option B – using software to map speech recognition output as keystrokes
This is pretty much like going to the zoo – you’ll find plenty of similar animals of varying proficiency at their game, and you’ll certainly confuse yourself with some of them. I will discuss two that work and one that has pretty good potential once some minor difficulties are overcome.
MyEcho is an application that allows you to use your iOS built-in recognition feature to dictate at your leisure. It then captures text output returning from the server and inserts it on your PC at your cursor’s current position. It does that by running an iOS application (currently costing €1.99) and a free Windows based program.
These are paired up by a very simple procedure involving QR codes, and you can pair multiple machines to a single mobile device. It will also work inside a virtual machine. As you dictate, the text will appear in the Windows based program, and it will be automatically inserted at the cursor’s position.
MyEcho suffers from the same confidentiality limitations as previously described, and uses Apple’s speech recognition servers for the audio processing.
Virtual keyboards – you can find several virtual keyboard applications in the iOS App Store that will allow you to use your iOS device as a trackpad or keyboard for your Mac or PC. Just like MyEcho, they will require a program to interface with the computer, with the advantage that the pairing up is done on a local network basis – that is, only the audio data is being sent to a remote server in order to be traded by the text output.
The basics are fairly simple – you activate the virtual keyboard on your iOS device and entirely forego the use of the virtual keys for the very conveniently placed microphone icon that will allow you to bring up the dictation feature.
Of the several applications I’ve tried so far, Remote Keyboard+ has some of the most user‑friendly installation and startup procedures imaginable. You simply run a helper program on your computer, and it will broadcast a signal on through your Wi-Fi network, which will then be found by the mobile device’s application.
Remote Control & Dictation – The PC based lifesaver!
One final combination of contenders for the PC crowd (for any platform, really) is the combination of a remote control tool and Swype, with these two applications being run on a mobile device.
The example I’m using in the video below includes Teamviewer Remote Control for Android (free) and Swype (€0.75) to do just that, running off my mobile phone.
In fact, the whole process can be executed straight from a simple mobile phone, because the only interaction you’ll have with it is the pressing of the microphone button to start dictating your text.
Teamviewer Remote Control allows you to control a machine identified by a unique number, following the insertion of said number and randomly generated password. Once the computer and mobile device are paired up, you can drop off the phone in front of you and focus on the computer.
Teamviewer Remote Control does not prevent your use of the computer being controlled remotely, so you simply use your machine as you would. Whenever you wish to dictate, simply place the cursor where you wish the dictated text to appear, press the Dictate button on the mobile device and start dictating.
The mobile device will then receive the text output of the speech recognition and map it out as keystrokes, which are subsequently sent to the computer via Teamviewer, thus inserting the dictated text straight onto your screen, at the cursor position.
Afterthoughts
As you might have noticed so far, with the exception of dictation using the built in features of OS/X, all of these solutions require the use of two separate applications: one that enables the use of high quality speech recognition, and a second one to function as an interface between the speech recognition platform and the machine running the software where the speech recognition output is to be used.
There is a current, unmet need for a single application or at least a unified approach that allows you to not require platform hopping in order to get things done.
Current, highly desirable solutions include:
- A way to use Nuance’s technology available through their API or Swype natively into a PC platform – in essence, a hugely extended, online version of Dragon Dictation.
- A direct, mobile-based solution, not requiring a third party application to serve as an interface to the host machine for the CAT tool, word processor, etc.
- The ability to have these very same features with a degree of data confidentiality that allows their approval by clients for whom that is an essential requisite
- An improved interface that allows for simpler control of the dictation vocabulary and settings in OS/X
The first two items basically depend solely on developers creating such applications, as the tools and resources already exist and are readily available.
Item number three will doubtlessly be very high in the future changes for this technology. The technology is already being used in this age of the “Internet of things”, and just like machine translation and its confidentiality issues, this will soon be “regulated” or at least adapted in a way to make it acceptable from a confidentiality standpoint.
Item number four basically depends on improving the interface on OS/X.
Text to speech, or the other side of the coin
Both Dragon Dictation and OS/X or iOS allow for the conversion of text to speech (TTS), i.e. to have written text get read aloud in a given language.
Now, as translators, we’ve all had to deal with the nice and subtle ways our brain plays tricks on us when we review our handy work. Simple mistakes are often overlooked, simply because the eye/brain combination developed into a highly advanced pattern recognition mechanism that makes up for missing individual elements. When I review my own work, I print it out, and read it again, and again, and again with a red marker on my hand. And I always find new things on every review.
Where TTS comes into play it is as an ancillary tool for said process. You learn not to trust your eyes as they run over the lines on a computer screen, but what about your ears? Even better, what about your eyes AND your ears, simultaneously.
A current and very important part of my review processes includes this exact tool and combination of senses. As soon as I run a spelling and grammar check on my translations, the next stage is reading the document on the computer screen WHILE it is being read aloud by the machine.
Some coordination is required, but the audio aspect increases concentration on the review, and any abnormalities, such a missing comma, a misplaced letter that causes a spelling mistake to go unnoticed because it results in a valid word, an overly long sentence – all of these quickly become immediately apparent when the text is being read to you.
And the advantage of having it being read to you while you read it yourself at the same time then becomes obvious: when you found something that requires correction, you are already at the correct position in the text to implement any necessary changes!
This article was originally posted here.
Tiago Neto holds a degree in Veterinary Medicine from UTAD, Portugal. He worked for 10 years as an official veterinarian, with responsibilities in disease control and eradication, public health and food safety. He has also worked as a field veterinarian, working mostly with large animals, particularly horses. Working as a freelance translator for 5 years now, he specializes in medical and pharmaceutical translation, namely in oncology and immunology, the very same fields where he is now pursuing his PhD at the University of Porto.
As a technophile translator and being slightly OCD-like as far as working efficiency goes, he enjoys looking into new, more efficient workflows and tools.
When not translating, Tiago is usually found producing noise of greatly varying loudness on motorcycles or guitars.
You can visit Tiago’s website here and you can also find him on Facebook and LinkedIn.
Featured image via