May 23, 2019
Category archive


Google’s Translatotron converts one spoken language to another, no text involved

in Artificial Intelligence/Delhi/Google/India/machine learning/machine translation/Politics/Science/Translation by

Every day we creep a little closer to Douglas Adams’ famous and prescient babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speaker’s voice.

Translatotron, as the project is called, is the culmination of several years of related work, though it’s still very much an experiment. Google’s researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting.

Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isn’t perfect; Each step has types of errors it is prone to, and these can compound one another.

Furthermore, it’s not really how multilingual people translate in their own heads, as testimony about their own thought processes suggests. How exactly it works is impossible to say with certainty, but few would say that they break down the text and visualize it changing to a new language, then read the new text. Human cognition is frequently a guide for how to advance machine learning algorithms.

Spectrograms of source and translated speech. The translation, let us admit, is not the best. But it sounds better!

To that end researchers began looking into converting spectrograms, detailed frequency breakdowns of audio, of speech in one language directly to spectrograms in another. This is a very different process from the three-step one, and has its own weaknesses, but it also has advantages.

One is that, while complex, it is essentially a single-step process rather than multi-step, which means, assuming you have enough processing power, Translatotron could work quicker. But more importantly for many, the process makes it easy to retain the character of the source voice, so the translation doesn’t come out robotically, but with the tone and cadence of the original sentence.

Naturally this has a huge impact on expression and someone who relies on translation or voice synthesis regularly will appreciate that not only what they say comes through, but how they say it. It’s hard to overstate how important this is for regular users of synthetic speech.

The accuracy of the translation, the researchers admit, is not as good as the traditional systems, which have had more time to hone their accuracy. But many of the resulting translations are (at least partially) quite good, and being able to include expression is too great an advantage to pass up. In the end, the team modestly describes their work as a starting point demonstrating the feasibility of the approach, though it’s easy to see that it is also a major step forward in an important domain.

The paper describing the new technique was published on Arxiv, and you can browse samples of speech, from source to traditional translation to Translatotron, at this page. Just be aware that these are not all selected for the quality of their translation, but serve more as examples of how the system retains expression while getting the gist of the meaning.

News Source =

This little translator gadget could be a traveling reporter’s best friend

in Crowdfunding/Delhi/Gadgets/Hardware/India/Kickstarter/machine learning/Politics/TC/Translation by

If you’re lucky enough to get travel abroad, you know it’s getting easier and easier to use our phones and other gadgets to translate for us. So why not do so in a way that makes sense to you? This little gadget seeking funds on Kickstarter looks right up my alley, offering quick transcription and recording — plus music playback, like an iPod Shuffle with superpowers.

The ONE Mini is really not that complex of a device — a couple microphones and a wireless board in tasteful packaging — but that combination allows for a lot of useful stuff to happen both offline and with its companion app.

You activate the device, and it starts recording and both translating and transcribing the audio via a cloud service as it goes (or later, if you choose). That right there is already super useful for a reporter like me — although you can always put your phone on the table during an interview, this is more discreet and of course a short-turnaround translation is useful as well.

Recordings are kept on the phone (no on-board memory, alas) and there’s an option for a cloud service, but that probably won’t be necessary considering the compact size of these audio files. If you’re paranoid about security this probably isn’t your jam, but for everyday stuff it should be just fine.

If you want to translate a conversation with someone whose language you don’t speak, you pick two of the 12 built-in languages in the app and then either pass the gadget back and forth or let it sit between you while you talk. The transcript will show on the phone and the ONE Mini can bleat out the translation in its little robotic voice.

Right now translation online only works, but I asked and offline is in the plans for certain language pairs that have reliable two-way edge models, probably Mandarin-English and Korean-Japanese.

It has a headphone jack, too, which lets it act as a wireless playback device for the recordings or for your music, or to take calls using the nice onboard mics. It’s lightweight and has a little clip, so it’s probably better than connecting directly to your phone in many cases.

There’s also a 24/7 interpreter line that charges two bucks a minute that I probably wouldn’t use. I think I would feel weird about it. But in an emergency it could be pretty helpful to have a panic button that sends you directly to a person who speaks both the languages you’ve selected.

I have to say, normally I wouldn’t highlight a random crowdfunded gadget, but I happen to have met the creator of this one, Wells Tu, at one of our events and trust him and his team to actually deliver. The previous product he worked on was a pair of translating wireless earbuds that worked surprisingly well, so this isn’t their first time shipping a product in this category — that makes a lot of difference for a hardware startup. You can see it in action here:

He pointed out in an email to me that obviously wireless headphones are hot right now, but the translation functions aren’t good and battery life is short. This adds a lot of utility in a small package.

Right now you can score a ONE Mini for $79, which seems reasonable to me. They’ve already passed their goal and are planning on shipping in June, so it shouldn’t be a long wait.

News Source =

Lilt is building a machine translation business with humans at the core

in Artificial Intelligence/Delhi/India/lilt/machine translation/natural language processing/nlp/Politics/TC/Translation by

The ability to quickly and automatically translate anything you see using a web service is a powerful one, yet few expect much from it other than a tolerable version of a foreign article, menu or street sign. Shouldn’t this amazing tool be put to better use? It can be, and a company called Lilt is quietly doing so — but crucially, it isn’t even trying to leave the human element behind.

By combining the expertise of human translators with the speed and versatility of automated ones, you get the best of both worlds — and potentially a major business opportunity.

The problem with machine translation, when you really get down to it, is that it’s bad. Sure, it won’t mistake “tomato” for “potato,” but it can’t be trusted to do anything beyond accurately translate the literal meaning of a series of words. In many cases that’s all you need — for instance, on a menu — but for a huge amount of content it simply isn’t good enough.

This is much more than a convenience problem; for many, language provides serious professional and personal barriers.

“Information on a huge number of topics is only available in English,” said Lilt co-founder and CEO Spence Green; he encountered this while doing graduate work in the Middle East, simultaneously learning Arabic and the limitations placed on those who didn’t speak English.

Much of this information is not amenable to machine translation, he explained. Imagine if you were expected to operate heavy machinery using instructions run through Google Translate, or perform work in a country where immigration law is not available in your language.

“Books, legal information, voting materials… when quality is required, you need a human in the loop,” he said.

Working on translation projects there and later at Google, where he interned in 2011, Green found himself concerned with how machine translation could improve access to information without degrading it — as most of the systems do.

His realization, which he pursued with co-founder John DeNero, was that machine learning systems worked well not simply as a tool for translation, but as tool for translators. Working in concert with a translation system makes them faster and better at their work, lightening the cognitive load.

The basic idea of Lilt’s tool is that the system provides translations for the next sentence or paragraph, as a reference for structure, tense, idiom and so on that the translator can consult and, at least potentially, work faster and better. Lilt claims a 5x increase in words per hour translated, and says the results are as good or better than a strictly human translation.

“We published papers — we knew the technology worked. We’d worked with translators and had done some large-scale experiments,” Green said, but the question was how to proceed.

Talk to a big company and get them interested? “We went through this process of realizing that the big companies are really focused on the consumer applications — not anywhere there’s a quality threshold, which is really the entire translation industry,” Green said.

Stay in academic research, get a grant and open-source it? “The money kind of dried up,” Green explained: money was lavishly allocated after 9/11 with the idea of improving intelligence and communication, but a decade later the sense of urgency had departed, and with it much of the grant cash.

Start a company? “We knew the technology was inevitable,” he said. “The question was who would bring it to market.” So they decided it would be them.

Interestingly, a major change in language translation took place around the time they were really getting to work on it. Statistical neural network systems gave way to attention-based ones; these have a natural sort of affinity to efficiently and effectively parsing things like sentences, where each word exists not like a pixel in an image, but is dependent on the words nearby it in a structured way. They basically had to reinvent their core translation system, but it was ultimately for the better.

“These systems have much better fluency — they’re just a better model of language. Second, they learn much faster; you need fewer updates to adapt to a domain,” Green said. That is to say, as far as domains, that the system can quickly accommodate jargon and special rules found in, say, technical writing or real estate law.

Of course, you can’t just sprint into the midst of the translation business, which spans publishing, real-time stuff, technical documents and a dozen other verticals, and say “here, use AI!”

“There’s enormous structural resistance in the industry to automating in any real way,” Green said. There was no way a major publishing house was going to change the way it worked.

“We tried several business models before we found one that works. There really hasn’t been a company that has decided ‘Okay, this human-in-the-loop method is the fundamental way to solve this problem, let’s just build a company around that.’ So we’re vertically integrated, we work with big enterprises and governments, and we just own the entire translation workflow for them.”

A faster method that doesn’t adversely affect translation quality is basically an efficiency multiplier — catnip for organizations that have a lot of content that needs accurate translation but needs to get the most for their money.

Think about it like this: if you’re a company that puts out products in 20 countries that speak as many languages, translation of packaging, advertising, documentation and so on is a task that’s essentially never done. The faster and cheaper you can get it done, the better, and if you have a single company that can handle it all, that’s just a cherry on top.

“We work with Zendesk, Snap, Sprinklr… we just take over the whole localization workflow for them. That helps with international go to market.” said Green. If a company’s translation budget and process before using Lilt limited it to targeting 5 or 6 new markets in a given period, that could double or triple for the same price and staff, depending on efficiency gains.

Right now they’re working on acquiring customers, naturally. “In Q4 last year we built our first sales team,” Green admitted. But initial work with governments especially has been heartening, since they have “more idiosyncratic language needs” and a large volume of text. The 29 languages Lilt supports right now will be 43 by the end of the year. A proofreading feature is in the works to improve the efficiency of editors as well as translators.

They’re also working hard on connecting with academics and building the translation community around Lilt. Academics are both a crucial source of translators and language experts and a major market. A huge majority of scientific literature is only published in English because it would be onerous to translate this highly technical text for others.

Green’s pet peeve seems to be that brilliant researchers are being put to work on boring consumer stuff: “Tech companies are kind of sucking up all the talent and putting them on Assistant or Alexa or something.” It’s a common refrain in frontier tech like AI and robotics.

Finally, Green said, “it’s my great hope that we can close this circle and get into book translation as we go on. It’s less lucrative work but it’s the third part of the vision. If we’re able to, it’s a choice where we’ll feel like we’ve done something meaningful.”

Although it may start out as support documents for apps and random government contracts, the types of content and markets amenable to Lilt’s type of human-in-the-loop process seem likely to only increase. And a future where AI and people work in cooperation is certainly more reassuring than one where humans are replaced. With translation at least, the human touch is nowhere near ready to be excluded.

News Source =

The WT2 in-ear translator arrives in January, with real-time feedback coming soon

in Delhi/Hardware/India/Politics/Startups/timeKettle/Translation/Wearables by

Timekettle was eager to show us the progress it’s made on the WT2 since it first showed us its wearable translation device at TechCrunch Shenzhen, this time last year. Unlike their 3D printed state at last year’s event, the crowdfunded earpieces are now ready to ship.

They’ve already started going out to early backers and will begin shipping in January to those who pre-order now. The hardware is quite solid. The set up looks a bit like an oversized AirPods case that snaps together magnetically. The idea is to pull it apart and hand one side to the person you want to talk to.

You choose the language via the app and each of you put one in your ear. The two translators are indistinguishable, but for a small line (the “eyebrow”) above the light up word bubble logo used to identify the second unit.

It’s a clever take on wearable translators like the lukewarmly received Google Pixel Buds. The idea is create a translation product that allows wearers to actively engage one another through eye contact and body language — which remain important insight even when you don’t share a language.

It’s a interesting point of friction, however. In plenty of situations, it’s probably a bridge to far to ask a stranger to jam your earpiece in their ear. For, say, business situations, on the other hand, it could ultimately prove a useful tool.

For the former, the company’s got other methods to interact with the product, including app-based communication. There’s also a mode more akin to a walkie-talkie, in which the speaker taps the logo to talk. This bit was design to help avoid picking up ambient noise.

Overall, I was pretty impressed with the experience. The translation isn’t perfect, as evidenced by the above transcript from my conversation with the company’s CEO. But given the ambient noise, a somewhat spotty cellular connection and the fact that my conversation partner insisted on walking around, the WT2 performed admirably.

At present, the translations are somewhat delayed. The earpiece waits for you to finish speaking for a few seconds and then offers the translation in the other ear (as well as yours, to help you learn the language, apparently). The company told me that it plans to offer closer to real-time translation around launch.

News Source =

Google Docs gets an AI grammar checker

in cloud applications/Delhi/Google/Google Cloud Next 2018/google-docs/grammarly/India/machine translation/natural language processing/Politics/TC/Translation by

You probably don’t want to make grammar errors in your emails (or blog posts), but every now and then, they do slip in. Your standard spell-checking tool won’t catch them unless you use an extension like Grammarly. Well, Grammarly is getting some competition today in the form of a new machine learning-based grammar checker from Google that’s soon going live in Google Docs.

These new grammar suggestions in Docs, which are now available through Google’s Early Adopter Program, are powered by what is essentially a machine translation algorithm that can recognize errors and suggest corrections as you type. Google says it can catch anything from wrongly used articles (“an” instead of “a”) to more complicated issues like incorrectly used subordinate clauses.

“We’ve adopted a highly effective approach to grammar correction that is machine translation-based,” Google’s VP for G Suite product management David Thacker said in a press briefing ahead of the announcement. “For example, in language translation, you take a language like French and translate it into English. Our approach to grammar is similar. We take improper English and use our technology to correct or translated it into proper English. What’s nice about this is that the language translations is a technology that we have a long history of doing well.”

Because we haven’t seen this new tool in person, it’s impossible to know how well it will do in the real world, of course. It’s not clear to me whether Google’s service will find issues with punctuation or odd word choices, something that tools like Grammarly can check for.

It’s interesting that Google is opting for this translation-based approach, though, which once again shows the company’s bets on artificial intelligence and how it plans to bring these techniques to virtually all of its products over time.

It’d be nice if Google also made this new grammar checker available as an API for other developers, too, though it doesn’t seem to have any plans to do so for the time being.

News Source =

Go to Top