Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Does Google Have A Secret “Translate” Service? (searchengineland.com)
146 points by Houshalter on May 3, 2014 | hide | past | favorite | 52 comments


The Google Play app translation he mentions isn't done automatically. Look at the announcement: > http://android-developers.blogspot.com/2013/11/app-translati...

"purchase professional app translations through the Google Play Developer Console" "select a professional translation vendor"

That said, translation vendors tend to just translate automatically first and then have a human clean it up.


I think he was referring to the descriptions in the Play (and Chrome) stores, which are now automatically translated if translations aren't provided by the developer: https://support.google.com/googleplay/android-developer/answ...


Those translations are rather bad, at least for languages I use. It appears to use the same algorithm as the free translate service.

Developers have option to buy professional translation inside Play Store, and some probably do it for mainstream languages. I'm not sure if you can tell if developer has used it (except by judging the quality of translation)


From the link I posted, "There will be a note above the translation explaining that the translation has been done automatically, and an option to return to the default language." - so presumably if that's not present it was professionally translated.


Looks like I missed that one. Thanks.


The link only talks about Play store. Do you have prove that they also do it for the Chrome Webstore?


The article talks about the Chrome webstore. I was replying to a comment about the Play store so I used a link for that, but I mentioned Chrome also because the article of this discussion is about auto-translated Chrome store descriptions.

Under the subheading "Can Google Translate Be Used To Create Auto-Translated, Non-Spam Content?" they discuss a chrome webstore app [0] and compare the Google translate version to the auto-translated version. Indeed the proof is in the article too, you can change your language in the settings (picture in the article) and see the descriptions translated.

[0] https://chrome.google.com/webstore/detail/parking-panic/gopi...


Consider also that Google has context when they translate for their stores, but they can only infer context from the content of translations on Google Translate.


I had the same thought: it's clear when translating a game description that the name of the game is a noun in whichever gender is appropriate for proper names of software in the target language.

Has anyone done any more widespread analysis of "auto"-translated Play Store descriptions versus those available through Google's public translate service? It would be interesting to see how much is contextual and how much suggests a different set of rules, training data, or algorithm.


I think this is the right answer

Of course, you can't feed context in the public tool

It's probably the same one, tuned for the specific issues (not translating the name, "fixed" genders and terms/translations for specific words, context clues, etc)

If I'm not mistaken Google uses translations as well when matching content on your site to serve AdSense content.


This would be a reasonable guess. Statistical machine translation models can be made a lot more accurate than the general models (i.e., Google Translate on the web) if you can pick a single specific domain, and use a focused training text corpus and terminology; and this improvement can be achieved without any improvements to the underlying algorithms.


Exactly, Google Translate is a statistical translation.


I always assumed Google has two kinds of translations - the online we know from the web, which is fast though simple, and an offline, that is much more realistic but slow. Other companies achieved very good offline translations (often by employing professional translations that translated sentences between languages and building databases for automated translation) and I see no reason why Google wouldn't be able to create/license a similar one.


Maybe both services run on different data centers and the statistical model is trained with other corpus.

Google Translate: http://en.wikipedia.org/wiki/Google_Translate

Microsoft Bing Translator works very similar: http://en.wikipedia.org/wiki/Microsoft_Translator


All machine translation is based on professional sentence-by-sentence translations.


Except Google Translate, which is just a statistical analysis of a huge corpus.


Yes, of texts available in multiple languages. Which were originally translated by humans.


This is wildly inaccurate.


This is only wildly inaccurate, if you include rule based translation, which is based on hand-crafted models.

All statistical machine translation, of which Google Translate is one is based on professional translations. As far as I know all published methods rely on aligning them sentence-by-sentence.


The article says Yiddish is "spoken mainly by the Jewish ultra-orthodox community that refrains from using computers." Well that's half true: (ultra-)orthodox Jews are a major part of the Yiddish speaking world, but the idea that they do not use computers is ridiculous. I wonder where this idea comes from...you can even Google it to learn that some rabbis advocate for their members to censor their internet access, which implies they do have computers. They even post on YouTube.


I believe that's kind of a fallback position, that now that they've lost the battle against computers, 2nd-best is to try to figure out what to do with them.

At least as of a few years ago there was an active campaign to discourage the use of computers and especially the internet in haredi communities: http://www.independent.co.uk/news/world/middle-east/rabbis-r.... According to that article, the Vizhnitzer Hasidim even threatened to expel any children whose homes had an internet connection at all from attending the community's schools. And there were some high profile stunts where rabbis smashed smartphones and similar.

But now it does seem that they're moving towards being okay with computers as long as they don't include unrestricted internet access: http://forward.com/articles/184099/kosher-smart-phone-arrive...


It's also commonly believed that the Amish, Mennonites, and related sects don't use computers. It's true for some parts of those groups, but not the whole groups (source: I'm a Mennonite.)


Not only do Orthodox Jews have computers, they apparently have enough of them to sell me my last 5 laptops and desktops. (source B&H Photo)


True, the American form is a bit different. The version of Haredi Judaism in New York is somewhat more moderate, in the sense that it engages with the modern world (even if, perhaps, only out of necessity). Unlike in Israel, where many Haredim reject secular work and devote themselves exclusively to Torah study, Haredim in NYC do do things like run camera stores.

There are also a good number of Orthodox Jews in the U.S. who aren't Haredi at all, mostly under the "Modern Orthodox" label, a position that embraces Orthodox Judaism theologically but doesn't follow the Haredi cultural rejection of modern society, e.g. modern clothing, the science & philosophy of the Enlightenment, etc. One key part of Haredi culture is not only Orthodox Judaism, but a wholesale rejection of the Jewish Enlightenment and modernity, which they hope to insulate their community from by building a parallel society separate from what they see as modern degenerate society. That tends to lead to suspicion of new technologies, especially new forms of communication that might short out the insulator.


the only web site that has open hours. I find that amazing that half the time I look for something there the web store is closed for some Jewish holiday - of which there are many.


Not the only one. If you access them from a German IP address, most game download services will only let you buy games that are rated 18+ in Germany between 10pm and 6am CET/CEST or some such time range. (I found this out because I have a VPN tunnel that terminates in Germany)


I don't see the connection.


Also, Yiddish is basically German but written with the Hebrew character set. The explosion of Yiddish translations is probably and simply because the Translate team at Google just got round to implementing Yiddish translation, which was relatively simple for them as they only needed to combine German translation (already mature code) with the Hebrew character parsing/writing code.


mmm ultra orthodox, gotta love them

http://youtu.be/Bx2LgeGFW4A?t=33m44s


simple explantation: machine learning.

the public google translate tool is trained on data from the internet (and google books), and its intended use is for translating stuff that you found online (on the open web)

google play store translation are trained on data from ... tada ... google play store (and probably editorial translations), it's intended use is for translating descriptions on the play store.

using the play store translation logic on other texts will probably lead to inferior results.

not working for google, just my 2 speculative cents


[deleted]


high quality cross translations only take you as far, reading "crap" on the internet - is a much bigger data-source. and lets be honest, google is damn good in reading crap on the internet. http://bit.ly/1hnKrbA


Looking at that piece of Latin->English translation, the word "United States" caught my attention. How did it get there?

I tried to narrow it down. This is the minimal phrase that gives "United States":

https://translate.google.com/#la/en/Duis%20gravida%20orci%20....

Obviously it's an error, but what specifically makes you go from:

Duis gravida orci in quam tempor.

to:

Designers in the film which, unlike in the United States.


That's what statistical translation algorithms do - form associations between source and target text. How did "United States" get there? Because it was trained on a Latin/English phrase that contained "United States" in the English and probably something completely different in the Latin version, and the algorithm decided that would be the best match.


Curiously, it translates to "It's a child in a candy" on my google translate session.


The presence or absence of period is enough for Google Translate to create entirely different translations!

My link above doesn't have a period at the end. HackerNews cuts off the period from the end of the URL it seems.

So clicking on this link,

https://translate.google.com/#la/en/Duis%20gravida%20orci%20...

gives It's a child in a candy, but if you then manually add a period to the end of the sentence, you'll get, Designers in the film which, unlike in the United States.


omg, that's hilarious!


I wonder if the translation bots could make an app, and repeatedly change its description and translate it.


I remember learning in my machine learning class that translation is a solved problem. However, the reason it appears to be bad is because it is computationally expensive. There are different levels of translation, each of which are increasingly more difficult to compute. There is word for word translation. As you can imagine, this is the lowest level, easy to compute, but overall has poor quality. First, word for word translation, then syntax translation, semantic translation, and finally interligual semantic translation. I took that class a while ago, so forgive me if I'm not 100% accurate, but that is generally the issue: the higher level translation you want, the more expensive the computation.


I don't think translation is even a solved problem for a human fluent in both languages. The ideal is to write what the original author would have written, if they were writing fluently in the target language. Sometimes that means they would have written something that doesn't have the same literal meaning. If a pun doesn't work in the target language, a fluent writer would have used a different pun, but how do you know which?

Even if you want to preserve literal meaning, an example that comes to mind, but which I can't quickly source: IIRC the original Metroid manual, in Japanese, referred to Samus with a gender-neutral pronoun. In English, you basically can't do that without calling attention to it. 'They' and 'he/she' are unusual enough to read awkwardly; 'it' and 'he' would be incorrect; and 'she' would give away the twist. Whatever you choose, you're losing something, and it's a judgment call as to what.


> The ideal is to write what the original author would have written, if they were writing fluently in the target language. [...] If a pun doesn't work in the target language, a fluent writer would have used a different pun, but how do you know which?

Umberto Eco, in Mouse Or Rat?[1], describes two different approaches to translation, (I'm paraphrasing here) source vs. target. Your example is a good one, but it actually only touches on grammatical differences. There are cultural differences too.

For example, if I'm translating a cricket (the sport)-themed story from Australian English into Japanese, do I retain the cricket references, and have the meaning be unintelligible to Japanese readers with no knowledge of cricket? This is what Eco refers to as source translation, which is closer to a literal translation.

Or I could change the cricket references to baseball, which is more appropriate in a Japanese context, and Eco refers to as target translation. The purpose here is to convey the feeling, more than remaining true to the literal meaning of the original text.

[1]: http://books.google.com.au/books/about/Mouse_Or_Rat.html?id=...


I believe that technically, in English, "he" is the correct gender neutral pronoun. It just feels wrong so people commonly try to find other options.


It's been surprisingly fluid over the history of English. One interesting bit to read about is the history of "singular they", i.e. the use of "they" in sentences such as "if a person wishes to gain access, they must enter the code". This was fairly common in pre-19th-century English, then fell out of favor in the 19th and early 20th centuries as grammarians considered it incorrect ("they" was deemed solely a plural pronoun), and now it's making a comeback as a gender-neutral singular pronoun. The construction "he or she" also has usage going back centuries, as a different approach to that. And as you note, the use of "he" as a stand-in pronoun is also traditional, but falling a bit out of favor lately. Overall I don't think there is one correct answer for how English deals with that situation; it varies across writers and eras.


As _delirium said, the singular "they" has been making a comeback and will likely earn its place in academic writing in a generation.


Until we get human level AI that's going to be impossible. But decent translation of "normal" writing is certainly possible and "good enough".


While it's true that more accurate translation is more computationally expensive, and we do have good knowledge of the structures and algorithms for machine translation, it's far from being a solved problem, mostly because it requires a relevant multilingual corpus for training (see [1]) That's also why translation performance varies greatly depending on which kind of prose you are translating (yes, there are attempts to translate poetry as well, see [2]).

There is a huge amount of potential information on the web but it's rarely well curated and the next big problem is how to extract it.

[1] See http://www.independent.co.uk/life-style/gadgets-and-tech/fea...

[2] http://googleresearch.blogspot.com/2010/10/poetic-machine-tr...


The lack of high-quality bilingual corpora is hardly the primary obstacle to considering automatic translation a solved problem. Even for language pairs for which plenty of "bitext" exists, the current approaches to machine translation still face major challenges with regards to ambiguities, idioms, finely nuances imagery and other semantic subtleties, all of which pose hard, but not insurmountable challenges to an expert human translator.


Machine translation is no where near solved.

Machine translation is still one of the most funded areas in NLP, and the quality is still incredibly bad for many language pairs. DARPA has been running one machine translation grant program or another for more than a decade.

It is true that some of the best systems take a long time to translate sentences (on the order of 1 cpu-minute or more). But their quality is still not anywhere near human level for most languages.


Translation is one of the hardest ML/NLP problem and it is far from being solved.

AFAIK, current state of the art translation systems are not working with a layered approach as you defined (morphology->syntax->semantics->interlingua?). First, they do not use interlingua. They use statistical models from example translations (translation model) and target language (language model). They started to include syntactic and semantic clues recently but it is mostly adding extra information to the models and alignment-search process.

So computational complexity is not the real issue when it comes to translation quality. For most (if not all) languages, systems simply cannot achieve human translators level yet. For languages that lack large amount of example translation texts, machine generated translations are still terrible.


Most of your details are correct, but your first sentence is wrong. Translation is nowhere near solved.


> translation is a solved problem

If that's the case, can you link me to the source code? I have a couple of extra computers I can run it on.


I think it depends what you mean by "solved" and your perspective on AI in general. If you think that human brains operate on the same rules as everything else in the universe, then it is theoretically "solved" in that we know the path to get there but just haven't been able to practically do it (yet).


Duolingo's translation bot seems to do a much better translation job than Google's "public" tool, at least for Spanish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: