Those translations are rather bad, at least for languages I use. It appears to use the same algorithm as the free translate service.
Developers have option to buy professional translation inside Play Store, and some probably do it for mainstream languages. I'm not sure if you can tell if developer has used it (except by judging the quality of translation)
From the link I posted, "There will be a note above the translation explaining that the translation has been done automatically, and an option to return to the default language." - so presumably if that's not present it was professionally translated.
The article talks about the Chrome webstore. I was replying to a comment about the Play store so I used a link for that, but I mentioned Chrome also because the article of this discussion is about auto-translated Chrome store descriptions.
Under the subheading "Can Google Translate Be Used To Create Auto-Translated, Non-Spam Content?" they discuss a chrome webstore app [0] and compare the Google translate version to the auto-translated version. Indeed the proof is in the article too, you can change your language in the settings (picture in the article) and see the descriptions translated.
Consider also that Google has context when they translate for their stores, but they can only infer context from the content of translations on Google Translate.
I had the same thought: it's clear when translating a game description that the name of the game is a noun in whichever gender is appropriate for proper names of software in the target language.
Has anyone done any more widespread analysis of "auto"-translated Play Store descriptions versus those available through Google's public translate service? It would be interesting to see how much is contextual and how much suggests a different set of rules, training data, or algorithm.
Of course, you can't feed context in the public tool
It's probably the same one, tuned for the specific issues (not translating the name, "fixed" genders and terms/translations for specific words, context clues, etc)
If I'm not mistaken Google uses translations as well when matching content on your site to serve AdSense content.
This would be a reasonable guess. Statistical machine translation models can be made a lot more accurate than the general models (i.e., Google Translate on the web) if you can pick a single specific domain, and use a focused training text corpus and terminology; and this improvement can be achieved without any improvements to the underlying algorithms.
I always assumed Google has two kinds of translations - the online we know from the web, which is fast though simple, and an offline, that is much more realistic but slow. Other companies achieved very good offline translations (often by employing professional translations that translated sentences between languages and building databases for automated translation) and I see no reason why Google wouldn't be able to create/license a similar one.
This is only wildly inaccurate, if you include rule based translation, which is based on hand-crafted models.
All statistical machine translation, of which Google Translate is one is based on professional translations. As far as I know all published methods rely on aligning them sentence-by-sentence.
The article says Yiddish is "spoken mainly by the Jewish ultra-orthodox community that refrains from using computers." Well that's half true: (ultra-)orthodox Jews are a major part of the Yiddish speaking world, but the idea that they do not use computers is ridiculous. I wonder where this idea comes from...you can even Google it to learn that some rabbis advocate for their members to censor their internet access, which implies they do have computers. They even post on YouTube.
I believe that's kind of a fallback position, that now that they've lost the battle against computers, 2nd-best is to try to figure out what to do with them.
At least as of a few years ago there was an active campaign to discourage the use of computers and especially the internet in haredi communities: http://www.independent.co.uk/news/world/middle-east/rabbis-r.... According to that article, the Vizhnitzer Hasidim even threatened to expel any children whose homes had an internet connection at all from attending the community's schools. And there were some high profile stunts where rabbis smashed smartphones and similar.
It's also commonly believed that the Amish, Mennonites, and related sects don't use computers. It's true for some parts of those groups, but not the whole groups (source: I'm a Mennonite.)
True, the American form is a bit different. The version of Haredi Judaism in New York is somewhat more moderate, in the sense that it engages with the modern world (even if, perhaps, only out of necessity). Unlike in Israel, where many Haredim reject secular work and devote themselves exclusively to Torah study, Haredim in NYC do do things like run camera stores.
There are also a good number of Orthodox Jews in the U.S. who aren't Haredi at all, mostly under the "Modern Orthodox" label, a position that embraces Orthodox Judaism theologically but doesn't follow the Haredi cultural rejection of modern society, e.g. modern clothing, the science & philosophy of the Enlightenment, etc. One key part of Haredi culture is not only Orthodox Judaism, but a wholesale rejection of the Jewish Enlightenment and modernity, which they hope to insulate their community from by building a parallel society separate from what they see as modern degenerate society. That tends to lead to suspicion of new technologies, especially new forms of communication that might short out the insulator.
the only web site that has open hours. I find that amazing that half the time I look for something there the web store is closed for some Jewish holiday - of which there are many.
Not the only one. If you access them from a German IP address, most game download services will only let you buy games that are rated 18+ in Germany between 10pm and 6am CET/CEST or some such time range. (I found this out because I have a VPN tunnel that terminates in Germany)
Also, Yiddish is basically German but written with the Hebrew character set. The explosion of Yiddish translations is probably and simply because the Translate team at Google just got round to implementing Yiddish translation, which was relatively simple for them as they only needed to combine German translation (already mature code) with the Hebrew character parsing/writing code.
the public google translate tool is trained on data from the internet (and google books), and its intended use is for translating stuff that you found online (on the open web)
google play store translation are trained on data from ... tada ... google play store (and probably editorial translations), it's intended use is for translating descriptions on the play store.
using the play store translation logic on other texts will probably lead to inferior results.
not working for google, just my 2 speculative cents
high quality cross translations only take you as far, reading "crap" on the internet - is a much bigger data-source. and lets be honest, google is damn good in reading crap on the internet. http://bit.ly/1hnKrbA
That's what statistical translation algorithms do - form associations between source and target text. How did "United States" get there? Because it was trained on a Latin/English phrase that contained "United States" in the English and probably something completely different in the Latin version, and the algorithm decided that would be the best match.
gives It's a child in a candy, but if you then manually add a period to the end of the sentence, you'll get, Designers in the film which, unlike in the United States.
I remember learning in my machine learning class that translation is a solved problem. However, the reason it appears to be bad is because it is computationally expensive. There are different levels of translation, each of which are increasingly more difficult to compute. There is word for word translation. As you can imagine, this is the lowest level, easy to compute, but overall has poor quality. First, word for word translation, then syntax translation, semantic translation, and finally interligual semantic translation. I took that class a while ago, so forgive me if I'm not 100% accurate, but that is generally the issue: the higher level translation you want, the more expensive the computation.
I don't think translation is even a solved problem for a human fluent in both languages. The ideal is to write what the original author would have written, if they were writing fluently in the target language. Sometimes that means they would have written something that doesn't have the same literal meaning. If a pun doesn't work in the target language, a fluent writer would have used a different pun, but how do you know which?
Even if you want to preserve literal meaning, an example that comes to mind, but which I can't quickly source: IIRC the original Metroid manual, in Japanese, referred to Samus with a gender-neutral pronoun. In English, you basically can't do that without calling attention to it. 'They' and 'he/she' are unusual enough to read awkwardly; 'it' and 'he' would be incorrect; and 'she' would give away the twist. Whatever you choose, you're losing something, and it's a judgment call as to what.
> The ideal is to write what the original author would have written, if they were writing fluently in the target language. [...] If a pun doesn't work in the target language, a fluent writer would have used a different pun, but how do you know which?
Umberto Eco, in Mouse Or Rat?[1], describes two different approaches to translation, (I'm paraphrasing here) source vs. target. Your example is a good one, but it actually only touches on grammatical differences. There are cultural differences too.
For example, if I'm translating a cricket (the sport)-themed story from Australian English into Japanese, do I retain the cricket references, and have the meaning be unintelligible to Japanese readers with no knowledge of cricket? This is what Eco refers to as source translation, which is closer to a literal translation.
Or I could change the cricket references to baseball, which is more appropriate in a Japanese context, and Eco refers to as target translation. The purpose here is to convey the feeling, more than remaining true to the literal meaning of the original text.
It's been surprisingly fluid over the history of English. One interesting bit to read about is the history of "singular they", i.e. the use of "they" in sentences such as "if a person wishes to gain access, they must enter the code". This was fairly common in pre-19th-century English, then fell out of favor in the 19th and early 20th centuries as grammarians considered it incorrect ("they" was deemed solely a plural pronoun), and now it's making a comeback as a gender-neutral singular pronoun. The construction "he or she" also has usage going back centuries, as a different approach to that. And as you note, the use of "he" as a stand-in pronoun is also traditional, but falling a bit out of favor lately. Overall I don't think there is one correct answer for how English deals with that situation; it varies across writers and eras.
While it's true that more accurate translation is more computationally expensive, and we do have good knowledge of the structures and algorithms for machine translation, it's far from being a solved problem, mostly because it requires a relevant multilingual corpus for training (see [1])
That's also why translation performance varies greatly depending on which kind of prose you are translating (yes, there are attempts to translate poetry as well, see [2]).
There is a huge amount of potential information on the web but it's rarely well curated and the next big problem is how to extract it.
The lack of high-quality bilingual corpora is hardly the primary obstacle to considering automatic translation a solved problem. Even for language pairs for which plenty of "bitext" exists, the current approaches to machine translation still face major challenges with regards to ambiguities, idioms, finely nuances imagery and other semantic subtleties, all of which pose hard, but not insurmountable challenges to an expert human translator.
Machine translation is still one of the most funded areas in NLP, and the quality is still incredibly bad for many language pairs. DARPA has been running one machine translation grant program or another for more than a decade.
It is true that some of the best systems take a long time to translate sentences (on the order of 1 cpu-minute or more). But their quality is still not anywhere near human level for most languages.
Translation is one of the hardest ML/NLP problem and it is far from being solved.
AFAIK, current state of the art translation systems are not working with a layered approach as you defined (morphology->syntax->semantics->interlingua?). First, they do not use interlingua. They use statistical models from example translations (translation model) and target language (language model). They started to include syntactic and semantic clues recently but it is mostly adding extra information to the models and alignment-search process.
So computational complexity is not the real issue when it comes to translation quality. For most (if not all) languages, systems simply cannot achieve human translators level yet. For languages that lack large amount of example translation texts, machine generated translations are still terrible.
I think it depends what you mean by "solved" and your perspective on AI in general. If you think that human brains operate on the same rules as everything else in the universe, then it is theoretically "solved" in that we know the path to get there but just haven't been able to practically do it (yet).
"purchase professional app translations through the Google Play Developer Console" "select a professional translation vendor"
That said, translation vendors tend to just translate automatically first and then have a human clean it up.