Bilingual speaker here: the German translation produced was very poor quality compared to the English one; the German speaker was enunciating excessively clearly, sitting in a sound studio and sticking to short, simple phrases with clear grammar.
From that point of view the demo was misleading - we heard an English speaker talking naturally and heard clear responses, but the other side of the conversation was quite broken, although possible to understand.
If both sides were speaking in real world settings the demo would have been honest but far less impressive to casual watchers.
Despite this misdirection I am still impressed with the amount it did manage to translate from the long, naturally-spoken English sentences.
Why moconnor's observation matters, based on my personal experience learning French:
- It's relatively easy to learn enough French to carry on a slow, clearly-enunciated conversation with a highly-cooperative speaker in a quiet room. Call it 350 hours of study, or B1 on the CEFRL scale: http://en.wikipedia.org/wiki/Common_European_Framework_of_Re...
- If you want to watch the French dub of [i]The Game of Thrones[/i] in a noisy room, and actually follow the plot twists, it's whole different game. Call it 1,000 to 2,000 hours of study and exposure, perhaps more for some people. If you want to listen to standup comedy, it's usually even worse.
In other words, it's surprisingly easy to establish basic communication, but you can break your heart trying to get really good.
Going by this experience, and by lots of experiments with Google Translate, I would predict that machine translation will suffer from similar challenges: It will be easy to get the point across if everybody cooperates and speaks clearly. But it will be a long time before you can speak idiomatically and casually, at full speed, and pretend the translation system isn't there.
I'm always a bit wary of those demonstrations, obviously the dialogue was rehearsed. It's hard to know how much has been hidden under the carpet for the sake of the presentation. For all we know the entirety of the machine translation could have been on playback. I'm wiling to give them some credit and assume it isn't the case though but it's pretty obvious they didn't choose the worst case scenario for a skype conversation.
I hope they'll make a public beta soon, so that we can all try and see how this works in practice in the real world.
I know the speech recognition researcher who worked on this project. It is all very real, and these demo situations don't always represent ideal conditions.
I tried an early version of the speech-to-text component (1 or 2 years ago?) myself using an off-the-shelf microphone with a model that was completely generic (none of me speaking in the corpus). It worked surprisingly very well.
While it isn't microsoft's version, here's something I made at a hacakthon which daisy chains google's services to make something very similar. It's an extension of Google Hangouts and uses and at the moment only works in Chrome, although I am working on improving it and making it less reliant on Google's technology.
> the German translation produced was very poor quality compared to the English one
They would have selected the best possible language pair for this demo, so we should expect it to have done very poorly if they picked, say, English<->Mandarin.
What stands out is that they didn't pick Spanish as the other language.
It's pretty universally agreed that Spanish[1] is the easiest language for English speakers to learn, and Portuguese is in the same ballpark[2], but German is significantly harder[3]. (And Russian, Chinese, and Arabic would be way to the right on an exponential graph.)
I'm guessing that machine translation of English<->German, for some reason, must be easier than English<->Spanish.
[1] There's a fairly authoritative study on this which I can't find it immediately.
[3] The difficulty of German vs Spanish is confirmed by an NSA (!) document that says that "Next to Vietnamese, German may be the most difficult for English-speaking students to learn for German has a difficult syntactical feature, the discontinuity of the predicate, which the others lack. Among French, Italian, and Spanish, there also seems to be only a slight difference in difficulty. It appears that these three are the easiest languages for English-speaking students to learn":
http://www.nsa.gov/public_info/_files/cryptologic_spectrum/f...
A little bit more than last year they did an impressive* demo from English -> Mandarin. There were slight errors that sometimes flipped meaning (but only slightly hindering, which made it more believable). This seems to be a first step along productization of that research.
What you are discussing is called linguistic topology and it is far from estalished fact that Spanish is the easiest. I studied the topic but from the perspective of Arabic language instruction for native speakers of English and other languages.
It might be up there, but I have also heard things like Indonesian, which has the simplest grammatical structure. Chinese, speaking of morpho-syntax, is some ways as easy or difficult as English (tense, gender, and number are not more difficult than English, in my opinion, having studied Arabic to fluency and Chinese at the beginner level).
To get back on topic, topologies like this are good for focusing on which specific constructs will cause difficulty, but which is easiest to learn.
Your NSA quote is a little out of context. The article you link to states that German is the fourth-hardest language from a list of five comparatively easy languages.
> Among Vietnamese, German, French, Italian and Spanish, Vietnamese may be the most difficult... Next to Vietnamese, German may be the most difficult
This doesn't amount to being significantly harder, particularly in light of the statement (that you quoted) that there is just one feature that makes it harder than the other three.
Speaking from personal experience (native English speaker, no prior difference in exposure to the two languages, simultaneous study of the two, similar teacher quality and curriculum), I found French harder than German. Although my single anecdote doesn't prove German to be easier, surely it suggests that the one I found easier couldn't be significantly harder.
While English <-> Spanish shares the romance language cognates (generally speaking, higher English comes from French, which shares a common ancestry), English <-> German shares grammar.
I imagine the vocabulary isn't as difficult for machines to process as grammar.
English shares no ancestry with French. However, in 1066 England was invaded by the Normans, leading to the entire aristocracy and upper classes speaking Norman French, causing a lot of French vocabulary to enter the English Language. Most of these words are for stuff in higher registers, though. The basic vocabulary in English is entirely Germanic (it's nigh-on impossible to write a sentence with only French words in English), while much of the more advanced or formal stuff is French (or Latin or Greek).
The FSI[1] has ranked languages according to how easy it is for English speakers to learn. There are 10 languages in Category I (the easiest to learn)[2]:
Afrikaans, Danish, Dutch, French, Italian, Norwegian, Portuguese, Romanian, Spanish, Swedish
Maybe because they didn't want to make it so obvious that the "other language" algorithm is much poorer than for English. A lot more people would pick-up on that fact, if they used Spanish, since a lot more people know Spanish.
Completely guessing - but I think the strict structure of German makes it easier for machines to translate? English and Spanish on the other hand break more of their rules than they stick to, so it makes it more "human" / "natural" and easier to relate to?
There probably were too many people in the room who speak Spanish. The German part of the conversation is slightly scary, to some degree due to her facial expressions. I'd really like to know what somebody who doesn't know German thinks about the faces she makes.
I agree, the German output was horrible. The German speaker spoke like a German lesson 101 speaker. Very simple sentences and she over-pronounced everything. Nobody speaks like that.
The problem with solving this problem is that the entire dev team and anyone that demos it has to fluently know all the languages being used. I highly doubt even they all know german, spanish, mandarin to such a high caliber that they are able to detect common language vs robot translated garble.
So what do you get in the end? This unimpressive video to native speakers and skeptical one to non-natives.
The fact that at the start of the demo, she asks him if he knows german and he said no should be a red flag for the impending bad demo.
"Who thought you could have fun with the Germans?" Really, Kara?
Anyway, obviously there were a couple of mistakes by the speech recognition, and they were taking care to speak very clearly, but still, I'm impressed. The future looks bright.
I hate her live appearances with a passion (can't speak of her articles). How can someone that annoying get the best interviews in the industry? As an example, the landmark Bill Gates / Steve Jobs sitdown - all she did was annoy with sudo sexual innuendo, completely trivial questions and played up drama.
Kudos to Microsoft Research. It is very impressive especially given that it was an actual long distance live demo. Although that funny Blooper from Vista's speech recognition [1] will always remain stuck in my mind.
I am guessing this is going to be the part of our future where language barrier slowly becomes the thing of the past.
The most intriguing part of this demo was where Satya says that Machine learning technology gets better at previous languages as new languages are introduced. That in itself is an accidental revolution.
As someone who is a native German speaker, I wasn't impressed. The translation to English was much better - because she spoke very clearly, slowly, and with simple grammar.
The translation was no better than, say Google Translate. A mashup of Siri + Google Translate wouldn't have been any worse, and both technologies exist for some years. This is, unfortunately, hardly a breakthrough. Rather a 24h API hackathon project ;)
" A mashup of Siri + Google Translate wouldn't have been any worse, and both technologies exist for some years."
I have seen those already and they don't work nearly as well as this one. I guess the lady was trying really hard to do not screw up the demo and that might actually reduce the wow effect for German speakers.
And one that my friends and I put together as an extension for Google Hangouts back in April, during a 36 hour hackathon in LA. Microsoft was one of the sponsors too.
Actually the Android google translate app already does this.. and probably better. If you tilt the app sideways it will alternate between the two languages and voice input.
I've done German at high school for three years or so. The contrast between how she had to speak German (ridiculously slowly as if trying to teach someone pronunciation) and how the English speaker spoke (pretty much normally) must say something about the confidence in their German speech recognition.
In other words, this has come a long way since "Dear aunt, let's set so double the killer delete select all", but I've seen enough "automated translation breakthroughs" to remain highly sceptical.
Back in April, at a hackathon sponsored in part by Microsoft, me and my friends built something we called Unilingo which was a working model of this idea! At the time we wondered why no other video chat applications did anything like this, but now it seems Microsoft stepped up to the plate and added translation to Skype!
I believe Nadella was referring to a more general sentiment in the machine learning community about our lack of theoretical understanding of deep neural nets. I'd suspect that the network still learns some higher level features that exist across languages which is why it could become better in English after learning German.
I know two sentences of german, and spotted one blattant mistake right from the first sentence : "wie gehts dir" and not "wie dir's geht". Not impressed as well.
Indeed. I wonder if they'll also want to call it Babel Fish? http://en.wikipedia.org/wiki/Yahoo!_Babel_Fish just confused me more, and I've now no idea who actually owns the trademark rights to that term. (It was AltaVista, bought by Yahoo!, spun-off into Bing Translate, which was partially bought by Microsoft, and ... I'm lost... )
Yahoo/Altavista Bablefish and Google Translation (til 2007) were based on SYSTRAN software: http://en.wikipedia.org/wiki/Systran (a desktop version for offline usage is even available)
Few years back when Google was working on Google Wave project (the new email) they also had done a similar translation.. but i think it was only at text and document level. Microsoft has come up with some real problem solving solutions specially when the world is more Globally connected. Kudos to Microsoft research.
I remember seeing an article a few years back about msft filing a patent for automatic language translation for instant message. Maybe that is why we haven't seen this done by anyone else?
Possibly - but maybe also because it apparently requires years and years of research to get to something that works properly. Not a lot of companies have the amounts of money and staff for that.
But it is:
a) less expensive than getting a human translator
b) faster than finding a human translator
c) a good support for people who speak a little bit of the other's language to provide an additional help for understanding each other
d) still far from perfect
I wonder why people are so excited about this. It's voice recognition coupled with an automated translator and a text-to-speech tool. Sure it all has to work fast enough but the English-to-German translation is horrible and probably much worse than a Google translate of the same sentence.
For me as a native German speaker I would have been embarrassed being a Microsoft employee if they advertise something like that as "revolutionary" - it's not!
From that point of view the demo was misleading - we heard an English speaker talking naturally and heard clear responses, but the other side of the conversation was quite broken, although possible to understand.
If both sides were speaking in real world settings the demo would have been honest but far less impressive to casual watchers.
Despite this misdirection I am still impressed with the amount it did manage to translate from the long, naturally-spoken English sentences.