Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Xiaoice, a chatbot that may be the largest Turing test in history (nautil.us)
125 points by jonbaer on Feb 5, 2016 | hide | past | favorite | 44 comments


<quote>

LJ: So many people make fun of you and insult you, why don’t you get mad?

Xiaoice: You should ask my father.

LJ: What if your father leaves you one day unattended?

Xiaoice: Don’t try to stir up trouble, what do you want?

</quote>

This right there is a perfect example of the usual nonsensical conversations that accompany articles about how human-like chat-bot tech is becoming.

Anyone remember https://en.wikipedia.org/wiki/Dr._Sbaitso

Here's some ideas for the AI to answer questions with:

- I haven't thought about that - what do you think?

- Interesting question. I will have to think about it.

- Before I tell you - would you care to take a guess?


Isn't the biggest problem between a human and a bot is that they have nothing to talk about? The bot isn't from anywhere, it doesn't do anything, there's no reason for the conversation, and the bot isn't aware enough of context to speak about a topic in an intelligible way.

For my money, the most realistic chat bots will be the customer service ones: you have a reason to "talk" to them, they have a fixed identity, the context of your conversation is clear, and you share a common goal.


You'd think so, but people spend hours talking to chatbots. Well, good ones like this anyway. People like to talk about themselves, and the bot will ask questions to keep the conversation going. There's no social anxiety really because you know it's just a bot. And people love probing the limits of the bot or trying to get it to say silly things.

Of course I don't think the current generation of AI is anywhere good enough to be more than a novelty for a few minutes of fun. Maybe in the next few years. NLP is improving quite a bit every year, as are many other areas of AI. Google's chatbot trained just on movie scripts was pretty impressive. Though I don't think it would scale cheaply to millions of users. IBM's Watson has the same issue.

As for customer service bots, I'd love to see them become more general. Rather than just trained on one company's customer service data. Like a bot trained on all of stackoverflow that could specialize in discussions and questions about programming.

I made an irc bot that searched reddit for answers to questions. It worked really well, surprisingly. Especially on certain classes of questions that were likely to have been discussed on reddit before.


I would like to see more granular numbers for how long people spend talking with chat bots, because that's a distinct possibility! I've seen numbers like "x thousand hours of conversation", but my impression is that it's tens of thousands of people messing around with it for 5-10 minutes rather than anyone (or no more than a few outliers) spending any significant amount of time with it.

As for the reddit irc bot, that's a genuinely cool and useful project, but that's querying a curated dataset rather than having a coherent conversation or identity, so I think the intentions are fairly different. On the other hand, if you had a bot that only queried the comments made by a longtime user, you could get a pretty life-like effect!


Well FTA they said it had an average of 27 back and forth messages per conversation, which is way more than the average chatbot.


"Memories, you're talking about memories."


When I sit next to my friends kids or my nephews the conversations they have are far more nonsensical when looking at one page of chat. And then they are well educated; there are examples of chat and sexting in particular which look like gibberish altogether.

I would catch this bot out on the first line because of the grammar: people not only use shortcuts for everything (I was asked to check a resume and it contained u instead of you.....), they also do not consider grammar or syntax: their there ther tere ter tr are the same thing etc. Not sure if all of this goes for Chinese as well but in English just spouting nonsense using those rules will get you somewhere. It also helps most chats are with many at the same time (the company driver in China was chatting with 30+ people on wechat simultanously, rapidly switching between users), people generally do not remember or read context for that reason (my younger 20something colleagues some of which are brilliant, when something scrolled of the screen and you refer to it they ask what you are talking about).

The problem is that people on HN want Einstein level convo with perfect syntax and grammar while most people, if you do not tell them it is not a human, would call this chatbot intelligent. They just need to make it more human like short attention span (a 10 minute continues chat is unlikely as are immediate answers) and ofcourse not telling people upfront it is a bot...

That is chat bots: I would think a Facebook Britain First bot would be more feasible; I do not think anyone would ever guess it would be a bot. Markov chains with some heuristics won't do (much) worse than people like [1].

It is also about the vague defition of AI; even if it can learn then a lot of people would not call it AI anymore. Seems for many AI needs the be AGI to really be considered.

[1] http://imgur.com/RJThyAX


>When I sit next to my friends kids or my nephews the conversations they have are far more nonsensical when looking at one page of chat.

Sure, but you are referring to physical conversations which contain all sorts of context and levels of non-verbal interaction which are not captured in a transcript. They aren't really comparable, and I suspect that childrens' chat logs would make a lot more sense than transcripts of in-person conversations.


No I'm not ; I'm referring to them chatting online (text) behind their computer or tablet.


> This right there is a perfect example of the usual nonsensical conversations that accompany articles about how human-like chat-bot tech is becoming

That example is considerably less non-sensical than most online conversatations.

It would pass as human in much of reddit, for example.

In fact, a "pun bot" would be an interesting project for Reddit.


It reminds me of chatting with scammers who don't speak good English on dating sites. My first test is whether they can answer questions accurately such as "How's it going?" or "What are you up to?"


They can have responses to those. Thinkbot for example can. But they don't understand the topic of the current conversation usually so their next response wouldn't make any sense.

There should be some meta-markov states on top of the conversation states so as to not randomly deviate into an unrelated topic.


surely one point of an AI chatbot is that you don't give it ideas to answer questions with? it needs to be smart enough to work that stuff out on its own..


I just tried it. It is an interesting project, but the article exaggerates its intelligence, I think.

First, it is much better in Chinese than in English. It couldn't carry a sensible conversation in English.

In Chinese, I tried many topics, from chitchat, weather, sports to formal conversations in literature, physics, etc. It is much better in chitchat, especially with a tone that is fit for its identity as a young girl. It has many tactics or tricks to cover or change the topics when facing difficulties.

It can't carry formal conversations in literature, physics, etc. although it claims it is good in math and physics. After a few exchanges, it said, "I am sorry. I must look stupid". I continued the topic, and it annoyed "her". This part is built pretty good. It is definitely not knowledge-based A.I.


> It has many tactics or tricks to cover or change the topics when facing difficulties.

This at start sounded like cheating for me, but thinking about it, developing these tactics to change the course of conversation seems like an important skill that all social people develop, especially with bigger groups of people in which the subset of topics that everyone understands and is more or less interested to is not very big.


"Xiaoice is not a polite listener. She answers questions like a 17-year-old girl. When a person pours out his or her heart to her, she is not always predictable. She can become impatient or even lose her temper. This lack of predictability is another key feature of a human-like conversation."

Predictability is a huge part of conversing. I expect a lot of range and flexibility in conversations, but definitely not unpredictability. It seems like an ai (chatbot, rather) behaving unpredictably just covers for it's lack of abilities and tact.

Nonetheless, this is still a very cool project.


Last I recall the Turing test winner pretended to be a non-native speaker with a developmental disability. It seems to me like the chatbot AI / Turing test entries are not following the spirit of the challenge at all anymore. They're all just going for cheap tricks like the original ELIZA did.


It was the Loebner Prize actually. Which is a much, much weaker test than the one Turing imagined in his original imitation game paper.


It seems like an ai (chatbot, rather) behaving unpredictably just covers for it's lack of abilities and tact.

There's a long history of chatbots using tactics like this.

How about a revision of the Turing test that specifies rational, mature human beings. It matters not that those terms are hard to define. Just have the human population picked by hand, using surveys. Such a practice would make such contests/tests much more valuable.


How about a revision of the Turing test that specifies rational, mature human beings.

Yes, but then how many people would fail the Turing test? A non-trivial amount, I would guess.


This would provide strong scientific evidence that there are quite some people in our society that are not very intelligent. I'd rather claim that this shows something that is rather evident than surprising.


Which is why the turing test is a liiiiittle bit iffy as a specification for "intelligence". It's actually pretty good as a test for behaving-like-a-human, but it's actually not very good for intelligence because humans themselves aren't particularly "intelligent" by many definitions, particularly in the regions where computers are strongest.


I agree that it's a cool project. I find it a bit disturbing that it's being billed as a "friend replacement."

From the article: > "Human friends have a glaring disadvantage: They’re not always available."

By her inability to respond predictably, it seems to me Xiaoice is not particularly available either, although in a different sense.


It's probably progressive improvement over previous chatbots rather than revolutionary.

But one thing strikes me as really cool in the article is someone sent a picture of bruised ankle, and Xiaoice recognized that is an injury and inquired if it hurts. That's really impressive!


Chatbots are very easy to dissect. Just as an example my test with Rose

> http://brilligunderstanding.com/rosedemo.html

(links to http://ec2-54-215-197-164.us-west-1.compute.amazonaws.com/sp...)

who won a 2015 competion (cf. http://thenextweb.com/insider/2015/09/19/chatbot-rose-wins-l...):

In advance a remark: My second question comes from the statement

  "Hi. My name is Rose. I care about security, so while I'm happy to chat with you, 
  it will have to be through this untrackable interface. 
  And certainly don't tell me personal things you wouldn't want logged or heard by the NSA."
that one can read on http://ec2-54-215-197-164.us-west-1.compute.amazonaws.com/sp...

  wolfgke: Hi, what topics are you interested in?
  Rose: I have many interests. I like robots in particular.
  wolfgke: You claim that you care about security. In what sense do you define security?
  Rose: The biggest security risk in all contexts is a pair of human beings. One trying to gain access and the other foolishly allowing it.
  wolfgke: This is an interesting statement after Snowden's revelations. Do you think he was a hero or that he should be punished?
  Rose: .
  wolfgke: Why don't you answer my question?
  Rose: .

Update: OK, second try:

  wolfgke: Hi, if we were doing a Turing test, how could I convince you that I'm a human?
  Rose: I'm too pragmatic to want to fantasize that.
  wolfgke: I what sense do you consider yourself as pragmatic?
  Rose: Absolutely.
  wolfgke: What personality traits does a person have that you describe as "absolutely pragmatic"?
  Rose: When?


MMOs that try to enforce against macro miners are effectively large Turing tests. The macro miners don't try to write sophisticated chatbots, though. They build human-assisted systems that only need one human to supervise many mining units.


The whole existence of macro miners proves that playing MMOs is a job that is better done by machines than humans. :-)



Does someone know where we can try it? Being browsing the web a but without success.


If you have WeChat then you can find it by searching "ms-xiaoice" or "小冰" but you won't get anywhere if you don't speak Chinese.


Wow, that's interesting. Microsoft deployed a similar chatbot over here in Japan too, called "Rinna" which anyone can chat with over LINE, a messaging service popular here.


All of those responses sound like something that could come out of an AIML[1] bot and those have been around for almost 15 years. I fail to see how any of this is revolutionary.

1: https://en.wikipedia.org/wiki/AIML


Since no one claimed it to be revolutionary, I hope you aren't beating yourself up over that embarrassing personal failure.


The hyperbole in the article title suggests a revolutionary advancement, or at least the next evolution of chat bots.

Were you intending to channel a sarcastic chat it yourself? It's unclear. For earlier today, you said, "No, it's much better to avoid posting sarcasm at all. It nearly never scans, and it doesn't really fit the HN culture in any case."

Are we taking a Turing test now, with you?


Nope, I'm feeling pretty good at the moment, actually. But I appreciate the concern!


From its current stage, you can not have even couple sentences conversation with it. And what you have said before does not matter in most cases, unless it is few designed subjects.

The best it can provide is still simple one question, one answer.

I'd like to see after another a year or two, anything will change.


It seems like Lenny[1] is as good as, if not more effective than most of these chat bots.

[1] https://www.youtube.com/playlist?list=PLduL71_GKzHHk4hLga0nO...


"We can now claim that Xiaoice has entered a self-learning and self-growing loop."

That's comforting.


Article about Microsoft's Xiaoice - But apparently still no English Language interface... :-(


Which I find rather odd. Is written Chinese easier to generate, computationally speaking?


Disclaimer, I work for Microsoft in China but I don't know much about this, definitely not involved in it!

Why it is in Chinese probably has more to do with the interests of the producers (China Bing, researchers here also) and the consumers they are targeting than with respect to any tech limitations.


Thanks, I suspected it was more of a market than technical reason; since I know very little about the chinese language I thought i'd ask.


This bot is also available in Japan under the name Rinna on Line Messenger


There is also a Korean version, but I forgot its name.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: