Nice research!
I'm curious though about your choice of winners.....
Looking at your benchmark:
User: Hi, I'm Dan. It's nice to meet you.
One issue here is this may very likely match an input directly, and call a fixed response from most bots (with the name variable thrown in).
On that basis, I'd say real 'cleverness' would come from responding with different words (as your Gold example), or ideally, even going off within the context.
- Code: Select all
Gold Response: Hello Dan, I'm BoBo. Glad to meet you too.
I like the fact your Gold response has 'glad' instead of 'nice', and I like the inclusion of 'too', and also with the bot understanding the two parts of the question.
What about if the bot said this, recognising that this is a common input:
- Code: Select all
Diamond Response: Bobo here, and I'm glad to meet you too. Where are you from Dan?
The difference being (other than an arbitrary change of initial response), is that the bot shows it has understood the context 'Getting to know you'.
All I want to say here is, I think when the input is common and therefore a predicatable one to have (I'm sure most bots will have - here in Verbotese -
Hi * I'm [youname]. How are you * as a whole input) then it is much harder to show intelligence.
My real-life answer would probably be "And you." .......... which makes me less intelligent than ALICE (although that may be true!).
If I had to pick winners based on that input alone, mine would be Bildgesmythe or Jeeny, because I think the bots show more 'getting' of the context (that it is just an elaborate 'hello').