Remarkable demo: Google Duplex

May 13, 2018 in OPINION
6 min read

Did you see the excellent demo of Google’s new Duplex AI booking a hair salon appointment with a human over the phone?

If you haven’t:

I was incredibly impressed. The machine sounds very lifelike and clearly its conversational partner did not notice anything was amiss.

This of course touched off all kinds of discussion, some of which I’d like to review here.

Did Google’s AI pass the Turing test?

I’d call this a no. The human on the other end of the line didn’t seem to notice anything unusual, and the AI even included some very human speech features like filler words (I saw someone posit on Reddit that this was covering the AI’s own thought slowness, which would make this feature pretty darn honest - humans use these to cover thought. Plausible but unlikely; Google has an abundance of computing power and very good programmers.). However, the Turing test does not limit conversation at all. The human judge could easily stray to much more abstract discussion, something I doubt Duplex would be capable of. I expect the human judge would necessarily stray to all kinds of strange topics with the express interest of tripping up the computer.

Even had the Duplex AI passed the unrestricted Turing test, that might not be that interesting. We assume that if a machine passes the Turing test, this has significance with regard to the machine’s intelligence. John Searle’s famous Chinese Room counterexample suggests this may not be so. Imagine a room with two windows. Inside, there is a man (who only speaks English) with a table containing Chinese characters in two columns. When the man receives input in one window, it is always in Chinese - so he finds the input in the left column, copies the right onto a new piece of paper, and pushes the paper out the other window. Plausibly the two-column lookup table could be well-formed and long enough that the man is able to effectively carry on very complicated conversations entirely in Chinese and entirely without understanding a single thing that he’s talking about. This would pass the Turing test were it given in Chinese. Can we infer that the man in the room is intelligent?

We’ve attempted “universal APIs” before and found them lacking. This will be no different.

When I was originally reading discussions about Duplex, I found someone referring to a supposed promise XML had made: that if every business exposed an XML API, we would be able to get all our computers to talk to each other and handle things like scheduling salon appointments. This turned out to be a much harder problem. We are continuing to try our hand at this problem through things like HATEOAS (for self-documenting JSON APIs). I was unable to verify that claim about XML (please jump in to the comments if you can, I’d love to read more), but this still raises an interesting point about the Duplex approach.

The English language, in all its unstructured hard-to-parse glory, is not going to go away soon. It’s not a passing fad. People use English to schedule hair appointments all the time, which results in that sort of staying power. This achievement is less like creating a universal API specification and more like allowing machines to work with what’s already out there. There’s no sea change required to get everyone to expose an API in the new format. Businesses already have phones.

Is this ethical?

Good question. We’d have to think about a number of smaller questions and weigh them all together, including things like:

Is it acceptable for a machine to silently pass as human?

This is the core question here. Are the “uhhhs” and “mm-hmm” a form of lying? I’d say yes, but also I would lie to a customer service representative to save both our time - essentially, lying out of respect for their not wanting to hear my life story. So maybe this is okay. But it also dovetails into:

Does taking up someone’s time create a responsibility on the part of the caller to also be spending their time (the use of Duplex then being disrespectful)?

If I’m taking up someone’s time on the phone, should I have skin in the game - should I also be taking up my own time? We consider robocalls disrespectful, partly for this reason. The telemarketer is able to scale up with less effort, and waste many more people’s time. Does being an individual change this? Perhaps. Does being a paying customer change this? Almost definitely. A business should want to make it as easy as possible to get me to spend my money there, and the use of Duplex being in their interest takes away all of my qualms about using it for those types of purposes.

Does Google have a copy of all of these calls (implying the myriad issues with data collection)?

I’d imagine so. They want to make this AI better at doing its job, and since modern AIs are very data-hungry they’ll want to keep as much as possible in order to keep improving. Then you’ve got the more nuanced question of whether large-scale data collection matters if there’s no human in the loop - if the processing is all automated. I don’t think that’s in the scope of this article, but rest assured I’m gonna write on that topic eventually.

Legally, there are a couple other questions on this subtopic as well. In some places, you must have the consent of both parties in order to legally record a call. As an individual this is almost never going to matter - you’re probably recording for your own records (might matter if your recording is suppressed as evidence because you did not have the consent of both parties, but I’d imagine that’s rare). If Google were to do this at scale though, then the law might actually get mad. There’s a technical issue here too: does the use of a machine to process the call legally imply that the call was recorded? If the audio of the call existed on a hard disk, even if it was deleted later, was the call recorded? That’s a question for a court to decide - but I expect it would be considered a yes.

What if we end up with bots talking to bots over the telephone?

This wouldn’t really be destructive, but it also doesn’t seem likely. English spoken over the phone is definitely not the most efficient way to get two machines to communicate. Presumably, if both source and target are using Duplex, they’ll have a way to identify each other - and at the very least establish a modem-style connection between themselves, if not establish a parallel connection over a higher-bandwidth channel.

Conclusion

The Google Duplex AI is the most interesting new thing I’ve seen recently. The limited demo shows a strong potential for a conversational AI. Though Duplex may not be general purpose, you can bet Google is cooking on that one right now (Wait for Google I/O 2025. That’ll be wild). There are some technical, ethical, and legal issues raised by the demo. Do you have opinions on this one? Let me know in the comments.