Artificial chatbots and voice assistants are becoming more and more part of our daily life. One rather new and novel development is Duplex, introduced by Google in 2018 (Leviathan & Matias, 2018). Duplex is able to conduct a conversation independently in order to make a reservation or appointment by telephone. The technology works so well that the other person on the phone assumes that they are talking to another human being. This sounds very similar to the Turing Test which was introduced by Alan Turing 70 years ago (Turing, 1950). Since then it has become a standard for testing a machine’s ‘intelligence’. The irony, however, is that passing the Turing Test by Duplex has raised new ethical concerns.

In the following I will discuss the effects of deception on the well-being of a person. First, I will explain how Google Duplex works thereby focusing on its anthropocentric design. Next, I will take a Kantian and then a utilitarian view in order to analyze whether deception is ethically justifiable. In doing so, I present different ethical guidelines. Afterwards, other technological advances that take speech synthesis one step further will be briefly examined. Finally, I conclude that Duplex undermines trust and authenticity, but at the same time transparency is not always desirable. Also, more sophisticated ethical guidelines and recommendations are needed to adequately address these new ethical issues.

Google Duplex

In 2018 Google introduced its new AI voice system called Duplex (Leviathan & Matias, 2018). Duplex, however, is not a ‘normal’ voice assistant like Apple’s Siri (2010), Google’s Google Now (2012), which is now known as the Google Assistant, Microsoft’s Cortana (2015) or Amazon’s Alexa (2015) (Jadhav & Thorat, 2020, p. 534). Instead it goes one step further by being able to initiate a phone call and speak on behalf of the user.

In order to use it, the user interacts with the Google Assistant and gives instructions. For example, the user wants to make an appointment with the hairdresser for a specific day and time. After the Google Assistant received all the relevant information, it processes the instructions and passes them on to Duplex. Duplex then calls the hairdresser and independently talks to the person on the other end. Once the task is completed, the Google Assistant informs the user whether or not the appointment could be arranged.

As for now, Duplex works only in very limited use case scenarios, such as restaurant reservations, making appointments or checking opening hours (Leviathan & Matias, 2018). This makes Duplex a task-oriented chatbot (Hussain et al., 2019, p. 952), which however acts completely autonomous.

Design for anthropomorphism

Google demonstrated this technology at its Google I/0 conference by playing recordings of phone conversations where the person could not tell if they were talking to a machine or not. This is because Google deliberately designed the system to sound like a human in order to “make the conversation experience comfortable” (Leviathan & Matias, 2018). The technology is based on natural language processing and different text-to-speech engines that are able to understand the conversation and respond accordingly. As a result, Duplex can add natural interruptions, so-called speech disfluencies, such as ‘hmm’ and ‘aah’, and also match synthetic waits or latency, similar to how humans do not always respond immediately. In addition, Duplex is not only able to adjust the sentiment or tone according to context, but it can also mimic regional accents and make use of less formal words like ‘gotcha’ instead of ‘got you’ (O’Leary, 2020, pp. 49–50).

At this point one could think of the Turing Test (Turing, 1950), which functions as a kind of standard for artificial intelligence by testing whether the human can distinguish whether he or she interacts with a machine. As a result, this anthropocentric design is not a new phenomenon. Even the first chatbot or conversational agent called ELIZA which was developed by Joseph Weizenbaum (Hussain et al., 2019, p. 947) “maintains the illusion of understanding” (Weizenbaum, 1966, p. 43).

In the case of Google Duplex, it has certainly passed the Turing Test. Paradoxically, this may be a big step forward for developers or researchers, but for critics or society at large it raises several ethical questions. In the following, I would like to focus primarily on the well-being of the person being called, as this ‘comfortable experience’ seems to be of great importance to Google. I will take a deontological or Kantian and a utilitarian point of view into account. It should be mentioned that the following discussion is rather forward-looking, as Google Duplex is an emerging technology that is limited in its use cases.

Ethical Considerations

Deception undermining well-being

The core of the technology seems to be based on deceiving or persuading the other person into believing that they are communicating with another human being. In other words, the system lies to disguise its nature. When thinking of Kant’s categorical imperative, one should act as if the action were to become a universal law. From this point of view, lying and thus the technology are considered morally wrong. Even if this claim can be contested in some situations from the same camp (Carson, 2010, p. 67), it is intended to serve as a starting point in its simplest form.

If everyone would lie about their identity there would be no more trust in society. As the laws also show, it is therefore prohibited to impersonate another person through identity theft or document fraud. This security threat also applies to fraudulent use of Duplex. Intruders could use the human-like machine voice to speak on behalf of others and thus undermine the authenticity of the conversation. This fact is troubling in that the voice is a very personal human characteristic. It conveys not only the content, but also information about the person speaking, such as gender, approximate age, but above all the emotional state by which we can relate to this person. When considering online communication, such as online forums, there is always some sort of discretion in disclosing sensitive information as well as suspicions, because you never know who you are talking to. This not-knowing is further reinforced by the emergence of text-based bots and arisen problems such as the widespread use of Fake News. As a result, the voice seems to be the last authentic way to identify someone unknown or to build trust when it comes to communication. This also includes the value of respect for a person.

In order to address this problem of lack of trust, California introduced the ‘Bot bill’ which enforces the disclosure of bot usage (Hertzberg, 2018). Similarly, in 2011 the UK’s Engineering and Physical Sciences Research Council and the Arts and Humanities Research Council have stated in a draft code of ethics that “it should always be possible to tell a robot from a human” (Kernaghan, 2014, p. 500). Also, the IEEE (2017, p. 180) recommends that “its [the AI’s] artifactual (authored, designed, and built deliberately) nature should always be made as transparent as possible”. All of these policies were available either before or almost at the same time that Google Duplex was introduced. It is therefore questionable why Google has not complied with these guidelines or at least not made similar efforts.

A further concern about Duplex is the rise in automated and unwanted calls, as was the case with so-called robocalls (Lieto et al., 2019, p. 2577). Current technology is able do detect and prevent such robocalls, but not completely. And with the human-like voice of today’s technology, it is becoming increasingly difficult to do so. Although we might feel comfortable communicating with bots to a certain extent, as the availability of several different voice assistants show, we are not used to having a bot initiate and conduct the conversation, as is the case with Duplex. In other words, we do not expect a call from a machine, rather we ‘call’ the machine. This uncanny or alien behaviour of machines further decreases the well-being of the person.

A year later, several people who were called by Duplex reported this eerie feeling (Garun, 2019). This is because, in response to criticism, Google adjusted Duplex to disclose its ‘nature’ when it starts a phone call (Statt, 2018). But as already mentioned, this raises questions, such as whether we would accept a call from a bot, and if so, whether we trust the call (O’Leary, 2020, p. 52). In terms of well-being, this concerns information privacy, i.e. the recording of the conversation and the further processing of the data, but also the accountability in case either the reservation is missed or the conversation gets out of hand (Rivas et al., 2018, pp. 159–160). In view of this, it could be argued that it is justified to deceive the user in order to maximize overall well-being, even if it limits the user’s autonomy, i.e. informed decision-making.

Deception promoting well-being

From a utilitarian standpoint one is concerned with maximizing overall utility. For example, it would be acceptable to lie to Gestapo officers in World War II in order to save the lives of Jews. In the case of Duplex, the reason for deceiving the user, as indicated by Google, seems to be to promote the well-being of the person called. The previously mentioned ethical guideline by IEEE (2017, p. 175) not only recommends the disclosure of the nature of a bot, but also mentions the following:

  1. In general, deception may be acceptable in an affective agent when it is used for the benefit of the person being deceived, not for the agent itself.
  2. For deception to be used under any circumstance, a logical and reasonable justification must be provided by the designer, and this rationale should be certified by an external authority, such as a licensing body or regulatory agency.

In the 1970s Masahiro Mori described the feeling people have when interacting with robots, which is now known as the uncanny valley (Mori et al., 2012). The uncanny valley claims that we accept a certain anthropomorphism in artificial objects, but only as long as we can clearly distinguish them. The more human-like the object becomes, the stranger our perception of the object is. This is because we attribute certain characteristics only to humans and not to machines (O’Leary, 2020, p. 47). The uncanny valley is passed when we can no longer tell the difference because we presume to communicate with a human, as is the case with Google Duplex.

Ciechanowskia et al. (2019) show that people feel more comfortable communicating by text with a less sophisticated chatbot than with one who speaks through a virtual avatar. Also, Rivas et al. (2018, p. 160) found that people slightly trust text-based chatbots more and are less worried by them compared to “talkbots”. As a consequence, by overcoming the uncanny valley Google Duplex promotes to the well-being by avoiding an eerie feeling.

Another aspect that contributes to the well-being is the efficiency of the conversation. Since Google Duplex is aimed at making reservations, the person does not want to stay on the phone too long and discuss irrelevant topics. Instead, the person wants to complete the given task in a reasonable amount of time and continue with their work. However, there seems to be a “transparency-efficency tradeoff” that is described by Ishowo-Oloko et al. (2019). Their findings suggest that humans cooperate with bots at a better performance in certain settings, but only when actually not knowing they are interacting with a machine. Once the true nature of the bot is revealed, this leap in efficiency is negated. Therefore, “transparency could hurt performance” (Ishowo-Oloko et al., 2019, p. 519) when it comes to achieving tasks.

Certainly, performance is not the only value that should be taken into account when evaluating a conversation. Depending on the context, we might prefer a time-inefficient but enriching conversation. But again, in terms of the intended use of Duplex, the “thickness” of the relationship with the end user is different compared to bots intended for companionship or friendship (Danaher, 2020, p. 126). In this sense, Duplex does not deceive the user into believing they have a close relationship. Instead, the “anthropomorphic cues [are used] to encourage social acceptance and integration” (Danaher, 2020, p. 126).

Discussion and Outlook

Although there are different guidelines available they seem to be to some extent contradictory, as is the case with the Ethically Aligned Design (IEEE, 2017). It proposes the disclosure of a bots nature and at the same time it allows for deception. Moreover, the word ‘bot’ is too general in that it can be understood differently depending on context and by different people. So given the definitions of ‘bot’ and ‘online’ in the Bot bill SB-1001 (Hertzberg, 2018), one might wonder whether an automatic email responder is considered a bot. In this regard, further work is needed to clearly address the new ethical implications in a growing landscape of technical possibilities.

This becomes clear when one considers not only the advances in speech generation technology separately, but also the merging with other technologies. As a more artistic project, developer Matt Reed, for example, developed the Zoombot to avoid having to attend all Zoom meetings (Reed, 2020). Other participants in the call can see that they are not talking to a real person. But given current computating power and increasing possibilities, such as 3D rendering or video manipulation, it does not seem too far-fetched to develop a real-looking bot that is no longer indistinguishable in appearance and voice. Second, further considerations arise with so-called open-domain (Adiwardana et al., 2020) or non-task oriented bots (Hussain et al., 2019), which, unlike Google Duplex, which is a task-oriented bot, can communicate freely without a specific task. Finally, tools like Adobe Voco, “the Photoshop for voices” (Bendel, 2019, p. 83), further threatens the trustworthiness of digitally mediated voice. In the context of news and media the manipulation of media content that looks convincingly real is commonly known as Deep Fakes. These examples are not intended as a deterrent, but to raise awareness of the new dimensions of deception made possible by advancing technologies. And additionally to clarify the need for new ethical guidelines.

Conclusion

At first-sight, Google Duplex seems a convenient technology that is able to make phone calls on behalf of the user. But, as it has passed the Turing Test due to its anthropomorphic design, meaning that the other person cannot tell whether he or she is talking to a machine, new ethical considerations arise.

On the one hand, from a deontological or Kantian perspective, Duplex is morally wrong as it deceives the person at the other end of the phone. By doing so, it undermines trust and authenticity which again result in a decrease of well-being of the person. As a consequence, different ethical guidelines recommend or enforce the disclosure of bot usage to ensure honesty. On the other hand, from a utilitarian point of view, Duplex promotes well-being by sounding so natural and thereby overcoming the eerie feeling one might have when interacting with a machine, also known as the uncanny valley. In addition, the performance of human-machine interaction seems to be higher when not being aware of this fact and rather believing one is interacting with another human being.

Given the limited scope of executable tasks of Google Duplex, that is making reservations, it can be argued that deception through anthropomorphism is acceptable. Some even claim that anthropocentric design is necessary in order to make communication with bots socially acceptable (Danaher, 2020, p. 119). With increasing acceptance, however, the various applications also increase and with them further ethical consequences. This means that it is very likely that deception is unacceptable if the context or use case changes. In general, when it comes to transparency, there is agreement on the decision-making aspect of the algorithm, but not whether the bot should reveal its nature (Ishowo-Oloko et al., 2019, p. 519).

A society in which the essence of the other must be questioned again and again seems rather disturbing. This may remind us of Descartes, who does not trust his senses since they could deceive him. So in the end we must ask ourselves to what extent we accept bots as ‘fellow citizens’ and thus what kind of society we want to live in. Starting with this, as this brief and anticipatory discussion of Google Duplex also shows, we need to develop more sophisticated guidelines to create ‘ethical bots’ or perhaps even ban their use in certain situations. Especially, when speech synthesis is combined with other technologies. Other values such as privacy, autonomy and responsibility should also be given greater consideration, as they too contribute to well-being.

References