Have you ever heard the Gutenberg Press and the Amazon Echo mentioned in the same breath before? In this episode, Bret Kinsella explains how they have more in common than you’d think. The invention of the printing press signalled the end of a purely oral tradition and now, centuries later, voice technology is bringing it back – and also tearing us away from our screens.
As the founder, CEO and research director of Voicebot.ai, Bret’s considered a global authority and the go-to spokesperson on voice assistants and AI. He’s also the host of the Voicebot Podcast and editor of the Voice Insider newsletter. If you’re a brand, Bret wants to know whether you’re in voice – and if not, why not? He also provides a new perspective on privacy issues and explains how voice assistants might radically change the workplace of the near future.
Paul Sephton: Bret, you’re a front runner in the voice industry and delivering keynotes and leading research for some of the biggest tech companies in the world, and yet when you started Voicebot, it must have been a slightly different landscape. What was it like when you first got into voice and how did you jump into it ahead of the curve?
Bret Kinsella: I started working in voice in 2013 with a client and we were working on voice interactive advertising on mobile. And in 2013, if you wanted to do that on mobile, you had to create your own voice assistant, essentially. You had to create your own automated speech recognition and matching and NLU in the backend. And in 2014 Amazon Alexa was launched in the fall and they had approached this company that I was working with and they were wondering if they would support this new Alexa ecosystem. And at first there was other things going on, so we weren’t focused on it. But about a year later, Amazon came back and I started doing some research for this client of mine in terms of what was going on in the industry, and I immediately recognized something that I had not noticed before, in that voice was becoming a platform shift.
So we had Alexa obviously coming up through the Amazon Echo. We already had Siri on the phone, but Siri had sort of stagnated. And then we had Google then starting to commit and planning to commit later that year in 2016 to its own platform, its own smart speaker platform. I immediately recognized something that was similar to what I’d seen in the 1990s around the rise of the web, when I was working in business to business e-commerce. So I wound up writing a couple articles because I know people in the industry for Advertising Week and Huffington Post, and they were just popular, people asked me where I got my information. This was 2016, it was very difficult to find market information. I said, “Okay, I can share all this information with these people who are asking it for me. I’ll just throw it up on a website, put some links up.” I thought that would probably be the end of it. Maybe I’d blog a little bit and watch the rise of the industry.
However, what happened the week I put up the website, which was September 15th, 2016, is Amazon Echo launched in Europe. The next week, Samsung bought Viv Labs for $200 million. That’s subsequently become the Bixby 2.0 product. And two weeks later, Google actually announced Google Home, and shortly after that, the acquisition of api.ai. So it really picked up. So in essence, this whole story, this origin story was trying to help some people find some information and then the market grew and then realizing that there was really a much broader need for information in the industry. That’s where we started writing articles and analysis and eventually we moved into research and we published five research reports last year. We’ll do probably between 9 to 12 this year. We’ve launched a podcast and these other things because what we found was that people just needed this information. They kept coming to us and asking for it because we had context about what was going on and what it meant.
Paul Sephton: We’re now in what you’re calling phase two of voice. So talk to me a little bit around what phase one was and the biggest changes you’ve noticed between when you first got into this topic and what you’re seeing in the industry now.
Bret Kinsella: Well, when I think about phase two, I’m thinking more around the natural development of technology adoption life cycles. But what I will say is that in 2016, 2017, even in the beginning of 2018 a lot of people saw voice as a smart speaker phenomenon, and they saw these new devices coming into the home. And it was a change. It was very different than the way we’d use voice in the past because of the far-field microphone. It was not a personal device. It had natural language understanding, so it was more than just voice navigation. It could understand what you’re doing, could deliver the right type of reaction, whether it be information or control of smart home. So that was really important. People saw that and obviously most people in technology do focus around devices because they see that as something that it’s manifested itself. What I think is starting to become clear to people now is that voice actually is device independent.
A lot of people think of it, in phase one in particular, and some people still have this view, that it’s about the smart speaker, and that’s been an important manifestation of voice. However, beyond being a catalyst, voice is a new interface across all surfaces, so it’s device independent, which makes it different from the other platform shifts that we’ve seen. It also has different capabilities behind it, and not to go into too much depth on that, but the other platforms, the web and mobile had programmatic back ends, so the developer would have to conceive of everything that it would allow a user to do and then it would give some sort of backend payload or response based on that request. Voice is different because anything can be asked so we have this AI capability which determines what people have asked for, but it’s not necessarily a programmatic response. They don’t have to anticipate everything that someone might ask for it. The system itself can determine what the best way to answer, and that the answer today might be different than the answer tomorrow or next week and it might be different for you or for me, and these are all very different types of scenarios that we’ve seen in the past.
Beyond being a catalyst, voice is a new interface across all surfaces, so it’s device independent, which makes it different from the other platform shifts that we’ve seen.
Paul Sephton: So one thing’s for sure, we’re going to see absolutely explosive growth in the coming years, and it makes me think about a concept which I think I first read about on Voicebot called the Gutenberg parenthesis. Can you tell me a little bit more about what this major, major change means for us.
Bret Kinsella: In 1440 Johannes Gutenberg created the modern printing press. Well, not modern now, but at the time the first manifestation of movable type and where you could have scale in terms of production and what we did is before that point we were focused on an oral tradition and oral communication was the primary way to disseminate information. Gutenberg came in with text. All of a sudden we had this other option and it was much more scalable because I didn’t have to be with you in order to spread my message. Very famous people like Martin Luther took advantage of that and the world has changed, and so much of our knowledge and communication is tied up in text. Now that has morphed over the last several years into symbols and images much more. That’s got everything from emojis to Instagram and the way we communicate as brands.
What we’re seeing with voice today is they talk about this idea of moving back to orality or the second orality, so the oral communication, and this idea of conversation again. So the important thing for brands, and this is where I spend a lot of my time working with big global brands, because they reach out to me all the time and ask what the meaning is, and I say, “Well, first of all, you have a very tactical response to this. We know that consumers are engaging in large numbers on Google Assistant, with Amazon Alexa with Apple Siri. So the question is, do you have a presence there? Are you able to engage with them when they have a question about your brand or your product category, for example, and they ask one of these assistants? Are you an answer or is someone else answering for you about your brand?” And largely that’s the case. We’ve done a lot of research on, for example, voice assistant SEO. If you ask a question about 200 different brands that we analyzed, most often these questions are being answered by Wikipedia or maybe some other resources on the web, Yelp, can they engage in that conversation? Can they control the messaging about themselves when people ask specifically about their brand?
We know that consumers are engaging in large numbers on Google Assistant, with Amazon Alexa with Apple Siri. So the question is, do you have a presence there?
And to large part today, they’re not. And the opportunity for them is to say, “Okay, yes, we must do this.” So I think of it as a competitive necessity for them to have presence on the platforms and spend time or places where people are. Now over time, I believe that will evolve as well, and that they will not only be working on these platforms, but like Bank of America for example, they’ll have their own voice assistants out there in the market, or Mercedes-Benz with the MBUX. BMW is doing the same type of thing, and they want to have their own voice assistant as the primary interface for the things that are important to them so they can provide a better user experience primarily and then it will also communicate with these other consumer voice assistants. So they’ll be able to use those consumer voice assistants as a client of their own AI voice assistant conversational backend, and they’ll also be able to use it as a funneling tool to take people to their other properties, whether it be their other voice assistant, their mobile app, their website, but where they can create the rounded experience, the interactive experience they want to have with their consumers.
Paul Sephton: This big change has certainly stirred the branding side of things because just this year we’ve seen the likes of HSBC or MasterCard investing millions of dollars within their marketing strategies to develop what is called sonic branding. How do you think brands can utilize voice to better reach their audiences and what’s the best way for a brand right now to future proof themselves for this change which is coming in voice?
Bret Kinsella: Well, sonic branding is not new. It’s been around for a long time. Many people are familiar with the Intel chimes or United Airlines for a long time had Gershwin’s Rhapsody in Blue, and it’s always been beneficial, but it’s usually been part of their activity around advertising. It’d be in their television commercials or radio commercials and it was a way to create a signature that associated immediately with the brand and usually with these favorable associations. What’s happening now, since audio is becoming, once again, a bigger part of our life and conversation is becoming a bigger part of our life, audio branding is increasing in importance. Our audio brand, our sonic brand needs to be as integrated with everything we do as visual brand identity.
Our audio brand, our sonic brand needs to be as integrated with everything we do as visual brand identity.
So just like we have a logo, for example, or we have a color palette for branding, we need to have a sonic fingerprint, an audio fingerprint, which might be a jingle or it might be just a sound or something like that. We might want to have a sound bed, which would be sort of… Think of a sound bed as the music track that plays under the voiceover or something like that. So there’s so many different things that come into play. But what I would say is going forward, brands are going to look at sonic branding as important visual branding because the conversational interaction with consumers is only going to increase over time. And that having that audio signature is going to be as important as having a visual imprint.
Brands are going to look at sonic branding as important visual branding because the conversational interaction with consumers is only going to increase over time.
Paul Sephton: Now, I recall a story from May, 2018 when what happened was Alexa made headlines for all of the wrong reasons. A husband and wife were having a private conversation at home, and the next thing they know they’re getting a phone call from one of the husband’s colleagues to say that the entire conversation has been emailed to him and it turned out that Alexa was incorrectly receiving a bunch of voice prompts from the background conversation and ended up erroneously sending this email without being prompted to. It raises questions of privacy for all of us as we use our voice assistants more and more. Do you think it’s a valid concern or is it quite a secure technology for us to be having in our pockets 24/7.
Bret Kinsella: If there’s a conversation around privacy and voice assistants, I think it’s important that we continue to put that in the public square and discuss what the implications are. The one thing I will say is that my view is that consumers continually trade off and prefer convenience over privacy, and we see that just with the mobile phones. What we have with mobile phones, what we have with CCTV tracking us as we walk around in public, there’s a sense of convenience in being able to have my navigation or being able to instantly tag the location of my Instagram posts. These are reasons why I give up some privacy in terms of where I’m going and what I’m doing, and then from a public standpoint, we see cameras in places where this is largely for security and safety. There’s concerns around facial recognition and tracking people not just by the phone but by their face and these other types of things.
So voice has come up as just one element in this broader tapestry of privacy concerns, or lack of anonymity is probably a better way to say it, that we’ve become accustomed to in the last century. When we talk about always listening, sure, it’s a concern, and I think that we need to understand what is happening with the transcription of what’s being said to us, by us anywhere in our vicinity. People need to think about not just the things that might be embarrassing, but how information might be taken out of context, and for there we have to have very high accountability among the different platforms, the people that we give our access to, around what they could ultimately do with that.
Paul Sephton: So right now I might have a few different gateways to access my voice assistant. Let’s say I wake up and I use a smart speaker to access Alexa or Google or Siri or whichever voice assistant I’m on. Then I shift over to a smart phone at some point, maybe later on I’m using a true wireless headset and I’m using that as my audio gateway and then I hop into my car and on the commute I access things through the car. If we get to a point where our voice assistants are completely independent of devices and able to recognize us and our preferences absolutely anywhere, how will the patterns of usage on these devices and what devices we use to access our voice assistants start to change? Because I think one of the most exciting things about this is that we could end up relying less and less on screens, but still manage to have an entirely personalized experience through our voice assistants, regardless of where we go without needing to constantly look down.
Bret Kinsella: A few different things in there. So first thing is, I would say that when we think about devices, you have to understand that it’s not just that people prefer to do things on different devices, it’s that those devices often represent different contexts and different needs at any given time. So there are certain types of things that I’m doing in the car that I’m not doing when I’m in my living room or when I’m exercising. So I have different types of needs. I have different types of requests that I might have. They always talk about this idea of go, talk, listen. Or sometimes they say fun or… They have different types of things. But if you’re in the car, you either need to navigate, you need to communicate with someone or you want to be entertained. The first thing I would tell people is when you think about voice and you think about devices, think about the context that device is normally used in and that’ll determine where the use cases are going to be most common.
So that’s one thing. The other thing is you talk about this idea of taking your devices with you and I think wearables are really going to be the big story over the next several years, and voice is really a gateway for wearables to be more functional than they have been in the past. So right now wearables have been mostly collection or output devices as opposed to input devices, and that’s definitely changing. So if I think about headsets, if I think about watches, I think about fitness trackers, those types of things. They were collecting maybe some data, but they weren’t designed necessarily originally to be input devices. So what we’re seeing now is that voice allows them to easily become an input device with adding a microphone and you put a voice assistant behind it and all of a sudden you can do these really robust things. Where I see this going is I do see people wearing these things more and less focused on the screen. And maybe some of the screens that we deal with won’t be the screens we carry with us, but the screens that are around us and then we’ll be able to interact with them. But the command and control is going to be with the voice in the ears predominantly with the touch being a supplement.
I think wearables are really going to be the big story over the next several years, and voice is really a gateway for wearables to be more functional than they have been in the past
Paul Sephton: Now there’s a game changer. If we go from the point of giving our voice assistants commands, prompting them for facts or to perform simple tasks for us on our phones, to the point of being prompted by our voice assistants, having them come to life and offer to augment our lives, to a point, from an audio perspective. We talk about movies like Her at Jabra and the science fiction future of what voice could do for us. But just this week in the MIT review, there was an article, it was entitled Inside Amazon’s Plan for Alexa to Run Your Entire Future. And we’re talking about voice commands from Alexa, like, “I bought your movie tickets, would you like a dinner reservation near the theater?” Or perhaps, “Your flight’s been delayed by 40 minutes. Can I change the arrival time of your Uber?” How will the voice landscape change and when will we start to get this level of richness from our voice assistants, and how much of it do you think is quite a long way away and right now remaining in the realm of science fiction?
Bret Kinsella: First of all, there’s a difference between what I think of as dialogue and what I think of as relationship. So I have a relationship with my smartphone. My relationship, it is a tool that allows me to do a lot of things right. It’s not a human relationship in any way. There’s definitely data out there that shows that people anthropomorphize the voice assistants and give them human qualities and they feel like they have a relationship with it. But I will tell them they do not have a relationship with a machine, at least in any type of way that it would be a human relationship. Now, that doesn’t mean that we’re not going to dialogue and we’re not going to have back and forth and that’s only going to become more sophisticated. And one of the things that I like to talk to people about is voice assistants as response oriented tools or as tools with agency, and agency is just this idea that the voice assistants can do something on our behalf, that we grant them certain latitude to do things for us and we have this idea of an assistant versus an advisor.
Where I see this going is I do see people wearing these things more and less focused on the screen.
An assistant in general is going to be something where you’re going to say, “Please give me this,” and it’s going to give it back to you. This is the response oriented. Then we have this idea of an advisor where it might suggest things for you. “Okay, Paul, I’ve noticed that you like to buy a new pair of brown shoes every fall. And I did notice that your favorite brand is on sale. Would you be interested in that?” Now it doesn’t have agency yet. It only has the agency to suggest something to you. It doesn’t have agency to actually execute that transaction on your behalf. So we’re definitely going to move into that in terms of personalization. I think we’re a ways away from it, but there’s people that are definitely working on it.
The next stage of that is agency, where the assistants actually do things for you on your behalf, you don’t even know it. It just shows up at your door. It doesn’t ask you about the shoes, they just show up, and then you say, “Oh, look at these shoes, I guess I bought them, or my assistant bought them for me,” and you try them on and you love them or you don’t like them and you send them back. But it’s going to start doing these things for you. It’s going to be out in the world, and it’s going to then scale our ability to do more things. You know, this idea of having an assistant, it’s not that different than people had human assistants that would do these things for them. They’d have personal shoppers, they would have… It’s just more scalable because we can’t all have a human personal assistant to be our shopper or to manage our schedule or to arrange our next travel or vacation. We can have this digital assistant to do that and at a price point, frankly, that is going to be accessible to everybody. So in some ways it’s really democratizing this idea of of having assistants in our lives.
But it’s going to start doing these things for you. It’s going to be out in the world, and it’s going to then scale our ability to do more things. So in some ways it’s really democratizing this idea of of having assistants in our lives.
Paul Sephton: So we deal with both hardware and software, and it’s a conversation I often have in the corridors at Jabra, where we’re saying on the one hand for voice assistants to work properly, you need to have a good audio gateway, you need to have good microphones, and good voice isolation so that the tech and the software can understand what it’s being asked. On the other hand, the software has a lot more learning to do before it can properly have these rich voice experiences. Where do you stack up the software and hardware sides of the debate in terms of how quickly we will accelerate in this field?
Bret Kinsella: On the hardware side, the most important advances right now are being able to take the processing power that we now are relying largely on cloud services, so scalable server infrastructure and things like that, and be able to do that on device. This is important from this idea of always available. There’s some sense that particularly in urban, Western cities that you always have persistent broadband connectivity, which is largely true. So then the question is do you really need this on device? And the idea is yes you do if you want it persistent, always available. But there’s also certain types of transactions, certain types of things that you might want to do with your voice assistant that you don’t necessarily want to travel over public networks for whatever reason, and this edge AI is going to be important.
I think that’s the thing that we’re going to see the most significant advances from on a hardware side, and that involves both more efficient computing, it involves more computing power at the chipset level, it involves lower power consumption. However, I would say that the biggest advances that are going to be required in terms of leading to this future of agency is really going to be on the software side, and that’s going to be able to do better personalization. We’re starting to see some of this. We can do this today in terms of… The capabilities are all there, but nobody’s really doing it at scale.
Paul Sephton: We know that voice assistants will further influence us in our lives and we’ll use them more and more to help us through our days. What about the workplace, Bret because I could see this potentially having major benefits, whether it’s Microsoft’s Cortana or another voice assistant, having a major benefit for us from a productivity perspective if suddenly everyone overnight at work gets a really diligent personal assistant, will this end up saving businesses a lot of money?
Bret Kinsella: Yeah, there’s no question about it. I mean assistants in general… And they’re already saving money from these right, most obviously in contact centers, and we see this with the studies that some people are pretty happy, consumers are pretty happy to talk to assistants if they’re good. I think most of us have this… what we call baggage from having dealt with IVR systems which were not designed… we call them on the telephone and they’re not designed to to help us. They’re designed to put us into a queue to help the company more efficiently manage us by taking our time. Some of these conversational assistants, right now we have a webinar coming up with a company called Gridspace, which are really sophisticated and they can answer your question faster than a human. Touch and type interfaces are not going away. Let’s not think that they are. There are times when conversation is going to be better. Just like there’s times when touch is better than typing is. So there’s times when a personal face to face meeting is better. There’s time when a Slack message is better, so it’s one more tool, but I will tell you, I think people underestimate the gateway that voice provides to access more things from technology that we didn’t even know were there.
Touch and type interfaces are not going away. Let’s not think that they are. There are times when conversation is going to be better. Just like there’s times when touch is better than typing is. I think people underestimate the gateway that voice provides to access more things from technology that we didn’t even know were there.
Paul Sephton: The only potential red flag that I would see here is that we are mostly working in open offices these days. I think a lot of people are sitting in open offices and that is a world of distraction and interruption. If we’re talking to voice assistants all day, in addition to the calls we’re taking, the colleagues we’re talking to and that sort of thing, will that not make the open office a really, really tricky place to be productive in? Or how do you think workplace designs will change and adapt in reaction to the adoption of voice assistants ubiquitously across the board at work?
Bret Kinsella: The purpose behind the open office is not to be able to see everybody. It’s to be able to interact with each other. So I’ve yet to walk into an open office where there weren’t conversations going on. There are conversations going on between humans today. The idea that you might have conversations with an assistant doesn’t necessarily change it that much. I will say that I believe that headphones are going to become even more common because there’s voice input and then whatever the output is coming back, it probably will be a better way for the individual plus all their their peers. But I don’t really see this as being as big of a hurdle as most people think they are because I don’t know where these offices are that are open offices that are silent. I don’t think they exist. I think people think they exist, but they don’t exist. But it wouldn’t surprise me that office design changes. We’ve seen that. We went to open office. We didn’t have open offices 20 years ago. They were very uncommon. We went from closed offices with open for certain types of staff, personnel, to cubicles, to open office. Maybe we go back to something that we’ve had in the past? I don’t know.
Headphones are going to become even more common because there’s voice input and then whatever the output is coming back, it probably will be a better way for the individual plus all their their peers
Paul Sephton: Bret, thank you so much for joining us on the show today. Fantastic to hear from you and get these insights into where voice will be affecting us both in our personal and professional lives. I for one, cannot wait to see where the technology has the most impact, but I think the one thing that’s clear from this conversation is that there’s no doubt in how much impacted we’ll have and that it will come to affect all of us for the better.
Bret Kinsella: Paul, thank you so much and I will tell you that I am a longtime user of Jabra Sport Pulses. In fact, I’ve bought many over the years. I love him because I like to run in the mountains and I like to listen to podcasts while I’m running. So that’s the one that I’ve used there. It was great when you reached out because I’ve been a longtime user of Jabra.