Turn it up: The Future of Voice

Scott Morris

Scott Morris

Head of Team Augmentation

Reading time: 15 minutes

We’ve established that voice technology is no longer a thing of the future. Voice assistants are present in our environment through voice enabled speakers and smartphones. We have voice activated homes, with smart lights, smart fridges and smart washing machines we can speak to. Beyond the home, you can talk to Google in the car, or in the office just ask Amazon Alexa to start a conference call for you.

Voice AI is populating our environment, and the immediate future is a voice ambient ecosystem, with voice technology embedded in every device in our environment. Beyond that, the future of voice lies in its ability to enhance other technologies. The sci-fi fantasy of man-made Westworld or Hal 9000 may become a reality. With endless possibilities, what exactly does the future of voice hold?

The Current State

The current state of voice technology is an ever-changing one. After months of research, conversations with clients and the insights from our State of Voice Panel event, we uncovered some pressing truths. Those of which you can find in our previous blog. Yet, the clearest truth is that brands need to jump-start their voice engines now if they want to compete in the new, and very vocal, race to the top. Without a clear strategy, you risk your competitors overtaking you.

Voice assistants jump starting engine of car

As a first step, businesses should optimise for voice search. Voice search relies on voice-activated search apps installed on compatible devices, that then feed voice queries to intelligent search engines and connected apps. Voice search results provided by Google depend on two key things. The first, is how well content matches the user’s question, this is called Needs Met Rating. The second relates to content phrasing, this is called Speech Quality Rating. In less than 2 years from now, ComScore predicts that 50% of searches will happen through voice.

In general, search is getting more personal with voice and mobile search often providing answers based on location. The limited results given by voice search makes it more important than ever to have an optimised local presence.

Screenshot of search results for best software agency Surry Hills

At 4mation, we’ve done this by updating our business information across the web with our correct address and phone number. Our My Google Business listing is also as detailed as possible, with photos and reviews further adding to Google’s positive perception of us. When Google thinks highly of your business, you’re more likely to be given as a result to those searching for your product or service.

The Future of Voice Technology

The following will deep dive into 3 short to medium term outlooks and 3 long term predictions for voice technology. I’ve also included a TL:DR summary for both if you’re time-poor.

3 Short to Medium Term Outlooks


Your customers are using voice search to access businesses in real time. The rise of this technology in homes, offices and even cars, presents an opportunity for businesses to be there whenever the customer is looking.

6.5 million Australians drive to work (ABS, 2017). That’s a whopping 2/3rds of Australians driving to work, offering an opportunity for voice technology to help people on the go. Imagine sitting in traffic on the way to work and asking your digital assistant to book a dinner reservation for you and your partner that night. Maybe you need to order groceries too.

Many businesses still don’t have online booking systems and require people to call them. Customers will soon use Google Duplex to phone businesses on their behalf. Currently, Duplex’s contextual understanding limits its phone conversation ability to bookings and reservations only. Many introverts and the time-poor will use this new assistant feature to book their next hair appointment or dinner date.

Even simple things, such as starting their Spotify car playlist or finding the nearest petrol station. All these interactions and the user’s hands stay on the wheel, and off their phone. These are all micro-conversions that present an opportunity for brands to help make the lives of consumers easier.

Beyond local search, customers expect greater convenience from brands. For instance, Dominos created an Alexa skill to allow customers to place and track their order. Our Head Of Marketing, Regan McGregor explains,

“Domino’s is a perfect example of where convenience pays over price. If you’re able to order your favourite pizza quickly, then you’re able to leverage yourself as a more convenient brand, rather than compete on price, which is always a race to the bottom.”

Regan McGregor

In the short to medium term, customers will search for businesses wherever they are and whilst multitasking. Your company will be able to reach customers in cars through location-based targeting. You could also create custom skills and actions to answer your audience’s unique needs.


Users not only want to access your products and services faster, they also want immediate answers. So you’ve updated your My Google Business listing and ensured that customers are leaving you great reviews, what’s next? With an oversaturated market, a strong content strategy is key to standing out. “That’s great Julia, a strong content strategy sounds really specific, thanks”, I hear you say. Don’t worry gang, I’m about to explain what that means.

Firstly, it’s important to understand how the way customers search has changed. We’re seeing more people use voice search. Instead of typing a jumble of keywords in Google’s search bar, users will ask their phone or other voice-enabled devices in a conversational structure.

Building a strong content strategy for voice search requires a little bit of time and effort, but it’s important to lay down the groundwork early on. Your customer is the first thing to consider. Who are they? What are they searching for? How are they searching? Generate a list of the kind of long-tail keyword phrases they’re asking, then group these by structure and theme. That way when you start creating content to address these questions, they’ll be organised by their intended purpose.

graphic showing person, magnifying glass and computer

As you would write content at other times, remember to write for people. Keep your language conversational and it’ll work well with the natural language processing (NLP) that’s built into voice technology.

If you have a mature content strategy and want to take the next step, then you can do what Tide did and build out a custom Alexa skill or Google action. Amazon’s Nima Vadie summarised this well,

“So, an example I’ve seen from the U.S. is with Tide and they do carpet cleaning solutions, they built an Alexa skill to answer the question, ‘how can I get this stain out of my carpet?’ And you know, that had answers to over 200 different types of stains and I think that’s a good example, and a neat example of providing useful content for the user while still pushing your brand.”Nima Vadie

Julia’s top rules for voice optimised content:

  • Remember who you’re writing for
  • Make it useful
  • Keep it simple (conversational)
  • Use long tail keyword phrases

In the short, to medium term, customers will ask their voice-enabled devices specific questions and they will expect clear answers. Businesses can prepare for this by ensuring their online content is informative and conversational. Once you’ve got your content down pat, build it out into a custom skill or action to address your customers’ unique needs and wants.


We’ve established customers expect brands to be accessible and useful. They increasingly desire a more personal experience, too. Call it a byproduct of the millennial generation if you will, but brands need to step up and make their users feel special.

I don’t know about you, but it can be hard to feel ‘special’ when shopping, both in a physical store and an online one. Many online brands have realised this and have developed personalised experiences for customers. Marketing platforms now use artificial intelligence marketing (AIM) and machine learning to collect and analyse consumer data to create detailed consumer profiles.

As a business, once you have those consumer profiles, you’ll want to execute your personalisation strategy through automation and omnichannel efforts. Creating individual experiences with voice can be a lot of fun. Like your content strategy, put your customers at the centre and focus on their needs and wants.

In the short, to medium term, we’ll see an increase in more personal and more convenient shopping experiences.

For example, last year, Google and Target formed a partnership that allows customers to shop Target across the U.S. via Google Express, including by voice. Walmart also have a partnership with Google. It makes sense for Target and Walmart to partner with Google’s shopping platform to compete with Amazon, their common rival.

Shopper bot helping person shop

To create tailored experiences for customers, Target and Google share customer information so that the Google Express shopping platform offers personalised recommendations to the user.

TL;DR Short to Medium Term Outlook

  • Voice search use is increasing, by 2020, 50% of searches will happen through voice (ComScore) and 40% of adults already use voice for search on a daily basis (Forbes).
  • Customers will soon use Google’s Duplex to call businesses on their behalf
  • Customers expect specific answers to their specific questions in real time. Businesses will need to ensure that their content is voice search friendly.
  • An increased demand for personal experiences will require artificial intelligence marketing (AIM).

3 Long-term predictions

Vocal Blueprints

Voice recognition is now at the same level of human recognition with Google having a 96% accuracy rate. This will only get better as accents and speech impediments are factored into the equation. The next step after recognition is context. Algorithms will not only understand what you’re saying, but how you’re saying it.

Think of all the components that make up speech; speech pattern, tonal inflection, utterance, volume, pace and even aspects such as intent (sarcasm, malice, excitement) or words that are sung.

Voice technology is already able to reproduce your vocal blueprint with near-perfect accuracy. We have started to see this at a base level with Lyrebird’s technology. Businesses could use realistic voices to carve a unique voice for their brand. In action we’d see this technology applied to chatbots, assistants, audiobooks, hotlines and video games, just to name a few. No longer would your customers have to listen to generic, robotic sounding voices.

If you add contextual understanding to this, you could modify the vocal blueprints for emotional intent. Choose if you want to make the replicated voice sound angry, happy or even sarcastic. Alter the words to be sung, whispered, shouted etc.

Understandably, there is a lot of controversy surrounding this, with privacy concerns raised. However, it should be noted that this technology has the potential to empower people. Lyrebird is working with those who have Motor Neuron Disease to give them back their voice.

In the long- term, voice AI will be able to understand and replicate all the many components that make up vocal communication.

Simon Says

Voice AI will use context to understand us better than we know ourselves. Once a voice assistant understands how you say things and why, it’ll dynamically shift the way it responds to you.

Just as us humans mimic other humans we like and admire, so too, will our voice assistant counterparts, in the hope we’ll use them more often and more intimately.

Person speaking to voice assistant and then parrot repeating

The reason these bots will mimic us is a powerful one, and one that will be cleverly built into them with purpose. Research shows that mimicry can be used as a tool for influence by building social relationships, increasing affiliation, rapport and pro-social behaviour. The benefits don’t stop there, with mimicry also associated with greater persuasive effect and compliance towards the mimicker’s suggestions.

Keep in mind, voice is only one component of mimicry, so it is difficult to know to what extent of influence the technology would have on us at this stage.

If you’d like to know more about the power of mimicry, read the article Mimicry in Social Interaction: Its Effect on Human Judgement and Behaviour.

In the long-term, voice AI will dynamically change how it responds to users to generate different outcomes. This will offer businesses intimate insight into consumer behaviour. From these insights, it’ll become easier to target the right people. This will lead to hyper-personalisation.

Beyond Words

As well as recognising and understanding voice, the technology will use sound recognition AI to go beyond what we’re saying, by listening to our bodies and our environments. The AI that recognises these environmental elements is called sound AI.

Sound AI will enable devices to intelligently understand context, attention, presence, activity, security, and entertainment by being able to identify events and scenes (IDC, 2017). Understandably, this has the potential to help businesses in a wide range of industries.

Healthcare is the first industry that I see sound AI making a difference. Combined with voice AI, sound AI could drastically improve the lives of people.

The microphones that are used to analyse voice could be developed to listen to our biological activity. On a personal level, it could monitor our heart rate, respiratory behaviour, coughing, sneezing and a host of other human sounds. This could progress into the realms of diagnosis, AI knowing when we’re physically and mentally unwell by combining and processing a wide range of auditory information.

When we visit our doctor, this information could be presented to them, giving our health professionals a more accurate picture of our overall health. Combine that with auditory note taking and the consultation process is streamlined.

voice assistants using environmental awareness to understand you

In the long-term, sound and voice AI will work in synergy and be known as auditory AI. Sound AI tracking patients health by keeping tabs on vitals and recording their movement. Sound AI could feed this information to voice AI, reminding those with memory loss to take medication or to stick to recommended daily schedules. It could help those limited by their hands (arthritis, paralysis) use their phones and appliances without fiddly buttons or keypads, and help the visually impaired through voice-controlled interfaces.

TL;DR Long-Term Predictions

  • Voice AI will understand the context of what is said and also how it is communicated
  • Dynamic responses from Voice AI will lead to a new form of hyper personalised marketing
  • Auditory AI will be the most important long term achievement, a combination of voice and sound AI

Final thoughts

The future voice technology applications for business will be exciting to see. Whether it’s making your customers’ lives easier, or enriching product experiences, voice can help.

The short to medium term outlook for voice technology revolves around availability and utility. Customers expect businesses to be there when they need them, whether that’s whilst they’re at home, in the car, the office – or anywhere else for that matter. With an increase in voice searches and voice assistant use, the next step is personalisation. More customers and brands will use personal shopper bots to transform their physical and digital shopping experiences.

image of car, arrows, targeting and brain

Long term, 4mation predicts that Voice AI will develop to the point where it not only understands what users are saying, but the context around it. It will garner this information from deep learning all the components that make up vocal communication. Then, once the AI knows you intimately, it will dynamically respond to you in order to please you or to prompt certain reactions/behaviours.

Businesses will use this to their advantage and it will lead to a new, hyper-personalised form of marketing. Beyond words, sound AI will develop to understand the environments in which we inhabit. From there a new auditory AI will combine voice and sound to communicate with us and our world.

We may not fully know what the future of voice will hold, however, one thing is for sure, a future like Westworld is still a long way off if LG’s Cloi display at CES 2018 is anything to go by.

About The Author

Related Articles

Engagement models comparison

Not sure which engagement model to choose for your project? Here are some key points to help you decide.

Project sizeAnyAnyLarge
Project typeOne-offOngoingOngoing
Project requirementsDefinedFlexibleFlexible
Project management4mation4mationYou

About The Author

Think we could help you?

Contact us