Fast-changing voice technology is a fact in any industry, and it is very common to feel lost or overwhelmed. The mushrooming of tech start-ups filled the internet with pitches of indistinguishable-from-human, instant-success voices… But are they really?
With this article, we aim to educate and inform you by analysing these trending choices so you and your brand can leave competitors in the dust.
A synthetic voice is an artificially produced version of a human voice. Speech synthesis is just another form of information output where a computer reads words to you out loud in a real or simulated voice, played through the device’s speaker.
Example: voice-guided navigation, yes/no control commands on automated messages, speak out short messages for the visually impaired
Text-to-Speech is a synthetic voice technology that uses computerised means to convert digital text into human speech.
Example: Google Text-to-Speech
AI is an abbreviation of Artificial Intelligence. AI voices are a type of synthetic voices; however, they operate differently. This technology uses a type of artificial intelligence called “deep learning” (machine learning in artificial intelligence) to convert text into audible human-sounding speech but also has the capability to convert speech into text. In addition, AI technology can identify a person based on their voice command.
Example: voice assistants like Siri and Alexa, Amazon Transcribe
An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions.
Examples: Google’s Alexa, Apple’s Siri, Microsoft’s Cortana
The Ultimate Checklist:
Pros and Cons
- Voice-over is produced faster – even on the spot
- Lower starting cost
- Higher control during the creation and editing process
- Voice exclusivity is expensive – think twice if what you are offered is a voice everyone can have
- Lack of emotion
- Double-check flow, pronunciation, and accent
- Risk of sounding monotonous
- Frequent ambiguities (i.e., homographs)
- Missed contextual clues
- Small variety in sparsely spoken languages or local accents
- Lack of spontaneity
- Difficulty with acronyms, digit sequences, and abbreviations