The Five Main Components of Voice & Audio –

Art by Juj Winn

Audio and voice are on a healthy and steady rise, and on first thought I find that an interesting phenomenon. I find it interesting because I think it goes against the always-more ethos. More modalities, more channels, more features, more data and more choices. And yet this medium seems to offer less: no colors, no shapes, no faces, no buttons to press, nothing to swipe, nothing to type, no windows to open or close – only sound heard and sound spoken. So, who on earth is going to be attracted to something that gives us not more, but less than we are used to getting?

Here are the top five constituencies that are excited, or at least should be excited, about voice and audio.

The creator
Unless you’re a graduate student locked in the library all day, reading reams of text and churning out paragraph after paragraph of speech, you’re probably consuming and creating at least an order of magnitude more information by listening and speaking than by reading and write. In fact, we start speaking much earlier than we write, not to mention becoming proficient writers that we can effortlessly express ourselves clearly and concisely through text. For us humans, speaking and listening are more than modalities of expression: they are a fundamental manifestation of our humanity. But beyond that, capturing high quality audio or capturing a voice stream to be processed by software is much less expensive than capturing and managing high definition video.

What’s also compelling and exciting about audio is how seamlessly it blends unobtrusively and minimally disruptively into the flow of a naturally occurring event. Whether you’re recording an interesting conversation or capturing a thought or insight, one click on your smartphone is all it takes to start the recording and that’s it. No tripods to set up, no lights to set up, no “wait, let me get this hair under control” before the shoot starts.

The Advertiser
For the advertiser, audio, whether linear or on-demand, whether heard through a smartphone, smart speaker, or earbuds like AirPods, is a medium that almost guarantees that the consumer of an ad will hear the entire ad. You can take your eyes off a magazine ad, mute it, turn your back on a TV ad or change channels, skip a Google ad, or mute it with a click on your tablet, but if you’re listening and your eyes or your hands (or both) are busy, you don’t have an easy way to skip an ad (if you can skip it, that is). You must endure, and we usually do. And that, like it or not, is music to the ears of the advertising men and women out there.

The physically busy
Whether you’re driving, combing your hair, sunbathing, keeping your eyes closed, holding a piña colada or trimming your flowers, shining your shoes, folding your clothes, showering, polishing your nails, washing your dog, voice and audio are your go-to interface to get things done (add something to your shopping list, light up some music) and consume information (ask for the time, get the answer to a question that just popped into your head is while enjoying the sun). Try doing this with a surface while your eyes are busy (or closed) or your hands are busy (or dirty) and let me know how much you enjoy the experience.

The physically handicapped
Through my work with the AARP Foundation, I have often had the opportunity to experience firsthand how liberating smart speakers are for us humans as we age and gradually (and sometimes very quickly and suddenly) lose our freedom and ability to move our will on the world around us. Simple things we never thought about suddenly become impossible tasks or very difficult: turning the lights on and off, tuning in to our favorite radio station, setting a timer, getting the answer a question (“When is the next Orioles game?”). ). We don’t wish anyone to always have someone on hand to help with such “simple” things. And even when we have someone around who can help us, unless it’s such a person’s job to fulfill those requests, we feel reluctance to be so imposing—sometimes even when the person we ask for help is paid be on our cue and call.

In other words, language and audio give us the opportunity to maintain our freedom and dignity as we age and become less able. Yes, it is certainly not healthy to rely entirely on yourself: seeking and receiving help from other people is part of a healthy social life. But it’s tragic to rely entirely, or almost entirely, on other people to do simple things all the time.

Perhaps the only group of speech and audio — and particularly smart speakers and far-field voice assistants — who may be most fundamentally affected by technology, and being early in life, with the most enduring consequence, are members of the alpha generation. The impact of audio has already begun with Generation Z, who, according to studies, “not only hear music, but live it, to amplify it or to escape from their surroundings”. But for the Alpha generation, the kids growing up with Amazon Echo and Google Assistant as a fact of life (they don’t know a world without smart speakers), the ever-available smart speaker has allowed them to do things themselves before they read or write or type. Yes, they could tap and swipe with their smartphone when they were two years old, but they couldn’t search and explore themselves — say, find games or answer questions like “What’s the biggest dinosaur?” or “Who lives the longest?” These smart speakers can even answer the infamous “why” questions toddlers are adept at asking, such as: B. “Why do we need to drink water?” and “Why do dogs bark?” and “Why do bees like flowers?”. In other words, this generation’s smart speaker represents independence, the ability to explore the world on your own without having to rely on or wait for adults – those big people who know everything and can do everything – or (God forbid) older siblings to be around. And this – a very young person who is not suffering under anyone’s yoke – is a new and wonderful way for me to start in life.

dr Ahmed Bouzid is CEO of Witlingo, a McLean, Virginia-based startup that develops products and solutions that enable brands to connect with their customers and prospects through voice, audio and conversational AI. Before Wittlingo, Dr. Bouzid Head of Alexa’s Smart Home Product at Amazon and VP of Product and Innovation at dr Bouzid holds 12 patents in the field of speech recognition and natural language processing and has been recognized as a Speech Luminary by Speech Technology Magazine and one of the Top 11 Speech Technologists by He is also an ambassador for the Open Voice Network, leader of the Social Audio initiative and a writer at Opus Research. You can find some of his articles and media appearances here and here. His new book The Elements of Voice First Style, co-authored with Dr. Weiye Ma, is scheduled for release by O’Reilly Media in early 2022.

‹ Vendors That Matter series by Opus Research: RedRoute

Categories: Conversational Intelligence, Smart Assistants, Articles

Comments are closed.