close
close

Gottagopestcontrol

Trusted News & Timely Insights

OpenAI’s new voice mode lets me talk to my phone, not at it
Michigan

OpenAI’s new voice mode lets me talk to my phone, not at it

I’ve been playing around with OpenAI’s enhanced voice mode for the past week, and it’s the most convincing taste of an AI-powered future I’ve had yet. This week, my phone has laughed at jokes, brought them back to me, asked me how my day was, and told me it was “having a great time.” I’ve been talking to my iPhone, not using it with my hands.

OpenAI’s latest feature, currently in limited alpha testing, doesn’t make ChatGPT any smarter than before. Instead, Advanced Voice Mode (AVM) makes communication more user-friendly and natural. It creates a new interface for using AI and your devices that feels fresh and exciting, and that’s exactly what scares me. The product had a few bugs and the whole idea totally scares me, but I was surprised by how much fun I actually had using it.

Taking a step back, I think AVM fits into OpenAI CEO Sam Altman’s broader vision of changing the way people interact with computers, alongside agents, with AI models at the center.

“At some point, you’ll just ask the computer what you need, and it will do all those tasks for you,” Altman said during OpenAI’s Dev Day in November 2023. “These capabilities are often referred to as ‘agents’ in the AI ​​field. The benefits of this will be enormous.”

My friend, ChatGPT

On Wednesday, I tested the biggest benefit of this advanced technology that I could imagine: I asked ChatGPT to order at Taco Bell the way Obama would.

“Uh, to be clear, I’d like a Crunchwrap Supreme and maybe some tacos to top it off,” said ChatGPT’s enhanced voice mode. “How do you think he’d handle the drive-thru?” said ChatGPT, then laughed at his own joke.

Screenshot: ChatGPT then transcribes the oral conversation.

The imitation also really made me laugh as it matched Obama’s typical cadence and pauses. However, it stayed in the tone of the ChatGPT voice I chose, Juniper, so it couldn’t really be confused with Obama’s voice. It sounded like a friend doing a bad imitation and understood exactly what I was trying to achieve and even said something funny. I found it surprisingly entertaining to talk to this advanced assistant in my phone.

I also asked ChatGPT for advice on how to handle a problem involving complex human relationships: I wanted to ask my significant other to move in with me. After explaining the complexity of the relationship and the direction of our careers, I received some very detailed advice on how to proceed. These are the kinds of questions you could never ask Siri or Google Search, but now you can with ChatGPT. The chatbot’s voice even had a slightly serious, gentle tone when responding to these requests; a stark contrast to the joking tone of Obama’s Taco Bell order.

ChatGPT’s AVM is also great at helping you understand complex topics. I asked it to break down items in an earnings report — such as free cash flow — in a way that a 10-year-old would understand. It used a lemonade stand as an example and explained several financial terms in a way that my younger cousin would totally understand. You can even ask ChatGPT’s AVM to speak more slowly to meet you at your current level of understanding.

Siri left so AVM could run

Compared to Siri or Alexa, ChatGPT’s AVM is the clear winner, thanks to faster response times, unique answers, and its ability to answer complex questions that the previous generation of virtual assistants couldn’t. However, AVM doesn’t fare as well in other ways. ChatGPT’s voice feature doesn’t let you set timers or reminders, browse the web in real time, check the weather, or interact with APIs on your phone. At least for now, it’s not an effective replacement for virtual assistants.

Compared to Gemini Live, Google’s rival feature, AVM seems to have a slight advantage. Gemini Live can’t do imitations, doesn’t express emotions, can’t speed up or slow down, and takes longer to react. Gemini Live has more voices (ten compared to OpenAI’s three) and seems to be more up-to-date (Gemini Live knew about Google’s antitrust decision). Notably, neither AVM nor Gemini Live sing, likely an attempt to avoid clashes with copyright lawsuits from the record industry.

However, ChatGPT’s AVM has a lot of glitches (and to be honest, so does Gemini Live). Sometimes it’ll stop mid-sentence and then start over. It also has this weird grainy voice here and there that’s a little unpleasant. I’m not sure if this is a problem with the model, the internet connection, or something else, but these technical flaws are somewhat to be expected in an alpha test. However, the issues hardly detracted from the experience of literally talking to my phone.

These examples are, in my opinion, the beauty of AVM. The feature doesn’t make ChatGPT omniscient, but it does allow people to interact with GPT-4o, the underlying AI model, in a uniquely human way. (I would understand if you forgot that there isn’t a person on the other end of your line.) It almost feels like ChatGPT is socially aware when it talks to AVM, but of course that’s not the case. It’s simply a bunch of neatly packaged predictive algorithms.

Talking about technology

Frankly, this feature worries me. It’s not the first time a tech company has offered companionship on your phone. My generation, Generation Z, was the first to grow up with social media, where companies offered connections but instead played on our collective insecurities. Talking to an AI device – as AVM seems to be offering – seems to be the evolution of social media’s “friend in the phone” phenomenon, offering cheap connections that scratch at our human instincts. But this time, it completely excludes humans from the loop.

Artificial human connections have become a surprisingly popular use case for generative AI. People now use AI chatbots as friends, mentors, therapists, and teachers. When OpenAI launched its GPT store, it quickly became flooded with “AI girlfriends,” chatbots specialized to act as your significant other. Two researchers from MIT Media Lab warned this month to prepare for “addictive intelligence,” or AI companions with dark patterns that keep people hooked. We could be opening a Pandora’s box of devices finding new, tantalizing ways to capture our attention.

Earlier this month, a Harvard dropout rocked the tech world with the announcement of an AI necklace called Friend. The wearable device – if it works as promised – is always listening and the chatbot will send you text messages about your life. While the idea seems crazy, innovations like ChatGPT’s AVM give me reason to take these use cases seriously.

And while OpenAI is ahead of the game here, Google isn’t far behind. I’m confident that Amazon and Apple are also vying to build this feature into their products, and soon it could become the standard for the industry.

Imagine asking your smart TV for a very specific movie recommendation and getting exactly that. Or telling Alexa exactly what cold symptoms you have and having her order you tissues and cough syrup from Amazon while recommending home remedies. Maybe you could ask your computer to plan a weekend trip for your family instead of manually Googling everything.

Of course, these measures require major advances in the world of AI agents. OpenAI’s effort on this front, the GPT Store, seems like an overhyped product that is no longer a focus for the company. But AVM at least takes care of the “talking to computers” part of the puzzle. Those concepts are still a long way off, but after using AVM, they seem much closer than they did last week.

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *