If you click on a link and make a purchase we may receive a small commission. Read our editorial policy.

Battlefield devs’ multiplayer shooter The Finals is full of AI voices - and oof, you can tell

Embark Studios says that text-to-speech tech “gets us far enough in terms of quality and allows us to be extremely reactive to new ideas”

A character wearing a mask and tactical jacket poses with both arms up on a yellow wall in The Finals
Image credit: Embark Studios

The Finals, the cash-grabbing, destructible-scenery multiplayer FPS from former Battlefield devs, is currently in the middle of an open beta that allows anyone to go and give its shoot-y, smash-y, cash-y gameplay a look. You’ll also be able to give it a listen - and you might notice something a little off about the announcers for its game show-within-a-game when you do. That’s right: they’re AI-generated voices, not human actors.

Watch on YouTube

Embark Studios audio designer Andreas Almström confirmed the decision to use AI text-to-speech technology on a podcast episode about the making of The Finals back in July, which was recently spotted by Gianni Matragrano. Matragrano is himself an actor for video games, having appeared in Genshin Impact, Evil West, Trepang2 and more.

“So here’s the kicker: what did the voiceovers?” Almström replied when asked by the podcast host about who provided the voiceovers in The Finals. “The thing is, we used AI with a few exceptions.”

Almström explained that “all the contestant voices, like the barks, and both of our commentators are AI text-to-speech”, with “things we call vocalisations” - including the breathy noises and grunts made by player characters when running, vaulting and jumping - provided by Embark’s own devs. Not for lack of trying, mind: “We can’t really get the AI to perform those kinds of tasks, yet,” Almström said.

Almström claimed that the decision to use AI-generated voices came down to the technology’s ability to provide voices that get close enough to sounding human, while being able to be generated in a relatively short time compared to getting human actors in the booth.

“The reason that we go this route is that AI text-to-speech is finally extremely powerful,” Almström said. “It gets us far enough in terms of quality and allows us to be extremely reactive to new ideas and keeping things really, really fresh.”

The aim of getting “far enough” can be heard fairly clearly in a clip of The Finals’ announcers shared by Matragrano, with notably strange stressing of certain words and a bizarre flow to sentences - listen to “the team that tucks away enough money first triumphs” from 0:08 in the clip below, where the lack of a pause in “first triumphs” almost blends into a single word. Unreal Tournament, this ain't.

“If it sounds a bit off, it still blends kinda well with the fantasy of the virtual game show aesthetically,” Almström added in the podcast, seemingly heading off criticism of the uncanny AI voice performances. Personally, I’m not convinced - it doesn’t sound like a futuristic virtual announcer, just a text-to-speech programme that doesn’t know how to approach simple words in the way a regular human would, let alone a professional actor.

Matragrano questioned Almström’s claim that it takes “months” to record voiceover for a new game mode created by a designer - something Almström said took a “matter of hours” with AI - retorting that human actors are accustomed to recording higher-quality sessions within a day or two.

Of course, that’s without pointing out the obvious ethical concerns surrounding the process of AI generation and its use in place of paid actors - something that has been a key point in the recent SAG-AFTRA strikes, as the union voiced concern around digital replicas of actors being used without informed consent or suitable payment.

“You can literally get pro-grade VO for less than a grand total, bang out a couple recording sessions and bam you have all the audio you need,” Matragrano said. “We actually make it very easy. And then it'll just sound good and not be something even players who don't really care about AI ethics keep complaining about.”

A player character from The Finals holds a sub-machinegun while wearing a smiley face mask on top of her head as an audience of silhouettes looks on.

In another recent case of AI being used in a video game, CD Projekt Red used the tech to recreate the voice of late Polish actor Miłogost Reczek - with the permission of the actor’s family - for the reappearance of Cyberpunk 2077’s ripperdoc Viktor Vektor in this year’s expansion Phantom Liberty. Elsewhere, God of War Ragnarok used similar tech to de-age dialogue recorded by Atreus actor Sunny Suljic to match his earlier prepubescent tone.

Recreating the voice of a deceased actor - even with their permission - and transforming dialogue originally performed by a human still feels notably different to replacing human performers entirely in your game, but it’s a situation that Almström at least expects to become more and more commonplace.

“We’re really coming into, like, a new dawn when it comes to video game voices,” the audio designer said.

Regardless of what you think of The Finals’ AI voices, one way or another, for better or worse, that is undoubtedly true.

Rock Paper Shotgun is the home of PC gaming

Sign in and join us on our journey to discover strange and compelling PC games.

In this article
Follow a topic and we'll email you when we write an article about it.

The Finals

Video Game

Related topics
About the Author
Matt Jarvis avatar

Matt Jarvis

Contributor

After starting his career writing about music, films and video games for various places, Matt spent many years as a technology, PC and video game journalist before writing about tabletop games as the editor of Tabletop Gaming magazine. He joined Dicebreaker as Editor-In-Chief in 2019, and has been trying to convince the rest of the team to play Diplomacy since.

Comments