By Hyunsu Yim
SEOUL (Reuters) – In a dimly lit recording studio in Seoul, producers at the K-pop music label that brought the world hit boy group BTS are using artificial intelligence to meld a South Korean singer’s voice with those of native speakers in five other languages.
The technology enabled HYBE, South Korea’s largest music label, to release a track by singer MIDNATT in six languages – Korean, English, Spanish, Chinese, Japanese and Vietnamese in May.
Some K-pop singers have released songs in English and Japanese in addition to their native Korean, but applying the new technology for a simultaneous six-language release is a global first, according to HYBE, and could pave the way for it to be used by more popular acts.
“We would first listen to the reaction, the voice of the fans, then decide what our next steps should be,” said Chung Wooyong, the head of HYBE’s interactive media arm in an interview at the company’s studio.
Lee Hyun, 40, known as MIDNATT, who speaks only limited English and Chinese in addition to Korean, recorded the song “Masquerade” in each language.
Native speakers read out the lyrics, and later the two were seamlessly combined with the help of HYBE’s in-house AI music technology, Chung said.
The song is the latest sign of the growing influence of AI in the music industry at a time when the Grammy Awards have introduced new rules for the technology’s use and AI-generated mash-ups of songs are flooding social media.
“We divided a piece of sound into different components – pronunciation, timbre, pitch and volume,” Chung said. “We looked at pronunciation which is associated with tongue movement and used our imagination to see what kind of outcome we could make using our technology.”
In a before-and-after comparison shown to Reuters, an elongated vowel sound was added to the word “twisted” in the English lyrics, for example, to sound more natural while no detectable change was made to the singer’s voice.
Using deep learning powered by the Neural Analysis and Synthesis (NANSY) framework developed by Supertone makes the song sound more natural than using non-AI software, Supertone chief operating officer Choi Hee-doo said.
HYBE announced the 45 billion won ($36 million) acquisition of Supertone in January. HYBE said it planned to make some of the AI technology used in MIDNATT’s song accessible to creators and the public, but did not specify if it would charge fees.
‘IMMERSIVE EXPERIENCE’
MIDNATT said using AI had allowed him a “wider spectrum of artistic expressions.”
“I feel that the language barrier has been lifted and it’s much easier for global fans to have an immersive experience with my music,” he said in a statement.
While the technology is not new, it is an innovative way to use AI in music, said Valerio Velardo, director of The Sound of AI, a Spain-based consulting service for AI music and audio.
Not only professional musicians but also a wider population will benefit from AI music technology in the long term, Velardo said.
“It’s going to lower the barrier of music creation. It’s a little bit like Instagram for pictures but in the case of music.”
For now, HYBE’s pronunciation correction technology takes “weeks or months” to do its job but when the process speeds up, it could serve a wider range of purposes such as interpreting in video conferences, said Choi Jin-woo, the producer of MIDNATT’s “Masquerade” who goes by the name Hitchhiker.
(Reporting by Hyunsu Yim; Additional reporting by Daewoung Kim and Hyun Young Yi; Editing by Josh Smith and Jamie Freed)