Skip to content

Insights Gained During the Development of an AI-Powered Opera

Prepare to be enlightened! Delve into the intricate process of orchestrating an AI Opera on a renowned German stage. Ever pondered about the challenges and insights behind such a groundbreaking event? Read on to discover the valuable insights gleaned from our production of [...]

Insights Gained from Crafting an Artificial Intelligence Opera
Insights Gained from Crafting an Artificial Intelligence Opera

Insights Gained During the Development of an AI-Powered Opera

==================================================================================================

The AI opera "Chasing Waterfalls," staged at Semperoper Dresden, Germany, in September 2022, is a groundbreaking production that merges art and technology. This opera, symbolizing surreal, immersive landscapes symbolized by waterfalls, captures a poetic edge of exploration driven by rapid AI development.

The Singing-Voice-Synthesis (SVS) system used for this opera is based on HifiSinger and DiffSinger. The project team, tasked with synthesizing a convincing opera voice for the AI, enlisted the services of T-Systems MMS.

The creation of the SVS system involved training AI models on royalty-free compositions recorded by a professional opera singer. The final training for the transformer acoustic model took 20 hours, while the diffusion decoder required 30 hours. Pretraining the vocoder took 120 hours, and fine-tuning it took another 10 hours.

However, the level of control over what the model synthesizes was a challenge. While the artists wanted some degree of control, a sophisticated control mechanism exceeded the scope of the project. Instead, the Global Style Tokens (GSTs) were employed, delivering reasonable results that satisfy the requirements of changing something, despite the level of control being lower than desired.

The dataset for the project consisted of 56 pieces, summing to 3 hours and 32 minutes of audio. The requirement to synthesize at least 16-second snippets during inference led to the use of local attention in the decoder. This improvement, coupled with the use of local attention in the encoder, resulted in another improvement in subjective quality.

The copyright law for music is unclear, and it is uncertain who would be the copyright-eligible owner of the model outputs and model itself. This remains an open law question.

The opera itself was composed for six human singers and one AI voice, which perform together with a human orchestra and electric sound scenes. The AI character is supposed to compose for itself in one scene of the opera.

The project was undertaken on a machine equipped with 2 A-100 GPUs, 1TB RAM, and 128 CPU cores, and the duration of the project was from November 2021 until August 2022, with the premiere in September.

In conclusion, the AI opera "Chasing Waterfalls" is a testament to the potential of AI in the realm of music and art. While the specifics about the opera and its SVS system creation remain unaddressed in the available sources, it serves as a stepping stone towards a future where AI and art intertwine to create captivating experiences.

Technology played a significant role in the creation of the AI opera "Chasing Waterfalls," as it incorporated various AI systems to synthesize the opera voice and compose one scene of the opera. The entertainment industry may further explore the use of technology in music, as the AI-generated opera demonstrated its potential to create captivating experiences.

Read also:

    Latest