Sound & Audio Industry Trends: AI, Machine Learning and DNN
AI is now being referred to as a disruptive technology for the media and its affiliated industries; however, the audio was perhaps its earliest adopter. In many ways, the sound and audio industries was definitely an early adopter of what we would describe today as artificial intelligence (AI).
Of course, sound and audio engineers like to be able to be in full control of the sounds they work with — whether it is making minor adjustments in level and EQ according to what they hear and their experience, or starting a whole project from scratch, it always comes down to the degree to which they can manipulate sound instead of trusting what an oscilloscope might be telling them.
But there will always be some duties and processes that are way more time-consuming — and even dull — , so if there’s anything that could make life easier by taking over these repetitive activities it has to be brought to the table.
Flying Faders became a pivotal element in music recording almost 40 years ago, allowing EQ settings and fader positions to be loaded into a system or computer directly connected to the mixing audio workstation and called up later down the line when required. Additionally, and although it might come across as a less flashy component, the work of the ubiquitous compressor, which allows engineers to control peak volumes and limit the dynamic range of vocal or instrument lines, are some of the early adoptions of what would later be defined as AI.
Controlling dynamic range and the use of compressors could be, to some extent, a definition of artificial intelligence because it is data-driven — it removes the need for an individual, or rather, the need for a human brain. Anything that is automated within this and many other industries can be referred to as AI. While the term has substantial scientific foundations, it has also become a one-site-fits-all phrase to describe things outside its original realm.
To complicate things a bit more, there’s also been certain confusion with other terms such as machine learning and digital neural networking. All three are often used interchangeably but they should not. Artificial intelligence as a term has always been around for several decades and within computer science is a big overarching category.
Some engineers, though, especially those who have argued that AI will not replace engineers in the near future, say they prefer to use the term machine learning because of its connotations of being a method of data analysis able to look at the information that has been extracted without working to a prescribed model.
In the same way, machine learning differs from digital neural networking because the latter is based on large amounts of training data derived from specific situations, such as recognizing particular types of vocal lines, accents or sounds such as traffic or environmental noises.
Digital neural networking, however, is one way of implementing machine learning but is too limited. Some audio and sound post-production studios first employed aspects of machine learning back in 2012 when the DNS 8 Live multi-channel dialogue noise suppressor was launched. This was able to get rid of background noise from speech amongst other features. This was followed by the DNS 2, a portable unit specially designed for location recording, capable of dealing with noises such as rain, traffic, wind, and more.
Machine learning was also used to recognize pieces of clean speech and the types of noises that could affect its quality. Other tools that rely on this method are able to identify vocals and specific instruments in a song and then carry out individual gain control of a particular element. It is also possible to both completely remove or isolate the vocal lines, something that is a popular feature on other tools used in the audio and sound industry.
A major benefit of machine learning and AI in audio is the possibility to speed up tasks and perform processes that humans cannot. It would be really hard to go deep into a track and remove something piece by piece, but thanks to these tools and methods we are now able to remove artifacts from speech for ADR, for example.
Research in machine learning and general artificial intelligence for the audio and sound industry will certainly continue to happen. Amongst the essential areas where these technologies will be applied we certainly find the ongoing development of smart speaker interfaces and AI-based intelligent signal processing capable of focusing particularly on the source and channel coding.
Perhaps in one or two decades, developers will look back at the developments that are taking place now and view the whole situation as people just opening a tiny window. But although machine learning, AI and all of these technologies will become more efficient over time, there’s also the possibility that in some years there might be a completely different development course from the one we’re following now.