Please note that Tell’s comments are not edited for brevity, as this is heady stuff and I didn’t want to do anything to muddle these complicated concepts.
Tavaglione: Is Gullfoss intended for most/all tasks, or particular applications? Would you say it’s advisable for individual tracks, sub-groups or primarily whole mixes?
Tell: We have not really finished exploring the usefulness of GF. It’s definitely useful on individual instruments, mix busses and the master stereo sum, but we’ve also cleaned up dialog and greatly enhanced stereo live recordings, especially those done with very simple equipment. You know how frustrating it is if you put your 2-track recorder on the table in your acoustic jam session and the recording sounds nothing like it was when you were there? GF fixes that amazingly well and restores that feeling of being right in the middle of it, so we certainly won’t limit the use cases to a few intended ones. As a scientist, I love to experiment and I can only recommend trying the same.
Tavaglione: How did you come up with this concept?
Tell: I was doing a lot of live sound for a band I was involved with. I usually received compliments for the sound, but I was never quite entirely happy myself. At some point, I realized that the problem was nothing I could solve with an EQ—or, in fact, any tool available at that time. I didn’t quite know what the solution could be until another question randomly came up in a discussion: “Why do waterfalls sound so pleasant?” You’re probably tempted to answer with “Because they sound like pink noise!” but that’s not really an answer. Why does pink noise sound pleasant then?
Related: Review: Soundtheory Gullfoss Intelligent EQ, by Rob Tavaglione, Pro Sound News, March 28, 2018
At this time, I was already deeply involved in researching auditory perception and we had been working on this particular model for many years, so it was obvious to try to use the model to see what pink noise does to it. We found that pink noise could even be improved upon, but also, more importantly, that the reason why it sounded good was very deep and could be generalized to a much more powerful concept that would also give an answer to the other question about what can be done with live sound to improve it beyond simple equalization.
The actual insight is difficult to formulate in simple terms, but it is essentially about information. What we found was that the amount of information reaching the brain, as modeled by our perception algorithm, could be maximized by dynamically equalizing the signal depending on its content. Maximizing this information translates to a more pleasing sound with more detail, clarity, spatial precision and presence. And it is exactly what GF does.
Tavaglione: How did you manage to model the psycho-acoustic perception and traits that we desire in our audio?
Tell: The model we use is based on an idea of mine that I came up with more than 15 years ago. In order to understand where I came from, it’s also important to understand that we are not really using psychoacoustics.
“Psychoacoustics” is the name of the discipline that describes acoustical perception by means of empirical methods. The methods usually involve listening tests, and produce tables listing the description of the perception of test subjects to certain sound stimulation. Such experiments are extremely difficult to interpret fundamentally and hard to evaluate because of the subjectivity of perception. Also, because hearing is very deeply nonlinear, these specific listening conditions and stimuli are not easily extrapolated to a more general auditory scenery. In other words, working with psychoacoustic methods is a lot of frustrating guesswork and bad approximation.
I was very aware of that back then and did not find the signal processing methods related to perception and, more generally, time-frequency processing very satisfactory. The academic physics research I was doing was all about quantum theory, but apparently auditory perception was always in the back of my head. The mathematical methods I had used and developed eventually inspired me to take a different approach to time-frequency analysis. This lead me to a formulation of time and frequency that was deeply about geometry and information, and, later, to a purely theoretical construct that describes perception as a process that, under the pressure of evolution, optimizes certain informational properties. The resulting computational perception model is therefore based on first principles with very few free parameters that depend on the actual physical realization of the human auditory system and can be estimated easily.
This is also why GF is all about information. We don’t apply any measures of aesthetics based on what music has been successful earlier. Instead, we try to please your brain in a very fundamental way and leave aesthetic decisions to the user.
Tavaglione: Is Gullfoss considered to be "artificial intelligence" or simply objective intelligence?
Tell: This is a difficult question. The modern understanding of artificial intelligence is very closely linked to machine learning, which does not find any application in our technology. As a theoretical physicist, I much prefer to understand every aspect of a model and work from first principles.
Training a black-box neural network with examples of what it should be doing can be impressively successful and recent progress in deep learning certainly gave a few stunning examples for that. However, it also has a number of problems. Machine learning methods often reproduce very well what they have learned, but then fail spectacularly at extrapolation. Another related problem is that of “overfitting” or favoring certain aspects over others. There is very little you can do about it other than training longer with more examples, but the selection of these examples already creates a bias for the result. So, in this context I would probably say no, GF is not artificial intelligence; it is insight and careful design.
Exhibiting at the 2018 NAB Show? Enter the NewBay Best of Show Awards!
Tavaglione: How can Gullfoss make so many adjustments so quickly without loss of phase cohesion or creating artifacts?
Tell: That was, in fact, a big problem in the development of GF. We’re clearly not the first who offer highly dynamic equalization changes, but I believe we do it better. A very obvious approach would be to use a linear phase response that can be easily created using known methods and then applied using time-varying fast convolution or FFT filtering, as it is often called. Such an approach would not really work for us. Linear phase introduces pre-ringing that is a lot more audible than post-ringing, because of temporal auditory masking, and the usual frequency response design methods are far too imprecise for our needs.
Alternatives include a graphic equalizer with a large enough number of bands that are controlled in real time. That also wouldn’t have worked for us for several reasons; lack of proper control over the magnitude frequency response is probably the most important one.
So we have developed an entirely new method that is based on a different approach. The basic idea is not to alter the signal in the first place, but to alter the perceived sound in the auditory model and then work backwards from there. This gives us unique filters that have just the right trade-off between pre- and post-ringing so that the alterations happen in a way that is compatible with perception. We can control those filters with extreme precision and high agility and still preserve a natural sound.
Tavaglione: Is this the end, or the beginning? That is, does Soundtheory hope to apply such intelligence to other audio processing functions? What might we expect from you in the future?
Tell: Oh, we’re definitely not stopping here! Our auditory perception model has many applications that we need to explore. And apart from that perception model, we also have a few more algorithms that may find use in a future product. Where exactly we are going has not been decided yet, but in the short term, we will invest more work into GF to make it even better; what comes after that remains to be seen.