Artificial Intelligence Solved This Audio Illusion. Can You?

The Cocktail Party Effect is an auditory phenomena that, really, humans shouldn't have to solve – normally we can automatically separate different sounds and voices through our selective attention. But it's hard, right? Well Artificial Intelligence found it even harder – a machine found it difficult to determine the difference between audio tracks.

AIRED: December 05, 2017 | 0:04:36

And Freeze. Now, did you hear what I was saying?

Clearly enough that you could, say, write it in dow the comments?

If so, you just experienced a phenomenon called

The Cocktail Party Effect.

You can hear me while there's people talking right next to us or if there's a jazz band

across the room.

This is because of selective attention - our ability to focus on one particular thing while

tuning out our surroundings.

And it's the same effect that allows us to separate the vocals from the background

music in a song

This comes so naturally to us, but machines find these tasks extremely hard.

To a machine, a voice singing is just another track in a song that isn't easily discernible

from the piano track or the violin track or the harmonica track.

So how do you train a machine to separate voices at a party or vocals from a song like

people can?

Well, the answer lies in algorithms and lots of data.

Recently, researchers developed an algorithm that can identify the vocals in multiple songs.

And this is thanks to breakthroughs in machine learning - a method used in artificial intelligence

to allow machines to learn by analysing data.

To do so, researchers used a deep neural network - these networks are software inspired by

how our brain works.

They can learn using a method called deep learning, a kind of machine learning technique

that works through a series of layers.

An input layer, an output layer and middle hidden layers.

These hidden layers are where the magic happens.

And to train an artificial neural network, you have to feed them a ton of data - just

like us, the more they know, the better they can learn.

So researchers trained their neural network by giving it 50 songs.

They let the neural network try to separate the vocals and the non-vocal components (the

other instruments), and compare its results with the correct answer - which is the particular

song already separated into the different components.

Every time the neural network gets closer to the correct result, it's rewarded.

So it improves with each run.

It was then tested with 13 new songs, and it correctly separated the vocals from the

background music in each one.

It taught itself to tell the vocals apart from the other instruments.

What separates deep learning from previous types of machine learning is this layered

structure, which is modelled specifically after the cortex, the wrinkly outer layer

of the brain.

It's the part responsible for higher-order brain function like sensory perception, cognition,

spatial reasoning and language.

Basically it's the part that makes you...

different from a lizard.

It's made up of 6 layers, and different aspects of processing happen at each level.

For example, when you see an apple, the first layer might identify the color red, the second

layer detects the round edges, and so on until finally the last layer puts it all together

and says hey, that's an apple!

Deep learning software tries to imitate this hierarchical structure of neurons in the cortex.

The first few layers of a deep neural network learn to identify simple patterns, like single

units sounds.

The next layers learn to recognize more complicated patterns, like words.

Eventually, the result is that extremely complicated patterns like the entire vocals of a song

can be recognized and distinguished from the other instruments.

This layered process is at the heart of deep learning's success.

Starting with simple ideas and making them become a more and more like a generalized

concept seems to capture something fundamental about intelligence.

Humans used to have a clear advantage in pattern recognition, but in 2015 a deep neural network

beat a human at image recognition for the first time.

This means we're able to make better and more sophisticated machines that can master

tasks we thought were unique to humans.

Machines are helping doctors make better diagnoses and robots are learning to cook by watching

YouTube videos.

And when a robot can learn to cook by watching YouTube videos - that makes you question

what it really means to be human


  • ios
  • apple_tv
  • android
  • roku
  • firetv