NOVA

S48 E101 | FULL EPISODE

Search Engine Breakdown

Why does a widely used internet search engine deliver results that can be blatantly racist and sexist? Two leading information researchers investigate their discoveries of hidden biases in the search technology we rely on every day.

AIRED: April 14, 2021 | 0:20:37
ABOUT THE PROGRAM
TRANSCRIPT

REPORTER: As misinformation and so-called fake news continues

to be rapidly distributed on the internet,

our reality has become increasingly shaped

by false information.

Many people don't know the difference between

something real and something created to deceive them.

SAFIYA NOBLE: I spent about 15 years in advertising and marketing,

and while I was there, Google arrived on the scene.

I understood the transformative effect that this search engine

was having in helping us curate

through all kinds of information.

But I was surprised, having just left advertising,

that everybody was thinking about Google

as this new public trusted resource,

because I thought of it as an advertising platform.

Most people who use search engines believe

that search engine results are fair and unbiased.

The public, and especially kids and young people,

use search engines to tell them the facts about the world.

One weekend, my nieces were coming over to hang out,

and I was thinking, "Oh, let me pull my laptop out

"and see if I can find some cool things

for us to do this weekend."

I just thought to type in "Black girls,"

and the whole first page

of search results was almost exclusively pornography

or hyper-sexualized content.

In 2012, I started to see some of the results changing.

Google had started to suppress the pornography

around Black girls.

Unfortunately, still today, we see pornography

and a kind of hyper-sexualized content as the primary way

in which Latina and Asian girls are represented.

"What makes Asian girls so attractive," "Asian fetish,"

"hot ladies from Asians," "see who we rank number one in 2020,"

"tender Asian girls," "meet world beauties."

This is the study that was done by the Markup

that replicated my study from ten years ago.

They found that Black girls, Latina girls, and Asian girls,

those phrases were-- look--

so profoundly linked with kind of adult content.

Zero for white girls, zero for white boys.

There are so many racial stereotypes

and gender stereotypes that show up in search results.

What about actual girls and children

who go and look for themselves in these spaces?

It's very disheartening.

When women become sex objects in a space like this,

it's really profound, because the public generally relates

to search engines as kind of fact checkers.

Before we were so heavily reliant upon a database,

we used something like a card catalogue.

We didn't rank content, it was alphabetical.

It also might be by subject.

It's a summary of the organization system

we call the Dewey Decimal System.

NOBLE: Now when we're in a subject, we know there is

a lot in relationship to that one item

that we might be looking for.

We might go look for a book in the stacks, for example,

and find that there's hundreds of books around that one

that tell us something about that book,

and we might serendipitously find all kinds of other bits

of information that are amazing.

But we can see a little a bit more about the logics of that.

We don't understand the logics of how certain things

make it to the first page in a search.

Google has a very complicated

and nuanced algorithm for search.

Over 200 different factors go into

how they decide what we see.

Of course, they're indexing about half

of all of the information that is on the web,

and even that is trillions of pages.

AD ANNOUNCER: Billions of times a day,

Google software locates all the potentially relevant results

on the web, removes all the spam, and ranks them

based on hundreds of factors, like keywords, links, location,

and freshness-- all in, oh, 0.81 seconds.

NOBLE: The whole premise of a search engine is to categorize

and classify information.

A lot of the content that comes back to us on the internet,

it's in a cultural context of ranking.

We know very early what it means to be number one,

so ranking logic signals to us

that the classification is accurate,

from one being the best

to whatever is on page 48 of search,

which nobody ever looks at.

(keyboard clacking)

Part of what it's doing is picking up signals

from things that we've clicked on in the past,

that a lot of other people have clicked on,

things that are popular.

So an algorithm is, in essence, a decision tree.

If these conditions are present,

then this decision should be made.

And the decision tree gets automated

so that it becomes like a sorting mechanism.

Google's very reliable for certain types of information.

If you're using it in this kind of phone book fashion,

it's fairly reliable.

But when you start asking a search engine

more complex questions, or you start looking for knowledge,

the evidence isn't there that it's capable of doing that.

It's this combination of hyperlinking,

it's a combination of advertising and capital,

and also what people click on

that really drives what we find on the web.

This is where we start falling into trickier situations,

because those who have the most money are really able

to optimize their content better than anyone else.

There have been great studies about the disparate impact

of what a profile online says about who you are.

LATANYA SWEENEY: I was the first

African American women to get

a PhD in computer science at M.I.T.

So, I visit Harvard.

I'm being interviewed there by a reporter,

and he wants to see a particular paper that I had done before.

So, I go over to my computer,

I type in my name into Google's search bar,

and upward pops this ad implying I had an arrest record.

He says, "Ah, forget that article.

Tell me about the time you were arrested."

I said, "Well, I have never been arrested."

And he says, "Then why does your computer say

you've been arrested?"

So I click on the ad, I go to the company to show him

not only did I not have an arrest record,

but nobody with a "Latanya Sweeney" name

had an arrest record.

And he says, "Yeah, but why did it say that?"

If you type in the name "Latanya"

in the Google image search,

you can see a lot Black faces staring back.

Whereas if I type "Tanya,"

I see a lot of white faces staring back.

So we get the idea that there are some first names

given more often to Black babies than white babies.

So, I then took a month

and I researched almost 150,000 ad deliveries

around the country, and I found that if your name was given

more often to white babies than Black babies,

the ad would be neutral.

And if your first name was given more often

to Black babies than white babies,

you were 80% likely to get an ad

implying you had an arrest record,

even if no one with your name

had any arrest record in their database.

NOBLE: One specific way that algorithms discriminate

is that they just are too crude.

The idea of if x, then y,

if you have this type of name,

it means you're automatically associated with criminality.

That blunt, crude kind of association,

that is the staple logic of how algorithms work.

The types of bias that we find on the internet are often blunt.

We are being profiled into similar groups of people

who do the kinds of things that we might be doing,

and we're clustered and sold as a cluster to advertisers.

And so there's certainly a commercial bias.

But we also have the bias of the people

who design the technologies.

To think that technologies will be neutral or never have bias

is really an improper framing.

Of course there will always be a point of view

in our technologies.

The question is, is the point of view in service of oppression?

Is it sexist? Is it racist?

SWEENEY: Here I was, a passionate believer in the future

of equitable technology, and if the people,

when they were hiring me at Harvard, had typed my name

into the Google search bar and paid attention to this ad,

it put me at a disadvantage.

And not just me, but a whole group of Black people

would be placed at a disadvantage.

How could these biases of society be invading

the technology that I really had grown to love?

And now civil rights was up for grabs

by what technology design allowed or didn't allow.

Google's ad delivery system is really quite amazing.

You click on a web page, and that web page has a slot

that an ad is going to be delivered.

And in that fraction of a second,

while the page is being delivered,

Google runs a fast digital auction.

And in that digital auction,

they decide which of competing ads are going to be the ad

they're going to place right there.

At first, the Google algorithm

will choose one of them randomly,

but if somebody clicks on one, then that one becomes

weighted more often to be delivered.

So, one way the discrimination in online ads could happen

would've been that society would have been biased

on which ads they clicked most often,

and that this would've represent the bias of society itself.

Our technology and our data sharing are so powerful

that they are kind of like the new policy maker.

We don't have oversight over these designs, but yet,

how the technology is designed dictates the rules we live by.

And this meant that we were moving from a democracy

to a new kind of technocracy.

I became the chief technology officer

at the Federal Trade Commission.

They're sort of the de facto police department

of the internet.

One of the experiments that I had done while I was at the FTC

showed that everyone's online experience is not the same.

What we lose with our hyper-reliance upon

search technologies and social media is,

the criteria for surfacing what's most important

can be deeply, highly manipulated.

One of the hardest case studies to write in my book

was about Dylann Roof.

He went online and he was trying to make sense

of the trial of George Zimmerman.

And the first thing that I guess I can say,

I would say woke me up,

you know, would be the Trayvon Martin case.

REPORTER: Trayvon Martin, an unarmed Black teenager,

was shot down by a white neighborhood watchman

who claimed self-defense.

Eventually I decided to, you know, look his name up,

just type him into Google,

you know what I'm saying?

For some reason, it made me type in the words

"Black on white crime."

NOBLE: We know from Dylann Roof's own words that the first site

that he comes to is the Council of Conservative Citizens.

The CCC is an organization

that the Southern Poverty Law Center calls vehemently racist.

And that's, that was it, ever since then.

NOBLE: Let's say he had been my student.

I could've just immediately said, "Did you know that

that phrase is kind of a racist red herring?"

The FBI statistics show us that the majority of white people

are actually killed by other white people.

But instead, he goes to the internet and he finds the CCC,

and he goes down a rabbit hole of white supremacist websites.

Did you read a lot? Did you read books,

or watch videos, or watch movies or YouTube,

or anything like that

specifically about that subject matter?

No, it was pretty much just reading articles.

Reading articles? Yeah.

NOBLE: And we know that shortly thereafter,

he goes into a church, murders nine African Americans,

and says his intent is to start a race war.

This is not an atypical possibility.

When you don't get a counterpoint to the query,

you don't get Black studies scholarship,

or FBI statistics, or anything that would reframe

the very question that you're asking.

This is an extreme case of acting upon

white power radicalization, but this is not unlike

things that are happening right now every day in search engines,

on Facebook, on Twitter, in Gab.

People are being targeted and radicalized

in very dangerous ways.

This is what is at stake when people are so susceptible

to disinformation, hate speech, hate propaganda in our society.

SWEENEY: Racism itself can't be solved by technology.

The question is, to what extent can we make sure technology

doesn't perpetuate it,

doesn't allow harms to be made because of it?

We need a diverse and inclusive community in the design stage,

in the marketing and business stage,

in the regulatory and journal stages, as well.

NOBLE: I am really interested in solutions.

It's easy to talk about the problems,

and it's painful, also, to talk about the problems.

But that pain and that struggle should lead us

to thinking about alternatives.

Those are the kind of things that I like to talk to

other information professionals and researchers

and librarians about.

As a person who has a name

that doesn't sound like Jennifer, right?

Or Sarah, or something.

That paper made the difference for me,

because I was just this grad student,

and you were this esteemed Harvard professor,

and you were having these experiences, too.

When I think about the, like, the, the foundations

of something like ethical A.I.,

I go back to you in that early paper.

I think what I feel most hopeful about

is that there's this new cottage industry called ethical A.I.,

and I know that our work is profoundly tied to that.

But on another level, I feel like

these predictive technologies are so much more ubiquitous

than they were ten years ago.

You know, what I find really painful is that

as we move forward, it's harder to track.

One thing that becomes clear is,

we could use a heck of a lot more transparency.

As a computer scientist, my vision

is, I want society to enjoy the benefits

of all these new technologies without these problems.

Technology doesn't have to be made this way.

NOBLE: That's right, that's right.

I see so many more women and girls of color

interested in these conversations,

and one of the things that I also see is how we see things

because we ask different questions

based on our lived experiences.

Just the fact that the questions are being raised

means that the space is less hostile,

means there's an opportunity for your voice.

And, and the other thing

that's really important about this work,

it means that it's a new kind of way of thinking

about computer science.

It's in this conversation with you that I see a future.

I'm hopeful because it's not one isolated paper,

but in fact, it's a, it's a movement

toward asking the right questions, exposing

the right unforeseen consequences,

and pushing this forward towards a solution.

NOBLE: Some questions cannot be answered instantly.

Some issues we're dealing with in society,

we need time and we need discussion.

How can we look for new logics and new metaphors

and new ways to get a bigger picture?

Maybe we can see when we do that query

that that's just nothing but propaganda,

and we can even see the sources of the disinformation farms,

maybe we can see the financial backers.

There's a lot of ways that we can reimagine

our information landscape.

So, I do feel like there is some hope.

STREAM NOVA ON

  • ios
  • apple_tv
  • android
  • roku
  • firetv

FEATURED PROGRAMS