Search Engine Breakdown
Why does a widely used internet search engine deliver results that can be blatantly racist and sexist? Two leading information researchers investigate their discoveries of hidden biases in the search technology we rely on every day.
REPORTER: As misinformation and so-called fake news continues
to be rapidly distributed on the internet,
our reality has become increasingly shaped
by false information.
Many people don't know the difference between
something real and something created to deceive them.
SAFIYA NOBLE: I spent about 15 years in advertising and marketing,
and while I was there, Google arrived on the scene.
I understood the transformative effect that this search engine
was having in helping us curate
through all kinds of information.
But I was surprised, having just left advertising,
that everybody was thinking about Google
as this new public trusted resource,
because I thought of it as an advertising platform.
Most people who use search engines believe
that search engine results are fair and unbiased.
The public, and especially kids and young people,
use search engines to tell them the facts about the world.
One weekend, my nieces were coming over to hang out,
and I was thinking, "Oh, let me pull my laptop out
"and see if I can find some cool things
for us to do this weekend."
I just thought to type in "Black girls,"
and the whole first page
of search results was almost exclusively pornography
or hyper-sexualized content.
In 2012, I started to see some of the results changing.
Google had started to suppress the pornography
around Black girls.
Unfortunately, still today, we see pornography
and a kind of hyper-sexualized content as the primary way
in which Latina and Asian girls are represented.
"What makes Asian girls so attractive," "Asian fetish,"
"hot ladies from Asians," "see who we rank number one in 2020,"
"tender Asian girls," "meet world beauties."
This is the study that was done by the Markup
that replicated my study from ten years ago.
They found that Black girls, Latina girls, and Asian girls,
those phrases were-- look--
so profoundly linked with kind of adult content.
Zero for white girls, zero for white boys.
There are so many racial stereotypes
and gender stereotypes that show up in search results.
What about actual girls and children
who go and look for themselves in these spaces?
It's very disheartening.
When women become sex objects in a space like this,
it's really profound, because the public generally relates
to search engines as kind of fact checkers.
Before we were so heavily reliant upon a database,
we used something like a card catalogue.
We didn't rank content, it was alphabetical.
It also might be by subject.
It's a summary of the organization system
we call the Dewey Decimal System.
NOBLE: Now when we're in a subject, we know there is
a lot in relationship to that one item
that we might be looking for.
We might go look for a book in the stacks, for example,
and find that there's hundreds of books around that one
that tell us something about that book,
and we might serendipitously find all kinds of other bits
of information that are amazing.
But we can see a little a bit more about the logics of that.
We don't understand the logics of how certain things
make it to the first page in a search.
Google has a very complicated
and nuanced algorithm for search.
Over 200 different factors go into
how they decide what we see.
Of course, they're indexing about half
of all of the information that is on the web,
and even that is trillions of pages.
AD ANNOUNCER: Billions of times a day,
Google software locates all the potentially relevant results
on the web, removes all the spam, and ranks them
based on hundreds of factors, like keywords, links, location,
and freshness-- all in, oh, 0.81 seconds.
NOBLE: The whole premise of a search engine is to categorize
and classify information.
A lot of the content that comes back to us on the internet,
it's in a cultural context of ranking.
We know very early what it means to be number one,
so ranking logic signals to us
that the classification is accurate,
from one being the best
to whatever is on page 48 of search,
which nobody ever looks at.
Part of what it's doing is picking up signals
from things that we've clicked on in the past,
that a lot of other people have clicked on,
things that are popular.
So an algorithm is, in essence, a decision tree.
If these conditions are present,
then this decision should be made.
And the decision tree gets automated
so that it becomes like a sorting mechanism.
Google's very reliable for certain types of information.
If you're using it in this kind of phone book fashion,
it's fairly reliable.
But when you start asking a search engine
more complex questions, or you start looking for knowledge,
the evidence isn't there that it's capable of doing that.
It's this combination of hyperlinking,
it's a combination of advertising and capital,
and also what people click on
that really drives what we find on the web.
This is where we start falling into trickier situations,
because those who have the most money are really able
to optimize their content better than anyone else.
There have been great studies about the disparate impact
of what a profile online says about who you are.
LATANYA SWEENEY: I was the first
African American women to get
a PhD in computer science at M.I.T.
So, I visit Harvard.
I'm being interviewed there by a reporter,
and he wants to see a particular paper that I had done before.
So, I go over to my computer,
I type in my name into Google's search bar,
and upward pops this ad implying I had an arrest record.
He says, "Ah, forget that article.
Tell me about the time you were arrested."
I said, "Well, I have never been arrested."
And he says, "Then why does your computer say
you've been arrested?"
So I click on the ad, I go to the company to show him
not only did I not have an arrest record,
but nobody with a "Latanya Sweeney" name
had an arrest record.
And he says, "Yeah, but why did it say that?"
If you type in the name "Latanya"
in the Google image search,
you can see a lot Black faces staring back.
Whereas if I type "Tanya,"
I see a lot of white faces staring back.
So we get the idea that there are some first names
given more often to Black babies than white babies.
So, I then took a month
and I researched almost 150,000 ad deliveries
around the country, and I found that if your name was given
more often to white babies than Black babies,
the ad would be neutral.
And if your first name was given more often
to Black babies than white babies,
you were 80% likely to get an ad
implying you had an arrest record,
even if no one with your name
had any arrest record in their database.
NOBLE: One specific way that algorithms discriminate
is that they just are too crude.
The idea of if x, then y,
if you have this type of name,
it means you're automatically associated with criminality.
That blunt, crude kind of association,
that is the staple logic of how algorithms work.
The types of bias that we find on the internet are often blunt.
We are being profiled into similar groups of people
who do the kinds of things that we might be doing,
and we're clustered and sold as a cluster to advertisers.
And so there's certainly a commercial bias.
But we also have the bias of the people
who design the technologies.
To think that technologies will be neutral or never have bias
is really an improper framing.
Of course there will always be a point of view
in our technologies.
The question is, is the point of view in service of oppression?
Is it sexist? Is it racist?
SWEENEY: Here I was, a passionate believer in the future
of equitable technology, and if the people,
when they were hiring me at Harvard, had typed my name
into the Google search bar and paid attention to this ad,
it put me at a disadvantage.
And not just me, but a whole group of Black people
would be placed at a disadvantage.
How could these biases of society be invading
the technology that I really had grown to love?
And now civil rights was up for grabs
by what technology design allowed or didn't allow.
Google's ad delivery system is really quite amazing.
You click on a web page, and that web page has a slot
that an ad is going to be delivered.
And in that fraction of a second,
while the page is being delivered,
Google runs a fast digital auction.
And in that digital auction,
they decide which of competing ads are going to be the ad
they're going to place right there.
At first, the Google algorithm
will choose one of them randomly,
but if somebody clicks on one, then that one becomes
weighted more often to be delivered.
So, one way the discrimination in online ads could happen
would've been that society would have been biased
on which ads they clicked most often,
and that this would've represent the bias of society itself.
Our technology and our data sharing are so powerful
that they are kind of like the new policy maker.
We don't have oversight over these designs, but yet,
how the technology is designed dictates the rules we live by.
And this meant that we were moving from a democracy
to a new kind of technocracy.
I became the chief technology officer
at the Federal Trade Commission.
They're sort of the de facto police department
of the internet.
One of the experiments that I had done while I was at the FTC
showed that everyone's online experience is not the same.
What we lose with our hyper-reliance upon
search technologies and social media is,
the criteria for surfacing what's most important
can be deeply, highly manipulated.
One of the hardest case studies to write in my book
was about Dylann Roof.
He went online and he was trying to make sense
of the trial of George Zimmerman.
And the first thing that I guess I can say,
I would say woke me up,
you know, would be the Trayvon Martin case.
REPORTER: Trayvon Martin, an unarmed Black teenager,
was shot down by a white neighborhood watchman
who claimed self-defense.
Eventually I decided to, you know, look his name up,
just type him into Google,
you know what I'm saying?
For some reason, it made me type in the words
"Black on white crime."
NOBLE: We know from Dylann Roof's own words that the first site
that he comes to is the Council of Conservative Citizens.
The CCC is an organization
that the Southern Poverty Law Center calls vehemently racist.
And that's, that was it, ever since then.
NOBLE: Let's say he had been my student.
I could've just immediately said, "Did you know that
that phrase is kind of a racist red herring?"
The FBI statistics show us that the majority of white people
are actually killed by other white people.
But instead, he goes to the internet and he finds the CCC,
and he goes down a rabbit hole of white supremacist websites.
Did you read a lot? Did you read books,
or watch videos, or watch movies or YouTube,
or anything like that
specifically about that subject matter?
No, it was pretty much just reading articles.
Reading articles? Yeah.
NOBLE: And we know that shortly thereafter,
he goes into a church, murders nine African Americans,
and says his intent is to start a race war.
This is not an atypical possibility.
When you don't get a counterpoint to the query,
you don't get Black studies scholarship,
or FBI statistics, or anything that would reframe
the very question that you're asking.
This is an extreme case of acting upon
white power radicalization, but this is not unlike
things that are happening right now every day in search engines,
on Facebook, on Twitter, in Gab.
People are being targeted and radicalized
in very dangerous ways.
This is what is at stake when people are so susceptible
to disinformation, hate speech, hate propaganda in our society.
SWEENEY: Racism itself can't be solved by technology.
The question is, to what extent can we make sure technology
doesn't perpetuate it,
doesn't allow harms to be made because of it?
We need a diverse and inclusive community in the design stage,
in the marketing and business stage,
in the regulatory and journal stages, as well.
NOBLE: I am really interested in solutions.
It's easy to talk about the problems,
and it's painful, also, to talk about the problems.
But that pain and that struggle should lead us
to thinking about alternatives.
Those are the kind of things that I like to talk to
other information professionals and researchers
and librarians about.
As a person who has a name
that doesn't sound like Jennifer, right?
Or Sarah, or something.
That paper made the difference for me,
because I was just this grad student,
and you were this esteemed Harvard professor,
and you were having these experiences, too.
When I think about the, like, the, the foundations
of something like ethical A.I.,
I go back to you in that early paper.
I think what I feel most hopeful about
is that there's this new cottage industry called ethical A.I.,
and I know that our work is profoundly tied to that.
But on another level, I feel like
these predictive technologies are so much more ubiquitous
than they were ten years ago.
You know, what I find really painful is that
as we move forward, it's harder to track.
One thing that becomes clear is,
we could use a heck of a lot more transparency.
As a computer scientist, my vision
is, I want society to enjoy the benefits
of all these new technologies without these problems.
Technology doesn't have to be made this way.
NOBLE: That's right, that's right.
I see so many more women and girls of color
interested in these conversations,
and one of the things that I also see is how we see things
because we ask different questions
based on our lived experiences.
Just the fact that the questions are being raised
means that the space is less hostile,
means there's an opportunity for your voice.
And, and the other thing
that's really important about this work,
it means that it's a new kind of way of thinking
about computer science.
It's in this conversation with you that I see a future.
I'm hopeful because it's not one isolated paper,
but in fact, it's a, it's a movement
toward asking the right questions, exposing
the right unforeseen consequences,
and pushing this forward towards a solution.
NOBLE: Some questions cannot be answered instantly.
Some issues we're dealing with in society,
we need time and we need discussion.
How can we look for new logics and new metaphors
and new ways to get a bigger picture?
Maybe we can see when we do that query
that that's just nothing but propaganda,
and we can even see the sources of the disinformation farms,
maybe we can see the financial backers.
There's a lot of ways that we can reimagine
our information landscape.
So, I do feel like there is some hope.
More Episodes (209)
NOVA Universe Revealed: Big BangNovember 24, 2021
NOVA Universe Revealed: Black HolesNovember 17, 2021
NOVA Universe Revealed: Alien WorldsNovember 10, 2021
NOVA Universe Revealed: Milky WayNovember 03, 2021
NOVA Universe Revealed: Age of StarsOctober 27, 2021
The Cannabis QuestionSeptember 29, 2021