AI’s Drawbacks: Environmental Damage, Bad Benchmarks, Outsourcing Thinking

Channel: Alex Kantrowitz

Published at: 2025-05-14

YouTube video id: WreiudYaenc

Source: https://www.youtube.com/watch?v=WreiudYaenc

Two of AI's most voseiferous critics
join us for a discussion of the
technologies weaknesses AMI abilities
and a debate on the finer points of
their arguments. We'll talk about it all
after this. Welcome to Big Technology
Podcast, a show for coolheaded nuance
conversation of the tech world and
beyond. We're joined today by the
authors of the AI con. Professor Emily
M. Bender is here. She's a professor of
linguistics at the University of
Washington. Emily, welcome. I'm glad to
be here. Thank you for having us on your
show. My pleasure. And we're also joined
by Alex Hannah, the director of research
at the Distributed AI Research
Institute. Alex, welcome. Thanks for
having us, Alex. Always good to have
another Alex on the show. So, look, we
try to get the full story on AI here.
And so, today we're going to bring in, I
think, two of the most vocal critics on
the technology. They're going to state
their case, and you at home can decide
whether you agree or not, but it's great
to have you both here. So, let's start
with the premise of the book. What is
the AI con? Emily, do you want to begin?
Sure. So, the AI con is actually a a
nesting doll situation of cons. Right
down at the bottom, you've got the fact
that especially large language models
are a technology that is it's a parlor
trick. It plays on our ability to make
sense of language and makes it very easy
to believe there's a thinking entity
inside of there. This parlor trick is
enhanced by various UI decisions.
There's absolutely no reason that a
chatbot should be using I, me, pronouns
because there's no I inside of it, but
they're set up to do that. So, you've
got that sort of base level con. But
then on top of that, you've got lots of
people selling technology built on chat
bots to, you know, be a legal assistant,
to be a diagnostic system in a medical
situation, to be a personalized tutor,
and to displace workers, but also um put
a band-aid over large holes in our
social safety net and social services.
So, it's it's cons from the bottom to
the top. Okay. I definitely have things
that I disagree with you in places on
and we will definitely get into that in
the second half especially about the
usefulness of these bots and whether
they should be using IRE pronouns and um
the whole consciousness debate. We're
going to get into that. Uh I don't think
I don't think any of us think we're
think that these things are conscious. I
just think we have a disagreement on how
much the industry has played that up.
But let's start with what we agree on.
And uh I think that from the very
beginning, Emily, you were the author,
lead author on this very uh famous paper
about calling the large language models
uh stochastic parrots. And at the very
beginning of that paper, there is
concern about the environmental safety
and the environmental issues that large
language models uh might bring about. So
on this show we talk all the time about
the size of the data centers, size of
the models and of course there is an
associated uh energy cost that must be
paid to use these things. And so I'm
curious if you Emily or you Anna Alex
you worked at Google right so uh you
probably have a good sense of this. Can
you both share like quantify how much
energy is being used to run these
models?
So part of the problem is that even you
know even if you're working at Google
you you are directly working on this
they're not very public estimates of how
much cost there is I mean the cost vary
quite widely and the only cost I think
that we know was an estimate being made
by folks at hugging face um that worked
on the blooms model because they were
able to actually have some kind of
insight into the energy consumption of
these models So part of the problem is
the transparency of companies on this.
you know, as a response at Google after
after the stochastic parents paper was
published, one of the complaints from
people like Jeff Dean, the SVP of
research at Google, and David Patterson,
who's the lead author of Google's kind
of rebuttal to that was that, you know,
well, you didn't factor in XYZ, you
didn't factor in renewables that only we
talk about at this one data center in
Iowa. we didn't you didn't factor into
off peak training and so it's part of
the problem I mean we could try to put
numbers on it but there's so much
guardedness about what's actually
happening here we can't quantify it we
don't know when it comes to model
training I mean we might have something
like we know the number of parameters
that are in a new model or in an open
weights model like llama but um we don't
know how many kind of fits and starts
there with stopping training and
restarting or experimenting. So, you
know, we could speculate, but we know
it's a lot because there are real
effects in the world right now. What are
those effects? What are those effects?
Uh, so, um, you see communities losing
access to water sources. You see
communities, you see electrical grids
becoming less stable. Um, and this is
starting to be, I think, very well
documented. There's a lot of journalists
who are on the beat doing a lot of good
work. And I also want to shout out the
work of Dr. Dr. Sasha Luchion who's been
looking at this from an academic
perspective and one of the points that
she brings in is that it's not just the
training of the models but of course
also the use and especially if you're
looking at the use of chat bots in
search instead of getting back a set of
links which may well have been cached if
you're getting back an AI overview which
happens non-conensually when you try
Google searches these days right um each
of those tokens has to be calculated
individually and so it's coming out one
word at a time and that is far more
expensive. I think her number is
somewhere between 30 and 60 times more
expensive just in terms of the compute
which then scales up for electricity,
carbon and water um than an oldfashioned
search. I would also say that speaking
about existing uh effects, there's also
a lot of reporting coming out of Memphis
right now, especially around um the
methane generators that um XAI has been
using to power a particular uh
supercomputer there called Colossus
there, specifically around emissions
there um affecting Southwest Memphis,
traditionally a black and impoverished
community. There's also reporting on um
well actually in research from uh UC
Irvine in which looking at backup
generators and emissions from diesel um
that are support that are connected to
the grid. But um just because the SLAs's
on data centers are, you know,
incredibly high um you effectively need
some kind of a backup to kick in at some
time and that's going to contribute to
air pollution.
And which communities have been affected
by the loss of water due to AI data? So
in I think the best reported one is the
Dulles in in Oregon. I mean I think
that's the one that is the best known
that is kind of pre- AI in which we're
focusing on the the development of
Google's hyperscaling and it wasn't
until the Oregonians sued the city that
we knew that half of the water
consumption in the city was going to
Google's data center. Um, we're That was
before generative AI. That was before
generative AI. I mean, we have to
imagine the problems probably
exacerbated right now. But do we know
that? I mean, you both wrote the book on
this.
Um, so we have we certainly point to
environmental impacts as a really
important factor. It is not the main
focus of the book. I would refer people
um to reporting of people like Paris
Marks over at tech won't save us um did
a wonderful series called uh data
vampires looking at I think there was
stories in Spain and in Chile um and
yeah so this is uh you know we are
looking at the overall con and the
environmental impacts come in um because
it is something we should always be
thinking about and also because it is
very hidden right when you access these
technologies you're probably sitting you
know looking at them through your mobile
device or through your computer and the
compute and its environmental footprint
and the noise and everything else is
hidden from you in the immateriality of
the cloud. I would also say that I mean
the reporting on Memphis I want to give
a shout out to the reporting in Prism um
by Ray
Levy
ua don't know if I'm pronouncing their
surname correctly but they have an
extensive amount about the kind of water
consumption of this saying that this
would take about I think a million
gallons. Um I'm checking it but I'm
looking I'm looking at the the the
reporting on it. I think
um uh I'm seeing uh the exact number on
this. I'm going to look at it. Yeah. So,
they they're focusing Yeah. a million
gallons of water a day to cool
computers. They don't they're saying
that they need to build a graywater
facil facility to do it. I mean, this is
not anything that any um uh that these
facilities don't exist yet, so they'd
have to be built. But I mean this thing
is already being constructed and is is
is is using water. So I mean I don't
think it's a far cry to say that this is
happening in an era that was in the
hyperscaling era in pre- the gentri AI
era. I mean it's the unfortunate fact
about it is that a lot of these
community groups are fighting this on a
very local level and a lot of these
things are getting under reportported on
just because but from what we know from
the fights in the Dulles and in London
County and uh in uh parts of rural Texas
I mean we we'd be surprised if this
similar kinds of battles weren't being
fought right I agree with the under
reporting and that's why we're leading
with it here when we're going to go
through a list of some of the things
that might be wrong with with generative
AI. I think it is an issue. I think
Emily, you you basically hit on it,
right? Where you're producing all these
tokens uh when you're going to generate
an AI overview that I checked and it is
you cannot opt out of it. You're
correct. Well, you can if you if you add
minus AI to the query, okay, but you
have to do that each time. You can't
like put a setting somewhere. That's
interesting. I didn't know about that.
Okay. So, you can you can opt out minus
AI. Uh but the these things do take more
computing than traditional Google
search. I guess the argument from these
companies would be that they're just
going to make their models more
efficient. I mean, we see the increasing
amounts of efficiency over time and you
know, there might be a big upfront
energy cost to train, but inference
might end up being not that energy
intensive. What would you say to that? I
would say that we've got uh Brad Smith
at Microsoft giving up on the plans to
become uh net zero carbon since the
beginning of Microsoft. and he said this
ridiculous thing about we had a moonshot
to get there and turns out with
generative AI the moon is five times
further away which is just an absurd
abuse of that metaphor. Um but yeah and
you see just you know Google similarly
also backing off of their environment
goals and so if there really was all
these efficiencies to be had I think
they wouldn't be doing that backing off.
And I want to also add, I mean, I think
this argument about
the large amount of training in in
carbon use on the front end and then it
tapering off with the inference. I mean,
this is an argument that straight came
from Google. This is this was again in
the same paper by David Patterson. I
think the title of the paper, I'm not
going to get it exactly right, was you
know cost of um the cost of training or
the cost of generative AI will um
probably not generative I think it was
the cost of language models will plateau
and then decrease uh or the training
cost and effectively the argument being
that you have this large investment that
we can offset with renewables and then
it's going to decrease but you have to
also consider that given that the
economics surrounding it It's not one
company training these, right? I mean,
if it's multiple different companies
training these and in multiple different
companies providing inference and so as
long as there there's some kind of
incentive to keep on putting this in
products, then they're going to
proliferate. So, if it was just Google,
sure, maybe there might be a case in
which there was some kind of planning
and there was some kind of way to
measure and focus on that and then it
actually tapering down. But you have
Google, Anthropic, XAI, of course,
OpenAI, Microsoft, Amazon, everyone
trying to get a piece doing both
training and doing inference. So I think
that's again, you know, like it's hard
to put numbers on it, but what we see in
this is just the massive investment in
this and that gives a good signal to say
that the carbon costs have to be
incredibly high.
Right? Look, I I think it's it's
important for us to again to lead here.
It's clear that there's some real
environmental impacts and I mean we have
Jensen Wong the CEO of Nvidia saying
inference is going to take 100 times
more compute than uh than traditional
LLM uh inference and uh every c every
well every top executive from these
firms that I've asked well is inference
going to take more compute it's not
exactly as much as Jensen is saying but
there is a spectrum so these things are
going to be more energy intensive if
everybody everybody listening out there,
I do think, you know, this is important
context to take in that when we talk
about AI, there's an environmental co
cost out there. It's not fully clear
what that is, although there is one, and
I agree with the authors here that more
transparency makes a lot of sense. Now,
let's talk about another issue that you
bring up in the book, which is benchmark
gaming. It's been a hot topic in our big
technology discord over the past couple
weeks that we see these research labs uh
keep telling us that they have reached a
new benchmark or beat a certain level on
uh a new test and we're all trying to
figure out what that means because it
does seem like a lot of them are
training to the test and you have some
point of criticism in the book about the
gaming of benchmarks and what that's
meant to tell us. So just lay it out for
us uh what's going on with benchmarks
and tell us about the gaming Emily. So
yeah so when you say the gaming of
benchmarks that makes it sound like the
benchmarks are reasonable and they're
being misused but I think actually most
of the benchmarks that are out there are
not reasonable. They lack what's called
construct validity. And construct
validity is this two-part test of the
thing that we are trying to measure is a
real thing and this measurement
correlates with it interestingly. But
nobody actually establishes what these
things are meant to measure as a real
thing, let alone that second part. And
so they are useful sales figures, right,
to say, hey, we now have
state-of-the-art soda on whatever. Um
but it is not interestingly related to
what it's named as measuring, let alone
what the systems are actually meant to
be for. Yeah. And I would just add that
I mean there's a lot of work and I mean
we prior to the book Emily and I have
spent a lot of time writing on benchmark
data sets and so this has been you know
like I'm personally obsessed with the
imageet data set. I'm thinking of
another book on the IMstead data set
just from what entails but I mean you
know the benchmarks what they purport to
there's a there's a lot of different
problems in the benchmarks right so the
construct validity is probably first and
foremost and when we get something where
you have something like med palm 2 or
med palm 1 and two being measured on the
US medical licensing exam that's not
really a test that determines whether
one is sufficed to be a medical press
practitioner. There's so much more
involved with being a medical
practitioner
uh above and beyond taking the US
medical license exam. You can't take the
bar and say you're ready to be a lawyer,
right? I mean, there's so much more that
has to do with with uh relationships and
training and other types of
professionalization.
There's huge literature in in sociology
and so sociology of occupations on what
professionalization looks like and what
it entails and what kind of social
skills involved and what that means and
how to be adept at um being in the
discipline. But then the car the um um
um the um the kind of different
benchmarks are there's so many different
problems just in terms of the way that
companies are doing science themselves.
they're releasing these benchmarks and
often these are benchmarks that they
themselves have created and released. So
it may be the fact that they are quote
unquote teaching to the exam but they're
also they have no kind of external
validity in terms of what they're trying
to do. So OpenAI is saying we had a
model that did so well we had to create
a new benchmark for it. Well, who's
validating that, right? I mean even the
old benchmarking culture you had
external benchmarks and multiple people
would be going to it and saying ah we've
done better in this benchmark. Now
OpenAI is saying we have our own
benchmarks cuz we did it so well. Not
like the old system was any better but
this new system is that well where's the
independent validation of this that it
says it can do this thing that it's
purported to say. Um what do you think
about the ARC AGI test?
Yeah.
Well, I mean we spent some time focusing
on the ARGI test, right? The AR AGI test
it is independent at least that it is it
is ostensibly independent. I mean it is
this is that the the
French? Yeah. By the way, for for
everybody who's listening, it basically
asks, let me see if I get this right. It
asks the models to be able to generalize
its ability to understand patterns and
putting shapes together. I think that's
the best way to explain. Yeah. So, it's
it's a it's a bunch of visual puzzles
where uh it's I think they are all in 2D
grids and in order to make this
something that a large language model
can handle, those 2D colorful things are
turned into just sequences of letters.
And the idea is that you have um I think
it's it's sort of a a few shot learning
setup where you have a few exemplars and
then an input and the the thing is can
you find an output like that and it is
when we want to talk about how the names
of the benchmarks are already misleading
the fact that that's called arc AGI
right that suggests that it's testing
for AGI it's not it's one specific thing
and I think Chole's point is that uh
this is something that is a very
different kind of task than what people
are usually using language models for.
And so the sort of gesture is towards
generalization that that if you can do
this even though you weren't trained for
it, then that's evidence of something.
But if you look at the um OpenAI
paper-shaped object about this, uh they
used a bunch of them as training data in
order to like tune the system to be able
to do the thing. So all right, fine.
Supervised machine learning kind of
works, right? And the next the next test
there was ARGI 2 that came out with a
whole bunch of new problems and
instantly all the models started doing
poorly on those. So let me let me just
ask this. Is there a measure that would
allow the two of you to assess whether
these AI models are useful or have you
just written off their ability to be
useful completely? So useful for what?
I mean you you tell me. Well, that's
that's sort of my point is that I think
it's perfectly fine to use machine
learning to do specific tasks and then
you set up a measurement that has to do
with the task in context. I'm a
computational linguist, so things like
automatic transcription are very much in
my area. If I were going to evaluate an
automatic transcription system, I would
say, okay, who am I using it for? What
kind of speech varieties? I'm going to
collect some data, people speaking, uh
have someone transcribe it for me, a
person, and then evaluate how well the
various models work on doing that
transcription. And if they work well
enough and it is to within the
tolerances of the use case for me then
great that's good. Do you believe
ability to be general?
So the ability to be general um and here
um I'm thinking of the work of Dr.
Tunique Gabru is not an engineering
practice. That's an unscoped system. So
what what Dr. Gabru says is the first
step in engineering is your
specifications. What is it that you're
building? If what you're building is
general, you're off on the wrong path.
That that's not something that you can
test for and it is not well scoped
technology. Yeah. I mean, this notion of
generality has always had some
specificity in AI too. I mean, we
mentioned in the book this idea of this
uh uh this this is a word I struggle
with and I have I've taken so many time
but the I'm just going to say fruit
flies, right?
Right. the Josephila the the kind of
fruitfly model of genomics this idea
that you have some kind of sequencing
that's very common to this one very
specific species right and there is in
the past what that's become in AI is the
game of chess it's been game playing
right I mean and these are very specific
tasks and those aren't those don't
generalize to something called general
intelligence as if something like that
actually exists. I mean, one of the
problems in AI research is that the
notion of intelligence is very very
poorly defined and the notion of
generality is very poorly defined or is
scoped to what the actual benchmark or
what the task is that it is being
attempted to achieve. So I mean that's I
mean so this notion of generality is
very
poorly understood and it is deployed in
a way that is that makes it sense sound
like there is a notion of kind of
general intelligence and it seems to be
the fact I mean and there's you know one
of the one of the um a great paper that
we we we point to in the footnotes of
the book is this paper by Nathan
Ensmerger um which um Ensenger
um that is talking about how chess
became the the Drosophilia of the um the
uh the AI research age and the prior AI
hype cycle in the '60s and ' 70s. And it
just happened to be you had a lot of
guys that liked chess and they wanted to
compete with the Soviets who had chess
dominance, right? And so those tasks
become kind of these tasks about like
well these are the things we kind of
like and and we're actually seeing some
of that again it's like well we these
are tasks that we think are suitable
these are tasks that are scoped in a way
that think we think are the most
worthwhile problems but they're not
tasks to think about well what exists in
the world that is going to be helpful
and useful and scoped to specific
execution right this this notion of an
everything system is is wildly unscoped
But okay, so it is unscoped. But I think
everybody listening or watching right
now would probably say, well, just my
basic use of chatbt, it can tell me
about history. It can write a poem. It
can create a game. I Okay, I see Emily
reacting already. Uh, it can search the
web and give me plans. It can do all
these different things in these
different disciplines. So there is a I
think for people listening there would
be a sense that there is an ability to
go into various different disciplines
and perform and whether you say it's a
magic trick or not it's it's clearly
that it can and so what I'm I guess I'm
trying to get at is do I mean is there a
way to measure that or do you think that
that is in itself a wrong assertion?
So yes, I think it's a wrong assertion.
What chat GPT can do is it can mimic
human language use across many different
domains. And uh so it can produce the
form of a poem. It can produce the form
of a travel itinerary. It can produce
the form of a Wikipedia page on the
history of some event. Uh it is an
extremely bad idea to use it if you
actually have an information need.
setting aside the environmental impacts
of using chat GPT and setting aside the
terrible labor practices behind it and
the awful exploitation of data workers
who have to look at the terrible outputs
so that the consumer sees fewer of them
and by terrible outputs I mean violence
and um racism and all kinds of sort of
psychological we covered that on the
show. Yes. Was that No, we we've we've
had one of the uh people who've been um
rating this content on the show. Folks
who are interested, I I'll link it in
the show notes, Richard was here to talk
about what that experience was like. But
sorry, go ahead. So, setting aside all
of that, if you have an information
need, um so something you genuinely
don't know, then taking the output of
the synthetic text extruding machine
doesn't set you up to actually learn
more on a few levels, right? because you
don't already know, you can't
necessarily quickly check except maybe
doing an additional search without chat
GPT at which point why not just do that
search. But also um it is poor
information practices to assume that the
world is set up so that if I have a
question there is a machine that can
give me the answer when I'm doing
information access. Instead, what I'm
doing is understanding the sources that
that kind of information comes from, how
they're situated with respect to each
other, how they land in the world. Um,
and so this is some work I've done with
Shrag Shaw on information behavior and
why chat bots even if they were
extremely accurate would actually be a
bad way to do these practices. So just
to you know back to your point yes this
system is set up to output plausible
looking text on a wide variety of topics
and that's there in lies the danger
because it seems like we are almost
there to the robo doctor the robo lawyer
the robo tutor and in fact not only is
that not true not only is it
environmentally ruinous etc but that is
not a good world to live in and thinking
about thinking about can I just I just
want to hit on this point uh this is I
agree I disagree with you on this one I
I do think that some of the points that
you're making are wellounded. We don't
want these things to be lawyers right
away. But let me at least point you to
one use that I've I've had recently and
you could tell me where I'm where I'm
going wrong if you think I am. I mean,
I'm in Paris now. Uh little work, little
vacation at the same time. And what I've
done is I've taken two documents that
I've had uh friends uh who they have
they've been here often. They put
together documents that they send to
friends when they go here. I've uploaded
that into chat GPT and then I have chat
GPT like search the web and give me
ideas of what to do. I tell it where I
am. I tell it where I'm going and it
searches through like for instance like
all the museums, the art galleries, the
festivals, the concerts and it brings it
into one place. And that's been
extremely useful to me to to find new
cultural events, uh concerts. Uh there's
even a bread festival going on here that
I had no idea about. Uh and now I'm
going to go because it's found it for
me. So there's a link when it when it
comes to this stuff, there's there's a
link uh that you can go out and doublech
checkck the work. But as far as finding
information on the web, um the the fact
that it's able to go and comb the
internet for these events and then take
into context some of the um the context
that I've given it with these documents,
I think is very impressive. And that's
just one use case. So I'm not asking it
to be a lawyer. I'm kind of asking it to
be what you said, an itinerary planner.
What's wrong with that?
Uh so I mean first of all you had these
lovely documents from your friends and I
guess what you're saying is missing is
whatever current events are. So they've
given you some sort of like these are
general things to look for but they
haven't looked into what's going on
right now. Um what's wrong with that?
You know on several levels um what what
would we do in a prior age like even
pre- internet right the local newspapers
would list current events. Here's what's
going on. If you landed in a city, you
would go find the local probably local
indie newspaper and and look up the
events page. And that system was based
on a series of relationships within the
community between the people putting on
festivals and the newspaper writers. And
it helped support probably the the local
news uh information ecosystem, which was
a good thing. Um but on top of that, uh
if something wasn't listed, you could
think about why is this not listed?
what's the relationship that's missing?
Um, your chat GPT output is going to
give you some nonsense and you're right,
this is a use case where you can verify
whether this is real or not. Um, it is
also uh likely going to miss some things
and the things that are not surfaced for
you are not surfaced because of the
complex set of biases that got rolled
into the system plus whatever the role
of the die was this time. Um, and
anytime someone says, "Well, I need chat
GPT for this." Usually, one of two
things is going on. Usually it's either
there's another way of doing that that
is giving you more opportunities to be
in community with people to make
connections, or there is some serious
unmet need, which doesn't sound like
it's this case. And if we sort of pull
the frame back a little bit, we can say,
why is it that someone felt like the
only option was a synthetic text
extruding machine? And here I think
you've fallen into the the former of
these, which is what are you missing out
on by doing it this way? What are the
connections you could be making to the
people around you? Um, you know, the if
you're staying in an Airbnb, maybe the
Airbnb host, if you're in a hotel, the
concierge, um, to to get answers to
these questions when you're looking to
the machine instead. I would also say I
would also say this is I just want to
add, you know, I mean, I would also say
that this is a pretty like low stakes
scenario, right? You can go out, you can
verify these things. You can go to
existing resources of you know event
calendars that people also spend a lot
of time curating online. I mean there's
a lot of stuff that's already curated
online. And I mean it's not like this
this didn't exist in prior incidents of
technology. I mean you know one of the
people that we cite in the book and talk
a lot about is Dr. Joseph Noble's work
on Google and and the kind of way that
Google results um you know present very
uh violent content with regards to
racial minorities. One of the parts of
the book that I I I like to reference
and that a lot of people don't reference
uh um uh uh initially is this this kind
of part that she talks about. She talks
about Yelp and she talks about Yelp and
like um specifically and what it's
referring in terms of a um black
hairdresser and the way that like Yelp
effectively was like shutting this
person out of business um because there
was a specific need that she had for for
um uh black residents of the city that
she was studying and braiding hair and
doing other black hairstyles, right? And
so this is this is kind of a a function
of all kind of information retrieval
systems, right? You're thinking about
what they what they're including, what
they're excluding, right? So this is not
very consequential here, but in any kind
of any kind of area of say summarization
or any any kind of retrieval, you do
need to have some kind of expertise
where you can verify that and ensure
that what's getting in there is not
missing something huge. And you're it's
going to basically then take this set of
information access resources or systems
in this case crawling the web and and
knowing that that's going to miss
something and then it's going to
exacerbate that because then you cannot
situate those sources in context. Okay,
let me just give my counterargument and
then we can move on from this. Uh my
counter argument would be a couple
things. Uh first of all, I don't speak
French so the local newspaper would kind
of be lost on me. Uh, I am speaking.
Okay, so I am staying at a resident's
place. We swapped apartments. So, she's
in my New York apartment. I'm here. Uh,
so maybe we she and I could have gone
over that newspaper together. That's
That's fair. But the newspaper, speaking
of things that leave stuff out, the
newspaper leaves stuff out all the time.
It ex It exercises editorial judgment.
So, it is bot editorial judgment for
newspaper editorial judgment, but the
bot can be in some ways more
comprehensive because it's searching the
entire web. Uh, and I'll just say one
one last thing about this. I never felt
I didn't feel the need to use it. Um, I
didn't say I need to use it to figure
out what's going on. Like again, I had
these documents. What's useful about it
is speaking of making connections with
the local community, if I'm able to,
here's the word, be efficient uh in my
research through using it, I could spend
much more time uh out in the community
versus searching the web or reading the
newspaper. So, what's your thought on
that on on those arguments?
Um, sorry. So, the I was getting
distracted by Alex's cat walking around.
Yes, listeners. Alex's cat is here.
Alex, what's your cat's name? This is
This is Clara. And I'd lift her up, but
my uh I have a shoulder injury. Um but
she is um she's knocking the mic around,
so I'm going to not I'm just trying to
keep her off the mic. Yeah. Yeah. Thank
you. So, um you know, the the efficiency
argument
um so this is efficiency argument in the
context of leisure activities as opposed
to in the context of work. Um you
mentioned along the way that it is
searching the whole web for you. You
don't know that actually. That's right.
Right. Yeah. Um, and also the whole web
uh includes a lot of stuff that you
don't actually want. Like lots and lots
and lots of the web is just garbage SEO,
you know, stuff. Um, and maybe you're
seeing more of that in your chat GPT
output than you would in with a search
engine, which as Alex mentioned also has
issues. Um, and then finally, I'm going
to take SEO garbage is made for the
search engine. So, uh, it is, but the
search engines also in order to stay in
business have to be fighting back
against the SEO garbage. It's a constant
battle. Probably the chat bots as well.
Yeah. Um, so you mentioned newspaper
editorial judgment versus bot editorial
judgment. And I'm going to take issue
there because a bot is not the kind of
thing that can have judgment, nor is it
the kind of thing that can have
accountability for exercising judgment.
And so I think that yes, as Alex is
saying, this is low stakes, but if
you're using it as sort of a motivation
for these things being useful in the
world, then you have to deal with the
fact that the useful in the world is
going to entail many more higher stakes
things. And then we really have to worry
about accountability. I would also want
to say too, I mean there's a lot of I
think this argument from cap like quote
unquote capabilities which I don't know
really what that term means. Um and
that's another poorly defined term I
think especially when it comes to AGI.
Um but I mean this argument from kind of
like well I find it useful I don't find
terribly convincing right? I mean it's
sort of like well okay you you f you
have found it useful in in either a
situation in which there is a way to
have some kind of
verification of sources that you know
about and have some kind of ground truth
about or you found it useful kind of
from a variety of this of these
different
situations. But if I'm asking a chatbot
about something about an area that I
know quite a lot about, say sociology or
social movements literature, um I then
have that knowledge to verify that just
from
my social skill in that area. And this
is a term I'm kind of borrowing from a
sociologist, uh Neil Fleststein, and my
knowledge of how to navigate those areas
and my professionalization as a
sociologist. Okay. But then I'm but then
once it gets into those areas in which
verifiability just escapes me which is
most areas because we're not
professionals in most areas and although
a lot of us want to be jacks of all
trade jacks and gills of all trades then
we lose that ability and we we don't
have the we don't have the social skill
or depth of knowledge to to verify that
in the same way. And so I'm really not
convinced by those. Well, these are
useful for me in these pretty low state
contexts because that slippage then
means that we're going to miss some
pretty big things in some really dire
contexts. Okay. Well, let's turn it up a
notch when we come back because we're
going to talk about AI at work and AI in
the medical context. And maybe we can
even touch a little bit on dumerism,
which you write about in the book. And
uh and there's plenty else on the
agenda. So, we'll be back right after
this. And we're back here on Big
Technology Podcast with Professor Emily
M. Bender and Alex Hannah. They are the
authors of the AICON, how to fight big
tech's hype and create the future we
want. Here it is. So, let's go to
usefulness and we'll start with AI uh
generative AI in the medical context
because why don't we just go straight
for uh the example that we'll probably
have the biggest disagreement on here.
And I'm not saying that I think
generative AI should play the role of a
doctor. In fact, when I wrote my list of
things I uh agree with you both on, I
don't think that AI should be a
therapist, at least not yet. And we know
now that AI is the number one use
according to a recent study is
companionship and therapy. And the
therapy side really scares me. And I
think the companionship isn't the best
thing in the world either. But in
medicine uh I do find that there is some
use for it. Medicine is a field overrun
by uh paperwork and insurance uh
requirements that I think keep have have
ruined the health care system because
they keep doctors effectively tied to
their computers writing notes as opposed
to uh seeing patients or living their
lives. And Alex, before the break, you
mentioned that one of the areas that
this stuff is useful is when it starts
to operate in your area of expertise
because you're able to verify that. So,
I mean, we're going to go with one use
that I find uh to be pretty good here
and would sort of to me doesn't make AI,
generative AI feel like a con is when a
doctor is seeing a patient and the AI
they can put a transcription uh take a
transcription of the conversation that
they have with the patient and then have
AI synthesize what they talked about and
summarize it and put it into the systems
that they have for electric medical
records and then verify that so they
don't have to spend the
time writing those summaries up and can
actually go and spend some more time
with patients. So what's the problem
with that? There are so many problems
with that. And the first thing I want to
say is that you named the underlying
problem when you talked about insurance
requiring so much paperwork. So this is
one of those situations where there's a
real problem here. Um it's not that
doctors shouldn't be writing clinical
notes. that is actually part of the care
but there is a lot of additional
paperwork that is required because of
the way insurance systems and especially
the one in the United States are set up
and so we could work on solving that
problem and this is a case where sort of
the turn towards large language models
so-called generative AI as an approach
to this is showing us the existence of
an issue um but that doesn't mean it is
a good solution so many problems um one
is writing the clinical note is actually
part of the process of care it is the
doctor reflecting on what came out of
that conversation with the patient and
thinking it through, writing it down,
plans for next treatment. That is not
something that I want doctors to get out
of the habit of doing as part of the
care. Now, they might feel like they
don't have time for it. That's also a
systemic issue. Secondly, these things
are set up as like ambient listeners,
which is a huge privacy issue. As soon
as you've collected that data, it
becomes sort of this like radioactive
pile of danger. Thirdly, you've got the
fact that uh automatic transcription
systems which are the first step in this
do not work equally well for different
language varieties. So think about
somebody who's speaking a second
language. Think about somebody who's got
disarthria. So an older person whose
speech isn't very clear. Think about a
doctor who is an immigrant to the
community that they're working in who's
got extra work to do now because their
words are not well transcribed and so
the clinical notes thing doesn't work
well for them. But the system is set up
where there's these expectations that
they can see more patients because the
AI in quotes is taking care of all of
this for them. And there's a beautiful
essay that came out recently, I think in
stat news, and I was looking for the
name of the author. Didn't find it real
quick, really reflecting on how
important it is to her that the doctor
do that part of the care of actually
pulling out from the conversation. This
is what matters. And it's not just
simple summarization. It is actually
part of the medical work to go from the
back and forth had with the patient all
of the doctor's expertise to what goes
into that note. Yeah. So I want to add
on Emily has said so much of what I
wanted to get at which I think is but I
have I think three or four separate
points in addition to that. So first off
is the technical point. So there's so
tools that are that are purported to be
summarization. There's uh some great
reporting by by Garren Spur and Hilda
Shelman and and and the AP from last
October that was looking at Whisper
specifically. So that's OpenAI's ASR
system, automated speech recognition
system that said that medical
transcription had basic basically was
making up a lot of And then we
knew that this they had quote unquote
hallucinations. Again, that's not a term
that we use in the book. We we we we say
that it's I say it's making up, but
that is maybe even granting too much
anthropomorphizing of the system for me.
Um and and so but there is a lot of
these things some from that quoting from
that text. Some of that invented text
includes racial commentary, violent
rhetoric and even imagined medical
treatments. So that's a one major
problem. The second problem is that
medical transcription has been this area
which has been an area in which medicine
has been forcing kind of this
casualization of work for years right
and so medical note takingaking that
exists in hospitals now many of much of
that is done remotely so it's gone and
take this taken this work that has been
seen as kind of like this busy work or
this this thing that like I don't want
to write up my medical notes to be this
type of work that needs to be forced on
someone else. So, prior to this kind of
ASR element of it is is we've had these
Oh, thanks for linking that, Emily, and
I'll link the um I'll link the AP
article that I'm that I'm looking at,
too. Part of that work has actually been
offshored a lot into this kind of
movement of of outsourcing. So, a lot of
that is done remotely um as as a part of
this casualization. And this seemed to
be I think uh to be a lot of um a lot of
I I want to point out the gender n
notion of this. This is like a very kind
of like women'sbased work and that
reflects a lot of the ways in which so
much of
uh quoteunquote AI technology wants to
basically take the work that has been
traditionally the domain of women and is
saying well we can automate that or we
can casualize that in different ways.
And that's important because it sees
this work as not actually part of quote
unquote the work. It is seen as um work
that should be that ought to be
casualized and offshored. And so and I
appreciate the essay that that Emily
shared because um that essay saying like
no this is actually part of the element
of doctoring. And then I want to also
just kind of couch all of this stuff in
the kind of political economy of the
medical industry. Thinking about what
does it mean to rush and put and have
more and more remote medicine, having
more and more doctors see more patients.
And these efficiency gains from doctors
isn't going to like make their jobs
necessarily easier. It's going to put
more of a pressure on them. Now that
you're in a position where you don't
have to take medical notes, you're going
to be running from position to from
appointment to appointment to
appointment. And my sister is a nurse.
She's a nurse practitioner. And she's
basically seeing this in her job right
now at her clinic. She's well like now
we have these things where I have to see
patients now, you know, and if you know,
it's it's not that I'm going to go and
be on the beach anywhere. It means that
I'm going to have, you know, I'm going
to have 9 to 10 15 minute appointments a
day. I'm not going to have enough proper
treatment uh proper time to spend with
patients. So if these things could be
you know like I I would say to the kod
to all of this is that if AI boosters
could really offshore all of doctoring
to chat bots they would and this is one
case in which Bill Gates has said you
know in 10 years we're not going to have
teachers and doctors. What a nightmare
scenario to have non-teers and not
doctors. And Greg Curado really gives it
away and we cite him in a book where he
says a med palm too, you know, this
thing is really efficient. We're going
to increase tenfold our medical ability,
but I wouldn't want this to be part of
my family's medical uh medical journey.
Okay. But you're again here you're
you're picking out what what is like
some of the most extreme statements and
I I I started my question saying it's
Bill Gates and he Bill Gates can make
extreme extreme statements. I I don't
think that guy. I don't think he's the
guy. And I I think that that that
doesn't reflect the broad consensus here
and definitely not the question that I
asked which again was about using using
this to um take some of the time that
the doctors are using you know in
paperwork and give that back to either
the doctors or to have them be able to
see more patients. So we very much
addressed that point. First of all I
want to name the author of that essay.
Her name is Aliyah Barakat and it's a
beautiful essay. She's a mathematician
and also a patient with a chronic
condition. Wonderful essay. But yeah,
you said give that time back to the
doctors or have them see more patients,
right? It is not going to be going back
to the doctors. That's not how our
healthcare system works. And it's also
going to therefore decrease quality of
patient care. It is lose-lose except for
uh the uh hospitals maybe getting more
money and certainly the tech companies
that are selling this to the hospitals.
Okay. I'm also curious in terms of
thinking about it. I mean, yes. What is
the I I'm curious in thinking about the
more nuanced position and like who are
the reference here that you're thinking
of, Alex?
What's the consensus on this cuz I I'm I
Yeah, like we see the egregious, you
know, elements of this and I'm wondering
what the medical consensus is, you know,
like who's what's an example, you know,
just to poise it now I'm interviewing
you, but like who's someone that you
think is doing this very well on? Well,
I I mean, someone doing this well, like
again, I don't think that this stuff is
welldeveloped yet. Uh, but I've
definitely seen enough doctors just
buried in paperwork. Uh, and we we
started this whole segment talking about
how this is, I guess, it's an insurance
driven thing. Um, and
so, uh, it's interesting that I guess do
you both not like the way that the
insurance companies are guiding the
system, but also think that it's good
practice to have doctors write those
notes for them or hold on. There's two
use cases for doctor's notes, right?
There is actually documenting for the
patient and for the rest of the care
team what has happened in this session.
And that I think is a super important
part of the work of doctoring. I believe
you that there's a lot of additional
paperwork that has to do with getting
the insurance companies to pay back. And
no, I don't like that system at all. It
does not the insurance companies are not
providing any value. They are just
vampires on our healthcare system in the
US. Okay. I think we can we can agree on
that front. I
mean, and anyway, but I I do I do think
that as this stuff gets better, I
understand a patient wants this to
happen. Do I think a doctor would be
giving them worse care if they allowed
the AI to summarize the notes or to pick
out the more important parts if this
stuff was working well? Not necessarily.
So that's a that's a big if. You know,
what does it mean when this stuff is
getting better and this stuff working
well? Do you mean kind of like the
absence of making up right?
Definitely. I mean, but we all we both
we we all agree that the doctor will
have to verify and check this
information after. Well, I guess the
problem is there then like then why are
we having the doctor double check that
to begin with, right? In an area where
the doctor has 15 minutes to see every
patient and there is an AI quote unquote
scribe doing or quote unquote AI I don't
want to call it AI scribe. There's an
automatic speech recognition tool doing
automatic speech recognition on these
things. In what in what space or with
what time does the doctor have to verify
those in an area? I mean this is I mean
like well the time that they would be
spending writing those notes in the
first place is verification an easier
task than transcription. I guess that's
my question. I would profer no. I mean
just from my experience using these
systems and I mean I not a doctor um
thank god um although although I've
thought about it um uh not that kind of
doctor uh to the sugar of my parents but
then I guess the question is is of that
is I in my experience I've used these
tools for interviews specifically and
kind of qualitative interviews with data
workers and have spent time with these
tools and have just had such an awful
time trying to think about this,
especially with regards to, you know,
this isn't, you know, this isn't medical
terminology, but it's terminology around
doing data work or talking about
training AI systems. And it just it is
such a terrible job. And at one point, I
I threw it all out and I said, "Okay, I
just am sending this to somebody to
actually transcribe because this is not
helpful for me and it's taking me more
time uh starting with the transcript and
then from doing it from scratch." and
I've transcribed, you know, I'm not a
primarily qualitative interviewer, but
I've spent time, you know, transcribing
dozens of interviews um in my research
career and have found it just very
difficult. So, I mean, I guess the
question is, is that
verification, you know, is that taking
the time that could be just be used for
the doctoring and working with patients?
And I mean like you know holding you
know everything about the insurance
industry you know stable you know like
is that kind of notion of thinking about
you know different patient um how the
patient presents how the patient is
describing their their their how they're
you know how they're presenting is that
you know that is often the work of doing
it you know and the medical training I
do have is that I am at one I was at one
point a licensed EMT. Uh, and writing up
PCRs is not like, you know, no one wants
to write up the PCRs. At the same time,
you're spending time taking note of how
a patient is presenting. The patient is,
you know, alexriythmic. Just bringing
back to the Alex's the, you know, the
patient is cyanotic around their lips.
Like these are things that, you know, a
a health care professional would be
paying attention to is making notes
maybe because they're like writing it
later. So, I'm thinking about this
process of writing and what it does to
our own practice of viewing and aiding
and administering medical care.
Okay. I mean, we'll agree to disagree on
this front. Uh but again, I I think we
are all on the same page that insurance
companies requiring additional writing
just because they hope you don't ever
get to the claim, so you don't file it.
That's probably bad. Uh, and we don't
think that there should be uh that that
there should be AI doctors at least yet.
I that's what I say. I think you guys
probably say never. So, all right. I
want to end on this, which is or maybe
maybe we can do two more topics. Um, I
guess like here's my question for you. A
lot of the discussion of AI's usefulness
in jobs uh in the book discusses this
these tools being imposed top down. Uh,
but what if they come bottom up? like
what if a worker can find use for them
and actually make their job easier by
getting good at using something like a
chat GPT or a claude or if a you know
again we like kind of talked through the
medical use case if a doctor does find
that this is useful for them are you
opposed to that so yes and I think that
actually Cadbury of all people put it
best there's this hilarious commercial
that was for the Indian market um sort
of showing how the supposed efficiencies
that you're getting out of this just
ramps up the speed of things and doesn't
leave you time to really get into the
work that you're doing and be there. Um
I think that you the most credible use
cases I've heard for these things are
first of all um as coding assistance
assistance. So that's sort of a machine
translation problem between natural
language and some programming language.
Um, and there I really worry about
technical debt. Um, where you have, you
know, output code that was not written
by a person that's not well documented
that becomes someone else's problem to
debug down the line. Um, but also in
writing emails, people hate writing
emails, and people hate reading emails.
So you get these scenarios where
somebody writes bullet points, uses chat
GPT to turn it into an email, and the
person on the other end might use ChatG
to summarize it. And it's like, okay, so
what are we doing here? like and again
taking a step back and saying what are
the systems that are requiring all of
this writing that everyone finds a
nuisance to write and to read can we
rethink those systems and also I just
have to say that if whenever I'm on the
receiving end of synthetic text I am
hugely offended and one of the things
that we put in the book is I definitely
got one of those emails yesterday and I
was like you use chat GPT for this I
know you did yeah if you couldn't be
bothered to write it why should I bother
to read it right yeah it's a good and I
mean I mean it is it's very interesting
putting this and like thinking about
cases in which workers are using this
kind of organically and I kind of like
in a case where it's you know
like this is the case where like first
off I've I've heard of very little of
that personally especially for
professional I mean I think there's
plenty of workers that are finding a lot
of use of this but I would say the
analog that I find to be where it's not
top down is in education and to that
degree I think that's kind of a failure
in thinking about like what education is
right. I mean, in that case, I want the
students to be using this to get through
their classes. Yeah. Right. Exactly.
About teachers putting stuff together.
Well, but I'm thinking about I'm
thinking about the students, right? And
I'm just thinking about areas in which
But I'm using that as sort of an analog
and then thinking about what are the
conditions that are forcing students to
use this, right? If there's kind of
cases in which this seems to be sort of
useful. Okay. What are the cases in
which what does that say about the job?
What does it say about like how the work
is oriented? Right? In that case then
maybe there might be needs to be kind of
different efficiencies or thinking about
how the job is operating right. I then
worry then that these things become
mandated in work environments and you're
saying well people are using this and so
everybody's using this and then where
does that leave the people who are
resistors or thinking about well I'm I
know this can't do a good job so where's
that putting me and I think we've
already seen such a justification for
this as being a place
where employers have been reducing
positions by the scores because there's
a notion that these tools can do these
jobs suitably and to a certain kind of
degree of proficiency which is just not
the case that has me worried about down
the line in these areas that Emily's
mentioned the kind of technical debt
area the kind of how do we know and
there's kind of an overestimation of
capabilities of these tools in that case
okay I know we're at time or close to
time can I ask you one question about
doomers before we get out of Sure. Let's
end by dogging on the doomers. Okay. So,
I definitely I saw that there was a
chapter about doomers here and um I was
excited to read it because my position
has been largely that those who are
worried that large language models are
going to turn us into paper clips are
either marketing what they're selling or
just very into I don't know they like
the smell of their own body odor because
it's it it I mean I guess it's not a
terrible thing to be worried about but
there's so much more and it seems so
unlikely that this is going to hurt us.
So, I definitely wanted to get your take
on why you are um why you're you're down
on dumerism. And I and let me just give
my one caveat here. There's a line in
your book that says uh the um that AI
safety is just doomerism and it's only
about these long-term problems. But I've
definitely heard some of the AI safety
folks like Dan Hendricks from the center
of AI safety talking about like really
important near-term issues like whether
this technology could help veriologists
with bad intent. Uh so I wouldn't I
wouldn't malign the entire AI safety
field, but the dumerous stuff I I hear
your point. All right. So so attack that
and then we'll get out of here. So, I
just want to put in a shout out for um a
new book by Adam Becker called More
Everything Forever. Um which really goes
deep into the connections between the
sort of dumerous thought and these more
palatable looking sides of what's called
effective altruism. Um and also in that
context there's a wonderful paper by um
Tamit Gabru and Emil Torres on what they
call the Tescrial bundle of ideologies.
Yes. You know, I think that if your um
if your concern about the systems is not
rooted in real people and real
communities and things that are actually
happening like even this like oh but bad
actors could use it to you know more
quickly designed you know uh viruses and
stuff like that that's still speculative
right so anytime we are taking the focus
away it's it's like has that happened
right this is this is still people
writing science fiction fanfiction for
themselves And you know it's not it's
based on these jumped up ideas about
what the technology can do and taking
the focus away from the actual harms
that are happening now including the
environmental stuff we started with
right but you
don't a virus right you want to get you
want to get ahead of that right like we
had with social media there were some
issues with social media but some of
these there was not a focus on some of
like the potential long-term issues and
that only came up later on at least in
the
you don't agree.
There are problems with social media for
sure. Um, and some of those problems
were documented and explained early on
and people were not paying attention,
but they were real problems that were
being documented as they were happening
as opposed to imaginaries about, well,
someone's going to use this and Dr. Evil
up a bad virus.
Okay.
Yeah. Go ahead, Alex. For the sake of
time, I think that's fine. I don't have
much to add. All right. Well, look, the
book is called The AI Con: How to Fight
Big Techs Hype and Create the Future We
Want. The authors are Emily M. Bender
and Alex Hannah. Emily and Alex, I've
been reading your work uh for a long
time. And it's great to have a chance to
speak with you. Like I said at the top,
you know, for those who are listening or
watching and uh you may not agree with
everything, either everything I said or
everything our guest said. Um, hey, at
least you now you know uh these
arguments and you know the arguments for
and against and uh we trust you to to
make up your own opinion and and do
further research and we've definitely
had plenty of good stuff to keep digging
into shouted out over the course of this
conversation. So Emily and Alex, great
to see you. Thank you so much for
joining the show. Thank you for this
conversation and enjoy Paris. Thanks
Alex. Have a great time. Thank you both.
Thank you everybody for listening. We'll
see you on Friday for our news recap.
Until then, we'll see you next time on
Big Technology Podcast.