AI’s Drawbacks: Environmental Damage, Bad Benchmarks, Outsourcing Thinking
Channel: Alex Kantrowitz
Published at: 2025-05-14
YouTube video id: WreiudYaenc
Source: https://www.youtube.com/watch?v=WreiudYaenc
Two of AI's most voseiferous critics join us for a discussion of the technologies weaknesses AMI abilities and a debate on the finer points of their arguments. We'll talk about it all after this. Welcome to Big Technology Podcast, a show for coolheaded nuance conversation of the tech world and beyond. We're joined today by the authors of the AI con. Professor Emily M. Bender is here. She's a professor of linguistics at the University of Washington. Emily, welcome. I'm glad to be here. Thank you for having us on your show. My pleasure. And we're also joined by Alex Hannah, the director of research at the Distributed AI Research Institute. Alex, welcome. Thanks for having us, Alex. Always good to have another Alex on the show. So, look, we try to get the full story on AI here. And so, today we're going to bring in, I think, two of the most vocal critics on the technology. They're going to state their case, and you at home can decide whether you agree or not, but it's great to have you both here. So, let's start with the premise of the book. What is the AI con? Emily, do you want to begin? Sure. So, the AI con is actually a a nesting doll situation of cons. Right down at the bottom, you've got the fact that especially large language models are a technology that is it's a parlor trick. It plays on our ability to make sense of language and makes it very easy to believe there's a thinking entity inside of there. This parlor trick is enhanced by various UI decisions. There's absolutely no reason that a chatbot should be using I, me, pronouns because there's no I inside of it, but they're set up to do that. So, you've got that sort of base level con. But then on top of that, you've got lots of people selling technology built on chat bots to, you know, be a legal assistant, to be a diagnostic system in a medical situation, to be a personalized tutor, and to displace workers, but also um put a band-aid over large holes in our social safety net and social services. So, it's it's cons from the bottom to the top. Okay. I definitely have things that I disagree with you in places on and we will definitely get into that in the second half especially about the usefulness of these bots and whether they should be using IRE pronouns and um the whole consciousness debate. We're going to get into that. Uh I don't think I don't think any of us think we're think that these things are conscious. I just think we have a disagreement on how much the industry has played that up. But let's start with what we agree on. And uh I think that from the very beginning, Emily, you were the author, lead author on this very uh famous paper about calling the large language models uh stochastic parrots. And at the very beginning of that paper, there is concern about the environmental safety and the environmental issues that large language models uh might bring about. So on this show we talk all the time about the size of the data centers, size of the models and of course there is an associated uh energy cost that must be paid to use these things. And so I'm curious if you Emily or you Anna Alex you worked at Google right so uh you probably have a good sense of this. Can you both share like quantify how much energy is being used to run these models? So part of the problem is that even you know even if you're working at Google you you are directly working on this they're not very public estimates of how much cost there is I mean the cost vary quite widely and the only cost I think that we know was an estimate being made by folks at hugging face um that worked on the blooms model because they were able to actually have some kind of insight into the energy consumption of these models So part of the problem is the transparency of companies on this. you know, as a response at Google after after the stochastic parents paper was published, one of the complaints from people like Jeff Dean, the SVP of research at Google, and David Patterson, who's the lead author of Google's kind of rebuttal to that was that, you know, well, you didn't factor in XYZ, you didn't factor in renewables that only we talk about at this one data center in Iowa. we didn't you didn't factor into off peak training and so it's part of the problem I mean we could try to put numbers on it but there's so much guardedness about what's actually happening here we can't quantify it we don't know when it comes to model training I mean we might have something like we know the number of parameters that are in a new model or in an open weights model like llama but um we don't know how many kind of fits and starts there with stopping training and restarting or experimenting. So, you know, we could speculate, but we know it's a lot because there are real effects in the world right now. What are those effects? What are those effects? Uh, so, um, you see communities losing access to water sources. You see communities, you see electrical grids becoming less stable. Um, and this is starting to be, I think, very well documented. There's a lot of journalists who are on the beat doing a lot of good work. And I also want to shout out the work of Dr. Dr. Sasha Luchion who's been looking at this from an academic perspective and one of the points that she brings in is that it's not just the training of the models but of course also the use and especially if you're looking at the use of chat bots in search instead of getting back a set of links which may well have been cached if you're getting back an AI overview which happens non-conensually when you try Google searches these days right um each of those tokens has to be calculated individually and so it's coming out one word at a time and that is far more expensive. I think her number is somewhere between 30 and 60 times more expensive just in terms of the compute which then scales up for electricity, carbon and water um than an oldfashioned search. I would also say that speaking about existing uh effects, there's also a lot of reporting coming out of Memphis right now, especially around um the methane generators that um XAI has been using to power a particular uh supercomputer there called Colossus there, specifically around emissions there um affecting Southwest Memphis, traditionally a black and impoverished community. There's also reporting on um well actually in research from uh UC Irvine in which looking at backup generators and emissions from diesel um that are support that are connected to the grid. But um just because the SLAs's on data centers are, you know, incredibly high um you effectively need some kind of a backup to kick in at some time and that's going to contribute to air pollution. And which communities have been affected by the loss of water due to AI data? So in I think the best reported one is the Dulles in in Oregon. I mean I think that's the one that is the best known that is kind of pre- AI in which we're focusing on the the development of Google's hyperscaling and it wasn't until the Oregonians sued the city that we knew that half of the water consumption in the city was going to Google's data center. Um, we're That was before generative AI. That was before generative AI. I mean, we have to imagine the problems probably exacerbated right now. But do we know that? I mean, you both wrote the book on this. Um, so we have we certainly point to environmental impacts as a really important factor. It is not the main focus of the book. I would refer people um to reporting of people like Paris Marks over at tech won't save us um did a wonderful series called uh data vampires looking at I think there was stories in Spain and in Chile um and yeah so this is uh you know we are looking at the overall con and the environmental impacts come in um because it is something we should always be thinking about and also because it is very hidden right when you access these technologies you're probably sitting you know looking at them through your mobile device or through your computer and the compute and its environmental footprint and the noise and everything else is hidden from you in the immateriality of the cloud. I would also say that I mean the reporting on Memphis I want to give a shout out to the reporting in Prism um by Ray Levy ua don't know if I'm pronouncing their surname correctly but they have an extensive amount about the kind of water consumption of this saying that this would take about I think a million gallons. Um I'm checking it but I'm looking I'm looking at the the the reporting on it. I think um uh I'm seeing uh the exact number on this. I'm going to look at it. Yeah. So, they they're focusing Yeah. a million gallons of water a day to cool computers. They don't they're saying that they need to build a graywater facil facility to do it. I mean, this is not anything that any um uh that these facilities don't exist yet, so they'd have to be built. But I mean this thing is already being constructed and is is is is using water. So I mean I don't think it's a far cry to say that this is happening in an era that was in the hyperscaling era in pre- the gentri AI era. I mean it's the unfortunate fact about it is that a lot of these community groups are fighting this on a very local level and a lot of these things are getting under reportported on just because but from what we know from the fights in the Dulles and in London County and uh in uh parts of rural Texas I mean we we'd be surprised if this similar kinds of battles weren't being fought right I agree with the under reporting and that's why we're leading with it here when we're going to go through a list of some of the things that might be wrong with with generative AI. I think it is an issue. I think Emily, you you basically hit on it, right? Where you're producing all these tokens uh when you're going to generate an AI overview that I checked and it is you cannot opt out of it. You're correct. Well, you can if you if you add minus AI to the query, okay, but you have to do that each time. You can't like put a setting somewhere. That's interesting. I didn't know about that. Okay. So, you can you can opt out minus AI. Uh but the these things do take more computing than traditional Google search. I guess the argument from these companies would be that they're just going to make their models more efficient. I mean, we see the increasing amounts of efficiency over time and you know, there might be a big upfront energy cost to train, but inference might end up being not that energy intensive. What would you say to that? I would say that we've got uh Brad Smith at Microsoft giving up on the plans to become uh net zero carbon since the beginning of Microsoft. and he said this ridiculous thing about we had a moonshot to get there and turns out with generative AI the moon is five times further away which is just an absurd abuse of that metaphor. Um but yeah and you see just you know Google similarly also backing off of their environment goals and so if there really was all these efficiencies to be had I think they wouldn't be doing that backing off. And I want to also add, I mean, I think this argument about the large amount of training in in carbon use on the front end and then it tapering off with the inference. I mean, this is an argument that straight came from Google. This is this was again in the same paper by David Patterson. I think the title of the paper, I'm not going to get it exactly right, was you know cost of um the cost of training or the cost of generative AI will um probably not generative I think it was the cost of language models will plateau and then decrease uh or the training cost and effectively the argument being that you have this large investment that we can offset with renewables and then it's going to decrease but you have to also consider that given that the economics surrounding it It's not one company training these, right? I mean, if it's multiple different companies training these and in multiple different companies providing inference and so as long as there there's some kind of incentive to keep on putting this in products, then they're going to proliferate. So, if it was just Google, sure, maybe there might be a case in which there was some kind of planning and there was some kind of way to measure and focus on that and then it actually tapering down. But you have Google, Anthropic, XAI, of course, OpenAI, Microsoft, Amazon, everyone trying to get a piece doing both training and doing inference. So I think that's again, you know, like it's hard to put numbers on it, but what we see in this is just the massive investment in this and that gives a good signal to say that the carbon costs have to be incredibly high. Right? Look, I I think it's it's important for us to again to lead here. It's clear that there's some real environmental impacts and I mean we have Jensen Wong the CEO of Nvidia saying inference is going to take 100 times more compute than uh than traditional LLM uh inference and uh every c every well every top executive from these firms that I've asked well is inference going to take more compute it's not exactly as much as Jensen is saying but there is a spectrum so these things are going to be more energy intensive if everybody everybody listening out there, I do think, you know, this is important context to take in that when we talk about AI, there's an environmental co cost out there. It's not fully clear what that is, although there is one, and I agree with the authors here that more transparency makes a lot of sense. Now, let's talk about another issue that you bring up in the book, which is benchmark gaming. It's been a hot topic in our big technology discord over the past couple weeks that we see these research labs uh keep telling us that they have reached a new benchmark or beat a certain level on uh a new test and we're all trying to figure out what that means because it does seem like a lot of them are training to the test and you have some point of criticism in the book about the gaming of benchmarks and what that's meant to tell us. So just lay it out for us uh what's going on with benchmarks and tell us about the gaming Emily. So yeah so when you say the gaming of benchmarks that makes it sound like the benchmarks are reasonable and they're being misused but I think actually most of the benchmarks that are out there are not reasonable. They lack what's called construct validity. And construct validity is this two-part test of the thing that we are trying to measure is a real thing and this measurement correlates with it interestingly. But nobody actually establishes what these things are meant to measure as a real thing, let alone that second part. And so they are useful sales figures, right, to say, hey, we now have state-of-the-art soda on whatever. Um but it is not interestingly related to what it's named as measuring, let alone what the systems are actually meant to be for. Yeah. And I would just add that I mean there's a lot of work and I mean we prior to the book Emily and I have spent a lot of time writing on benchmark data sets and so this has been you know like I'm personally obsessed with the imageet data set. I'm thinking of another book on the IMstead data set just from what entails but I mean you know the benchmarks what they purport to there's a there's a lot of different problems in the benchmarks right so the construct validity is probably first and foremost and when we get something where you have something like med palm 2 or med palm 1 and two being measured on the US medical licensing exam that's not really a test that determines whether one is sufficed to be a medical press practitioner. There's so much more involved with being a medical practitioner uh above and beyond taking the US medical license exam. You can't take the bar and say you're ready to be a lawyer, right? I mean, there's so much more that has to do with with uh relationships and training and other types of professionalization. There's huge literature in in sociology and so sociology of occupations on what professionalization looks like and what it entails and what kind of social skills involved and what that means and how to be adept at um being in the discipline. But then the car the um um um the um the kind of different benchmarks are there's so many different problems just in terms of the way that companies are doing science themselves. they're releasing these benchmarks and often these are benchmarks that they themselves have created and released. So it may be the fact that they are quote unquote teaching to the exam but they're also they have no kind of external validity in terms of what they're trying to do. So OpenAI is saying we had a model that did so well we had to create a new benchmark for it. Well, who's validating that, right? I mean even the old benchmarking culture you had external benchmarks and multiple people would be going to it and saying ah we've done better in this benchmark. Now OpenAI is saying we have our own benchmarks cuz we did it so well. Not like the old system was any better but this new system is that well where's the independent validation of this that it says it can do this thing that it's purported to say. Um what do you think about the ARC AGI test? Yeah. Well, I mean we spent some time focusing on the ARGI test, right? The AR AGI test it is independent at least that it is it is ostensibly independent. I mean it is this is that the the French? Yeah. By the way, for for everybody who's listening, it basically asks, let me see if I get this right. It asks the models to be able to generalize its ability to understand patterns and putting shapes together. I think that's the best way to explain. Yeah. So, it's it's a it's a bunch of visual puzzles where uh it's I think they are all in 2D grids and in order to make this something that a large language model can handle, those 2D colorful things are turned into just sequences of letters. And the idea is that you have um I think it's it's sort of a a few shot learning setup where you have a few exemplars and then an input and the the thing is can you find an output like that and it is when we want to talk about how the names of the benchmarks are already misleading the fact that that's called arc AGI right that suggests that it's testing for AGI it's not it's one specific thing and I think Chole's point is that uh this is something that is a very different kind of task than what people are usually using language models for. And so the sort of gesture is towards generalization that that if you can do this even though you weren't trained for it, then that's evidence of something. But if you look at the um OpenAI paper-shaped object about this, uh they used a bunch of them as training data in order to like tune the system to be able to do the thing. So all right, fine. Supervised machine learning kind of works, right? And the next the next test there was ARGI 2 that came out with a whole bunch of new problems and instantly all the models started doing poorly on those. So let me let me just ask this. Is there a measure that would allow the two of you to assess whether these AI models are useful or have you just written off their ability to be useful completely? So useful for what? I mean you you tell me. Well, that's that's sort of my point is that I think it's perfectly fine to use machine learning to do specific tasks and then you set up a measurement that has to do with the task in context. I'm a computational linguist, so things like automatic transcription are very much in my area. If I were going to evaluate an automatic transcription system, I would say, okay, who am I using it for? What kind of speech varieties? I'm going to collect some data, people speaking, uh have someone transcribe it for me, a person, and then evaluate how well the various models work on doing that transcription. And if they work well enough and it is to within the tolerances of the use case for me then great that's good. Do you believe ability to be general? So the ability to be general um and here um I'm thinking of the work of Dr. Tunique Gabru is not an engineering practice. That's an unscoped system. So what what Dr. Gabru says is the first step in engineering is your specifications. What is it that you're building? If what you're building is general, you're off on the wrong path. That that's not something that you can test for and it is not well scoped technology. Yeah. I mean, this notion of generality has always had some specificity in AI too. I mean, we mentioned in the book this idea of this uh uh this this is a word I struggle with and I have I've taken so many time but the I'm just going to say fruit flies, right? Right. the Josephila the the kind of fruitfly model of genomics this idea that you have some kind of sequencing that's very common to this one very specific species right and there is in the past what that's become in AI is the game of chess it's been game playing right I mean and these are very specific tasks and those aren't those don't generalize to something called general intelligence as if something like that actually exists. I mean, one of the problems in AI research is that the notion of intelligence is very very poorly defined and the notion of generality is very poorly defined or is scoped to what the actual benchmark or what the task is that it is being attempted to achieve. So I mean that's I mean so this notion of generality is very poorly understood and it is deployed in a way that is that makes it sense sound like there is a notion of kind of general intelligence and it seems to be the fact I mean and there's you know one of the one of the um a great paper that we we we point to in the footnotes of the book is this paper by Nathan Ensmerger um which um Ensenger um that is talking about how chess became the the Drosophilia of the um the uh the AI research age and the prior AI hype cycle in the '60s and ' 70s. And it just happened to be you had a lot of guys that liked chess and they wanted to compete with the Soviets who had chess dominance, right? And so those tasks become kind of these tasks about like well these are the things we kind of like and and we're actually seeing some of that again it's like well we these are tasks that we think are suitable these are tasks that are scoped in a way that think we think are the most worthwhile problems but they're not tasks to think about well what exists in the world that is going to be helpful and useful and scoped to specific execution right this this notion of an everything system is is wildly unscoped But okay, so it is unscoped. But I think everybody listening or watching right now would probably say, well, just my basic use of chatbt, it can tell me about history. It can write a poem. It can create a game. I Okay, I see Emily reacting already. Uh, it can search the web and give me plans. It can do all these different things in these different disciplines. So there is a I think for people listening there would be a sense that there is an ability to go into various different disciplines and perform and whether you say it's a magic trick or not it's it's clearly that it can and so what I'm I guess I'm trying to get at is do I mean is there a way to measure that or do you think that that is in itself a wrong assertion? So yes, I think it's a wrong assertion. What chat GPT can do is it can mimic human language use across many different domains. And uh so it can produce the form of a poem. It can produce the form of a travel itinerary. It can produce the form of a Wikipedia page on the history of some event. Uh it is an extremely bad idea to use it if you actually have an information need. setting aside the environmental impacts of using chat GPT and setting aside the terrible labor practices behind it and the awful exploitation of data workers who have to look at the terrible outputs so that the consumer sees fewer of them and by terrible outputs I mean violence and um racism and all kinds of sort of psychological we covered that on the show. Yes. Was that No, we we've we've had one of the uh people who've been um rating this content on the show. Folks who are interested, I I'll link it in the show notes, Richard was here to talk about what that experience was like. But sorry, go ahead. So, setting aside all of that, if you have an information need, um so something you genuinely don't know, then taking the output of the synthetic text extruding machine doesn't set you up to actually learn more on a few levels, right? because you don't already know, you can't necessarily quickly check except maybe doing an additional search without chat GPT at which point why not just do that search. But also um it is poor information practices to assume that the world is set up so that if I have a question there is a machine that can give me the answer when I'm doing information access. Instead, what I'm doing is understanding the sources that that kind of information comes from, how they're situated with respect to each other, how they land in the world. Um, and so this is some work I've done with Shrag Shaw on information behavior and why chat bots even if they were extremely accurate would actually be a bad way to do these practices. So just to you know back to your point yes this system is set up to output plausible looking text on a wide variety of topics and that's there in lies the danger because it seems like we are almost there to the robo doctor the robo lawyer the robo tutor and in fact not only is that not true not only is it environmentally ruinous etc but that is not a good world to live in and thinking about thinking about can I just I just want to hit on this point uh this is I agree I disagree with you on this one I I do think that some of the points that you're making are wellounded. We don't want these things to be lawyers right away. But let me at least point you to one use that I've I've had recently and you could tell me where I'm where I'm going wrong if you think I am. I mean, I'm in Paris now. Uh little work, little vacation at the same time. And what I've done is I've taken two documents that I've had uh friends uh who they have they've been here often. They put together documents that they send to friends when they go here. I've uploaded that into chat GPT and then I have chat GPT like search the web and give me ideas of what to do. I tell it where I am. I tell it where I'm going and it searches through like for instance like all the museums, the art galleries, the festivals, the concerts and it brings it into one place. And that's been extremely useful to me to to find new cultural events, uh concerts. Uh there's even a bread festival going on here that I had no idea about. Uh and now I'm going to go because it's found it for me. So there's a link when it when it comes to this stuff, there's there's a link uh that you can go out and doublech checkck the work. But as far as finding information on the web, um the the fact that it's able to go and comb the internet for these events and then take into context some of the um the context that I've given it with these documents, I think is very impressive. And that's just one use case. So I'm not asking it to be a lawyer. I'm kind of asking it to be what you said, an itinerary planner. What's wrong with that? Uh so I mean first of all you had these lovely documents from your friends and I guess what you're saying is missing is whatever current events are. So they've given you some sort of like these are general things to look for but they haven't looked into what's going on right now. Um what's wrong with that? You know on several levels um what what would we do in a prior age like even pre- internet right the local newspapers would list current events. Here's what's going on. If you landed in a city, you would go find the local probably local indie newspaper and and look up the events page. And that system was based on a series of relationships within the community between the people putting on festivals and the newspaper writers. And it helped support probably the the local news uh information ecosystem, which was a good thing. Um but on top of that, uh if something wasn't listed, you could think about why is this not listed? what's the relationship that's missing? Um, your chat GPT output is going to give you some nonsense and you're right, this is a use case where you can verify whether this is real or not. Um, it is also uh likely going to miss some things and the things that are not surfaced for you are not surfaced because of the complex set of biases that got rolled into the system plus whatever the role of the die was this time. Um, and anytime someone says, "Well, I need chat GPT for this." Usually, one of two things is going on. Usually it's either there's another way of doing that that is giving you more opportunities to be in community with people to make connections, or there is some serious unmet need, which doesn't sound like it's this case. And if we sort of pull the frame back a little bit, we can say, why is it that someone felt like the only option was a synthetic text extruding machine? And here I think you've fallen into the the former of these, which is what are you missing out on by doing it this way? What are the connections you could be making to the people around you? Um, you know, the if you're staying in an Airbnb, maybe the Airbnb host, if you're in a hotel, the concierge, um, to to get answers to these questions when you're looking to the machine instead. I would also say I would also say this is I just want to add, you know, I mean, I would also say that this is a pretty like low stakes scenario, right? You can go out, you can verify these things. You can go to existing resources of you know event calendars that people also spend a lot of time curating online. I mean there's a lot of stuff that's already curated online. And I mean it's not like this this didn't exist in prior incidents of technology. I mean you know one of the people that we cite in the book and talk a lot about is Dr. Joseph Noble's work on Google and and the kind of way that Google results um you know present very uh violent content with regards to racial minorities. One of the parts of the book that I I I like to reference and that a lot of people don't reference uh um uh uh initially is this this kind of part that she talks about. She talks about Yelp and she talks about Yelp and like um specifically and what it's referring in terms of a um black hairdresser and the way that like Yelp effectively was like shutting this person out of business um because there was a specific need that she had for for um uh black residents of the city that she was studying and braiding hair and doing other black hairstyles, right? And so this is this is kind of a a function of all kind of information retrieval systems, right? You're thinking about what they what they're including, what they're excluding, right? So this is not very consequential here, but in any kind of any kind of area of say summarization or any any kind of retrieval, you do need to have some kind of expertise where you can verify that and ensure that what's getting in there is not missing something huge. And you're it's going to basically then take this set of information access resources or systems in this case crawling the web and and knowing that that's going to miss something and then it's going to exacerbate that because then you cannot situate those sources in context. Okay, let me just give my counterargument and then we can move on from this. Uh my counter argument would be a couple things. Uh first of all, I don't speak French so the local newspaper would kind of be lost on me. Uh, I am speaking. Okay, so I am staying at a resident's place. We swapped apartments. So, she's in my New York apartment. I'm here. Uh, so maybe we she and I could have gone over that newspaper together. That's That's fair. But the newspaper, speaking of things that leave stuff out, the newspaper leaves stuff out all the time. It ex It exercises editorial judgment. So, it is bot editorial judgment for newspaper editorial judgment, but the bot can be in some ways more comprehensive because it's searching the entire web. Uh, and I'll just say one one last thing about this. I never felt I didn't feel the need to use it. Um, I didn't say I need to use it to figure out what's going on. Like again, I had these documents. What's useful about it is speaking of making connections with the local community, if I'm able to, here's the word, be efficient uh in my research through using it, I could spend much more time uh out in the community versus searching the web or reading the newspaper. So, what's your thought on that on on those arguments? Um, sorry. So, the I was getting distracted by Alex's cat walking around. Yes, listeners. Alex's cat is here. Alex, what's your cat's name? This is This is Clara. And I'd lift her up, but my uh I have a shoulder injury. Um but she is um she's knocking the mic around, so I'm going to not I'm just trying to keep her off the mic. Yeah. Yeah. Thank you. So, um you know, the the efficiency argument um so this is efficiency argument in the context of leisure activities as opposed to in the context of work. Um you mentioned along the way that it is searching the whole web for you. You don't know that actually. That's right. Right. Yeah. Um, and also the whole web uh includes a lot of stuff that you don't actually want. Like lots and lots and lots of the web is just garbage SEO, you know, stuff. Um, and maybe you're seeing more of that in your chat GPT output than you would in with a search engine, which as Alex mentioned also has issues. Um, and then finally, I'm going to take SEO garbage is made for the search engine. So, uh, it is, but the search engines also in order to stay in business have to be fighting back against the SEO garbage. It's a constant battle. Probably the chat bots as well. Yeah. Um, so you mentioned newspaper editorial judgment versus bot editorial judgment. And I'm going to take issue there because a bot is not the kind of thing that can have judgment, nor is it the kind of thing that can have accountability for exercising judgment. And so I think that yes, as Alex is saying, this is low stakes, but if you're using it as sort of a motivation for these things being useful in the world, then you have to deal with the fact that the useful in the world is going to entail many more higher stakes things. And then we really have to worry about accountability. I would also want to say too, I mean there's a lot of I think this argument from cap like quote unquote capabilities which I don't know really what that term means. Um and that's another poorly defined term I think especially when it comes to AGI. Um but I mean this argument from kind of like well I find it useful I don't find terribly convincing right? I mean it's sort of like well okay you you f you have found it useful in in either a situation in which there is a way to have some kind of verification of sources that you know about and have some kind of ground truth about or you found it useful kind of from a variety of this of these different situations. But if I'm asking a chatbot about something about an area that I know quite a lot about, say sociology or social movements literature, um I then have that knowledge to verify that just from my social skill in that area. And this is a term I'm kind of borrowing from a sociologist, uh Neil Fleststein, and my knowledge of how to navigate those areas and my professionalization as a sociologist. Okay. But then I'm but then once it gets into those areas in which verifiability just escapes me which is most areas because we're not professionals in most areas and although a lot of us want to be jacks of all trade jacks and gills of all trades then we lose that ability and we we don't have the we don't have the social skill or depth of knowledge to to verify that in the same way. And so I'm really not convinced by those. Well, these are useful for me in these pretty low state contexts because that slippage then means that we're going to miss some pretty big things in some really dire contexts. Okay. Well, let's turn it up a notch when we come back because we're going to talk about AI at work and AI in the medical context. And maybe we can even touch a little bit on dumerism, which you write about in the book. And uh and there's plenty else on the agenda. So, we'll be back right after this. And we're back here on Big Technology Podcast with Professor Emily M. Bender and Alex Hannah. They are the authors of the AICON, how to fight big tech's hype and create the future we want. Here it is. So, let's go to usefulness and we'll start with AI uh generative AI in the medical context because why don't we just go straight for uh the example that we'll probably have the biggest disagreement on here. And I'm not saying that I think generative AI should play the role of a doctor. In fact, when I wrote my list of things I uh agree with you both on, I don't think that AI should be a therapist, at least not yet. And we know now that AI is the number one use according to a recent study is companionship and therapy. And the therapy side really scares me. And I think the companionship isn't the best thing in the world either. But in medicine uh I do find that there is some use for it. Medicine is a field overrun by uh paperwork and insurance uh requirements that I think keep have have ruined the health care system because they keep doctors effectively tied to their computers writing notes as opposed to uh seeing patients or living their lives. And Alex, before the break, you mentioned that one of the areas that this stuff is useful is when it starts to operate in your area of expertise because you're able to verify that. So, I mean, we're going to go with one use that I find uh to be pretty good here and would sort of to me doesn't make AI, generative AI feel like a con is when a doctor is seeing a patient and the AI they can put a transcription uh take a transcription of the conversation that they have with the patient and then have AI synthesize what they talked about and summarize it and put it into the systems that they have for electric medical records and then verify that so they don't have to spend the time writing those summaries up and can actually go and spend some more time with patients. So what's the problem with that? There are so many problems with that. And the first thing I want to say is that you named the underlying problem when you talked about insurance requiring so much paperwork. So this is one of those situations where there's a real problem here. Um it's not that doctors shouldn't be writing clinical notes. that is actually part of the care but there is a lot of additional paperwork that is required because of the way insurance systems and especially the one in the United States are set up and so we could work on solving that problem and this is a case where sort of the turn towards large language models so-called generative AI as an approach to this is showing us the existence of an issue um but that doesn't mean it is a good solution so many problems um one is writing the clinical note is actually part of the process of care it is the doctor reflecting on what came out of that conversation with the patient and thinking it through, writing it down, plans for next treatment. That is not something that I want doctors to get out of the habit of doing as part of the care. Now, they might feel like they don't have time for it. That's also a systemic issue. Secondly, these things are set up as like ambient listeners, which is a huge privacy issue. As soon as you've collected that data, it becomes sort of this like radioactive pile of danger. Thirdly, you've got the fact that uh automatic transcription systems which are the first step in this do not work equally well for different language varieties. So think about somebody who's speaking a second language. Think about somebody who's got disarthria. So an older person whose speech isn't very clear. Think about a doctor who is an immigrant to the community that they're working in who's got extra work to do now because their words are not well transcribed and so the clinical notes thing doesn't work well for them. But the system is set up where there's these expectations that they can see more patients because the AI in quotes is taking care of all of this for them. And there's a beautiful essay that came out recently, I think in stat news, and I was looking for the name of the author. Didn't find it real quick, really reflecting on how important it is to her that the doctor do that part of the care of actually pulling out from the conversation. This is what matters. And it's not just simple summarization. It is actually part of the medical work to go from the back and forth had with the patient all of the doctor's expertise to what goes into that note. Yeah. So I want to add on Emily has said so much of what I wanted to get at which I think is but I have I think three or four separate points in addition to that. So first off is the technical point. So there's so tools that are that are purported to be summarization. There's uh some great reporting by by Garren Spur and Hilda Shelman and and and the AP from last October that was looking at Whisper specifically. So that's OpenAI's ASR system, automated speech recognition system that said that medical transcription had basic basically was making up a lot of And then we knew that this they had quote unquote hallucinations. Again, that's not a term that we use in the book. We we we we say that it's I say it's making up, but that is maybe even granting too much anthropomorphizing of the system for me. Um and and so but there is a lot of these things some from that quoting from that text. Some of that invented text includes racial commentary, violent rhetoric and even imagined medical treatments. So that's a one major problem. The second problem is that medical transcription has been this area which has been an area in which medicine has been forcing kind of this casualization of work for years right and so medical note takingaking that exists in hospitals now many of much of that is done remotely so it's gone and take this taken this work that has been seen as kind of like this busy work or this this thing that like I don't want to write up my medical notes to be this type of work that needs to be forced on someone else. So, prior to this kind of ASR element of it is is we've had these Oh, thanks for linking that, Emily, and I'll link the um I'll link the AP article that I'm that I'm looking at, too. Part of that work has actually been offshored a lot into this kind of movement of of outsourcing. So, a lot of that is done remotely um as as a part of this casualization. And this seemed to be I think uh to be a lot of um a lot of I I want to point out the gender n notion of this. This is like a very kind of like women'sbased work and that reflects a lot of the ways in which so much of uh quoteunquote AI technology wants to basically take the work that has been traditionally the domain of women and is saying well we can automate that or we can casualize that in different ways. And that's important because it sees this work as not actually part of quote unquote the work. It is seen as um work that should be that ought to be casualized and offshored. And so and I appreciate the essay that that Emily shared because um that essay saying like no this is actually part of the element of doctoring. And then I want to also just kind of couch all of this stuff in the kind of political economy of the medical industry. Thinking about what does it mean to rush and put and have more and more remote medicine, having more and more doctors see more patients. And these efficiency gains from doctors isn't going to like make their jobs necessarily easier. It's going to put more of a pressure on them. Now that you're in a position where you don't have to take medical notes, you're going to be running from position to from appointment to appointment to appointment. And my sister is a nurse. She's a nurse practitioner. And she's basically seeing this in her job right now at her clinic. She's well like now we have these things where I have to see patients now, you know, and if you know, it's it's not that I'm going to go and be on the beach anywhere. It means that I'm going to have, you know, I'm going to have 9 to 10 15 minute appointments a day. I'm not going to have enough proper treatment uh proper time to spend with patients. So if these things could be you know like I I would say to the kod to all of this is that if AI boosters could really offshore all of doctoring to chat bots they would and this is one case in which Bill Gates has said you know in 10 years we're not going to have teachers and doctors. What a nightmare scenario to have non-teers and not doctors. And Greg Curado really gives it away and we cite him in a book where he says a med palm too, you know, this thing is really efficient. We're going to increase tenfold our medical ability, but I wouldn't want this to be part of my family's medical uh medical journey. Okay. But you're again here you're you're picking out what what is like some of the most extreme statements and I I I started my question saying it's Bill Gates and he Bill Gates can make extreme extreme statements. I I don't think that guy. I don't think he's the guy. And I I think that that that doesn't reflect the broad consensus here and definitely not the question that I asked which again was about using using this to um take some of the time that the doctors are using you know in paperwork and give that back to either the doctors or to have them be able to see more patients. So we very much addressed that point. First of all I want to name the author of that essay. Her name is Aliyah Barakat and it's a beautiful essay. She's a mathematician and also a patient with a chronic condition. Wonderful essay. But yeah, you said give that time back to the doctors or have them see more patients, right? It is not going to be going back to the doctors. That's not how our healthcare system works. And it's also going to therefore decrease quality of patient care. It is lose-lose except for uh the uh hospitals maybe getting more money and certainly the tech companies that are selling this to the hospitals. Okay. I'm also curious in terms of thinking about it. I mean, yes. What is the I I'm curious in thinking about the more nuanced position and like who are the reference here that you're thinking of, Alex? What's the consensus on this cuz I I'm I Yeah, like we see the egregious, you know, elements of this and I'm wondering what the medical consensus is, you know, like who's what's an example, you know, just to poise it now I'm interviewing you, but like who's someone that you think is doing this very well on? Well, I I mean, someone doing this well, like again, I don't think that this stuff is welldeveloped yet. Uh, but I've definitely seen enough doctors just buried in paperwork. Uh, and we we started this whole segment talking about how this is, I guess, it's an insurance driven thing. Um, and so, uh, it's interesting that I guess do you both not like the way that the insurance companies are guiding the system, but also think that it's good practice to have doctors write those notes for them or hold on. There's two use cases for doctor's notes, right? There is actually documenting for the patient and for the rest of the care team what has happened in this session. And that I think is a super important part of the work of doctoring. I believe you that there's a lot of additional paperwork that has to do with getting the insurance companies to pay back. And no, I don't like that system at all. It does not the insurance companies are not providing any value. They are just vampires on our healthcare system in the US. Okay. I think we can we can agree on that front. I mean, and anyway, but I I do I do think that as this stuff gets better, I understand a patient wants this to happen. Do I think a doctor would be giving them worse care if they allowed the AI to summarize the notes or to pick out the more important parts if this stuff was working well? Not necessarily. So that's a that's a big if. You know, what does it mean when this stuff is getting better and this stuff working well? Do you mean kind of like the absence of making up right? Definitely. I mean, but we all we both we we all agree that the doctor will have to verify and check this information after. Well, I guess the problem is there then like then why are we having the doctor double check that to begin with, right? In an area where the doctor has 15 minutes to see every patient and there is an AI quote unquote scribe doing or quote unquote AI I don't want to call it AI scribe. There's an automatic speech recognition tool doing automatic speech recognition on these things. In what in what space or with what time does the doctor have to verify those in an area? I mean this is I mean like well the time that they would be spending writing those notes in the first place is verification an easier task than transcription. I guess that's my question. I would profer no. I mean just from my experience using these systems and I mean I not a doctor um thank god um although although I've thought about it um uh not that kind of doctor uh to the sugar of my parents but then I guess the question is is of that is I in my experience I've used these tools for interviews specifically and kind of qualitative interviews with data workers and have spent time with these tools and have just had such an awful time trying to think about this, especially with regards to, you know, this isn't, you know, this isn't medical terminology, but it's terminology around doing data work or talking about training AI systems. And it just it is such a terrible job. And at one point, I I threw it all out and I said, "Okay, I just am sending this to somebody to actually transcribe because this is not helpful for me and it's taking me more time uh starting with the transcript and then from doing it from scratch." and I've transcribed, you know, I'm not a primarily qualitative interviewer, but I've spent time, you know, transcribing dozens of interviews um in my research career and have found it just very difficult. So, I mean, I guess the question is, is that verification, you know, is that taking the time that could be just be used for the doctoring and working with patients? And I mean like you know holding you know everything about the insurance industry you know stable you know like is that kind of notion of thinking about you know different patient um how the patient presents how the patient is describing their their their how they're you know how they're presenting is that you know that is often the work of doing it you know and the medical training I do have is that I am at one I was at one point a licensed EMT. Uh, and writing up PCRs is not like, you know, no one wants to write up the PCRs. At the same time, you're spending time taking note of how a patient is presenting. The patient is, you know, alexriythmic. Just bringing back to the Alex's the, you know, the patient is cyanotic around their lips. Like these are things that, you know, a a health care professional would be paying attention to is making notes maybe because they're like writing it later. So, I'm thinking about this process of writing and what it does to our own practice of viewing and aiding and administering medical care. Okay. I mean, we'll agree to disagree on this front. Uh but again, I I think we are all on the same page that insurance companies requiring additional writing just because they hope you don't ever get to the claim, so you don't file it. That's probably bad. Uh, and we don't think that there should be uh that that there should be AI doctors at least yet. I that's what I say. I think you guys probably say never. So, all right. I want to end on this, which is or maybe maybe we can do two more topics. Um, I guess like here's my question for you. A lot of the discussion of AI's usefulness in jobs uh in the book discusses this these tools being imposed top down. Uh, but what if they come bottom up? like what if a worker can find use for them and actually make their job easier by getting good at using something like a chat GPT or a claude or if a you know again we like kind of talked through the medical use case if a doctor does find that this is useful for them are you opposed to that so yes and I think that actually Cadbury of all people put it best there's this hilarious commercial that was for the Indian market um sort of showing how the supposed efficiencies that you're getting out of this just ramps up the speed of things and doesn't leave you time to really get into the work that you're doing and be there. Um I think that you the most credible use cases I've heard for these things are first of all um as coding assistance assistance. So that's sort of a machine translation problem between natural language and some programming language. Um, and there I really worry about technical debt. Um, where you have, you know, output code that was not written by a person that's not well documented that becomes someone else's problem to debug down the line. Um, but also in writing emails, people hate writing emails, and people hate reading emails. So you get these scenarios where somebody writes bullet points, uses chat GPT to turn it into an email, and the person on the other end might use ChatG to summarize it. And it's like, okay, so what are we doing here? like and again taking a step back and saying what are the systems that are requiring all of this writing that everyone finds a nuisance to write and to read can we rethink those systems and also I just have to say that if whenever I'm on the receiving end of synthetic text I am hugely offended and one of the things that we put in the book is I definitely got one of those emails yesterday and I was like you use chat GPT for this I know you did yeah if you couldn't be bothered to write it why should I bother to read it right yeah it's a good and I mean I mean it is it's very interesting putting this and like thinking about cases in which workers are using this kind of organically and I kind of like in a case where it's you know like this is the case where like first off I've I've heard of very little of that personally especially for professional I mean I think there's plenty of workers that are finding a lot of use of this but I would say the analog that I find to be where it's not top down is in education and to that degree I think that's kind of a failure in thinking about like what education is right. I mean, in that case, I want the students to be using this to get through their classes. Yeah. Right. Exactly. About teachers putting stuff together. Well, but I'm thinking about I'm thinking about the students, right? And I'm just thinking about areas in which But I'm using that as sort of an analog and then thinking about what are the conditions that are forcing students to use this, right? If there's kind of cases in which this seems to be sort of useful. Okay. What are the cases in which what does that say about the job? What does it say about like how the work is oriented? Right? In that case then maybe there might be needs to be kind of different efficiencies or thinking about how the job is operating right. I then worry then that these things become mandated in work environments and you're saying well people are using this and so everybody's using this and then where does that leave the people who are resistors or thinking about well I'm I know this can't do a good job so where's that putting me and I think we've already seen such a justification for this as being a place where employers have been reducing positions by the scores because there's a notion that these tools can do these jobs suitably and to a certain kind of degree of proficiency which is just not the case that has me worried about down the line in these areas that Emily's mentioned the kind of technical debt area the kind of how do we know and there's kind of an overestimation of capabilities of these tools in that case okay I know we're at time or close to time can I ask you one question about doomers before we get out of Sure. Let's end by dogging on the doomers. Okay. So, I definitely I saw that there was a chapter about doomers here and um I was excited to read it because my position has been largely that those who are worried that large language models are going to turn us into paper clips are either marketing what they're selling or just very into I don't know they like the smell of their own body odor because it's it it I mean I guess it's not a terrible thing to be worried about but there's so much more and it seems so unlikely that this is going to hurt us. So, I definitely wanted to get your take on why you are um why you're you're down on dumerism. And I and let me just give my one caveat here. There's a line in your book that says uh the um that AI safety is just doomerism and it's only about these long-term problems. But I've definitely heard some of the AI safety folks like Dan Hendricks from the center of AI safety talking about like really important near-term issues like whether this technology could help veriologists with bad intent. Uh so I wouldn't I wouldn't malign the entire AI safety field, but the dumerous stuff I I hear your point. All right. So so attack that and then we'll get out of here. So, I just want to put in a shout out for um a new book by Adam Becker called More Everything Forever. Um which really goes deep into the connections between the sort of dumerous thought and these more palatable looking sides of what's called effective altruism. Um and also in that context there's a wonderful paper by um Tamit Gabru and Emil Torres on what they call the Tescrial bundle of ideologies. Yes. You know, I think that if your um if your concern about the systems is not rooted in real people and real communities and things that are actually happening like even this like oh but bad actors could use it to you know more quickly designed you know uh viruses and stuff like that that's still speculative right so anytime we are taking the focus away it's it's like has that happened right this is this is still people writing science fiction fanfiction for themselves And you know it's not it's based on these jumped up ideas about what the technology can do and taking the focus away from the actual harms that are happening now including the environmental stuff we started with right but you don't a virus right you want to get you want to get ahead of that right like we had with social media there were some issues with social media but some of these there was not a focus on some of like the potential long-term issues and that only came up later on at least in the you don't agree. There are problems with social media for sure. Um, and some of those problems were documented and explained early on and people were not paying attention, but they were real problems that were being documented as they were happening as opposed to imaginaries about, well, someone's going to use this and Dr. Evil up a bad virus. Okay. Yeah. Go ahead, Alex. For the sake of time, I think that's fine. I don't have much to add. All right. Well, look, the book is called The AI Con: How to Fight Big Techs Hype and Create the Future We Want. The authors are Emily M. Bender and Alex Hannah. Emily and Alex, I've been reading your work uh for a long time. And it's great to have a chance to speak with you. Like I said at the top, you know, for those who are listening or watching and uh you may not agree with everything, either everything I said or everything our guest said. Um, hey, at least you now you know uh these arguments and you know the arguments for and against and uh we trust you to to make up your own opinion and and do further research and we've definitely had plenty of good stuff to keep digging into shouted out over the course of this conversation. So Emily and Alex, great to see you. Thank you so much for joining the show. Thank you for this conversation and enjoy Paris. Thanks Alex. Have a great time. Thank you both. Thank you everybody for listening. We'll see you on Friday for our news recap. Until then, we'll see you next time on Big Technology Podcast.