AI's Rising Risks: Hacking, Virology, Loss of Control — With Dan Hendrycks
Channel: Alex Kantrowitz
Published at: 2025-03-28
YouTube video id: WcOlCtgreyQ
Source: https://www.youtube.com/watch?v=WcOlCtgreyQ
Now that artificial intelligence has tried to break out of a training environment, cheat at chess, and deceive safety evaluators, is it finally time to start worrying about the risk that artificial intelligence poses to us all? Here to speak with us about it is Dan Hris. He's the director of the Center for AI policy and also an adviser to Elon Musk's XAI and Scale AI. Dan, it's so great to see you. Welcome to the show. Glad to be here. It's an opportune moment to have you on the show because I'm recently doomed curious and I'll explain what that means. So, I had long been skeptical of this idea that AI could potentially break out of its training set or out of the computers and start to potentially even harm humans. I still think I'm on that path, but I'm starting to question it. We've recently seen research of AI starting to try to export it its weights in scenarios where it thinks it might be rewritten, trying to fool evaluators, and even trying to break a game of chess by rewriting the rules because it's so interested in winning uh the game. So, I'm just going to put this to you right away. uh are you is what I'm seeing in these early moments of AI trying to deceive evaluators or trying to change the rules that it's been given is that the early signs of us having AI as an adversary and not as a front. Um I think the easier way to see that it could be adversarial is just if people maliciously use these AI systems against us. So if we have um an adversarial state trying to weaponize it against us, that's a an easier way in which it could cause a lot of damage to us. Now there is a um an additional risk that the AI itself could have an adversarial relation to us and be a threat in itself. Not just the threat of humans in the forms of terrorists or in the forms of state actors uh but the AIS themselves uh uh potentially working against us. Um, I think those risks would potentially grow in time. I don't think they're as substantial now compared to just the malicious use uh uh sorts of risks. But yeah, I I think that um as time goes on and as they're more capable, if if some small fraction of them do decide to deceive us or try to self-exfiltrate themselves um or develop an adversarial posture toward us, then that could be extraordinarily dangerous. So um it depends. So I want to distinguish between what things are um particularly concerning in the next year versus somewhat more in the future. And I think in in the shorter term it is more of this malicious. But that's not to downplay the fact that as could be threats later on of course and we're going to cover the short term and the immediate term as we go on through this conversation. But I don't want to lose this thread here because what I'm trying to figure out is what is what I've seen with again AI trying to rewrite the rules of chess, AI trying to excfiltrate its weights, right? Copy itself. Now it runs a code saying if I run this code, I copy myself to another server was a fake code. It was tricked by its evaluators into thinking it could do that. But it did run the code. And then also manipulating showing uh evaluators that it was actually modeling one behavior whereas when it thought they weren't watching it was doing something else. This is all stuff that's happened. Are these are these the early signs of what could go wrong with AI or or is this just benign activity that we shouldn't read too much into? Um so uh I I think that this it is suggestive of some loss of control scenarios. However, it is not the type of thing that I'm most concerned about with loss of control. So that's because we still can control these AI systems somewhat reasonably right now and maybe we'll get better methods for doing so. Um however the loss of control uh mechanism that I'm most concerned about is when we have automated AI research and development. So imagine at some point in the future it could do what AI researchers do. And if you have one AI that can do that and that would be involve automating a lot of software engineering. If if you could have one AI do that, then you could make, you know, 100 thousand of copies of these and have them perform research simultaneously. So that could lead to some very substantial um acceleration in the rate of development. You might get a decade's worth of AI developments within the course of a year. And this would be highly automated where there's very little human oversight. And I think in in those situations or or in that scenario, I think a loss of control is much more likely. the sort of pro the sort of AIs that we have right now kind of you know being a little nefarious here and there. um that is a concern but that seems more tractable as a thing to to research and improve and um reduce the risks of. Meanwhile, uh some automated AI research and development loop that goes extremely quickly with very little human oversight. Uh that seems hard to derisk and get the risks to be at a negligible level just because it's so fast and so um uh and there's so little human involvement. you couldn't involve humans that much because of competitive pressures. The larger geopolitics of this would make it harder to, you know, slow down in such a scenario. So, I think that is a uh that's more the type of loss of control risk that I'm concerned about. I think that those those other papers are suggestive. Um, but there's a lot more hope for it being empirically tractable to to counteract it. Now from what I understand it from your first answer you are concerned both uh the way that humans use AI and AI itself sort of taking its own actions loss our loss of control uh of artificial intelligence. So can you just rank sort of where you see the problems in terms of most serious to least serious and what we should be focusing on? Um good that's a really good question. So I think the the risks in their severity sort of depend on time. Um uh some become much more severe later. So I don't think AI poses um a risk of extinction like today. Okay. I don't think that they're powerful enough to do that. Um they because they they can't make powerpoints yet, right? Um they they don't have agential skills. They can't accomplish tasks that require many hours to complete. And so as as since they lack that this puts a severe uh limit on uh the amount of damage that they could do or their ability to operate autonomously. Uh so um I think there's a variety of risks. I think there's in malicious use in the shorter term. Um, when AIs get more agential, I'd be concerned about AIS um, uh, uh, causing cyber attacks on critical infrastructure, possibly by, uh, as directed by a rogue actor. Uh, there'd also be the risk of AI's um, facilitating the development of bioweapons um, in particular pandemic-causing ones, not smaller scale ones like anthrax. And um those are I think the two um uh malicious use risks that we um need to be getting on top of in the next year or two. Um at the same time there's loss of control risks which um I think primarily stem from uh people an AI company trying to automate all of AI research and development and they can't have humans check in on that process because that would slow them down too much. If you have a human do a a week review every month of what's been going on and trying to interpret what's happening. This is uh this would slow them down substantially. and the competitor that doesn't do that will end up getting ahead. What that would mean is that you'd have basically um uh AI development going very rapidly where there's nobody really checking what's going on or or hardly checking. And I think a loss of control in that scenario is is more likely. So, but that comes a bit later. So, it it it depends um uh um and it'll depend on what we do. So these risks aren't aren't um these risks aren't something that exist out there and are immutable. Like maybe we can do more research to make the malicious use risks go down substantially. Maybe there can be um maybe states can deter each other from uh pursuing this automated AI R&D loop um so that we don't have this risk of loss of control. So it it depends not just on technical research, it depends on the politics and geopolitics uh as well and those will keep changing and so the risk sources will keep changing right and with the center for AI safety we're going to talk today about risks but we're also going to talk about solutions and with the center for AI safety what you're doing is basically pointing out the risks and trying to get to solutions to these problems. You told me you were just at the White House uh yesterday the day before we were talking. So, uh, this stuff is something that you're actually working towards mitigating, and I think we're going to get to that in a bit. But first, let's talk a little bit through some of the risks that you see with AI and and how serious they actually are. Uh, one of them that just jumped out for me right away was uh, bio, creating bioweapons. Um, let me run you through what I think the scenario could be in my head, and you tell me what I'm missing. with bioweapons, uh, you'd basically be prompting an LLM to help you come up with, uh, new biological agents effectively that you could go unleash against an enemy. And I think, wouldn't that be predicated on the AI actually being able to come up with biological discoveries of its own? Uh, right now, current LLMs, they don't really extend beyond the training set. Maybe there's an emerging property here or there, but they haven't made any discoveries and sort of been the big knock on them to this point. So, I am curious if you're talking about immediate risks and one of them being okay, there could be bioweapons that are created with AI, um, doesn't that suppose that there's going to be something uh, uh, much more advanced than the LMS that we have today? Because with current LM LLM, to me, it's basically like Google. to search for what's on the web and it can produce what's on the web but it's not coming up with new uh compounds on its own. Yeah. So um I I think that uh for cyber that's more in the future but I think uh veriology expert level veriology capabilities um are much more plausible in the short term. So uh uh for instance we have a paper that'll be out um maybe in some months we'll we'll see. Um but most of the works for it's been done and in it we we have um Harvard and MIT expert level veriologists sort of taking pictures of themselves in the wet lab um and asking what step should I do next? So can the AI um given this image and given this background context help guide through step by step these various wet lab procedures in making viruses and manipulating their properties and um we are finding that with the most recent reasoning models um quite unlike the models from two years ago like the initial GPT4 the most recent reasoning models are getting around 90th percentile compared to these um expert level verologists in their area of expertise. Uh so um uh this suggests uh that they have some of these wet lab type of skills and so if they can guide somebody through it step by step that could be um that could be uh very dangerous. Now there is an ideation step um uh but that seems like a capability them doing brainstorming to come up with ways to make viruses more dangerous. I think that's a capability that they've had um for uh over a year. the the brainstorming part but the implementation part seems to be fairly different. So I think in bio actually the um uh I would not be surprised if in a few months there's a consensus that they're uh expert level in many relevant ways and that we need to be doing something about that. Wow that's crazy to me because I would think it would be the opposite right that cyber would be the thing that we need to be worried about because these things code so well not veriology. So I just want to ask you but on that biology has been such an interesting subject um because they just know the literature really well. They they know the ins and outs of it. They got a fantastic memory um and um they have so much um background experience. I it's been for some reason their easiest subject uh uh historically uh biology and verology um in in earlier forms of measurements like if you see how they do on exams but now we're looking at their practical wet lab skills and they have those uh increasingly as well. So what about the evolution of the technology? Because this is all with large language models, right? Reasoning is just something that's taking place within a large language model like the GPT which powers chat GPT. So what is it about the current uh capabilities that have increased to the point where they're now able to guide somebody through the creation or manipulation of a virus? That seems to be like that change in capability. Well, now they have this image understanding image understanding skills. So, that's a problem. Um, that that they didn't used to have that that makes it a lot easier for them to do guidance or sort of, you know, be an apprent or sort of a guide on one shoulder saying now do this, now do that. Um, uh, but I don't know where that came from that that skill. um they've just trained on the internet and maybe they read enough papers and saw enough pictures of things inside those papers to have a sense of the the protocols and how to troubleshoot uh appropriately. So um since they've read basically every academic paper written maybe uh maybe that's the the cause of it, but it's a surprise. I mean I I was thinking that this practical tacit knowledge or something wouldn't be something that they would um pick up on necessarily. uh it'd make a lot more sense for them to have, you know, academic knowledge about, you know, um knowledge of vocab words and things like that, but uh um so I don't know where it came from. It's there, right? But this is still all stuff that is known to people. Like it's not like the AI is coming up with new viruses on its own. Well, so you can't like prompt whatever GPT it is and say create a new corona virus. So if you're saying I'm trying to modify this property of the virus so that it has more transmissibility or or a um longer stealth period um then I think it could with some pretty easy brainstorming make some suggestions and then if it can guide you through the intermediate steps that's something that can make it be much more lethal. I don't think it needs a you don't need um breakthroughs uh for doing some uh bioteterrorism generally um the main limitation for for risks generally uh risks will be capability and intent and historically our bio- risks have been fairly low because the people with these capabilities has been a very small number uh maybe a few hundred top verology PhDs and then a lot of them just don't intend to do this sort of thing however if these capabilities are out there without any sorts restrictions and extremely accessible. Um then uh uh as it happens then your risk service is blown up by several orders of magnitude. A solution for this for um to let people keep access to these expert level verology capabilities is that they can just speak to sales or ask for permission to have some of these guardrails taken off. Like if they're a real researcher um at Genentech or what have you um wanting these expert level verology capabilities, then they could just ask and then like, oh, you're a trusted user. Sure, here's access to these capabilities. But if somebody just made an account a second ago, then by default they wouldn't have access to it. So it it so for safety, a lot of people think that the way you go about safety is, you know, slowing down all of AI development or or something like that. But I think there are very surgical things you can do where you just have it refuse to talk about um topics such as reverse genetics or guide you through practical intermediate steps um for some uh verology methods. And um wait, those those safeguards don't exist today? Um uh at XAI they do um you're an adviser at XAI. Yeah. Yeah. Yeah. But like the what were the models that you were testing to try to find out whether they would help with the the enhancement creation of verologists? So viruses. Um uh we tested pretty much all of the leading ones that have these sort of multimodal capabilities and they'll have some sort of safeguards but there are various holes and so uh those are those are being uh um patched. we've communicated that hey you know there are various issues here and so I'm hopeful that uh very quickly uh some of these vulnerabilities will be patched with it and then if people want access to those capabilities then they could possibly be a trusted third party tester or something like that or work at a biotech company and then those restrictions could be lifted for those use cases but random users we don't know who they are asking how to make some virus more lethal or something sorry animal affecting virus it's just just punt have the model refuse on that that seems fine. Yeah, we do see the benchmarks come in through each model release and it's like, oh, now it scored 84th or 90th percentile or 97th percentile on this math test or on this bio test. And for us, it's like, oh, that's the model doing it. But what you're trying to say is um, and correct me if I'm wrong, if it's getting 90% of the way that an expert veriologist might get, then it could take a crafty user, you know, a number of prompts effectively to find their way towards that 100%. Because if they try it enough times, they might accidentally get to the not accidentally, but they might end up getting the bad virus that we're trying not to have the public create. Yeah. Yeah. So, this this is this is what concerns me like quite a bit. And I'm being more quiet about this just to, you know, you're talking about Yeah. I guess what I'm talking about now, but I'm not, you know, I'm not there's this orders of magnitude with this is we're it's it's being taken care of at at XAI and this is sort of in our risk management framework there. Um and uh um uh other labs are taking this sort of stuff more seriously or finding some vulnerabilities and then they're patching them. So I'm being you know non-specific about some of the um uh vulnerabilities here. Um but hopefully can uh uh uh provide more precision um once they have that taken care of. But yeah. Okay. I look forward to reading the paper. You're an adviser to scale. Mhm. Um they are a company that will give a lot of PhD level information to models in post- training. Right. So you've trained up the model and all of the the internet is pretty good at predicting the next word. And then it needs some domain specific knowledge. Scale from my understanding has PhDs and really smart people writing their knowledge down and then feeding it into the model to make these models smarter. Uh how does a company like Scale AI approach this? do they like have to say all right if you're a viology PhD we shouldn't be fine-tuning the model with your information like what's going on there and how are you advising them um so I've largely been advising on um measuring uh capabilities and risks in these models so we did for instance a paper on um the uh weapons of mass destruction related knowledge that models would have um together last year and um for that we were finding a lot of the academic knowledge um or knowledge that you would find uh in the literature like does it really understand the literature at the um uh uh quite well and we were seeing that in um biology and for bioweapons related papers that they did. However, um this just tested their knowledge not their knowhow. Uh so that's why we did the follow-up paper to see what's their actual wet lab know-how skills and um those were lower but now they're higher and so now um uh those vulnerabilities need to be patched or and those patches are I gather underway um um so uh so we we've also worked on other sorts of things together like in measuring the capabilities of these models because I think it's important that the public have some sense of how quickly is AI improving what what's what level is it at currently So a recent paper we did together was uh humanity's last exam where we put together um various professors and postocs and PhDs from from all over the world and they could join in on the paper um if they submit some good questions um that that stump the AI systems and I think this is a fairly difficult test. So it was think of something really difficult that you encountered in your research and try and turn that into a question and I think each person each each researcher probably has one or two of these sorts of questions. So it's it's a compilation of that and I think when there's very high performance on on that benchmark um that would be suggestive of something that has say in the ballpark of superhuman mathematician capabilities and so I think that would um uh revolutionize uh the uh academy quite substantially because all the theoretical sciences that are so dependent on mathematics uh would um be a lot more automatable. you could just give it the math problem and it could probably crack it um or crack it better than nearly anybody on Earth could. So um that's an example capability measurement that we're um uh looking at. We excluded in humanity's last exam no verology related skills. Okay. So we were not collecting data for that um because we didn't want to incentivize um the models getting better at that particular skill through this benchmark. And how's the AI doing today on that exam? um they're in the ballpark of like 10 to 20% um overall uh the the very best models. Um so uh you know it'll take a while for it to get to 80 plus percent but I think once it is 80 plus percent that's basically a superhuman mathematician is is one way of thinking of it but the thing is they're at 10 to 20% now and many experts within the AI field the practitioners we had Yan on a couple weeks ago talking about how we're getting to the point of diminishing returns with scaling right that that current growth trajectory of or the current trajectory of generative AI in particular um is limit because basically the labs are maximizing their ability to increase its capabilities. Um so I'm curious what you think whether you think that's right because you're obviously working with these companies working with XAI you're working with scale uh if we are getting to this data wall or some wall or some moment of um diminishing marginal return on the technology is it possible that all this fear is somewhat misplaced because if the AI is not going to get much better than it is right now at least with the current methods you know we may not be a year or two away from AGI right we may not be getting AGI at the end of 2025 like some people are suggesting testing and so then maybe uh we shouldn't be as afraid because again the stuff is limited. Yeah. So if if we were trapped it around the capability levels that we're at now then that would um definitely reduce urgency and um you know uh means one could chill out a bit more and um take it easy. But uh I'm not really seeing that if I think maybe what he's referring to is the sort of pre-training paradigm sort of running out of steam. So if you train take an AI train on a big blob of data um and have it just sort of predict the next token do do what um basically gave rise to older models like GPT4. Uh that sort of paradigm does seem like it's um running out of steam. it has held for many many orders of magnitude. Um but uh the returns on doing that are lower. That is separate from the new reasoning paradigm um that has emerged in the past year uh which is um where you train models to um uh on math and coding types of questions uh with reinforcement learning. And that has a very steep slope and I don't see any signs of that slowing down. That seems to have a um faster rate of improvement than the pre-training paradigm the previous paradigm had. And um there's still a lot of reasoning data left to go through and do reinforcement learning on. So I think we have um uh quite a number of of uh months or potentially years of being able to do that. And so, um, uh, personally, I'm not even thinking too specifically about what AIs will be looking like in a few months. They'll be, I think, quite a bit better at math and coding, but I don't know how much better. So, I'm largely just waiting because the rate of improvement is so high and we're so early on in this um, in this new paradigm uh, that I don't find it useful to try and speculate here. I'm just going to wait a little while to see. But um but I would expect it to be uh quite better um in each of these domains uh in in these STEM domains. Right. I guess reasoning does make it better at the areas that you're mostly concerned about doing math, science, coding. Yeah, that's right. Yeah. Because when it goes and tell me again if I'm wrong, when it goes step by step, it's much better at executing and working on these problems than if it's just printing answers. Yeah. And there is a possibility and this is sort of a hope in the field. I don't know whether it will happen um is that these reasoning capabilities might also give these agent type of capabilities where it can do other sorts of things like make a PowerPoint for you and do things that um would require operating over a very long time horizon. Uh potentially those would fall out of this this that that skill set would fall out of this paradigm but it's it's it's not clear. Uh there has been a fair amount of generalization from training on coding and mathematics to other sorts of domains like law for instance. Um uh and maybe if those skills get high enough, maybe it will be able to sort of reason its way through things step by step and act in a more coherent goal- directed way across longer time spans. I'm going to try to channel Yan here a little bit. I think he would say that um this is uh still going to be constrained by the fact that AI has no real understanding of the real world. Well, I don't know. This sounds like a almost a no true Scotsman type of thing. Like it's like what's real understanding mean? like um predictive ability if it's there sort of like if it can do the stuff that's what I care about but if it like doesn't satisfy some like strict philosophical sense of something you know some people might find that compelling but I don't I'll give you an example like with the video generators like if AI really understood physics uh then you know when you drive when you say give me a video of a car driving through a haystack it will actually be a car driving through a haystack as opposed to what I've done is give it that prompt and it's just hay exploding onto the front of a car with perfectly intact hay bales in the background. Um I I think that for a lot of these sorts of queries, at least for with images, for instance, we'd see a lot of nonsensical arrangements of things and things that don't make much sense if you look at it more closely. Uh but then as you just scale up the models and they tend to just kind of get it um uh increasingly so. So, we might just see the same for for um uh for images or excuse me for for video. I think as well they have like some good world model stuff like they'll have like vanishing points being more coherent and like like if I were drawing or anything like that, I'd probably be lacking, you know, lacking an understanding of the physics and geometry of the situation and making things internally coherent relative to them. So, um uh I don't know. Yeah, they seem pretty compelling and have a lot of the the details right, including some of the more uh structural details, but h there's there'll be gaps that one can keep zooming into and but I just think that that set will keep decreasing as was sort of the case with images and text before. I mean text back in the day the same argument. It doesn't have a real understanding of causality. It's just sort of mixing together words and whatnot. And when it was barely able to um construct sentences coherently um now it can and then Yeah. I know it can. Um, uh, so I don't know if it like then got a real understanding in in the sort of philosophical sense that he's thinking for for language, but it was good enough. And that might be the case with video as well. There were points where I was like, oh, but it is getting the guy sitting on the chair when I say, you know, do a video of a guy sitting on a chair and kicking his legs and those legs are kicking and they are bending at the joints. So, there must be some understanding there. Yeah, in some ways. isn't. But if you ask them to do like gymnastics, then I'll just have loops flailing all person just disappears into the floor. So, okay. Like you said at the beginning, Chachi isn't going to kill us. Yeah. Yeah. Yet, uh let's talk about hacking. Uh I do think that we glanced over a little bit before, but in terms of we're we're now going through I think the humans plus AI problem, right? Um and hacking to me is is one that I think we should definitely focus on. you mentioned that we we're still not quite there, but it does seem to me, again, I'm just going to go back to the point I made earlier, you can really code stuff up with these things and they enable like pretty impressive code already. You you could think that uh chat GPT could produce pretty good fishing emails if you just kind of creatively and not just Chat GPT, but all of these GPT models, if you creatively prompt it, right, it will give you an email that you can send and try to fish somebody. Um, or even let's say you just take an open source model like DeepSeek, download it and then run it without safeguards. Um, so where's the risk with hacking? I know you said it's a little bit further off. Why is it further off and what should people be afraid of or what should people be concerned of? Yeah. Yeah. So, uh, the the risk from it, more of the risk comes from when they're able to autonomously do the hacking themselves. So trying to break into a system, finding an exploit, escalating privileges, causing damage from there, things like that. And that requires multiple different steps and these agential skills that I keep referring to uh that they currently don't have. So although they could facilitate in like ransomware development and other forms of other forms of malware uh for them to autonomously execute and infiltrate systems that at or um that is something that um will require the new agential skills and um I don't see it's very unclear when those arrive. could be a few months from now, could be a year from now. It's a little less I'm a little more suspicious. Maybe it would even take two years for that. So, um so that's something to for us to get prepared for, figure out how we're going to deal with that. Um try and make safeguards increasingly robust um to people trying to maliciously use it in in those ways. Um uh but uh um yeah, I I think much of the risk source comes from being able to take one of these AI, let's say one of these DeSseek AI, let's say it's deepseek agent version, and it's able to actually do these cyber attacks. Then he could just run 10,000 of them simultaneously, and then you, you know, some rogue actor could have it target critical infrastructure. Uh then this is causing quite severe damage. Um so for like critical infrastructure you know this could be like have it reduce the detector um the um or the filter in you know a water plant or something like that. um then the water supply is like ruined. Um or you could target um these uh thermostats these uh in in various homes and because they're, you know, often some of the more advanced ones are, you know, connected to Wi-Fi and then you sort of turn them up and down simultaneously and this can just like ruin like transformers um and like blow them and then, you know, they take multiple years to replace things like that. Um and um but they aren't capable of doing that sort of thing currently. Uh so it's it's more of a on the horizon type of thing, but uh but I'm I'm not like feeling the urgency with that currently. Um yeah, I'm I'm more concerned about I I think there's more um uh the geopolitics of this like you know making sure that uh um uh um u states are you know aware of what's going on in AI like they're uh at least able to follow the news and things like that in some capacity. Um, I think that's that things like that feel somewhat more urgent to me than than trying to address cyber risks. There are things to do though and I think we should create incentives beforehand. But you know, maybe I'm too much of an optimist for my own good, but when I hear you talk about this, I also get a little bit excited about the capabilities of these programs because for instance, if AI can enhance the function of a virus, AI can probably create a vaccine, make make medical discoveries. If AI can hack into the infrastructure of some country, right, find exploits and uh turn the thermostats up and down, then AI could probably do incredible amounts of very beneficial coding and computer work for humanity. So, if we do get to that point, it seems to me like there's going to be these these maybe two poles here, right? One is the potentially scary and destructive stuff that you can mitigate, right, with some of the controls that you talked about, but also amazing opportunity. Mhm. Yeah. So it's it's in and the thermostat thing was for messing with the electricity and that causing strain on the power grid and um destroying transformers. The just for clarification in case it u but um yeah I think you're pointing at that it's dual use. So um uh I'm not saying AI is bad in every single way and uh it's it's like other dual use technologies. Bio is a dual use technology can be used for bioweapons can be used for healthcare. um uh uh nuclear technology is dual use. There's civilian applications for it as well and chemicals too. And we have managed all of those other ones by selectively trying to you know limit some particular types of usage and restricting the capabilities of rogue actors to some of these technologies and making sure there are good safeguards for the civilian applications. Um and then we can actually um capture the benefit. So it's not an all or nothing uh uh type of thing with AI. Um it's uh what are surgical um uh restrictions one can place so that we can keep capturing the benefits. And so for instance with verology that's a matter of you add the safeguards and then the researchers who want access to those can speak to sales. that's basically a resolution of that problem provided that you have um the models kept um behind behind APIs and um uh so uh now on this dual use part though there's an offense defense balance so for some applications it can help it can hurt um and maybe it helps more than it hurts or maybe it will hurt more than it will help uh so in in bio I think that is offense dominant uh if if somebody creates a virus there's not necessarily a cure that it will immediately find for it. If it would help a rogue actor make a a somewhat compelling virus now that could be enough to to cause many millions to die and it may take months or years to find a cure. There are many viruses for which we have not found cures yet. Um and uh for cyber in most contexts there's a balance between offense and defense where if somebody can find a vulnerability with one of these hacking eyes then they could also use that to patch the vulnerability. There is an exception though uh where in the context of critical infrastructure uh there the software is not updated rapidly. So even if you identify various vulnerabilities, there will not necessarily be a patch because the system needs to always be on or there are interoperability constraints or the person the company that made the software is no longer in business. These sorts of things. So our critical infrastructure is a sitting duck and so in that context cyber is offense dominant. But in normal context it's roughly there's roughly a duality and for verology I think that's largely offense dominant. So before we go to the nation state element of this, I need to ask you a question about the actual research houses themselves. Every research house says they're concerned with safety from open AI to XAI, everything in the middle. Maybe not DeepSeek, we'll get to Deep Seek. Um yet they're the ones that are building this technology. And it I find it a little strange that you have companies that are saying we're buil it's weird. we have to build this and advance this technology so we can keep people safe. I never really understood that message. Yeah. I don't know if it's to say that we need to keep people safe. I think it's more that the main um organizations that have power in the world now are largely companies. And so if one's trying to influence the outcomes, one basically needs to be a company is how many of them will reason. they'll think that yeah you could be in civil society or you could protest but this will not determine the course of events as much. So there's sort of many of them are buying themselves the option to hopefully influence things in a more positive direction but most of the effort will be to stay competitive and stay in this arena. So I think over you know 90% of the intellectual energies that they're going to spend is actually how can we afford the 10x larger supercomputer and uh that means being very competitive speeding this up um and um making safety be some priority but not necessarily a substantial one. So I I do think there is sort of a an interesting contradiction or something that looks like a contradiction there. But I I think if we think back to nuclear weapons um nuclear weapons nobody wants nuclear weapons. we if there'd be zero on Earth, fantastic. You know, that that would that would be a nice thing to have um if that would be a stable state, but it's not a stable state. One actor may then develop nuclear weapons and they could um destroy the other. So, this encourages states to do an arms race and it makes everybody all collectively less secure. But that's just how the game theory ends up working. So, you get a classic what's called a security dilemma. Everybody's worse off collectively. Um but uh and even if you took it seriously you say yes new nuclear technology is dual use and potentially catastrophic and we need to be very risk conscious about it. You can agree with all those things but you still might want nuclear weapons because other parties will also have nuclear weapons and unilateral disarmment in many cases or it just didn't make uh didn't make game theoretic sense. So um uh in the way that like an individual company pausing um their development while others race ahead doesn't make game theoretic sense. So I think this just points to the fact that there's some um game theory is kind of confusing and so you're getting some things that are seeming contradictions that if you use a nuclear analogy go yeah I suppose that makes sense and it's just kind of an ugly reality uh to internalize. Doesn't that discount the fact that like these companies if they want to influence like the way things are going um they are going to be it's like you're one and the same. Yes, you're influencing but without you this wouldn't be moving as fast as it is. Like it is interesting for instance think about uh Elon Musk right um obviously he has you in two days a week to work on safety inside XAI but he's also putting together what million GPU data centers to build the biggest baddest LLM ever. Mhm. Um well, if he didn't, then then he would be having less influence over it. So, it's um uh there's it's not something that I would envision everybody would just sort of voluntarily pause. So, subject to companies not sort of voluntarily rolling over and dying, then what's the best you can do subject to those constraints? But the competitive pressures are quite intense such that they do end up prioritizing focusing on competitiveness. Um and other priorities like um what's the budget for safety research um it will be generally lower than would be you know nice to have if um this were a less competitive environment. Do you think Elon is more interested in like restoring this original vision that he had for open AI, making everything open source, uh making it safe? I would imagine like he founded OpenAI with Sam Alman as sort of a beach head against Google because he was afraid of what Google was going to do with this technology. So I'm curious if you think that XAI is uh along that mission or is he more interested in the sort of soft cultural power that comes with having the world's best AI for instance like you can change the way that it speaks about certain sensitive political issues. It can be anti-woke which we all know is sort of where Elon stands. So what where do you think his true interest lies in building XAI? Well, I think the uh and I um won't, you know, position myself as sort of speaking on behalf of We won't put you as Elon spokesperson, but you are in there. Yeah. So, I I think that the mission is to understand the the universe. And so, this means having AIs that are honest and um uh accurate and truthful um uh to improve the public's understanding of the world. So we will be getting in a very fastm moving uh trying situation with AI if it keeps accelerating and so good decision-m will be very important and us understanding the world around us will be very important. So if there are more um features that enable um uh uh truth seeeking and honesty and good forecasts and good judgment and institutional decision-m those would be great to have with um uh the hope is that Grock could help enable uh enable some of that so that civilization is steered um uh steering itself more prudently um uh in this potentially more turbulent period that's upcoming. That's that's one read on the the mission statement. But um yeah, I think that that's the objective of it is understand the universe and their um different subobjectives that that would give rise to. And I think it's I I think it's ability to um uh uh help culture process events without censorship um uh uh or um bi political bias one way or the other um is is a stated objective and I I think that would be indispensable in the years going forward. Do you buy that they're that's what they're doing? Because we also heard the same thing from Elon when it came to buying Twitter now X. Um but but I don't know. I think community notes has been you know quite but that was something that was built under Jack Dorsey. I'm not going to take sides here. I'm going to just observe empirically what I've seen. I mean we know that Substack links have been dep prioritized because it was seen as a competitor with with um with Twitter. We know that um Musk I I think according to reporting changed the algorithm to have his tweets show up more often and his tweets took a strong stance towards supporting Donald Trump in the election. So to me, the idea that like hearing again from Elon and again, look, I respect what Elon's done as a business person, but hearing again that he has a plan to make a culturally relevant uh uh product that's free of censorship and politically unbiased. Um I don't know if I believe that anymore. So uh I I don't know about some of the specific things about such as the the you know waiting thing or something like that uh profile things for instance. Um I I think that overall in terms of cultural influence and people being more disagreeable and um doing less self-censoring uh has been um has been successful. I think that was the main objective of it and I so I think uh and I think that um X had a large role to play there. Uh so I don't know I think like I think in terms of shaping discourse norms um in in the US and uh um that that seems to have been successful in my view. Yeah. I'm not saying pre Elon uh Twitter didn't censor which is the wrong probably the wrong word because that's usually from the government didn't sort of shape the definition of speech to its own liking. It obviously had a progressive approach and moderated speech on a progressive approach. I just don't think Elon is not using his own influence when it comes to how he runs X. But you and I could speak about this. Yeah. And I I this isn't even my sort of wheelhouse as much, but yeah. I mean, it's sort of like uh since since um you know, do you brought it up, so Oh, okay. All right. Sure. I mean, just the the non-biased and truthful things. So it's worth So I mean it is if if there are like um ways in which um it's like extremely biased one way or the other that's that's useful to know. This is a um a thing that um uh is continually trying to be improved um at least for uh for um XAI's Grock. Um so u and I think that all the sort of product offering could could get quite better at this. Okay. But but but I'm not speaking, you know, as a sort of representative there or anything like that, but I guess maybe in my I guess right now in my personal capacity, I think that there's uh things to improve on for for all these models in terms of uh in terms of their bias. All right, we agree on that front. uh you hinted at it previously, but you talk a little bit about how companies basically how you don't think it's a good idea for there to be an arms race here and and certainly there is one between the US and China. Um we know that US has put export controls on China. China has in some ways gotten around them through like very creative uh procurement processes that go through Singapore, right? We can probably say that with a pretty good degree of confidence. Then of course we see the release of deepseek and some other AI applications from China and everyone's trying to build the better AI so that they have the soft power like we spoke about uh to effectively you know a control like to influence culture across the world but also it's an offensive capability um and defensive like you're saying if your country has the ability to manipulate viruses or to do cyber hacks you become more powerful and you get to sort of you know potentially put your your uh view of the world implant your view of the world on the the way that that it operates. You have a paper out that's sort of arguing against this arms race. It's called super intelligent strategy. Uh it's with you, Eric Schmidt, who we all know, former CEO of Google. I think he just started. He's taking over a drone company. So, you can tell me a little bit about that. And Alexander Wang, uh the former, no, not the former, the current CEO of Scale AI, who's been formerly on this show. Um talk a little bit about why you don't think it's a good idea for uh countries to pursue this this arms race. You say it might be leading us to mutually assured AI malfunction, not mutually assured like nuclear destruction. I think that's what you get you get that from. Yeah. So the strategy has three parts. One of which is competitiveness, but we're um uh um saying that some forms of competition could be destabilizing and that you um uh may be irrational to pursue it because you couldn't get away with it. So in particular this um making a bid for uh super intelligence um through um some automated AI research and development loop um is um uh could potentially lead to one state having some capabilities that are vastly beyond another states. um if you have if one state gets to experience a decade of development in a year and the other one is the year behind then this uh results in a very uh substantial difference in the states's capabilities. So this could be quite destabilizing um if um one state might then start to get an insurmountable lead relative to the other. Uh so um I think that form of competition um would be very um dangerous and because there's a risk of loss of control and because it might in um incentivize states to engage in preventive sabotage or preemptive sabotage uh to disable these sorts of projects. So I think um states may want to deter each other from pursuing super intelligence through this means. Um and um this then means that AI competition gets channeled into other sorts of realms such as in military in the military realm of sec having more secure supply chains for robotics for instance and for um for uh uh AI chips um having reduced sole source supply chain dependence on Taiwan for making AI chips. So um states can compete in other dimensions but them trying to dee uh um uh compete to develop super intelligence first I think that's that seems like a a very risky idea and I would not suggest that because there's too much of risk of loss of control and there's too much of a risk that uh um one state if they do control it uses it to um disempower others and affects the balance of power far too much and destabilizes things. So um but the the strategy overall think of think of the cold before you go on the strategy like my reaction to that is good luck telling that to China. So I think it's totally so for the um for deterrence I think if the US were pulling ahead uh both Russia and China may have a substantial interest in saying hey cut this out um pulling ahead to develop super intelligence which could give it a huge advantage in and ability to crush um uh crush them. They'd say you don't get to do that. we are going that we are making a conditional threat that if you keep going forward in this because you're on the cusp of building this then we will you know disable your data center or the surrounding power infrastructure so that you cannot continue building this. Uh I think they could make that conditional threat to deter it and we might do the same or the US might do the same to um to um China or other states that would do that. So um um I don't see why China wouldn't do that later on. right now they're not as thinking about you know super intelligence and advanced AI. This is more of a description of what the dynamics later on um when AI is more salient. But it would be it would be surprising to me if China were saying yes, United States go ahead build do your Manhattan project to build super intelligence. Come back to us in a few years and then tell us you can boss us around because now we're in a complete position of weakness and we'll be at your mercy um and we'll accept whatever you say uh or tell us to do. I don't see that happening. I think they would just say would move to preempt or uh deter um uh that type of development so that they don't get put in that fragile position. Are you in like the Elazar Yudowski camp of bombing the data centers if we get to s intelligence? Well, so I I think do what I'm I'm advocating or pointing out that it becomes rational for states to deter each other by making conditional threats and by means that are uh less escalatory such as um uh cyber uh cyber sabotage on data centers or surrounding power plants. Um um I don't think one needs to get kinetic for this and I think that um if discussions start earlier I don't see any reason things need to be escalating in that way or unilaterally actually doing that. we didn't need to, you know, get in nuclear exchange with Russia to sort of express that we have a preference against nuclear war. Um when so I think um thank goodness. So so so um indicating um uh or making conditional threats through deterrence uh seems uh like a much uh smarter move than a um hey wait a second what are you doing there and then bomb that that that seems needless. Yeah, I'm not into that solution either. U but what you're talking about is sort of assuming that there will be a lead that will be protectable for a while, but everything we've seen with AI is that no one protects a lead, right? Well, um, if there's, so one difference is that when you get to a different paradigm like automated AI R&D, the slope might be extremely high such that if the competitor starts to um, uh, do automated AI R&D a year later, they may never catch up just because you're so far ahead and your um, gains are compounding on your gains. sort of like in social media companies, Eric will use this analogy where um if one of them starts um blowing up and growing before you started, uh it's often the case that you won't be able to catch up and they'll have a winner take all type of dynamic. So um uh um right now the rate of improvement is not um uh that high or there's less of a path for a winner take all dynamic currently. uh but but later on um when you have you know the ability to to run a 100,000 AI researchers simultaneously uh this this really accelerates things maybe open's got a few hundred maybe we'll say 300 AI researchers so going from 300 AI researchers to orders of magnitude more worldclass ones create quite quite substantial developments this is something that isn't you know new this is something that like Alan Turing and and the uh founders of computer science uh had pointed out um that This is a natural property of when you get uh AIS at this level of capability then this creates this sort of um recursive dynamic uh where um things start accelerating um extremely quickly and quite explosively. Okay, I we managed to spend most of our conversation today talking about present risks or like risks in the near future. Um we should focus a little bit more on intelligence explosion and loss of control and we're going to do that right after the break. And we're back here on Big Technology Podcast with Dan Hendricks. He is the director and co-founder of the Center for AI Safety. Dan, it's great speaking with you about this stuff. Let's talk a little bit. You've been sort of talking about it uh in the first half, but I want to zero in here on this idea of intelligence explosion or what you talk about as basically having AI autonomously improve itself. um just talk through a little bit about how that might happen and whether you see that being something that is actually probable in our future. Yeah, I mean the the basic idea is just imagine automating one AI researcher um one worldclass one then there's a fun property with with computers which is there's copy and paste. So you can then have a whole fleet of these. It's it's well, you know, with humans, you know, if you just have one of them, you it's it's maybe they'll be able to train up somebody else who has a similar level of ability. So this this adds a very interesting dynamic to the mix and then you can um get so many of them um proceeding forward at once. So and you know AIs also operate quite quickly. They can code a lot faster than than than people. So maybe it's maybe you've got 100 thousand of these of these things operating at 100x the speed of a human. How fast will that go? Maybe conservatively let's say it's just overall 10xing research. But 10xing research would mean say like a decade's worth of developments in a year. that telescoping of of all these developments makes things pretty wild and means that um uh one um player could possibly get AIs that go from like very good, you know, world class to being vastly better than everybody um at everything and super intelligence um um uh something that towers far beyond um any living person or or collective of people. So if we get an AI like that, this could be um destabilizing because it could be, you know, used to develop a super weapon potentially. Maybe it could find some breakthrough for anti-bballistic missile systems which would um uh make nuclear deterrence no longer work. Um or or um other types of ways of weaponizing it. So uh that um that's why it's destabilizing. And um so states then if they're seeing oh they're you know don't don't run um this many AI researchers simultaneously in these data centers working on to build a next generation or or super intelligence because if you do so then that will put us in that will make our survival be threatened. So them saying them deterring that uh would um would help them secure themselves and they they can make those threats very credible uh currently and I think we'll continue to be able to have these threats be credible going forward. So this is why I think it might take a while for super intelligence to be developed because there'll be deterrence around it later on and then maybe in the farther future there could be something multilateral but um that's speaking you know quite far out in very different uh uh economic conditions. In the meantime with the AIS that um that we'd have in the future those could still automate various things and um um uh increase prosperity and all of that. Uh so we'd still have explosive economic growth if you had something that was just at the an average human level ability um uh running running for very cheap. So um I think that those are some of the later stage strategic dynamics and I don't think we can get away with um or I don't think any state could get away with trying to build a super intelligence go build a you know big data center out in the middle of the desert trillion dollar cluster bring all the researchers there and this not invite the other states to go what what do you think you're doing here? Uh you were at the White House yesterday. Well, this is largely just sort of um speaking about some of these uh you know strategic implications. Are they receptive? Yeah. I mean, it's it's a um it it's a uh um there's there's always um interest in, you know, um thinking what what are some of the the um later term dynamics, what things should happen now and whatnot. But this is a um yeah I I think I think when people think White House it's you know sounds you know where the president lives. Well, so there's the Well, yeah. So, there's there's the Eisenhower building which is, you know, part of the White House kind of not, but you know, that's where everybody's works and what I think um uh you know, some of the things we're speaking about here like verology advancements, things like that. There's just a lot of, you know, things to uh um uh speak about and uh um think what what things make sense or what things to keep in mind going forward. So, yeah. Yeah. I guess I'd rather an executive branch paying attention to this stuff than not. Yeah. Yeah, that's right. Yeah. Yeah. And um what what are the sort of ways that help you know um maintain competitiveness because you know how people will normally think about this? They'll think it's all or nothing and good or bad thing and instead we're saying no it's dual use. So that means there are some particular applications that are concerning and there are other applications that are good and you want to stem the particularly harmful applications and what are ways of doing that um while capturing the upside. Right. Okay. So the intelligence explosion part of this conversation Neilie brings up the loss of control part where to me I think the thing that when people think about AI harm they are always worried that AI is going to escape you know the simulation or whatever it is and act on act on its own and try to basically ensure that it preserves itself. Uh, we've seen it recently. I think I brought this up at the beginning of the show, where Anthropic has done some experiments where the AI has run code to try to copy itself over onto a server if it thinks that its values are at risk of being changed. Um, is this so it's it's fun to think about, but it's also like probably just probability like if you run it enough times because it's a probabilistic concerning though if if you if it was like, oh, it's only one in a thousand of them intend to do this. Well, if you're running a million of them, then you're basically certain to get many of them to try and, you know, self-exfiltrate. And so, are you worried that this ex selfexfiltration is going to be a thing? Um, uh, I think from a, you know, a recursive automated AR and D thing, I think that has really substantial um, probability behind it of of a loss of control in that situation to turn you're worried about this. So there's there's that, but I would distinguish between that and these sort of things that are not super intelligences or things that are not coming from that sort of really rapid loop and like the currently existing systems. I think that the currently existing systems are relatively controllable or if there is some very concerning failure mode um we have been able to find um ways to make them more controllable. For instance, um for um bioweapons refusal, um we used not to be able to make robust safeguards for that two years ago, but we've done research like with meth methods such as like called circuit breakers and things like that and those seem to improve the situation quite a bit and make it actually prohibitively difficult to do that jailbreaking. And so maybe we'll find something similar with um uh self- excfiltration. So, I think people generally want to claim that like, oh, current AIS are not controllable. And I think that they're they're not highly reliably controllable. They're they're reasonably controllable. Um, maybe we could get um uh some or it seems plausible that we'll get to have increasing levels of reliability. And um so um I'm sort of reserving judgment. Um it'll depend more on the empirical phenomena. So I think everybody should research this more. um and uh um and we'll sort of see what the risks actually are. But there are some that uh seem less empirically tractable uh or things that can't be empirically solved like this loop thing like how are you going to you can't run this experiment 100 times or something like that and make it you know go well you're making a huge attempt at building a super intelligence and has destabilizing consequences like this this isn't something that um that that's totally unprecedented and for that you have more of like a one chance to get it right type of thing but with the current systems we can continually um adjust them and retrain them and come up with better methods and iterate. So, um uh it is concerning. It would not surprise me if this would really start to make AI development itself extremely hazardous. Um instead of just the deployment, but instead inside the lab, like you need to be worried about the AI trying to be breaking out sometimes. Um that's totally totally in the realm of possibility. Um but yeah, I could see it going either way. Yeah, I mean this personally freaks me out because yeah, if you see the AI trying to deceive evaluators, for instance, or you see the AI trying to break out, um you really can't trust anything it's telling you. And we had Demis on the show a little while ago, and he's basically like, listen, if you see deceptive behavior from AI, if you see alignment faking, you really can't trust anything in the safety training because it's lying to you. Uh there is truth to that. Are you seeing deceptiveness at Grock by the way? Oh yeah. Yeah. So we have a paper out um last week we're just measuring the extent to which they're deceptive and in the scenarios we had like all the models were in these sorts of scenarios under you know slight pressure to lie not being told to lie but just some slight pressure. Uh then some of them will like lie like 20% of the time some of them like 60% of the time. So they don't really have this sort of um virtue sort of baked into them the virtue of honesty. So I think they'll we'll need to do more work and we'll need to do it quickly. So I I'm sort of you speaking in a more nonchalant way about this, but I can't like, you know, get worked up about every single risk because or else I just, you know, be be at 11 all the time. So there are some that I'm, you know, putting in different tiers different tiers than than other risks. And uh um this is a a more speculative one. We've seen these sometimes get surprisingly handleable. Um but yeah, it could end up making things really really bad. Um we'll see. Um we'll we'll do things about it to make that not be the case. Okay, thank you. Uh two more topics for you, then we'll get out of here. Uh the Center for AI Safety, who's funding it? Well, there's not sort of one funer. It's largely just various philanthropists. The main funer would be um uh Yan Talon and um uh Yon Talon, others, who's a Skype co-founder, right? Um there's a variety of of other uh philanthropies or philanthropists the generally for um uh generally for so for instance Elon doesn't I've never asked him to to fund the center so that isn't to say I don't get any money from Elon my appointment at XAI I get a dollar a year um uh um at scale I've uh at scale AAI I've increased my salary exponentially to where I get $12 a year, a dollar per month from scale. Uh but I I'll try not to um or I'll try to avoid you know getting complicate having some complicated relations with them just so that I can uh you know um not not feel on behalf of of any of them in particular. You're basically doing the work for them for free. Well, but it's useful, right? It's useful to do. Um and uh I mean yeah I mean I think the main objective is yeah just try try and try and generate some value here and as best as one can so um um um by reducing these sorts of risks and yeah I think it's it's it's a good arrangement because it enables me to like you know it's do have a choose your own your adventure type of thing of like oh now I think the the politics or geopolitics this is more relevant so now I can go off and learn about this for some months and and uh work work on a paper there and um compared to if it's like no you got to be coding 80 hours a week that's that's your job. Yeah. That that would be quite restrictive and I couldn't be speaking with you. So I'm glad you're here. So thank you Alex Wayne. Um, so let's talk a little bit about about this funding because uh I think that after Sam Alman was fired and then rehired at OpenAI, there was a sort of skepticism around effective altruism's impact on uh the AI field and even Yan Talon I'm reading uh from his statements right after the open AI governance crisis highlights the fragility of voluntary AI motivated governance teams. So the world should not rely on such governance working as intended. Uh now Jan is of course associated with EA. EA is like basically leading the conversation around AI safety. Um is that good? Uh so I think that in terms of Yan I think he's he's funded organizations that are um EA affiliated. I don't know if he'd call himself that but whatever you know people can you know ascribe labels how they'd like. Um I think that the um I mean I've tweeted um that EA is not equal to to AI safety. I think that EA community generally um is insular on these. So I lived in Berkeley for a a long time when I was doing my PhD and there's sort of a school a sort of AI risk school that was um uh had very particular views about what things are important. So malicious use for instance when I was when I was talking about malicious use in the beginning of this thing that you know they historically really against that yeah be only loss of control don't talk about malicious use the other that's a distraction um and um and so that was annoying because I'd always been working on robustness as as a PhD student um where this the main thing was malicious use um uh so um yeah I ended up leaving um Berkeley even before graduating just because of the sort of um relatively suffocating atmosphere and the sort of central focus on there'd be some new fat and you'd have to you know get interested in Abby elk eliciting late knowledge this is the important thing that you have to focus on or you have to focus on inner optimizers there's lots of these speculative um empirically fragile things so for instance this alignment faking stuff that you're seeing like there's some concern there but you know I'm not totally sold that this is like a top tier type of priority. But um in these communities, this is all that matters currently. Roughly speaking, this involuntary commitments from AIS from AI companies. I think voluntary commitments from AI companies are also a distraction. Um uh because the companies will you should expect most of them by default to just break those sorts of commitments if they end up going up against economic competitiveness. Okay. Um so I think it's a distraction relatively. And so I think it's um I think um there are many people who think that the um EA's broadly their influence on this sort of thing has not been overall positive. Um I think at least for me and making and other sorts of researchers in this space who've been interested in AI risks um uh the the amount of pressure to adopt some particular positions though on this be extraordinarily high and I think quite quite um destructive. So I I'm very pleased now that in um I think the most re in the past year or so there's been a lot more um diversity of opinion um uh which has been quite quite important. So and I think this is just because the broader world is getting more interested in AI. So a lot of these um a lot of these uh you know fixation on this is the one particular risk, this is the most important risk and everything else is distraction that just doesn't work when you're speaking with the or interfacing with the real world. There's a lot of complications um uh and AI is so multifaceted. So you can't in your risk management approach can't just be focusing on one of them. Right. So you're not an effective altruist. Um I don't I don't um think of myself as that. I don't particularly um u uh get along with the this part this this school of thought this sort of Berkeley AI alignment um uh monolith um and uh and I'm pleased that uh people can be more op independently operating in the space now uh which I don't think was the case for many many years including basically the entire time I was doing my PhD and there'll be many people like Dylan Hatfield Manell a professor at MIT who was also at Berkeley at the time very suffocating Rohan Shaw researcher at deep mind very suffocating that they all all feel this way yeah okay uh let's let's bring it home we've been talking for more than an hour about AI safety as if it's controllable uh but open source is like really putting up a pretty valiant effort in this field keeping pace with the proprietary labs and of course open source is not controllable um what do you think about that I mean we just saw deepseek uh not to, you know, go back to it all the time, but it it effectively equaled the cutting edge at uh the proprietary labs and, you know, put the weights on its website. So, how can we possibly have a relationship of safety with AI if open source is out there exposing everything that's been done? So, I've um been I haven't been endorsing open source historically, but I've thought that releasing the weights of models didn't seem robustly good or bad. So I sort of was like it's fine seems to have complicated effects. There's an advantage to it which it helped with diffusion of the technology um so that more people would have access to it and sort of get a sense of of AI and the increase sort of literacy on this topic and just increase public awareness and um get the world more prepared for for more advanced versions of AI. Um so that's been my um uh historical position but this depends on it should always proceed by a costbenefit analysis. So if the if for instance they have these cyber capabilities later on um yeah I think that or I think that would be a potential place to be drawing the line on um on uh openw weight releases. uh personally um uh in particular the ones that could cause damage to critical infrastructure um you could you could still capture the benefits by having the models be available through APIs um and if they're like software developers they have access to these you know more cyber offensive capabilities but if they're a random user they don't if they're random faceless user they don't um and uh um and likewise for verology once there's consensus uh once the capabilities are so high that there's consensus about it being expert level in verology. I think that would be a very natural place to be saying having an international norm not not saying a treaty because those take forever to write and ratify but um uh to to um a a norm against open weights if they are expert level verologists for the same reasons that we had the biological weapons convention. Russia and or the Soviet Union and the US you know got together for the biological weapons convention. US and China did as well. Um, we also coordinated on chemical weapons with the chemical weapons convention and the nuclear non-prololiferation treaty. States find it in their incentive to um work together to make sure that rogue actors do not have um um extremely hazardous um potentially catastrophic capabilities uh like chem bio and and and nuclear um uh inputs. So, um I I think something similar might be reasonable for AI when they get at that uh capability threshold. Ben, I I am at once kind of reassured that people are thinking about this stuff, but also more freaked out than I was when we sat down. Uh but I do appreciate you coming in and giving us the full rundown of what to be concerned about and what maybe not to be as concerned about as we think about where AI is moving next. So, thank you so much for coming on the show. Yep. Yep. Thank you for having me. This has been fun. Super fun. If people want to learn more about your work or get in touch, how do they do that? Um, I guess this the paper or strategy we've been speaking about is at national securitycurity.ai and then I'm also on Twitter or X or whatever it's called. Uh, should be you should know X.com there. x.com/danhris. It be another way of of following the going ons and as the situation evolves we'll uh we'll keep uh trying to put out work and you know seeing what's going on with these risks and uh if we come up with you know technical interventions to make him less uh we'll also uh put that out too. So yeah uh that's that's where you can find me. Yep. Well godspeed Dan and uh we'll have to have you back. Thanks again. All right everybody thank you for listening and we'll see you next time on Big Technology Podcast. Heat. Heat. N. [Music]