AI's Rising Risks: Hacking, Virology, Loss of Control — With Dan Hendrycks

Channel: Alex Kantrowitz

Published at: 2025-03-28

YouTube video id: WcOlCtgreyQ

Source: https://www.youtube.com/watch?v=WcOlCtgreyQ

Now that artificial intelligence has
tried to break out of a training
environment, cheat at chess, and deceive
safety evaluators, is it finally time to
start worrying about the risk that
artificial intelligence poses to us all?
Here to speak with us about it is Dan
Hris. He's the director of the Center
for AI policy and also an adviser to
Elon Musk's XAI and Scale AI. Dan, it's
so great to see you. Welcome to the
show. Glad to be here. It's an opportune
moment to have you on the show because
I'm recently doomed curious and I'll
explain what that means. So, I had long
been skeptical of this idea that AI
could potentially break out of its
training set or out of the computers and
start to potentially even harm humans. I
still think I'm on that path, but I'm
starting to question it. We've recently
seen research of AI starting to try to
export it its weights in scenarios where
it thinks it might be rewritten, trying
to fool evaluators, and even trying to
break a game of chess by rewriting the
rules because it's so interested in
winning uh the game. So, I'm just going
to put this to you right away. uh are
you is what I'm seeing in these early
moments of AI trying to deceive
evaluators or trying to change the rules
that it's been given is that the early
signs of us having AI as an adversary
and not as a front. Um I think the
easier way to see that it could be
adversarial is just if people
maliciously use these AI systems against
us. So if we have um an adversarial
state trying to weaponize it against us,
that's a an easier way in which it could
cause a lot of damage to us. Now there
is a um an additional risk that the AI
itself could have an adversarial
relation to us and be a threat in
itself. Not just the threat of humans in
the forms of terrorists or in the forms
of state actors uh but the AIS
themselves uh uh potentially working
against us. Um, I think those risks
would potentially grow in time. I don't
think they're as substantial now
compared to just the malicious use uh uh
sorts of risks. But yeah, I I think that
um as time goes on and as they're more
capable, if if some small fraction of
them do decide to deceive us or try to
self-exfiltrate themselves um or develop
an adversarial posture toward us, then
that could be extraordinarily dangerous.
So um it depends. So I want to
distinguish between what things are um
particularly concerning in the next year
versus somewhat more in the future. And
I think in in the shorter term it is
more of this malicious. But that's not
to downplay the fact that as could be
threats later on of course and we're
going to cover the short term and the
immediate term as we go on through this
conversation. But I don't want to lose
this thread here because what I'm trying
to figure out is what is what I've seen
with again AI trying to rewrite the
rules of chess, AI trying to excfiltrate
its weights, right? Copy itself. Now it
runs a code saying if I run this code, I
copy myself to another server was a fake
code. It was tricked by its evaluators
into thinking it could do that. But it
did run the code. And then also
manipulating showing uh evaluators that
it was actually modeling one behavior
whereas when it thought they weren't
watching it was doing something else.
This is all stuff that's happened. Are
these are these the early signs of what
could go wrong with AI or or is this
just benign activity that we shouldn't
read too much into? Um so uh I I think
that this it is suggestive of some loss
of control scenarios. However, it is not
the type of thing that I'm most
concerned about with loss of control. So
that's because we still can control
these AI systems somewhat reasonably
right now and maybe we'll get better
methods for doing so. Um however the
loss of control uh mechanism that I'm
most concerned about is when we have
automated AI research and development.
So imagine at some point in the future
it could do what AI researchers do. And
if you have one AI that can do that and
that would be involve automating a lot
of software engineering. If if you could
have one AI do that, then you could
make, you know, 100 thousand of copies
of these and have them perform research
simultaneously. So that could lead to
some very substantial um acceleration in
the rate of development. You might get a
decade's worth of AI developments within
the course of a year. And this would be
highly automated where there's very
little human oversight. And I think in
in those situations or or in that
scenario, I think a loss of control is
much more likely. the sort of pro the
sort of AIs that we have right now kind
of you know being a little nefarious
here and there. um that is a concern but
that seems more tractable as a thing to
to research and improve and um reduce
the risks of. Meanwhile, uh some
automated AI research and development
loop that goes extremely quickly with
very little human oversight. Uh that
seems hard to derisk and get the risks
to be at a negligible level just because
it's so fast and so um uh and there's so
little human involvement. you couldn't
involve humans that much because of
competitive pressures. The larger
geopolitics of this would make it harder
to, you know, slow down in such a
scenario. So, I think that is a uh
that's more the type of loss of control
risk that I'm concerned about. I think
that those those other papers are
suggestive. Um, but there's a lot more
hope for it being empirically tractable
to to counteract it. Now from what I
understand it from your first answer you
are concerned both uh the way that
humans use AI and AI itself sort of
taking its own actions loss our loss of
control uh of artificial intelligence.
So can you just rank sort of where you
see the problems in terms of most
serious to least serious and what we
should be focusing on? Um good that's a
really good question. So I think the the
risks in their severity sort of depend
on time. Um uh some become much more
severe later. So I don't think AI poses
um a risk of extinction like today.
Okay. I don't think that they're
powerful enough to do that. Um they
because they they can't make powerpoints
yet, right? Um they they don't have
agential skills. They can't accomplish
tasks that require many hours to
complete. And so as as since they lack
that this puts a severe uh limit on uh
the amount of damage that they could do
or their ability to operate
autonomously. Uh so
um I think there's a variety of risks. I
think there's in malicious use in the
shorter term. Um, when AIs get more
agential, I'd be concerned about AIS um,
uh, uh, causing cyber attacks on
critical infrastructure, possibly by,
uh, as directed by a rogue actor. Uh,
there'd also be the risk of AI's um,
facilitating the development of
bioweapons um, in particular
pandemic-causing ones, not smaller scale
ones like anthrax. And um those are I
think the two um uh malicious use risks
that we um need to be getting on top of
in the next year or two. Um at the same
time there's loss of control risks which
um I think primarily stem from
uh people an AI company trying to
automate all of AI research and
development and they can't have humans
check in on that process because that
would slow them down too much. If you
have a human do a a week review every
month of what's been going on and trying
to interpret what's happening. This is
uh this would slow them down
substantially. and the competitor that
doesn't do that will end up getting
ahead. What that would mean is that
you'd have basically um uh AI
development going very rapidly where
there's nobody really checking what's
going on or or hardly checking. And I
think a loss of control in that scenario
is is more likely. So, but that comes a
bit later. So, it it it depends um uh um
and it'll depend on what we do. So these
risks aren't aren't
um these risks aren't something that
exist out there and are immutable. Like
maybe we can do more research to make
the malicious use risks go down
substantially. Maybe there can be um
maybe states can deter each other from
uh pursuing this automated AI R&D loop
um so that we don't have this risk of
loss of control. So it it depends not
just on technical research, it depends
on the politics and geopolitics uh as
well and those will keep changing and so
the risk sources will keep changing
right and with the center for AI safety
we're going to talk today about risks
but we're also going to talk about
solutions and with the center for AI
safety what you're doing is basically
pointing out the risks and trying to get
to solutions to these problems. You told
me you were just at the White House uh
yesterday the day before we were
talking. So, uh, this stuff is something
that you're actually working towards
mitigating, and I think we're going to
get to that in a bit. But first, let's
talk a little bit through some of the
risks that you see with AI and and how
serious they actually are. Uh, one of
them that just jumped out for me right
away was uh, bio, creating bioweapons.
Um, let me run you through what I think
the scenario could be in my head, and
you tell me what I'm missing. with
bioweapons, uh, you'd basically be
prompting an LLM to help you come up
with, uh, new
biological agents effectively that you
could go unleash against an enemy. And I
think, wouldn't that be predicated on
the AI actually being able to come up
with biological discoveries of its own?
Uh, right now, current LLMs, they don't
really extend beyond the training set.
Maybe there's an emerging property here
or there, but they haven't made any
discoveries and sort of been the big
knock on them to this point. So, I am
curious if you're talking about
immediate risks and one of them being
okay, there could be bioweapons that are
created with AI, um, doesn't that
suppose that there's going to be
something uh, uh, much more advanced
than the LMS that we have today? Because
with current LM LLM, to me, it's
basically like Google. to search for
what's on the web and it can produce
what's on the web but it's not coming up
with new uh compounds on its own. Yeah.
So um I I think that uh for cyber that's
more in the future but I think uh
veriology expert level veriology
capabilities um are much more plausible
in the short term. So uh uh for instance
we have a paper that'll be out um maybe
in some months we'll we'll see. Um but
most of the works for it's been done and
in it we we have um Harvard and MIT
expert level veriologists sort of taking
pictures of themselves in the wet lab um
and asking what step should I do next?
So can the AI um given this image and
given this background context help guide
through step by step these various wet
lab procedures in making viruses and
manipulating their properties and um we
are finding that with the most recent
reasoning models um quite unlike the
models from two years ago like the
initial GPT4 the most recent reasoning
models are getting around 90th
percentile compared to these um expert
level verologists in their area of
expertise.
Uh so um uh this suggests uh that they
have some of these wet lab type of
skills and so if they can guide somebody
through it step by step that could be um
that could be uh very dangerous. Now
there is an ideation step um uh but that
seems like a capability them doing
brainstorming to come up with ways to
make viruses more dangerous. I think
that's a capability that they've had um
for uh over a year. the the
brainstorming part but the
implementation part seems to be fairly
different. So I think in bio actually
the um uh I would not be surprised if in
a few months there's a consensus that
they're uh expert level in many relevant
ways and that we need to be doing
something about that. Wow that's crazy
to me because I would think it would be
the opposite right that cyber would be
the thing that we need to be worried
about because these things code so well
not veriology. So I just want to ask you
but on that biology has been such an
interesting subject um because they just
know the literature really well. They
they know the ins and outs of it. They
got a fantastic memory um and um they
have so much um background experience. I
it's been for some reason their easiest
subject uh uh historically uh biology
and verology um in in earlier forms of
measurements like if you see how they do
on exams but now we're looking at their
practical wet lab skills and they have
those uh increasingly as well. So what
about the evolution of the technology?
Because this is all with large language
models, right? Reasoning is just
something that's taking place within a
large language model like the GPT which
powers chat GPT. So what is it about the
current uh capabilities that have
increased to the point where they're now
able to guide somebody through the
creation or manipulation of a virus?
That seems to be like that change in
capability. Well, now they have this
image understanding image understanding
skills. So, that's a problem. Um, that
that they didn't used to have that that
makes it a lot easier for them to do
guidance or sort of, you know, be an
apprent or sort of a guide on one
shoulder saying now do this, now do
that. Um, uh, but I don't know where
that came from that that skill. um
they've just trained on the internet and
maybe they read enough papers and saw
enough pictures of things inside those
papers to have a sense of the the
protocols and how to troubleshoot uh
appropriately. So um since they've read
basically every academic paper written
maybe uh maybe that's the the cause of
it, but it's a surprise. I mean I I was
thinking that this practical tacit
knowledge or something wouldn't be
something that they would um pick up on
necessarily. uh it'd make a lot more
sense for them to have, you know,
academic knowledge about, you know, um
knowledge of vocab words and things like
that, but uh um so I don't know where it
came from. It's there, right? But this
is still all stuff that is known to
people. Like it's not like the AI is
coming up with new viruses on its own.
Well, so you can't like prompt whatever
GPT it is and say create a new corona
virus. So if you're saying I'm trying to
modify this property of the virus so
that it has more transmissibility or or
a um longer stealth period um then I
think it could with some pretty easy
brainstorming make some suggestions and
then if it can guide you through the
intermediate steps that's something that
can make it be much more lethal. I don't
think it needs a you don't need um
breakthroughs uh for doing some uh
bioteterrorism generally um the main
limitation for for risks generally uh
risks will be capability and intent and
historically our bio- risks have been
fairly low because the people with these
capabilities has been a very small
number uh maybe a few hundred top
verology PhDs and then a lot of them
just don't intend to do this sort of
thing however if these capabilities are
out there without any sorts restrictions
and extremely accessible. Um then uh uh
as it happens then your risk service is
blown up by several orders of magnitude.
A solution for this for um to let people
keep access to these expert level
verology capabilities is that they can
just speak to sales or ask for
permission to have some of these
guardrails taken off. Like if they're a
real researcher um at Genentech or what
have you um wanting these expert level
verology capabilities, then they could
just ask and then like, oh, you're a
trusted user. Sure, here's access to
these capabilities. But if somebody just
made an account a second ago, then by
default they wouldn't have access to it.
So it it so for safety, a lot of people
think that the way you go about safety
is, you know, slowing down all of AI
development or or something like that.
But I think there are very surgical
things you can do where you just have it
refuse to talk about um topics such as
reverse genetics or guide you through
practical intermediate steps um for some
uh verology methods. And um wait, those
those safeguards don't exist today? Um
uh at XAI they do um you're an adviser
at XAI. Yeah. Yeah. Yeah. But like the
what were the models that you were
testing to try to find out whether they
would help with the the enhancement
creation of verologists? So viruses. Um
uh we tested pretty much all of the
leading ones that have these sort of
multimodal capabilities and they'll have
some sort of safeguards but there are
various holes and so uh those are those
are being uh um patched. we've
communicated that hey you know there are
various issues here and so I'm hopeful
that uh very quickly uh some of these
vulnerabilities will be patched with it
and then if people want access to those
capabilities then they could possibly be
a trusted third party tester or
something like that or work at a biotech
company and then those restrictions
could be lifted for those use cases but
random users we don't know who they are
asking how to make some virus more
lethal or something sorry animal
affecting virus it's just just punt have
the model refuse on that that seems
fine. Yeah, we do see the benchmarks
come in through each model release and
it's like, oh, now it scored 84th or
90th percentile or 97th percentile on
this math test or on this bio test. And
for us, it's like, oh, that's the model
doing it. But what you're trying to say
is um, and correct me if I'm wrong, if
it's getting 90% of the way that an
expert veriologist might get, then it
could take a crafty user, you know, a
number of prompts effectively to find
their way towards that 100%. Because if
they try it enough times, they might
accidentally get to the not
accidentally, but they might end up
getting the bad virus that we're trying
not to have the public create. Yeah.
Yeah. So, this this is this is what
concerns me like quite a bit. And I'm
being more quiet about this just to, you
know, you're talking about Yeah. I guess
what I'm talking about now, but I'm not,
you know, I'm not there's this orders of
magnitude with this is we're it's it's
being taken care of at at XAI and this
is sort of in our risk management
framework there. Um and uh um uh other
labs are taking this sort of stuff more
seriously or finding some
vulnerabilities and then they're
patching them. So I'm being you know
non-specific about some of the um uh
vulnerabilities here. Um but hopefully
can uh uh uh provide more precision um
once they have that taken care of. But
yeah. Okay. I look forward to reading
the paper. You're an adviser to scale.
Mhm.
Um they are a company that will give a
lot of PhD level information to models
in post- training. Right. So you've
trained up the model and all of the the
internet is pretty good at predicting
the next word. And then it needs some
domain specific knowledge. Scale from my
understanding has PhDs and really smart
people writing their knowledge down and
then feeding it into the model to make
these models smarter. Uh how does a
company like Scale AI approach this? do
they like have to say all right if
you're a viology PhD we shouldn't be
fine-tuning the model with your
information like what's going on there
and how are you advising them um so I've
largely been advising on um measuring uh
capabilities and risks in these models
so we did for instance a paper on um the
uh weapons of mass destruction related
knowledge that models would have
um together last year and um for that we
were finding a lot of the academic
knowledge um or knowledge that you would
find uh in the literature like does it
really understand the literature at the
um uh uh quite well and we were seeing
that in um biology and for bioweapons
related papers that they did. However,
um this just tested their knowledge not
their knowhow. Uh so that's why we did
the follow-up paper to see what's their
actual wet lab know-how skills and um
those were lower but now they're higher
and so now um uh those vulnerabilities
need to be patched or and those patches
are I gather underway um
um so uh so we we've also worked on
other sorts of things together like in
measuring the capabilities of these
models because I think it's important
that the public have some sense of how
quickly is AI improving what what's what
level is it at currently So a recent
paper we did together was uh humanity's
last exam where we put together um
various professors and postocs and PhDs
from from all over the world and they
could join in on the paper um if they
submit some good questions um that that
stump the AI systems and I think this is
a fairly difficult test. So it was think
of something really difficult that you
encountered in your research and try and
turn that into a question and I think
each person each each researcher
probably has one or two of these sorts
of questions. So it's it's a compilation
of that and I think when there's very
high performance on on that benchmark um
that would be suggestive of something
that has say in the ballpark of
superhuman mathematician capabilities
and so I think that would um uh
revolutionize uh the uh academy quite
substantially because all the
theoretical sciences that are so
dependent on mathematics uh would um be
a lot more automatable. you could just
give it the math problem and it could
probably crack it um or crack it better
than nearly anybody on Earth could. So
um that's an example capability
measurement that we're um uh looking at.
We excluded in humanity's last exam no
verology related skills. Okay. So we
were not collecting data for that um
because we didn't want to incentivize um
the models getting better at that
particular skill through this benchmark.
And how's the AI doing today on that
exam? um they're in the ballpark of like
10 to 20% um overall uh the the very
best models. Um so
uh you know it'll take a while for it to
get to 80 plus percent but I think once
it is 80 plus percent that's basically a
superhuman mathematician is is one way
of thinking of it but the thing is
they're at 10 to 20% now and many
experts within the AI field the
practitioners we had Yan on a couple
weeks ago talking about how we're
getting to the point of diminishing
returns with scaling right that that
current growth trajectory of or the
current trajectory of generative AI in
particular um is limit
because basically the labs are
maximizing their ability to increase its
capabilities. Um so I'm curious what you
think whether you think that's right
because you're obviously working with
these companies working with XAI you're
working with scale uh if we are getting
to this data wall or some wall or some
moment of um diminishing marginal return
on the technology is it possible that
all this fear is somewhat misplaced
because if the AI is not going to get
much better than it is right now at
least with the current methods you know
we may not be a year or two away from
AGI right we may not be getting AGI at
the end of 2025 like some people are
suggesting testing and so then maybe uh
we shouldn't be as afraid because again
the stuff is limited. Yeah. So if if we
were trapped it around the capability
levels that we're at now then that would
um definitely reduce urgency and um you
know uh means one could chill out a bit
more and
um take it easy. But uh I'm not really
seeing that if I think maybe what he's
referring to is the sort of pre-training
paradigm sort of running out of steam.
So if you train take an AI train on a
big blob of data um and have it just
sort of predict the next token do do
what um basically gave rise to older
models like GPT4. Uh that sort of
paradigm does seem like it's um running
out of steam. it has held for many many
orders of magnitude. Um but uh the
returns on doing that are lower. That is
separate from the new reasoning paradigm
um that has emerged in the past year uh
which is um where you train models to um
uh on math and coding types of questions
uh with reinforcement learning. And that
has a very steep slope and I don't see
any signs of that slowing down. That
seems to have a um faster rate of
improvement than the pre-training
paradigm the previous paradigm had. And
um there's still a lot of reasoning data
left to go through and do reinforcement
learning on. So I think we have um uh
quite a number of of uh months or
potentially years of being able to do
that. And so, um, uh, personally, I'm
not even thinking too specifically about
what AIs will be looking like in a few
months. They'll be, I think, quite a bit
better at math and coding, but I don't
know how much better. So, I'm largely
just waiting because the rate of
improvement is so high and we're so
early on in this um, in this new
paradigm uh, that I don't find it useful
to try and speculate here. I'm just
going to wait a little while to see. But
um but I would expect it to be uh quite
better um in each of these domains uh in
in these STEM domains. Right. I guess
reasoning does make it better at the
areas that you're mostly concerned about
doing math, science, coding. Yeah,
that's right. Yeah. Because when it goes
and tell me again if I'm wrong, when it
goes step by step, it's much better at
executing and working on these problems
than if it's just printing answers.
Yeah. And there is a possibility and
this is sort of a hope in the field. I
don't know whether it will happen um is
that these reasoning capabilities might
also give these agent type of
capabilities where it can do other sorts
of things like make a PowerPoint for you
and do things that um would require
operating over a very long time horizon.
Uh potentially those would fall out of
this this that that skill set would fall
out of this paradigm but it's it's it's
not clear. Uh there has been a fair
amount of generalization from training
on coding and mathematics to other sorts
of domains like law for instance. Um uh
and maybe if those skills get high
enough, maybe it will be able to sort of
reason its way through things step by
step and act in a more coherent goal-
directed way across longer time spans.
I'm going to try to channel Yan here a
little bit. I think he would say that um
this is uh still going to be constrained
by the fact that AI has no real
understanding of the real world. Well, I
don't know. This sounds like a almost a
no true Scotsman type of thing. Like
it's like what's real understanding
mean? like um predictive ability if it's
there sort of like if it can do the
stuff that's what I care about but if it
like doesn't satisfy some like strict
philosophical sense of something you
know some people might find that
compelling but I don't I'll give you an
example like with the video generators
like if AI really understood physics uh
then you know when you drive when you
say give me a video of a car driving
through a haystack it will actually be a
car driving through a haystack as
opposed to what I've done is give it
that prompt and it's just hay exploding
onto the front of a car with perfectly
intact hay bales in the background. Um I
I think that for a lot of these sorts of
queries, at least for with images, for
instance, we'd see a lot of nonsensical
arrangements of things and things that
don't make much sense if you look at it
more closely. Uh but then as you just
scale up the models and they tend to
just kind of get it um uh increasingly
so. So, we might just see the same for
for um uh for images or excuse me for
for video. I think as well they have
like some good world model stuff like
they'll have like vanishing points being
more coherent and like like if I were
drawing or anything like that, I'd
probably be lacking, you know, lacking
an understanding of the physics and
geometry of the situation and making
things internally coherent relative to
them. So, um uh I don't know. Yeah, they
seem pretty compelling and have a lot of
the the details right, including some of
the more uh structural details, but h
there's there'll be gaps that one can
keep zooming into and but I just think
that that set will keep decreasing as
was sort of the case with images and
text before. I mean text back in the day
the same argument. It doesn't have a
real understanding of causality. It's
just sort of mixing together words and
whatnot. And when it was barely able to
um construct sentences coherently um now
it can and then Yeah. I know it can. Um,
uh, so I don't know if it like then got
a real understanding in in the sort of
philosophical sense that he's thinking
for for language, but it was good
enough. And that might be the case with
video as well. There were points where I
was like, oh, but it is getting the guy
sitting on the chair when I say, you
know, do a video of a guy sitting on a
chair and kicking his legs and those
legs are kicking and they are bending at
the joints. So, there must be some
understanding there. Yeah, in some ways.
isn't. But if you ask them to do like
gymnastics, then I'll just have loops
flailing all person just disappears into
the floor. So, okay. Like you said at
the beginning, Chachi isn't going to
kill us. Yeah. Yeah. Yet,
uh let's talk about hacking. Uh I do
think that we glanced over a little bit
before, but in terms of we're we're now
going through I think the humans plus AI
problem, right? Um and hacking to me is
is one that I think we should definitely
focus on. you mentioned that we we're
still not quite there, but it does seem
to me, again, I'm just going to go back
to the point I made earlier, you can
really code stuff up with these things
and they enable like pretty impressive
code already. You you could think that
uh chat GPT could produce pretty good
fishing emails if you just kind of
creatively and not just Chat GPT, but
all of these GPT models, if you
creatively prompt it, right, it will
give you an email that you can send and
try to fish somebody. Um, or even let's
say you just take an open source model
like DeepSeek, download it and then run
it without safeguards. Um, so where's
the risk with hacking? I know you said
it's a little bit further off. Why is it
further off and what should people be
afraid of or what should people be
concerned of? Yeah. Yeah. So, uh, the
the risk from it, more of the risk comes
from when they're able to autonomously
do the hacking themselves. So trying to
break into a system, finding an exploit,
escalating privileges, causing damage
from there, things like that. And that
requires multiple different steps and
these agential skills that I keep
referring to uh that they currently
don't have. So although they could
facilitate in like ransomware
development and other forms of other
forms of malware uh for them to
autonomously execute and infiltrate
systems that at or um that is something
that um will require the new agential
skills and um I don't see it's very
unclear when those arrive. could be a
few months from now, could be a year
from now. It's a little less I'm a
little more suspicious. Maybe it would
even take two years for that. So, um so
that's something to for us to get
prepared for, figure out how we're going
to deal with that. Um try and make
safeguards increasingly robust um to
people trying to maliciously use it in
in those ways. Um uh but uh um yeah, I I
think much of the risk source comes from
being able to take one of these AI,
let's say one of these DeSseek AI, let's
say it's deepseek agent version, and
it's able to actually do these cyber
attacks. Then he could just run 10,000
of them simultaneously, and then you,
you know, some rogue actor could have it
target critical infrastructure. Uh then
this is causing quite severe damage. Um
so for like critical infrastructure you
know this could be like have it reduce
the detector um the um or the filter in
you know a water plant or something like
that. um then the water supply is like
ruined. Um or you could target um these
uh thermostats these uh in in various
homes and because they're, you know,
often some of the more advanced ones
are, you know, connected to Wi-Fi and
then you sort of turn them up and down
simultaneously and this can just like
ruin like transformers um and like blow
them and then, you know, they take
multiple years to replace things like
that. Um and um but they aren't capable
of doing that sort of thing currently.
Uh so it's it's more of a on the horizon
type of thing, but uh but I'm I'm not
like feeling the urgency with that
currently. Um yeah, I'm I'm more
concerned about I I think there's more
um
uh the geopolitics of this like you know
making sure that uh um uh um u states
are you know aware of what's going on in
AI like they're uh at least able to
follow the news and things like that in
some capacity. Um, I think that's that
things like that feel somewhat more
urgent to me than than trying to address
cyber risks. There are things to do
though and I think we should create
incentives beforehand. But you know,
maybe I'm too much of an optimist for my
own good, but when I hear you talk about
this, I also get a little bit excited
about the capabilities of these programs
because for instance, if AI can enhance
the function of a virus, AI can probably
create a vaccine, make make medical
discoveries. If AI can hack into the
infrastructure of some country, right,
find exploits and uh turn the
thermostats up and down, then AI could
probably do incredible amounts of very
beneficial coding and computer work for
humanity. So, if we do get to that
point, it seems to me like there's going
to be these these maybe two poles here,
right? One is the potentially scary and
destructive stuff that you can mitigate,
right, with some of the controls that
you talked about, but also amazing
opportunity. Mhm. Yeah. So it's it's in
and the thermostat thing was for messing
with the electricity and that causing
strain on the power grid and um
destroying transformers. The just for
clarification in case it u but um yeah I
think you're pointing at that it's dual
use. So um uh I'm not saying AI is bad
in every single way and uh it's it's
like other dual use technologies. Bio is
a dual use technology can be used for
bioweapons can be used for healthcare.
um uh uh nuclear technology is dual use.
There's civilian applications for it as
well and chemicals too. And we have
managed all of those other ones by
selectively trying to you know limit
some particular types of usage and
restricting the capabilities of rogue
actors to some of these technologies and
making sure there are good safeguards
for the civilian applications. Um and
then we can actually um capture the
benefit. So it's not an all or nothing
uh uh type of thing with AI. Um it's uh
what are surgical um uh restrictions one
can place so that we can keep capturing
the benefits. And so for instance with
verology that's a matter of you add the
safeguards and then the researchers who
want access to those can speak to sales.
that's basically a resolution of that
problem provided that you have um the
models kept um behind behind APIs and um
uh so uh now on this dual use part
though there's an offense defense
balance so for some applications it can
help it can hurt um and maybe it helps
more than it hurts or maybe it will hurt
more than it will help uh so in in bio I
think that is offense dominant uh if if
somebody creates a virus there's not
necessarily a cure that it will
immediately find for it. If it would
help a rogue actor make a a somewhat
compelling virus now that could be
enough to to cause many millions to die
and it may take months or years to find
a cure. There are many viruses for which
we have not found cures yet. Um and uh
for cyber in most contexts there's a
balance between offense and defense
where if somebody can find a
vulnerability with one of these hacking
eyes then they could also use that to
patch the vulnerability. There is an
exception though uh where in the context
of critical infrastructure
uh there the software is not updated
rapidly. So even if you identify various
vulnerabilities, there will not
necessarily be a patch because the
system needs to always be on or there
are interoperability constraints or the
person the company that made the
software is no longer in business. These
sorts of things. So our critical
infrastructure is a sitting duck and so
in that context cyber is offense
dominant. But in normal context it's
roughly there's roughly a duality and
for verology I think that's largely
offense dominant. So before we go to the
nation state element of this, I need to
ask you a question about the actual
research houses themselves. Every
research house says they're concerned
with safety from open AI to XAI,
everything in the middle. Maybe not
DeepSeek, we'll get to Deep Seek. Um yet
they're the ones that are building this
technology. And it I find it a little
strange that you have companies that are
saying we're buil it's weird. we have to
build this and advance this technology
so we can keep people safe. I never
really understood that message. Yeah. I
don't know if it's to say that we need
to keep people safe. I think it's more
that the main um organizations that have
power in the world now are largely
companies. And so if one's trying to
influence the outcomes, one basically
needs to be a company is how many of
them will reason. they'll think that
yeah you could be in civil society or
you could protest but this will not
determine the course of events as much.
So there's sort of many of them are
buying themselves the option to
hopefully influence things in a more
positive direction but most of the
effort will be to stay competitive and
stay in this arena. So I think over you
know 90% of the intellectual energies
that they're going to spend is actually
how can we afford the 10x larger
supercomputer and uh that means being
very competitive speeding this up um and
um making safety be some priority but
not necessarily a substantial one. So I
I do think there is sort of a an
interesting contradiction or something
that looks like a contradiction there.
But I I think if we think back to
nuclear weapons um nuclear weapons
nobody wants nuclear weapons. we if
there'd be zero on Earth, fantastic. You
know, that that would that would be a
nice thing to have um if that would be a
stable state, but it's not a stable
state. One actor may then develop
nuclear weapons and they could um
destroy the other. So, this encourages
states to do an arms race and it makes
everybody all collectively less secure.
But that's just how the game theory ends
up working. So, you get a classic what's
called a security dilemma. Everybody's
worse off collectively. Um but uh and
even if you took it seriously you say
yes new nuclear technology is dual use
and potentially catastrophic and we need
to be very risk conscious about it. You
can agree with all those things but you
still might want nuclear weapons because
other parties will also have nuclear
weapons and unilateral disarmment in
many cases or it just didn't make uh
didn't make game theoretic sense. So um
uh in the way that like an individual
company pausing um their development
while others race ahead doesn't make
game theoretic sense. So I think this
just points to the fact that there's
some um game theory is kind of confusing
and so you're getting some things that
are seeming contradictions that if you
use a nuclear analogy go yeah I suppose
that makes sense and it's just kind of
an ugly reality uh to internalize.
Doesn't that discount the fact that like
these companies if they want to
influence like the way things are going
um they are going to be it's like you're
one and the same. Yes, you're
influencing but without you this
wouldn't be moving as fast as it is.
Like it is interesting for instance
think about uh Elon Musk right um
obviously he has you in two days a week
to work on safety inside XAI but he's
also putting together what million GPU
data centers to build the biggest
baddest LLM ever. Mhm. Um well, if he
didn't, then then he would be having
less influence over it. So, it's um uh
there's it's not something that I would
envision everybody would just sort of
voluntarily pause. So, subject to
companies not sort of voluntarily
rolling over and dying, then what's the
best you can do subject to those
constraints? But the competitive
pressures are quite intense such that
they do end up prioritizing focusing on
competitiveness. Um and other priorities
like um what's the budget for safety
research um it will be generally lower
than would be you know nice to have if
um this were a less competitive
environment. Do you think Elon is more
interested in like restoring this
original vision that he had for open AI,
making everything open source, uh making
it safe? I would imagine like he founded
OpenAI with Sam Alman as sort of a beach
head against Google because he was
afraid of what Google was going to do
with this technology. So I'm curious if
you think that XAI is uh along that
mission or is he more interested in the
sort of soft cultural power that comes
with having the world's best AI for
instance like you can change the way
that it speaks about certain sensitive
political issues. It can be anti-woke
which we all know is sort of where Elon
stands. So what where do you think his
true interest lies in building XAI?
Well, I think the uh and I um won't, you
know, position myself as sort of
speaking on behalf of We won't put you
as Elon spokesperson, but you are in
there. Yeah. So, I I think that the
mission is to understand the the
universe. And so, this means having AIs
that are honest and um uh accurate and
truthful um uh to improve the public's
understanding of the world. So we will
be getting in a very fastm moving uh
trying situation with AI if it keeps
accelerating and so good decision-m will
be very important and us understanding
the world around us will be very
important. So if there are more um
features that enable um uh uh truth
seeeking and honesty and good forecasts
and good judgment and institutional
decision-m those would be great to have
with um uh the hope is that Grock could
help enable uh enable some of that so
that civilization is steered um uh
steering itself more prudently um uh in
this potentially more turbulent period
that's upcoming. That's that's one read
on the the mission statement. But um
yeah, I think that that's the objective
of it is understand the universe and
their um different subobjectives that
that would give rise to. And I think
it's I I think it's ability to um uh uh
help culture process events without
censorship um uh uh or um bi political
bias one way or the other um is is a
stated objective and I I think that
would be indispensable in the years
going forward. Do you buy that they're
that's what they're doing? Because we
also heard the same thing from Elon when
it came to buying Twitter now X. Um but
but I don't know. I think community
notes has been you know quite but that
was something that was built under Jack
Dorsey. I'm not going to take sides
here. I'm going to just observe
empirically what I've seen. I mean we
know that Substack links have been dep
prioritized because it was seen as a
competitor with with um with Twitter. We
know that um Musk I I think according to
reporting changed the algorithm to have
his tweets show up more often and his
tweets took a strong stance towards
supporting Donald Trump in the election.
So to me, the idea that like hearing
again from Elon and again, look, I
respect what Elon's done as a business
person, but hearing again that he has a
plan to make a culturally relevant uh uh
product that's free of censorship and
politically unbiased. Um I don't know if
I believe that anymore. So uh I I don't
know about some of the specific things
about such as the the you know waiting
thing or something like that uh profile
things for instance. Um I I think that
overall in terms of cultural influence
and people being more disagreeable and
um doing less self-censoring uh has been
um has been successful. I think that was
the main objective of it and I so I
think uh and I think that um X had a
large role to play there. Uh so I don't
know I think like I think in terms of
shaping discourse norms um in in the US
and uh um that that seems to have been
successful in my view. Yeah. I'm not
saying pre Elon uh Twitter didn't censor
which is the wrong probably the wrong
word because that's usually from the
government didn't sort of shape the
definition of speech to its own liking.
It obviously had a progressive approach
and moderated speech on a progressive
approach. I just don't think Elon is not
using his own influence when it comes to
how he runs X. But you and I could speak
about this. Yeah. And I I this isn't
even my sort of wheelhouse as much, but
yeah. I mean, it's sort of like uh since
since um you know, do you brought it up,
so Oh, okay. All right. Sure. I mean,
just the the non-biased and truthful
things. So it's worth So I mean it is if
if there are like um ways in which um
it's like extremely biased one way or
the other that's that's useful to know.
This is a um a thing that um uh is
continually trying to be improved um at
least for uh for um XAI's Grock. Um so u
and I think that all the sort of product
offering could could get quite better at
this. Okay. But but but I'm not
speaking, you know, as a sort of
representative there or anything like
that, but I guess maybe in my I guess
right now in my personal capacity, I
think that there's uh things to improve
on for for all these models in terms of
uh in terms of their bias. All right, we
agree on that front.
uh you hinted at it previously, but you
talk a little bit about how companies
basically how you don't think it's a
good idea for there to be an arms race
here and and certainly there is one
between the US and China. Um we know
that US has put export controls on
China. China has in some ways gotten
around them through like very creative
uh procurement processes that go through
Singapore, right? We can probably say
that with a pretty good degree of
confidence. Then of course we see the
release of deepseek and some other AI
applications from China and everyone's
trying to build the better AI so that
they have the soft power like we spoke
about uh to effectively you know a
control like to influence culture across
the world but also it's an offensive
capability um and defensive like you're
saying if your country has the ability
to manipulate viruses or to do cyber
hacks you become more powerful and you
get to sort of you know potentially put
your your uh view of the world implant
your view of the world on the the way
that that it operates. You have a paper
out that's sort of arguing against this
arms race. It's called super intelligent
strategy. Uh it's with you, Eric
Schmidt, who we all know, former CEO of
Google. I think he just started. He's
taking over a drone company. So, you can
tell me a little bit about that. And
Alexander Wang, uh the former, no, not
the former, the current CEO of Scale AI,
who's been formerly on this show. Um
talk a little bit about why you don't
think it's a good idea for uh countries
to pursue this this arms race. You say
it might be leading us to mutually
assured AI malfunction, not mutually
assured like nuclear destruction. I
think that's what you get you get that
from. Yeah. So the strategy has three
parts. One of which is competitiveness,
but we're um uh um saying that some
forms of competition could be
destabilizing and that you um uh may be
irrational to pursue it because you
couldn't get away with it. So in
particular this um making a bid for uh
super intelligence um through um some
automated AI research and development
loop um is um uh could potentially lead
to one state having some capabilities
that are vastly beyond another states.
um if you have if one state gets to
experience a decade of development in a
year and the other one is the year
behind then this uh results in a very uh
substantial difference in the states's
capabilities. So this could be quite
destabilizing
um if um one state might then start to
get an insurmountable lead relative to
the other. Uh so um I think that form of
competition
um would be very um dangerous and
because there's a risk of loss of
control and because it might in um
incentivize states to engage in
preventive sabotage or preemptive
sabotage uh to disable these sorts of
projects. So I think um states may want
to deter each other from pursuing super
intelligence through this means. Um and
um this then means that AI competition
gets channeled into other sorts of
realms such as in military in the
military realm of sec having more secure
supply chains for robotics for instance
and for um for uh uh AI chips um having
reduced sole source supply chain
dependence on Taiwan for making AI
chips. So um states can compete in other
dimensions but them trying to dee uh um
uh compete to develop super intelligence
first I think that's that seems like a a
very risky idea and I would not suggest
that because there's too much of risk of
loss of control and there's too much of
a risk that uh um one state if they do
control it uses it to um disempower
others and affects the balance of power
far too much and destabilizes things. So
um but the the strategy overall think of
think of the cold before you go on the
strategy like my reaction to that is
good luck telling that to China. So I
think it's totally so for the um for
deterrence I think if the US were
pulling ahead uh both Russia and China
may have a substantial interest in
saying hey cut this out um pulling ahead
to develop super intelligence which
could give it a huge advantage in and
ability to crush um uh crush them.
They'd say you don't get to do that. we
are going that we are making a
conditional threat that if you keep
going forward in this because you're on
the cusp of building this then we will
you know disable your data center or the
surrounding power infrastructure so that
you cannot continue building this. Uh I
think they could make that conditional
threat to deter it and we might do the
same or the US might do the same to um
to um China or other states that would
do that. So um um I don't see why China
wouldn't do that later on. right now
they're not as thinking about you know
super intelligence and advanced AI. This
is more of a description of what the
dynamics later on um when AI is more
salient. But it would be it would be
surprising to me if China were saying
yes, United States go ahead build do
your Manhattan project to build super
intelligence. Come back to us in a few
years and then tell us you can boss us
around because now we're in a complete
position of weakness and we'll be at
your mercy um and we'll accept whatever
you say uh or tell us to do. I don't see
that happening. I think they would just
say would move to preempt or uh deter um
uh that type of development so that they
don't get put in that fragile position.
Are you in like the Elazar Yudowski camp
of bombing the data centers if we get to
s intelligence? Well, so I I think do
what I'm I'm advocating or pointing out
that it becomes rational for states to
deter each other by making conditional
threats and by means that are uh less
escalatory such as um uh cyber uh cyber
sabotage on data centers or surrounding
power plants. Um um I don't think one
needs to get kinetic for this and I
think that um if discussions start
earlier I don't see any reason things
need to be escalating in that way or
unilaterally actually doing that. we
didn't need to, you know, get in nuclear
exchange with Russia to sort of express
that we have a preference against
nuclear war. Um when so I think um thank
goodness. So so so um indicating um uh
or making conditional threats through
deterrence uh seems uh like a much uh
smarter move than a um hey wait a second
what are you doing there and then bomb
that that that seems needless. Yeah, I'm
not into that solution either. U but
what you're talking about is sort of
assuming that there will be a lead that
will be protectable for a while, but
everything we've seen with AI is that no
one protects a lead, right? Well, um, if
there's, so one difference is that when
you get to a different paradigm like
automated AI R&D, the slope might be
extremely high such that if the
competitor starts to um, uh, do
automated AI R&D a year later, they may
never catch up just because you're so
far ahead and your um, gains are
compounding on your gains. sort of like
in social media companies, Eric will use
this analogy where um if one of them
starts um blowing up and growing before
you started, uh it's often the case that
you won't be able to catch up and
they'll have a winner take all type of
dynamic. So um uh um right now the rate
of improvement is not um uh that high or
there's less of a path for a winner take
all dynamic currently. uh but but later
on um when you have you know the ability
to to run a 100,000 AI researchers
simultaneously uh this this really
accelerates things maybe open's got a
few hundred maybe we'll say 300 AI
researchers so going from 300 AI
researchers to orders of magnitude more
worldclass ones create quite quite
substantial developments this is
something that isn't you know new this
is something that like Alan Turing and
and the uh founders of computer science
uh had pointed out um that This is a
natural property of when you get uh AIS
at this level of capability then this
creates this sort of um recursive
dynamic uh where um things start
accelerating um extremely quickly and
quite explosively. Okay, I we managed to
spend most of our conversation today
talking about present risks or like
risks in the near future. Um we should
focus a little bit more on intelligence
explosion and loss of control and we're
going to do that right after the break.
And we're back here on Big Technology
Podcast with Dan Hendricks. He is the
director and co-founder of the Center
for AI Safety. Dan, it's great speaking
with you about this stuff. Let's talk a
little bit. You've been sort of talking
about it uh in the first half, but I
want to zero in here on this idea of
intelligence explosion or what you talk
about as basically having AI
autonomously improve itself.
um just talk through a little bit about
how that might happen and whether you
see that being something that is
actually probable in our future. Yeah, I
mean the the basic idea is just imagine
automating one AI researcher um one
worldclass one then there's a fun
property with with computers which is
there's copy and paste. So you can then
have a whole fleet of these. It's it's
well, you know, with humans, you know,
if you just have one of them, you it's
it's maybe they'll be able to train up
somebody else who has a similar level of
ability. So this this adds a very
interesting dynamic to the mix and then
you can um get so many of them um
proceeding forward at once. So and you
know AIs also operate quite quickly.
They can code a lot faster than than
than people. So maybe it's maybe you've
got 100 thousand of these of these
things operating at 100x the speed of a
human. How fast will that go? Maybe
conservatively let's say it's just
overall 10xing research. But 10xing
research would mean say like a decade's
worth of developments in a year. that
telescoping of of all these developments
makes things pretty wild and means that
um
uh one um player could possibly get AIs
that go from like very good, you know,
world class to being vastly better than
everybody um at everything and super
intelligence um um uh something that
towers far beyond um any living person
or or collective of people. So if we get
an AI like that, this could be um
destabilizing because it could be, you
know, used to develop a super weapon
potentially. Maybe it could find some
breakthrough for anti-bballistic missile
systems which would um uh make nuclear
deterrence no longer work. Um or or um
other types of ways of weaponizing it.
So uh that um that's why it's
destabilizing. And um so states then if
they're seeing oh they're you know don't
don't run um this many AI researchers
simultaneously in these data centers
working on to build a next generation or
or super intelligence because if you do
so then that will put us in that will
make our survival be threatened. So them
saying them deterring that uh would um
would help them secure themselves and
they they can make those threats very
credible uh currently and I think we'll
continue to be able to have these
threats be credible going forward. So
this is why I think it might take a
while for super intelligence to be
developed because there'll be deterrence
around it later on and then maybe in the
farther future there could be something
multilateral but um that's speaking you
know quite far out in very different uh
uh economic conditions. In the meantime
with the AIS that um that we'd have in
the future those could still automate
various things and um um uh increase
prosperity and all of that. Uh so we'd
still have explosive economic growth if
you had something that was just at the
an average human level ability um uh
running running for very cheap. So um I
think that those are some of the later
stage strategic dynamics and I don't
think we can get away with um or I don't
think any state could get away with
trying to build a super intelligence go
build a you know big data center out in
the middle of the desert trillion dollar
cluster bring all the researchers there
and this not invite the other states to
go what what do you think you're doing
here? Uh you were at the White House
yesterday. Well, this is largely just
sort of um speaking about some of these
uh you know strategic implications. Are
they receptive? Yeah. I mean, it's it's
a um it it's a
uh
um there's there's always um interest
in, you know, um thinking what what are
some of the the um later term dynamics,
what things should happen now and
whatnot. But this is a
um yeah I I think I think when people
think White House it's you know sounds
you know where the president lives.
Well, so there's the Well, yeah. So,
there's there's the Eisenhower building
which is, you know, part of the White
House kind of not, but you know, that's
where everybody's works and what I think
um uh you know, some of the things we're
speaking about here like verology
advancements, things like that. There's
just a lot of, you know, things to uh um
uh speak about and uh um think what what
things make sense or what things to keep
in mind going forward. So, yeah. Yeah. I
guess I'd rather an executive branch
paying attention to this stuff than not.
Yeah. Yeah, that's right. Yeah. Yeah.
And um what what are the sort of ways
that help you know um maintain
competitiveness because you know how
people will normally think about this?
They'll think it's all or nothing and
good or bad thing and instead we're
saying no it's dual use. So that means
there are some particular applications
that are concerning and there are other
applications that are good and you want
to stem the particularly harmful
applications and what are ways of doing
that um while capturing the upside.
Right. Okay. So the intelligence
explosion part of this conversation
Neilie brings up the loss of control
part where to me I think the thing that
when people think about AI harm they are
always worried that AI is going to
escape you know the simulation or
whatever it is and act on act on its own
and try to basically ensure that it
preserves itself. Uh, we've seen it
recently. I think I brought this up at
the beginning of the show, where
Anthropic has done some experiments
where the AI has run code to try to copy
itself over onto a server if it thinks
that its values are at risk of being
changed. Um, is this so it's it's fun to
think about, but it's also like probably
just probability like if you run it
enough times because it's a
probabilistic concerning though if if
you if it was like, oh, it's only one in
a thousand of them intend to do this.
Well, if you're running a million of
them, then you're basically certain to
get many of them to try and, you know,
self-exfiltrate. And so, are you worried
that this ex selfexfiltration is going
to be a thing? Um, uh, I think from a,
you know, a recursive automated AR and D
thing, I think that has really
substantial um, probability behind it of
of a loss of control in that situation
to turn you're worried about this. So
there's there's that, but I would
distinguish between that and these sort
of things that are not super
intelligences or things that are not
coming from that sort of really rapid
loop and like the currently existing
systems. I think that the currently
existing systems are relatively
controllable or if there is some very
concerning failure mode um we have been
able to find um ways to make them more
controllable. For instance, um for um
bioweapons refusal, um we used not to be
able to make robust safeguards for that
two years ago, but we've done research
like with meth methods such as like
called circuit breakers and things like
that and those seem to improve the
situation quite a bit and make it
actually prohibitively difficult to do
that jailbreaking. And so maybe we'll
find something similar with um uh self-
excfiltration. So, I think people
generally want to claim that like, oh,
current AIS are not controllable. And I
think that they're they're not highly
reliably controllable. They're they're
reasonably controllable. Um, maybe we
could get um uh some or it seems
plausible that we'll get to have
increasing levels of reliability. And um
so um I'm sort of reserving judgment. Um
it'll depend more on the empirical
phenomena. So I think everybody should
research this more. um and uh um and
we'll sort of see what the risks
actually are. But there are some that uh
seem less empirically tractable uh or
things that can't be empirically solved
like this loop thing like how are you
going to you can't run this experiment
100 times or something like that and
make it you know go well you're making a
huge attempt at building a super
intelligence and has destabilizing
consequences like this this isn't
something that um that that's totally
unprecedented and for that you have more
of like a one chance to get it right
type of thing but with the current
systems we can continually um adjust
them and retrain them and come up with
better methods and iterate. So, um uh it
is concerning. It would not surprise me
if this would really start to make AI
development itself extremely hazardous.
Um instead of just the deployment, but
instead inside the lab, like you need to
be worried about the AI trying to be
breaking out sometimes. Um that's
totally totally in the realm of
possibility. Um but yeah, I could see it
going either way. Yeah, I mean this
personally freaks me out because yeah,
if you see the AI trying to deceive
evaluators, for instance, or you see the
AI trying to break out, um you really
can't trust anything it's telling you.
And we had Demis on the show a little
while ago, and he's basically like,
listen, if you see deceptive behavior
from AI, if you see alignment faking,
you really can't trust anything in the
safety training because it's lying to
you. Uh there is truth to that. Are you
seeing deceptiveness at Grock by the
way? Oh yeah. Yeah. So we have a paper
out um last week we're just measuring
the extent to which they're deceptive
and in the scenarios we had like all the
models were in these sorts of scenarios
under you know slight pressure to lie
not being told to lie but just some
slight pressure. Uh then some of them
will like lie like 20% of the time some
of them like 60% of the time. So they
don't really have this sort of um virtue
sort of baked into them the virtue of
honesty. So I think they'll we'll need
to do more work and we'll need to do it
quickly. So I I'm sort of you speaking
in a more nonchalant way about this, but
I can't like, you know, get worked up
about every single risk because or else
I just, you know, be be at 11 all the
time. So there are some that I'm, you
know, putting in different tiers
different tiers than than other risks.
And uh um this is a a more speculative
one. We've seen these sometimes get
surprisingly handleable. Um but yeah, it
could end up making things really really
bad. Um we'll see. Um we'll we'll do
things about it to make that not be the
case. Okay, thank
you. Uh two more topics for you, then
we'll get out of here. Uh the Center for
AI Safety, who's funding it? Well,
there's not sort of one funer. It's
largely just various philanthropists.
The main funer would be um uh Yan Talon
and um uh Yon Talon, others, who's a
Skype co-founder, right? Um there's a
variety of of other uh philanthropies or
philanthropists the generally for um uh
generally for so for instance Elon
doesn't I've never asked him to to fund
the center so that isn't to say I don't
get any money from Elon my appointment
at XAI I get a dollar a year um uh um at
scale I've uh at scale AAI I've
increased my salary exponentially to
where I get $12 a year, a dollar per
month from scale. Uh but I I'll try not
to um or I'll try to avoid you know
getting complicate having some
complicated relations with them just so
that I can uh you know um not not feel
on behalf of of any of them in
particular. You're basically doing the
work for them for free. Well, but it's
useful, right? It's useful to do. Um and
uh I mean yeah I mean I think the main
objective is yeah just try try and try
and generate some value here and as best
as one can so um um um by reducing these
sorts of risks and yeah I think it's
it's it's a good arrangement because it
enables me to like you know it's do have
a choose your own your adventure type of
thing of like oh now I think the the
politics or geopolitics this is more
relevant so now I can go off and learn
about this for some months and and uh
work work on a paper there and um
compared to if it's like no you got to
be coding 80 hours a week that's that's
your job. Yeah. That that would be quite
restrictive and I couldn't be speaking
with you. So I'm glad you're here. So
thank you Alex Wayne.
Um, so let's talk a little bit about
about this funding because uh I think
that after Sam Alman was fired and then
rehired at OpenAI, there was a sort of
skepticism around effective altruism's
impact on uh the AI field and even Yan
Talon I'm reading uh from his statements
right after the open AI governance
crisis highlights the fragility of
voluntary AI motivated governance teams.
So the world should not rely on such
governance working as intended. Uh now
Jan is of course associated with EA. EA
is like basically leading the
conversation around AI safety. Um is
that good? Uh so I think that in terms
of Yan I think he's he's funded
organizations that are um EA affiliated.
I don't know if he'd call himself that
but whatever you know people can you
know ascribe labels how they'd like. Um
I think that the um I mean I've tweeted
um that EA is not equal to to AI safety.
I think that EA community generally um
is insular on these. So I lived in
Berkeley for a a long time when I was
doing my PhD and there's sort of a
school a sort of AI risk school that was
um uh had very particular views about
what things are important. So malicious
use for instance when I was when I was
talking about malicious use in the
beginning of this thing that you know
they historically really against that
yeah be only loss of control don't talk
about malicious use the other that's a
distraction um and um and so that was
annoying because I'd always been working
on robustness as as a PhD student um
where this the main thing was malicious
use um uh so um yeah I ended up leaving
um Berkeley even before graduating just
because of the sort of um relatively
suffocating atmosphere and the sort of
central focus on there'd be some new fat
and you'd have to you know get
interested in Abby elk eliciting late
knowledge this is the important thing
that you have to focus on or you have to
focus on inner optimizers there's lots
of these speculative
um empirically fragile things so for
instance this alignment faking stuff
that you're seeing like there's some
concern there but you know I'm not
totally sold that this is like a top
tier type of priority. But um in these
communities, this is all that matters
currently. Roughly speaking, this
involuntary commitments from AIS from AI
companies. I think voluntary commitments
from AI companies are also a
distraction. Um uh because the companies
will you should expect most of them by
default to just break those sorts of
commitments if they end up going up
against economic competitiveness. Okay.
Um so I think it's a distraction
relatively. And so I think it's um I
think um there are many people who think
that the um EA's broadly their influence
on this sort of thing has not been
overall positive. Um I think at least
for me and making and other sorts of
researchers in this space who've been
interested in AI risks um uh the the
amount of pressure to adopt some
particular positions though on this be
extraordinarily high and I think quite
quite um destructive. So I I'm very
pleased now that in um I think the most
re in the past year or so there's been a
lot more um diversity of opinion um uh
which has been quite quite important. So
and I think this is just because the
broader world is getting more interested
in AI. So a lot of these um a lot of
these uh you know fixation on this is
the one particular risk, this is the
most important risk and everything else
is distraction that just doesn't work
when you're speaking with the or
interfacing with the real world. There's
a lot of complications um uh and AI is
so multifaceted. So you can't in your
risk management approach can't just be
focusing on one of them. Right. So
you're not an effective altruist. Um I
don't I don't um think of myself as
that. I don't particularly um u uh get
along with the this part this this
school of thought this sort of Berkeley
AI alignment
um uh monolith um and uh and I'm pleased
that uh people can be more op
independently operating in the space now
uh which I don't think was the case for
many many years including basically the
entire time I was doing my PhD and
there'll be many people like Dylan
Hatfield Manell a professor at MIT who
was also at Berkeley at the time very
suffocating Rohan Shaw researcher at
deep mind very suffocating that they all
all feel this way yeah okay uh let's
let's bring it home we've been talking
for more than an hour about AI safety as
if it's controllable uh but open source
is like really putting up a pretty
valiant effort in this field keeping
pace with the proprietary labs and of
course open source is not controllable
um what do you think about that I mean
we just saw deepseek uh not to, you
know, go back to it all the time, but it
it effectively equaled the cutting edge
at uh the proprietary labs and, you
know, put the weights on its website.
So, how can we possibly have a
relationship of safety with AI if open
source is out there exposing everything
that's been done? So, I've um been I
haven't been endorsing open source
historically, but I've thought that
releasing the weights of models didn't
seem robustly good or bad. So I sort of
was like it's fine seems to have
complicated effects. There's an
advantage to it which it helped with
diffusion of the technology um so that
more people would have access to it and
sort of get a sense of of AI and the
increase sort of literacy on this topic
and just increase public awareness and
um get the world more prepared for for
more advanced versions of AI. Um so
that's been my um uh historical position
but this depends on it should always
proceed by a costbenefit
analysis. So if the if for instance they
have these cyber capabilities later on
um yeah I think that or I think that
would be a potential place to be drawing
the line on um on uh openw weight
releases. uh personally um uh in
particular the ones that could cause
damage to critical infrastructure um you
could you could still capture the
benefits by having the models be
available through APIs um and if they're
like software developers they have
access to these you know more cyber
offensive capabilities but if they're a
random user they don't if they're random
faceless user they don't um and uh um
and likewise for verology once there's
consensus uh once the capabilities are
so high that there's consensus about it
being expert level in verology. I think
that would be a very natural place to be
saying having an international norm not
not saying a treaty because those take
forever to write and ratify but um uh to
to um a a norm against open weights if
they are expert level verologists for
the same reasons that we had the
biological weapons convention. Russia
and or the Soviet Union and the US you
know got together for the biological
weapons convention. US and China did as
well. Um, we also coordinated on
chemical weapons with the chemical
weapons convention and the nuclear
non-prololiferation treaty. States find
it in their incentive to um work
together to make sure that rogue actors
do not have um um extremely hazardous um
potentially catastrophic capabilities uh
like chem bio and and and nuclear um uh
inputs. So, um I I think something
similar might be reasonable for AI when
they get at that uh capability
threshold. Ben, I I
am at once kind of reassured that people
are thinking about this stuff, but also
more freaked out than I was when we sat
down. Uh but I do appreciate you coming
in and giving us the full rundown of
what to be concerned about and what
maybe not to be as concerned about as we
think about where AI is moving next. So,
thank you so much for coming on the
show. Yep. Yep. Thank you for having me.
This has been fun. Super fun. If people
want to learn more about your work or
get in touch, how do they do that? Um, I
guess this the paper or strategy we've
been speaking about is at national
securitycurity.ai and then I'm also on
Twitter or X or whatever it's called.
Uh, should be you should know X.com
there.
x.com/danhris. It be another way of of
following the going ons and as the
situation evolves we'll uh we'll keep uh
trying to put out work and you know
seeing what's going on with these risks
and uh if we come up with you know
technical interventions to make him less
uh we'll also uh put that out too. So
yeah uh that's that's where you can find
me. Yep. Well godspeed Dan and uh we'll
have to have you back. Thanks again. All
right everybody thank you for listening
and we'll see you next time on Big
Technology
Podcast. Heat. Heat. N.
[Music]