What The Ex-OpenAI Safety Employees Are Worried About

Channel: Alex Kantrowitz
Published at: 2024-07-03
YouTube video id: dzQlRt3y5mU
Source: https://www.youtube.com/watch?v=dzQlRt3y5mU
an exop aai super alignment team member
joins us to share his concerns about the
company's trajectory along with his
lawyer the Harvard law professor
Lawrence leig who will shed light on the
lack of protections for those who speak
out all that and more is coming up right
after this welcome to Big technology
podcast a show for cool-headed new On's
conversation of the tech world and
Beyond we have a great show for you
today we're finally going to speak with
some of the people behind some of the
concerns you've been hearing about the
the trajectory of open AI especially
with regard to the alignment work within
the company or really the super
alignment work um so we're joined today
by a former member of that super
alignment team William Saunders is here
welcome
William thanks for having me on thanks
for being here and it's my great
pleasure to welcome Larry leig back to
the show he's a professor of Law and
Leadership at Harvard Law School and
he's also representing William pro bono
here as he goes uh and I guess many of
his colleagues as well as he goes and
speaks out about these is isues welcome
Larry great to be back let's begin just
talking a little bit about the vibe
within open AI so William you left a few
months ago what was the vibe and the the
what was the sense you got of open AI as
you were heading out there like take us
a little bit inside the company so we
can understand the environment from
which you're coming out
of so during my three years at open AI I
would sometimes ask myself uh a question
question was the path that like opening
eye was on more like the Apollo program
or more like the Titanic and you know
the Apollo program was about like
carefully predicting and assessing risks
in doing groundbreaking science um
building in enough safeguards to be able
to like successfully bring astronauts to
the Moon um and then even when big
problems happened like Apollo 13 uh they
had enough sort of like redundancy and
were able to adapt to the situation um
in order to bring everyone back
safely um whereas the Titanic you know
came out of this um competitive race
between companies to keep building
bigger and bigger ships um and sort of
ships that were bigger than the
regulations had been designed for um
lots of work went into making the ship
safe um and and and you know building
watertight compartments so that they
could say that it was Unsinkable um but
at the same time there weren't enough
life vots for everyone and so when
disaster struck you know a lot of people
died um and open AI claimed that their
mission was to build safe and beneficial
AGI and I thought that this would mean
that they would prioritize you know
putting safety first um but over time it
started to really feel like the
decisions being made by leadership were
more like the White Star Line uh
building the Titanic prioritizing
getting out newer shinier products um
than you know really feeling like NASA
during the days of the Apollo
program uh and I really didn't want to
end up working on the Titanic of AI um
and so that's why I resigned it's kind
of interesting that you use those
examples and not the Manhattan Project
which is kind of the one that's been
people have brought up like the power
and the destruction potential of nuclear
energy has uh been been something that's
been talked about and I don't know if
Sam has compared himself to Oppenheimer
but I've definitely heard some people
make those comparisons why did you shy
away from that as long as we're going
through the analogy lens why do you shy
away from that one I think that is
another valid analogy um you know I
think this
example of the Titanic you know makes it
sort of clear that there were
like again like a uh you know safety
decisions that could have been made
better
or something right I think the Manhattan
Project is more the analogy for like the
scope of impact that this technology
could have the companies are claiming
this technology will have and are
raising you know billions and billions
of dollars uh based on this premise of
the scope of impact I think it's also a
tale of sort of scientists who set out
building a technology wanting to do
something good in the world right the
the the reason the Manhattan Project got
started is because science scientists
looked at what was coming what was
possible and were terrified that you
know Adolf Hitler would get the bomb
right and that this would be you know U
absolutely terrial for the world and
that's why they went to the Americans
but somewhere along the way uh you know
at some point like Hitler you know was
dead Germany had surrendered and yet the
project went on you know
um and again that's another situation
that like I would really not like to
find myself in right and I was going to
ask you whether you think openai is a
product or research company and
obviously it's both but the question is
what leads and reading between the lines
or maybe just hearing you explicitly
your belief is it's a product company is
that right
um yeah product or research I think
it's I I do think openingi
if it was it's it's a bit different from
just trying to make like the products
that are most useful today it's coupled
with a vision of the research for um how
to build towards something called AGI or
artificial general intelligence which is
uh you know Building Systems that um are
you know as smart as most humans and can
do most economically valuable work that
humans can do AKA can like do most jobs
of people and so it's
like the combination of these Visions or
something is is is something that I'm
more concerned about it's where that
that they're like building on a
trajectory where they are going to do
like AGI as stated will be a tremendous
change to the world
um and so it's they're on this
trajectory to like change the world and
yet when you know they they release
things their priorities are more like a
product company uh and I think that is
like what is most unsettling okay so
let's let's keep that in mind and get
back to it in a bit but you also worked
on super alignment which is anticipating
that this is going to grow Beyond human
capabilities can you talk a little bit
about what the super alignment team does
or did actually since it's now
dissolved yeah so um again there's this
idea of building you know AGI which is
like about a smart as a human and then
from there companies will want to go and
build super intelligence if you have the
the blueprint for building something as
smart as a human then you like run a
bunch of copies of it and they try to
figure out how to improve the Brew plant
and make itself even smarter um and then
this this creates sort of um this
problem of what do you do if you know
you're trying to get advice or you're
delegating decisions to someone who is
like genuinely smarter or more informed
than you you know um like let's say you
have like you know an expert lawyer um
you know who is like smarter than you
and knows the law better than you how do
you know if the advice that you're
getting from them is good when you're
not a lawyer yourself um and so like
this is the kind of problem uh that we
would try to tackle in sort of um in in
in the AI context if you have an AI
That's producing answers and you can't
immediately tell are these answers good
or bad so it's not about like you know
Google you to like eat rocks everyone
knows this is bad but if you go to
Google for advice on a medical question
that you don't understand or advice on
you know like a scientific question uh
where you don't know the answer you
can't really tell and so like what some
of the research that I did at open AI
was you know trying to develop
techniques for this and the simple
technique that we tried and show that
worked was just ask a different AI
system or even the same AI system is
there a problem with this answer you
gave me and then the AI will sometimes
put out a list of problems and say like
hey this part of the answer was like
made up it's not supported by any
evidence or you're leaving out this
important piece of information and then
we showed that if you just if you show
people both an answer and a set of
possible problems people are better at
spotting um when the AI system has made
a
mistake right and so this is an example
of how do you like try to you know deal
with the situation where you're getting
advice or you're getting answers and
information and you can't tell
immediately whether it's correct Y and
so you're able to do this work within
openai uh yet you did leave out of some
concerns and then you came out with this
letter recently talking about how there
needs to be a right for people within
these companies to warn the public about
some of the concerns they have here's a
quote from the letter there is no
Effective Government oversight of these
corporations current or former current
and former employees are among the few
people who can hold them accountable to
the public yet broad confidentially
broad confidentiality agreements block
us from voicing our concerns so we're
going to get into those agreements in a
bit with Larry but you obviously signed
the agreement you're here talking
today the main question I think the
public has from you and the people who
have signed this letter is did you see
something like did you see something
concerning enough inside the company to
Merit speaking out here and if so what
was it
well I I want to actually make sure that
we're clear about what we're going to
talk about here so um it's perfectly
fine to say whether there was something
was seen but the reason for the letter
is that answering the second part of
your question what exactly did you
see uh is very difficult because to the
extent what you did see is within the
scope of a confidentiality agreement
that's the very concern that the right
to warn is trying to address so I mean
William obviously when we talked and is
quite competent to make this kind of
Distinction but I want to make sure that
we're not trying of walking down a path
that's going to actually trigger the
kind of concern that the letter was
trying to respond to but okay so so
let's just talk about that because we've
talked about this on the show a bunch
and the main concern that we have here
is if members of the super alignment
team or members of the team within open
AI saw something I mean you're talking
about like again like this PO this thing
that we've already compared to the
Titanic and and nuclear weapons if
you've seen something concerning enough
inside the company that sort of merits
an alarm isn't it a duty to share that
with the public no matter what the sort
of legal ramifications might be now I
know it's easy for me to say in my
position um and and maybe you don't even
have to share the specific like thing
although i' prefer you do but the
question of did was there was is there a
piece of technology inside that company
that we don't know about that is Rises
to the concern of let's say Titan even
Titanic level proportions that you're
concerned and want to warn about like I
feel like we should just hear what it
is
so to set some things clear if there was
if if there was a group of people that I
knew were being seriously harmed by this
technology first I still really hope
that open AI would like do the right
thing and address this if this was like
a a very clear-cut case um I also
personally would um you know ignore any
sort of like uh considerations of how I
might be retaliated against and I would
talk about that plainly um so that's not
what I was seeing it's also I don't
think that I was working on the Titanic
I don't think that gp4 was the Titanic I
more am afraid that like GPT 5 or GPT 6
or gpt7 might be the Titanic um
in in in this analogy and
so I can talk
about um maybe like a couple of areas
here uh so one is the a former colleague
on the super alignment team leopo Ashen
Brunner has like talked on a podcast
about how he you know was like looking
you know asking some questions about
like how the internal security uh was
working at the company and then uh you
know wrote a document containing some
concerns he shared this around um he
then got reprimanded uh because you know
some of the concerns um offended some
people and you know personally I would
have like written it in a different way
um and I think but the dist the
disturbing part of this was that you
know um the only response was
reprimanding the person who raised the
these concerns right it might be
reasonable to reprimand the person and
then be like like okay but these parts
of these concerns we're taking these
seriously and we're going to address
them that was not the response and then
he later talked about like this was one
of the reasons uh that was offered for
him being fired um so I guess what I'm
narrowing I'll let you do the the other
example in a second but like what I'm
narrowing down on is the people that
have raised concerns within the super
alignment team it's not like they've all
seen some powerful dangerous technology
that they don't believe uh open AI like
in the immediate term they don't believe
open is going to handle appropriately
it's more of like what what is the path
in the future and that's where this
right to warn thing comes down I'll say
at the beginning we're going to get into
right to Warrant I'm fully in favor of
this right to warrant and I'm really
glad that you brought it up but I think
that it's important for the public just
to establish that like inside open AI
today there's this group that's just
left it's not like there was this
question like did ilas see something
right did you guys see something it's
not that there's something immediate and
harmful that you've seen it's more of
like you're concerned about the path
that this company can go down is that
right yeah and I think you know this
right to warrn this is a right to warn
responsibly this is not a right to like
um you know the like uh you know cause
like unnecessary Panic or something um
but and I am most of my research was
driven by again concerns along this
trajectory that the companies are going
along have demonstrated progress on and
are raising billions of dollars to keep
going along this trajectory but I do
think there could be like there could be
things happening today that we don't
know about and so um the the sort of
scenario that I'm worried about
happening today is suppose that there's
some group of people that wants to you
know spread a lot of disinformation on
on social media let's say they want to
manipulate an election or they want to
you know Insight violence um against a
minority ethnic group and let's say that
they're in a non-english speaking
country um and so you know again most of
the people at open ey speak English most
of the alignment work is done in English
and so like it would be you know
somewhat harder for the the the company
to notice this now like the models are
like safety trained um to like refuse
requests to do things that are
inappropriate um you know or like that
that seem like they might be harmful now
you know open AI has like caught actors
like generating disinformation
um and so clearly they were able to
either like you know the safe the safety
training didn't apply or they were able
to bypass it right there's this
technique of jailbreaking where you like
change how you frame the request um in
order to like bypass the safety limits
maybe you just ask it in a different
language or you tell some story around
it so now you got a group of people they
have the ability to like get the model
to like you know go along generating um
lots of
disinformation um and then you know the
the the other line of defense that you
might have here would be monitoring um
the company looking at it
and I have you know concerns that like
there might be a lot of ways that
monitoring might miss things um so for
example like some systems that the
company has talked about involve using a
very like small and dumb language model
to like monitor what a larger language
model is doing and this will like
clearly miss a bunch of things um you
know or there are like there might be
ways to like send requests in that like
go through some pathway that is just not
subject uh to monitoring or like you
know the the people can't look at what
the what the actual requests and the the
completions are just because the company
just doesn't store that um and so now
you might have a group of people like
generating massive amounts of
disinformation using like opening eye
products and you know the company
wouldn't know about it um and it's like
you know in this in this case if this
happens in an English speaking country
you know somebody might notice and
eventually tweet about it and the
company would like find out through that
pathway right but if it's in a
non-english speaking country right you
know I don't know how big it could get
before people would notice and I really
would want a company like taking this
kind of step to you know have here you
know th this is a scary story right tell
me why this tell me why this can't
happen and like you know have somebody
who is outside of the company who can
like have an independent assessment that
can say like yes this can't happen you
know and then I will like be able to
rest easy um but I can't yeah and I want
to get to Larry here because we should
talk about the broader context here
which is that first of all Believe It or
Not William is the first open AI
employee that we've had on the show in
four years he's the first one expressing
criticism of open AI from within or like
from previously within for four years
and recently we kind of had a you know
at least from the from the former
employee standpoint we had we figured
out why and that is because there are
these broad non-disclosure agreements
that um that opening eye employees have
to sign before they leave so I think
we're going to talk about those first
and then we'll talk a little bit about
this uh I guess new regulation or law
that you are advoc you're both
advocating for which is basically a law
that's going to allow employees to or I
don't know a rule that will allow
employees to whistleblow even if there's
nothing imminently illegal uh that's
happening within but let's talk about
the ndas first so when people leave open
aai what are they forced to sign and why
has that made it so difficult for people
like William to speak out okay but I
want to actually First tag on to
something William just said great which
I guess I think it's really important
you know I first got into this space of
whistleblowing protection helping
Francis hgan um who was the Facebook
whistleblower and what William just
described of course is what happened
inside of Facebook with the um myamar
rongi um genocide um which was basically
the technology wasn't able to monitor
the hate uh that was being spread by the
government in that country um which led
to you know tens of thousands of people
being murdered um and while this is
happening people in the company are
trying to raise the alarms um and the
company's not willing to devote the
resources necessary to address the harm
that they are able to demonstrate their
company's performance that is committing
and the reason why that experience is
relevant is it's it shows exactly why
you can't rely on the company alone when
you've got a company in a deeply
competitive market that's focused on a
you know Facebook's had a single
Dimension which is like user engagement
like are we continuing to meet our our
Target and uh an employee you know there
are a lot of great engineers in that
company who raised concerns that were
valid and and serious but if they're
inconsistent with the objective the
company it's not going to do anything
about it um and so that's the structure
uh that's uh re real in Silicon Valley
that you've got to build around and
that's why what we what we're talking
about which we'll get to in a second as
you've said uh uh guarantees that any
concerns that are raised are not just
raised to the company they're raised to
people outside the company who can do
something about it now as to the what
you know what you Bound by I got
connected to this group this incredible
group of open AI um employees
employees when I read about um uh the
struggle that Daniel um had gone through
where um another X open AI employee yeah
who who sign the agreement who who
believed as he was leaving that by not
signing an agreement he was giving up as
New York Times reported um something
like $1.7 million in equity um and and
when I read that I was like you know wow
I mean I don't know many people who
would give up $1.7 million just for the
freedom to speak um that's interesting
like what is it that you think you need
to say um and when he raised that
concern and it we began to talk to
people in the circle of the company very
quickly the company
realized um that the agreements they
were forcing people to sign were
technically just not legal agreements in
the state of California um Equity is
wage in the state of California if you
earn Equity you're vested Equity it's
like your wages and when you leave they
can't say oh here's a bunch of other
additional uh agree uh terms you must
agree to in order to take what you've
already earned so the non-disparagement
part any other additional obligations
that were uh demanded were not actually
obligations that could be enforced and
right now um the company's in the
process of revising and um putting
together exit packages that are
consistent with the law and um I'm
optimistic we're not nothing settled yet
but I'm optimistic we going to get to
place that the company's rules are
exactly right that they you know they
say you're leaving remember you've got
secrets you can't share um and uh don't
um but um of course you're allowed to
share secrets with government uh uh
investigators or people who are doing um
work with the government for safety
purposes they're not trying to block any
of that and to the extent they do that
what they're doing um is consist is
going to be consistent with the law but
we're in a transition right now and it's
not yet fully resolved exactly how much
they've accomplished and how much this
still needs to be
done okay so that's the the it's
interesting so those non-disparagement
that employees had to sign or else they
could face their Equity being clawed
back seems like they're both like non-
enforceable and potentially being
revised which is very interesting good
news um I think open AI also says that
they've never clawed back any Equity nor
they ever intend to but
it's making people sign the agreement is
strong enough but then's that's right
because I mean I've spoken to ex
employees who've said look I've not done
XY and Z because I've feared the club
act so they can say they never enforced
it they didn't need to enforce it to
have the effect which it had for a
significant number of people especially
when you've got you know uh people who
are being very um conscientious about
the kind of obligation they're going to
accept for themselves or not and so if
they sign something like that they're
going to live up to it so it has an
effect whether whether they enforced it
or not and that's the problem with the
agreement right and then so that's when
people leave so the deeper question is
what happens if people are inside the
company and they see something they
don't like and are they able to speak
out because let me see if I get this
right in a normal whistleblower
situation let's say you're an Enron and
you see the company committing tax fraud
that's obviously illegal you'd be
protected on whistleblower statutes but
if you're within let's say an open Ai
and you find that the development of the
technology is moving towards artificial
general intelligence or super
intelligence in a way that you
find dangerous you're not allowed to say
anything because we don't have any laws
against developing super intelligence so
there's actually so there's two things
that go together that's very important
in this context so one thing you're
right there's not a lot of regulations
so there's not an FAA or an FDA um um
sitting on top of the company that has
imposed regulations that the companies
either living up to or not living up
to um but you know some agencies like
the SEC takes the view that most
anything could potentially be the sort
of thing you'd have privilege to
complain to the SEC about because it
could potentially affect the the value
of the company and to the extent it's
potentially affecting the value of the
company it raises SEC concerns so if you
say you're following the following
safety regime as the company in order to
make sure AGI is safe and then you don't
follow that regime the sec's view is you
can come out and tell the SEC you can
you can whistleblow to the SEC and the
SEC would um consider whether that's
something to act on the problem is this
is the second part Engineers inside of
companies like this or policy people
inside of companies like this need to
have confidence that the people they're
talking to know what the hell they're
talking about right so it's one thing to
imagine you know an AI safety Institute
where you can imagine going to that and
talking to people like you people who
have a really good sense of like what
the risks are what the technology is and
explaining here's why you think there's
a concern it's another thing to imagine
like calling the SEC and telling the SEC
here are the seven safety related
concerns that I have because you're
you're very anxious that they understand
it and are able to act on it in the
appropriate way and so that's why this
is a kind of unique situation it's both
that there's not adequate regulation so
there's no regulator on the scene and
that it's a technical field that doesn't
easily open up to like non-technical
lawyer types the sorts that are going to
be working at the SEC and that's why you
know when I spoke to the um the
employees that I representing it became
clear that they wanted to kind of craft
something that was different and new and
that's what the structure of the right
to Warren is trying to produce so are
you advocating for both a new regulatory
agency and a rule to protect AI
whistleblowers what are you going to try
to get at here well my own View and I
won't speak for the uh my clients here
my own view is um yeah absolutely there
needs to be a regulatory agency that is
overseeing this I'm not sure what the
structure of it is it's kind of academic
to talk about it given the dysfunction
of the federal government right now but
yes I other countries are building
things like this and we ought to be
doing the same um um and if there were
such an agency it itself would have lots
of whistleblower protections built in
and that would maybe obviate a
significant chunk of the need for the
rule but the rule that we're talking
about is a rule that initially we're
trying to get um companies to embrace um
you know I think the most interesting
part of The Whistleblower the right to
warn is the third point where it talks
about creating a culture of criticism
where the company says look we want you
to critize us we want you to tell us
what's going wrong um we want to
encourage that we're not going to punish
that because that's the way we become
the safest kind of company that we could
be uh and so that's really about create
uh the company itself creating that and
then the other part that I think is
really critical is that the company says
we agree if we create we we'll create
this uh structure that says you can
complain to us and to a regulator and to
an independent uh AI uh like a safety
Institute you can do all three of those
things confidentially and
anonymously um and if we do that we
expect you will use that channel um and
if we don't do that we acknowledge you
can use whatever channel is necessary to
make sure that these safety concerns are
out there but that's obviously designed
to create a strong incentive for them to
build a channel for warning that
protects but opening I would say they
already have that channel for warning so
this is what they've said in the Press
reports they say they have avenues for
employees to express their concern
including an anonymous Integrity hotline
and a deployment safety board that they
run products through right but that's
the company alone so what I said is it
has to be all three of those things
together right right so it's the company
and the regulator and the AI safety
Institute so that again like we saw with
Facebook lot of complaints were made to
sa Facebook about the safety or the lack
of safety of their product the company
didn't do anything about it and so the
concern here is you need to have
external review as well and that's why
the channel has got to be a channel that
goes to three of these entities so that
you know we have some confidence that
somebody's going to do something if
there's something that has to be
done well I'm just curious from your
perspective do you think going to the
SEC Like Larry described is something
that you or your colleagues would
consider given you know if there were
things that you saw that didn't sort of
hold to the safety protocols that
opening eye had lined out or is that a
non-starter it's a really important
point about the SEC it's one of the
great things about the SEC is that
anybody going to the SEC goes there
confidentially and
anonymously which creates this weird
circumstance that you know um it's a
hard question to
ask uh somebody in this context right
because what they how could they answer
honestly in the context of this but you
know it is interesting to figure out
whether you know William you know is it
enough would you imagine it's enough
just to have something like the SEC as a
way to complain about this yeah so what
I would really want if I went to
whistleblow is to you know have like
somebody on the other end of the phone
or the other end of the message line who
I know really understands the technology
and I don't know to you know who at the
SEC would be the person who would really
understand the
technology um I think that like a model
that I think might work you know I I I
think I personally think would work
better would be like the model more
proposed in California um Senate Bill
1047 where there would be like a you
know where the law would create like you
know uh the the office of the the
California attorney general um as a
place where you could submit whistle
blower complaints too and you could have
like you know if you had uh employees
who understood the technology there and
you could talk to them you know and
ideally this doesn't doesn't need to be
like ideally this is not a high stakes
conversation ideally you can just like
call up somebody at the government and
say like hey I think this might be going
like a little bit wrong what do you
think about it and like talk to them and
they can gather the information and then
hopefully they say like okay this isn't
actually that bad you know and then you
can like get on with your day um you
know I think the thing to fight for here
is like being able to really like you
know talk about things before they
become big problems um and in that
circumstance the SEC is
insufficient you know going to the SEC
sounds like very
intimidating um and you know it it
sounds like the sort of thing one would
only do you know
like you know it would be again it would
be better to be like you know again like
be able to to talk to somebody in some
agency who understands the technology
and understands you know what the the
the safety like system should look like
um one more question about this
potential agency would they have then
sort of penalty power or what sort of
power would they have um and would it
require like effectively like a new law
to be to be written what are the
technicalities once it comes to Federal
level absolutely it would take a law I
mean you know agencies like the FTC
believe they have lots of inherent
jurisdiction that they could set up
something that would be close to this
but the kind of thing that would
convince people like William um would
require legislation the way California
is 1047 like has legislative um uh
structures that they're creating um and
what's interesting about this type of
legislation is that you know your P Doom
does not have to be extremely high to
believe that it makes sense to have a
system of warning like you don't put a
uh you know a fire alarm inside of a
school because you really believe the
school's going to burn down it's just
that if the school's on fire there ought
to be a way to pull in alarm right and
so what's interesting about making the
argument for this type of uh ability to
warn is that you can bring along people
who are not yet convinced there's
something really to worry about here
it's not the end of uh end of days um
and just say look let's just have an
infrastructure in place um at least you
can see why there could be a problem uh
and if you agree that there could be a
problem let's just make sure it doesn't
manifest into something really
destructive and then again on the
enforcement question what would this
agency be able to do if it actually sees
an issue Alex it's a great question I
haven't thought about it um I mean
because you know
enforcement modern enforcement needs to
be very different from you know
historical enforcement here so it's not
you know if it's going to satisfy
Williams objective that it doesn't you
know it feels like an ordinary thing
that you can do without believing you're
getting the company shut down there's
got to be some moderation and an
opportunity for like just engagement um
you know it's interesting to think in
the context of like do doctors I mean
I'm don't know this firsthand obviously
but you know surgeons um and hospitals
have procedures for like reporting and
talking about mistakes that have been
made um and a certain immunity that goes
with that to encourage that kind of
conversation um and I would think that
that that kind of creative thinking
might be helpful here the objective is
not to shut anybody down or to sue
anybody for billions of dollars it's
just to make sure that the technology is
safe uh and and to and to use or utilize
the people who are closest to the
technology and could have the best
insight about what the problem is and
what could we do about
it great well I have a few more
questions that build off what some of
the former colleagues of William have
said within open a uh and then more
about the nature of the company and
where we might be heading so let's do
that right after this and we're back
here on big technology podcast with
William Sounders he's a former open AI
super alignment team member now here
with us expressing his concerns William
I can't thank you enough for being here
and being open about this stuff and
we're also here with Larry leig the
professor of Law and Leadership at
Harvard Law School also representing
William and some of his former
colleagues so here's like a couple
questions um that that have come up in
discussions of this after you've gone
public um so let me let me start with
this one so Yan Lea who used to run the
um open AI open AI super alignment team
which you were on he said that safety
culture and processes have taken a
backseat to shiny products within open a
we've discussed that already here so
there's an argument that's being made
online and I I'm just going to put it
out there and would love to hear your
thoughts on this William that
basically the argument is that the group
The Super alignment group didn't really
see anything and that the company
doesn't really expect to see anything
super dangerous for a while and so it's
reasonably putting like the 20% of
compute that I was going to give to the
super alignment team toward product
until the time comes where it makes
sense to shift that resources back to
alignment work what do you think about
that again I don't think the super
alignment team saw like you know this is
a catastrophe and it's like endangering
people now I think what we were seeing
is a trajectory that the company is
Raising billions of dollars to go down
that leads to somewhere with predictable
unsolved technical problems like how do
you supervise something that's smarter
than you how do you make a model that
can't be jailbroken to do whatever you
know any unethical user wants it to do
um how do and and more fundamentally
behind this you know how do we
understand what's going on inside of
these language models which is what I
was working on you know for the second
my career and I was leading you know a
team of four people doing this
interpretability research and like we
just fundamentally don't know how they
how they work inside unlike you know any
other technology known to man um and you
know there's a there's a research
community that is like trying to figure
this out and we're making progress um
but I'm like terrified that we're not
going to make progress you know fast
enough before we have something
dangerous and you know what people were
talking about at the company in terms of
timelines to something dangerous were
like there were people talking a lot of
people talking about similar things to
like the predictions of like Leopold
Ashen Brunner where it's like 3 years
towards like you know uh wildly
transformative
AGI um and so I think you know when the
the company is like talking about this I
think that they have a duty to put in
the work to prepare for that and when
you know the super alignment team formed
and the compu commitment was made you
know I thought that like maybe they were
finally going to take that seriously and
we could finally like get together and
figure out the like you know I I could
concentrate on the hard technical
problems we're going to need to get
right um before we have something truly
dangerous but you know that's not what
happened right some people say that this
conversations like this are kind of
doing open ai's marketing work for it
that basically like if this technology
could potentially like level cities
within a few years then like I don't
know McKenzie is going to definitely get
in there and try to contract with gp4
what what do you think about that
conversation I certainly don't feel like
what I'm saying here is doing marketing
for open AI
okay um I think you know we need to be
able to have like a serious and
conversation about the risks and risks
are not certainties there's a lot of
uncertainty about what could happen but
when you are uncertain about what should
happen you should be preparing for worst
case scenarios right the best time to
prepare for Co was not when like it had
spread everywhere but when you could
start seeing it spreading and you could
be like there's a significant chance um
that it will continue spreading you know
right so uh this is for both you and
Larry so Joshua AIA who's a open aai
employee currently he sort of took issue
with the letter on a couple of areas I'm
just going to read from a tweet thread
that he put out there he said the
disclosure of confidential information
from Frontier Labs however well
intentioned can be outright dangerous
this letter asks for a policy that would
in effect give safety staff cart blanch
to make disclosures at Will based on
their own judgment and he says I think
this is obviously crazy the letter
didn't have to ask for a policy so
arbitrarily Broad and so underdefined
something narrowly scoped around
discussions of risk without confidential
material would have been perfectly
sufficient what do you think about that
so what's interesting about that is it I
think it means that he didn't actually
read the full um agreement uh um right
to warn that we were talking about
because the right to warm we were
talking about actually talked about
creating an incentive so that no
confidential information would be re
released to the public if they had this
structure you know imagine a portal
again where you can connect with the
company and with a regulator and with
something like an AI safety Institute
together the deal was that's what you
would use and you wouldn't be putting
any information out in the public the
only way that you would the the right to
Warrant asks uh for recognition of the
right to speak to the public if that is
if that does not exist so I I when I
read that I was like wow it's missing
the most important part which is a
incentive to build something that
doesn't require information is released
to the public um so long as there's
adequate alternative channel for that
information to
flow and what about this idea that
getting something like this established
might keep safety staff out of product
meetings here's again from uh Joshua chm
he says good luck getting product staff
to add you to meetings and involve you
in sensitive discussions if you hold up
a flag that says I will Scutter Scuttle
your launch or talk [ __ ] about it later
if I feel morally obligated I mean
that's like I guess sort of traditional
Silicon Valley thinking but I'm curious
what you both think about
that okay William maybe we go with you
first yeah this is not something that I
want to
achieve
um
but you know and I think again these
like the this is a right that should be
used responsibly and so that if you know
you're involved in decision making and
you feel like you don't you you disagree
with the outcome but you feel like a
good faith process is
followed you know you should be willing
to you should be willing to respect that
and I think kneeling this down where it
you know getting the right balance of of
you know the legal rights on this is
going to be tricky and you know I I you
know I want to get that right but this
is more like starting a conversation of
where it should be and I think that like
you know I again I think on the other
side companies shouldn't have you know
cart blanch to like declare any
information about possible harms uh
confidential um but yeah it's it's it's
going to be any implementation of this
you know is going to get more detailed
and more nuanced trying to defend both
you know the company's legitimate rights
to you know confidential information
that like preserves their
competitiveness and also the like rights
of employees to you know warn the public
when something is going
wrong the other thing is I mean even if
there is the dynamic that you
described um there's also so within a
company there's also an uh uh dynamic
between companies so if a company were
to embrace the right to war in the way
we've discussed it um there would be a
lot of people like William or others who
would say that's the kind of company I
want to work for and so the that company
would achieve an advantage of talent um
um that might you know swamp any cost
that they're paying because they're
being you know anxious about who they're
sharing safety concerns with number one
number two you know again inside of
Facebook of course there were people
inside of Facebook is that I don't care
what we're doing I don't care what the
world how the world's suffering because
what they're doing you know what do I
care about 10,000 people dying in a
country I've never heard of yeah they're
mostly not like that though yeah they're
mostly not like that these are really
smart decent people who went to work for
these companies because they're trying
to make the world better especially AI
companies like people who went to work
for open AI at the beginning didn't even
have any conception of what open AI was
going to be like today like the idea
that it made the progress it did um was
you know a surprise to most people so
these are the very best motivated people
that you could imagine and I'm not
worried that you're going to have a
bunch of people who are like I don't
care what we do to the world we're just
trying to make sure our stock achieves
its maximum
return right and so obviously this will
take some buying uh from the top of
companies and there I mean this is a
particularly interesting one with uh
with Sam Alman at the head of open AI so
Sam you know he has talked often about
how he cares about AI safety there have
been some interesting quotes from him
like early on he's like I think there's
a good chance that AI is going to wipe
us out but in the meantime there are a
lot of companies that can make some
money from it U I'm I'm sure I'm
misquoting him but that was the spirit
of of the quote and then William you
spoke with the New York Times I believe
talking about your uh view of Sam and
oversight you said I do I'm pretty sure
this is you I do think with Sam Alman in
particular he's very uncomfortable with
oversight and accountability I think
it's telling that every group that maybe
could provide oversight to him including
the board and the Safety and Security
committee Sam Alman feels the need to
personally be on and nobody can say no
to him so just curious like what your
message would be to him and sort of what
type of leader do you think he is in you
know and in this
moment you know I think I don't recall
the exact words but I think Sam mman has
also said like no one should be trusted
with you know this much power um I think
he then went on to like say like oh I
don't think that's happening but you
know I think my message would really be
like
you know if you want people to trust
you like you should have real systems of
accountability and
oversight that you know
um and like not try to avoid
that
right can I ask just one more question
about like what it's like inside open AI
because this is sort of like
um been the message that that we've
gotten from you and some of your
counterparts who have made these uh you
know sort of declarations about what's
going on this idea of like that it's
shiny products and safety and culture
safety takes a backseat like how does
that manifest internally when there are
product launches and things like that
like how did you see that actually play
out or like you were was your team like
not giv a seat at the table or what
happened and I was mostly not in the
part of the company that was like
participating in product launches I was
doing this research to prepare for the
problems that are like coming down the
road um but I think you know what that
can look like is the difference between
like you know we have a fixed launch
date and we'll like rearrange everything
to meet that versus you know when
there's a like safety Pro safety process
like like testing how dangerous the
systems are or like you know putting
things together where there is like not
enough time to do this before the launch
date being willing to move it you know
and it's um I do think you know now with
like uh the the GPT 40 voice mode uh the
company did say that they were like you
know pushing the launch back um but I
think you know the again the the the
real question here is are the people who
are doing the safety work and doing the
testing for Dangerous
capabilities like are they actually you
know able to to to have the time and
support to do their job uh before the
launch and you know I think a company
can say that they're like pushing
something back for safety but still like
not have all the work done by that time
okay last question for both of you so I
think we've established that there's no
like immediate term like threat to
Society or like let's say like Titanic
sinking style event that could happen
with AI but what's the time frame that
you think that these concerns might
start to creep in given the trajectory
of this technology we've talked a little
bit today about how Leupold believes
like maybe within 3 years but I'm
curious like yeah what the time frame is
and then is there is there like the like
the Frog boiling in the water problem
where like this might only become a
problem uh when we've sort of become
immune to it because we've heard so much
about the dangers here even as like chat
GPT will hallucinate very basic
details yeah so I think like leopole
talks about some scenario of like you
get AI systems that could sort of like
be drop in like remote replacements for
remote workers do anything that you
could get a remote worker to do and then
you could start applying this to like
you know uh the development of more AI
technology and like other science and
that sort of thing happening within you
know like the threeyear time frame and
then this coming with you know a
dramatic increase in the um amount of
risk that you could have from either
like you know misuse if if anyone can
hire an unethical biology PhD student
does it then make it a lot easier for
like you know nefarious groups to
conduct uh like you create biological
weapons um or also just do we like start
putting these systems everywhere in our
businesses and decision-making roles and
then we've like put them you know in
place in our society and then like you
know a scenario that I think about is
these systems become very good at
deceiving and manipulating people in
order to increase their own power um
relative to Society at large and even
you know the people who are like running
these companies um and I'm you know not
as convinced about Leopold that this is
necessarily going to come soon soon but
I think you know there's maybe like a
10% probability that this happens within
3 years and then I think in this
situation it is unconscionable to race
towards this without doing your best to
prepare and get things right um yeah and
I would add to that by just reflecting
on the cultural
difference between um people who are in
the business of setting up regulatory
infrastructures to address safety
concerns in General and people who are
in this industry so when people in this
industry are saying look between 3 and 5
years it's probably a 10 maybe 20 maybe
30% chance we're going to have AGI like
capabilities and that's going to create
all sorts of risks um in the safety
culture world you know outside of these
tech companies 3 to five years is the
time it takes just to even understand
that there's a problem right so anybody
who expects you're going to set up an
infrastructure of safety regulation in 3
to five years just doesn't understand
how Washington or the real world works
right so this is why I feel anxious
about this it's not that I'm worried
that in three to five years um
everything's going to blow up it's just
that I'm convinced that it takes 10
years to get to a place that we have an
infrastructure of Regulation that we can
count on and if we're talking about 10
years what is the real estimate of um
this technology manifesting these very
dangerous um uh characteristics um seems
to be from what people on the inside are
saying um pretty significant so that's
why even if it's not a problem today or
tomorrow or next year or the year after
we have to you know it's a huge aircraft
carrier we've got a turn and it takes a
long time to get it to turn and that
work has got to begin
today and uh I'll I'll just add that you
know in the real world with the Titanic
right you didn't have regulation that
guaranteed that you have enough life
boats until the Titanic actually sunk
right and I am on the side of we should
have regulation before the Titanic
sinks y I mean man all that money uh to
get on that boat and then no life jacket
seems brutal all right William thank you
so much for uh coming here spending the
time um addressing some of the
criticisms and being forthcoming about
what your concerns are I mean hearing
from you after you've SP some time on
the inside has has been Illuminating to
me and I think it will be for our
listeners as well so thanks so much for
coming
on thank you and Larry always great
speaking with you thank you for bringing
uh such great analysis to the show every
time you're on and I hope we can speak
again soon every time you ask thanks for
having me okay thanks so much all right
everybody thanks so much for listening
we'll be back on Friday breaking down
the week's news with Ronan Roy until
then hope you take care and we'll see
you next time on big technology podcast