Skills at Scale — Nick Nisi and Zack Proser, WorkOS

Channel: aiDotEngineer

Published at: 2026-05-06

YouTube video id: pFsfax19yOM

Source: https://www.youtube.com/watch?v=pFsfax19yOM

This workshop is going to be skills at
scale. We're super excited to be here
with you today. There's going to be an
interactive component and we also want
you to feel free to interrupt us and ask
questions as we go.
>> Yep. We'll show that slide again with
the with the um uh QR code and the the
instructions to clone the repo. That
repo has the skills uh the skills that
we're working on plus uh the slides that
we're presenting. So you'll have all of
that as reference material. Um I am Nick
Nissi. I am a developer experience
engineer at work OS.
>> I'm Zach Proer. I'm also a developer
experience engineer at work OS and we're
on the applied AI team.
>> And this is like working with agents.
That's just like what we do now. Uh,
Zach, when is the last time you wrote a
line of code by yourself?
>> Uh, I think I did a CD uh in a directory
recently. Otherwise, it's been like
probably six or eight months now.
>> Yeah,
>> maybe longer. Yeah,
>> same. Same. We've been early on from the
the Opus 35 days
>> from copying and pasting back and forth
through GUI to now.
>> Yeah. Uh, it's gotten a lot better and I
don't know, there's a a mythos out there
that it's going to get even better.
Uh we are at work OS and uh if you are
uh interested in like securing MCP for
example or just setting up off uh for
this new agentic startup that you're
you're working on uh reach out to us.
There's a number of us here with these
shirts on and uh we'd be happy to talk.
We're also hiring. So
>> thanks to the AI installer that Nick
built as well. You no longer even have
to configure install kit yourself. It
can just do it for you. So pretty easy
to get started
>> for sure.
All right. So, as you know, when you're
working with these uh with these
systems, every single conversation that
you have starts completely from zero. Uh
you're always just like passing in new
information to it. You've got to
reiterate how you do things. Uh and
Claude never Claude, for example, never
remembers that it ever talked to you. It
just continues on a conversation. And
so, we have to provide that information
fresh each time.
>> Yep. Um, so for example, let's say you
have a skill or let's say that you're
talking to uh just in disperate terminal
tabs. You're looking at different code
bases over the course of a week. Every
single time you start talking to it, you
need to reload all of that context first
and say, "This is what I care about.
This is how we do things here. This is
what we're particularly concerned with."
Right. And it ends up eating a ton of
time and slowing you down.
>> Yep. And of course there's things like
agents.mmd or cloud.mmd uh that you can
put in information about the repo or
about how you like to work like in a a
global directory uh so that it can
remember that and understand it each
time you're giving that instruction each
time kind of like appending it uh so
that it will remember oh in this project
we actually use vest and we use pnpm so
you should use those each time uh so it
can that that's like the way to give it
some memory for it to understand how to
go but it can still get it it can still
uh decide not to follow things that you
have. I've definitely had cases where
I'm like, "Do this, this, and then this,
and it skip the step in the middle, and
I say, why'd you skip that?" And it's
like, "Oh, yeah, you told me to do it. I
I didn't feel like it." And uh that's
that's how you know it's a real
engineer.
Uh yeah. So, one of the nice things
about skills is that you can think of it
as like a discrete unit of work where
you can encode everything that's super
important to you, everything you don't
want it to miss, everything you don't
want to repeat yourself. It's almost
like carrying, if you will, the dry
pattern into the agentic era in a way.
Um, and not repeating yourself. So, and
as we'll see, that becomes incredibly
powerful regardless of if you're a solo
developer working on your own startup
with 12 agents or if you're on a
traditional dev team with 12 team
members.
>> Yeah, it doesn't know what you know. So,
you have to be very specific and be
thorough with what you want it to know
because it's not always going to figure
it out. Sometimes it feels like magic
because it just does, but a lot of times
you have to put in the work to do that.
And that's what what like those memory
files are like claw.md uh and other
memory files. It even has like claw for
example has its own built-in memory
where it kind of keeps track of things
that it thinks are pertinent to the the
way you work or the project that you're
working on and it will save that off.
>> Yep. And you know, of course, this works
too in just a single project context
with some of those files that Nick
mentioned. But then again, the problem
is that you're still tied to that repo.
Uh you need your team members to
remember to pull updates to that
specific project skill if if they want
the context. um there's no necessarily
built-in script execution. So, how do
you get how do you interle a
deterministic result when you're having
a non-deterministic conversation with an
LLM? And eventually that starts to get
pretty gross.
>> So,
>> and the there's downsides to this uh
specifically these memory files where if
they're tied to the repos that you're
working on or you have to put them
globally so it affects everything uh and
you can't do things like uh give it like
more smarts like execute this script.
you can, but it's kind of not built in,
so to say. Um, but that's kind of like
where the things that you put in there
are not always like transferable or
portable to other projects that you
might be working on. Uh, and so we need
a better way to do that.
And skills are that next step.
>> Yeah, indeed.
>> Uh, it's a way to make things more
portable. Uh, and you can use scripts to
inject real data. Uh, and you can make
them composable. uh so that they can be
very small and very focused on exactly
what you want them to do. Uh and that
way they're they're very small like a
very small footprint in your your
context window, but also uh you can
build them in such a way that they are
only going to be applicable when you
actually want to do whatever that skill
is set up to do. So you're not just
bloating the context with everything
from the start every time like you are
with a claw.md.
>> Yep.
And so if you haven't seen or heard of a
skill before, just to level set really
quick, it could be as simple as a a
single static markdown file uh without
any scripts at all. Um but let's look at
the difference at what might happen.
Let's imagine that we're roasting a
repo. We're onboarding a new team
member. We want to make sure that you
know they're kind of up to snuff on how
things work here. Without a skill, if
you're just talking to any generic agent
with no specific context injected,
you're going to get, okay, looks pretty
good. It's going to be generic advice.
It might find some lowhanging fruit. If
you say instead as little as 30 lines of
markdown specific to your use case and
your conventions, your constraints, uh
you can start to get back very very like
hyperspecific feedback about this is how
we handle you know uh routing in this
project. These are um you know we we
follow semantic commits or whatever and
we've got readme drift here and that's
unacceptable, right? So it it can take
as little as 30 lines of markdown or
less. And so that's the one of the first
things that makes skills incredibly
powerful is that it's a very minimal
investment on your part and it could be
as simple as a small markdown file and
it becomes a composable unit of work
that you can share across codebases and
your team.
>> Yep. And you're you're codifying exactly
what you want it to do and you have
freedom to express yourself in the exact
way that you want it to do that. And
there's a number of different ways and
techniques that we'll talk about
throughout this workshop. Uh, but it's
much better having those skills and
knowing that they exist and and knowing
not just like that you can use them, but
also setting them up in a way that the
LLM can decide to use them when it makes
sense. Uh, and it's going to give you a
single repeatable way of doing that
thing in the way that you expect it. If
I just tell it some generic thing like
look at this and tell me how how good of
a repo it is, depending on the re the
model that you're running, uh maybe the
the
amount of thinking that you have turned
on, uh, etc., It might give you more or
less information, but it's never going
to be the same thing each time. If you
want it in a very specific format, you
want this report in this exact way.
That's what a skill is. You're teaching
the the LLM how to do something in the
way that you expect it to be done and
then it will follow that much more
closely.
All right, let's take a look at uh how a
typical skill might break down. So
again, it could be skill.md. So as
simple as a markdown file. At the top,
you'll notice a front matter. So, think
about um anywhere else that you've used
like a YAML based system almost like
headers, right? In in other languages or
formats, but you've got a name, a
description, and this description is is
incredibly powerful and loaded. This is
what the LLM is going to use at runtime
to essentially do routing and determine
if this skill is relevant to the task
that you've assigned it. Um, so that's
kind of how the AI finds and routes to
your skill. And then additionally you
can provide uh additional context and
and then even scripts so that again
think of it as your option or your
on-ramp for interle determinism with the
nondeterministic LLM conversation.
>> Yep. Uh so the the most important things
here are exactly what Zach said, the the
name and the description. Uh it's a
misnomer that skills are only a single
markdown file. They of course can be,
but they're more like a folder with a
skill.mmd file in them. And then they
can have anything else in there as well.
And we'll kind of talk about that, but
they can have references to other other
uh things that that they might want to
know. They can have scripts that it
should run uh and they can have images.
They can have all sorts of different
things uh and then use that in different
ways. But the most important piece of it
uh from the start is the description
which we'll talk about. Let's also just
uh talking about constraints. One of the
things that's kind of um not intuitive
is that it can be more powerful just to
to provide a few constraints as opposed
to overly uh being overly prescriptive
in exactly how you want the task done.
So if you pro provide just three
constraints and say never be vague or um
when you site code it always has to have
a specific line and a git commit
reference with it. Um then you'll get
better performance than if you end up
you know bloiating in the middle of a
markdown file. So it's like a novel. Um,
this is actually common failure mode
when designing skills.
>> Yeah.
>> Yeah.
>> So, today we're going to uh put all of
this together into a skill that we're
going to build here in the the workshop
and it's just called repo roast. We
tried to think of like a a fun generic
skill that would be applicable to anyone
who is working in, you know, JavaScript
or uh different languages. Uh but also
is like like if you if you're not really
like if you have an idea of what a git
repo is, this is applicable to you. So
it's kind of transcends all of that uh
and is gener generally uh something
that's useful for for everyone, but also
kind of fun. We can kind of be more or
less serious with it uh as we're we're
putting that in. So it's going to allow
for a lot of creativity as we go. And
feel free to also uh kind of use this as
a as a place to inject the actual
constraints or the requirements that you
have at work that you're kind of uh
struggling with or testing. Um so we'll
kind of get the baseline together and
then you can start customizing from
there and we'll have some time to share
and discuss them later too.
>> Yep. So uh this is that slide from the
beginning. Uh if you haven't yet uh
please download this uh clone this repo
uh and work in there. We've got kind of
the basics of the skill. Um and what
we're going to do is just kind of get it
set up and you can make it your own.
We've got some general guidelines and
some tips to do. Uh but the fun is going
to be that we have a room full of people
and we can have a room full of different
ways of uh analyzing this. And we'll
also share that uh in that repo. There
is a share.sh uh that you can run and it
will just ask you for your name and then
it will uh put that into a KV store and
then I can pull it down uh quickly on my
machine and then run it against some
repos like on the screen so we can share
these uh at the end of the day. Yeah, it
would be kind of a fun way to experiment
with different uh approaches to the same
skill.
>> Yeah.
>> So, speaking of loading skills, we
should talk about how skills load. Um,
we we are generally kind of talking like
you'll hear Zach and I kind of always
just like when we're talking, we're
saying Claude because we tend to use
Claude. How many here use Claude as kind
of their daily
>> daily driver? Whoa. Okay.
>> That's a pretty much it's like 91% uh
market and then everyone else like
there's cursor and I have been dabbling
with pi. Pi is amazing. Uh but also
Anthropic won't let me use my I it's
unclear. Can I use my subscription with
it? I don't know. Uh maybe I'll find out
today or this week.
>> You can pay more for it. You can Yeah.
>> pay more in credits for it. That's fine.
>> For sure.
>> Um but when you're using these, so like
the the main thing and the reason that
we're so excited about skills too is
that they're generally applicable to all
of the major models. So codec supports
them, cloud supports them, cursor
supports them, uh the uh desktop apps
like like cloud desktop supports it. So
even like if you're non-technical, you
can be working on skills and sharing
skills and and using skills.
>> What was the skill that you did last
week with the recruiting team in
desktop?
>> Yeah. Uh I was working with our
recruiting team kind of uh they're at an
on-site and I was zoomed in with them
helping them build a skill that could
like take like candidate information and
format it in specific ways and
understand um you know what they're
looking for in different things and kind
of build reports automatically. Uh so
it's things that they could do pretty
simply but they um because of like the
the beauty of cloud desktop and all of
the connectors that it has like it could
just reach into Slack and pull in
information from there. could reach into
notion and grab that information and
then mix that in with like the
recruiting software that they use and
put it all together into a single report
that then they can share to build from
there. So it wasn't like this is the
final report that we use for everything.
This is a building block that then they
can use to do different things in
different places.
>> And so it was really powerful for that.
>> And as soon as you gave them that skill
then everyone on the team is running it
in a uniform way. Yes. The power of it
too.
>> For sure. And so where do those skills
go? Uh well the the most basic place is
if you have a repo there's you can just
put a claude directory and then uh have
a skills directory and then a folder
which is the skill name and then a
skill.md all caps uh in there just like
this and that will be a skill that lives
with that repo and so anyone who is
using that repo it'll just automatically
load that and understand how to use it.
You can also have that same.claude
directory in your home directory and put
those skills there. And then they're
generally applicable to everywhere uh
that you would be using claude. Same
thing uh there there's kind of more
standardization for everyone else on
agents. I wish uh there was like
agents.mmd and instead of cloud.mmd and
uh aagents instead of quad uh but maybe
we'll get there one day. So, uh, you can
put them in there and, uh, if you've
ever used like the MPX skills, uh, tool
from Verscell, that is just kind of sim
linking them all into all of these
different directories. And so, the
skills are generally applicable
everywhere. That's just a an easy way to
load and install them, which is why it's
so popular.
>> Yep.
>> But the the main dev loop with it is
like you edit the skill, save it, invoke
it, see what output it is, and then do
that process all over again. Uh, and
test it. If you're using Claude as well,
Claude ships with a fantastic uh skill
builder skill or skill creator skill.
>> And uh that is really good for
critiquing your skill, setting it up in
a way that Claude would expect it to be
uh and even evaluating it, which we'll
talk about.
>> Yep.
>> All right. So, we're going to start by
uh letting you go ahead and work and
build the foundation. So, you should
have that repo and uh we just want to
get started with it. So um the main
things that you want to do is you want
to set up uh a proper description for
it. Now remember this description is not
for humans. The description is really
more for the LLM so that it knows when
it should use the skill automatically.
And so you want to set that up in in
some way uh we recommend in in some way
where like it it describes like oh we're
going to roast this repo in like like
the the user wants to roast this repo
and get an analysis a fun analysis of
it. uh or or something. Be creative and
fun with it. But then you should just be
able to like open up cloud and say roast
this repo. Roast my repo and it goes and
does it.
>> Yep. And then remember that in general
it's recommended that instead of being
overly prescriptive in how to do
something, provide your constraints
instead. So say we're using this format
in this repo or we follow these coding
conventions or we never do X or Y and
then allow the LLM to make the right
determination at runtime.
>> Yep. Yeah, definitely like closing it
off like that. Don't uh prescribe what
it should do. kind of give it advice on
what it shouldn't do and let it be more
creative on things. But you can also
like change that as well and be more
assertive on things that you know you
want in a specific way.
So let's work on that. Um a couple of
things that uh like tips that we want to
talk about in this first section is uh
and this I think might be pretty
applicable only applicable to Claude
right now. I did ask Pi if it could do
it and Pi just like made an extension
that made it work. So uh that's that's
awesome. Uh but if you use the bang and
then back tick uh back ticks for like a
script call uh Claude will do like an
interpolation of that just like how
JavaScript has like the dollar sign open
curly brace and close curly brace. It'll
just like instead of having whatever was
in there uh like this um where it's
saying stale to-dos and then it gives
you a command to run, it will just
replace this with a list of the stale
to-dos because it will actually execute
this GP command uh and then do all of
these pipes to all sorts of different
things that's totally not slurping up
keys or anything.
>> Yeah. Um,
>> you can imagine how this is really
powerful if you're like say you're doing
your morning report, your your kind of
like your get status report, any of the
pieces that you want to be output in a
deterministic way. That's an ideal u use
case for this kind of script
interpolation.
>> Yeah, this is really great because
you're not you're not saying go grab the
latest commits or the latest 10 commits
and give me some information on it.
You're saying here are the latest 10
commits in the exact format that I
expect you to understand them in. Go and
do something with that information. So
it's not guessing. It's not going to be
non-deterministic each time. It's going
to start from this deterministic base
and then go from there.
>> It's also very token saving. If you've
ever said go and figure out the 10
commits and you've run that more than
once or on three different terminal
tabs, you know, the first two might get
it perfectly right and the third is like
spinning and reading git docs and you
know before it finally gets there. So
this is a way to say once you've
formalized a piece of your workflow, you
can just codify it and say run this
exact script.
>> Yeah.
Yeah. Like we said, without scripts, the
AI is just speculating on what you mean
when you say go get the latest commits.
>> Um, yeah, and just remember that
descriptions are routing rules, right?
They're they're less for us and they're
more for the AI to determine when to use
it. So, a good example is you might have
a couple different image generation
skills and they're all kind of littered
in the projects and maybe in your global
skills. Maybe one is more applicable to
your personal blog and you say, "On my
personal blog, I always ship pixel art."
So, if we're writing on this domain,
this is the skill to use. Right? If
we're going to work, it has to be
formalized and we use a completely
separate image generation system or we
only fetch images from S3. That's where
you can kind of codify that in your
description.
And if you're not sure, by the way, you
can always ask cloud. That's the other
like kind of secret hack of this era
that everyone forgets is that a lot of
times the models are capable enough that
you can ask them, have I done this right
or when would this apply? Uh so you can
say as a test run, when would you load
this currently? If I only want it to run
in these conditions, is this the best
description for me or not?
>> Yep. And a great example of this is when
we were building this, I asked Claude, I
was like, "Hey, I know I can do this,
but do you actually support like skills
calling skills?" And it was like, "Oh,
let me go check." And it loaded like a
claude code analyzer skill to get that
information and then do that research
and come back and say maybe.
>> Yeah. It was like kind of, but you
probably don't want to do that. So,
>> uh, so your turn. We we're gonna take
some time to go do that. Uh to to let
you go do that. Um and when we do these
breaks, too, this is a great time if you
have any questions uh or have uh
discussion topics that you'd like us to
talk about,
>> uh we can do that. We're trying to like
fill the dead air of like you working on
these with um general topics. So if
there's something that you want us
>> to say that part out loud.
>> Oh, that's okay. Uh if if there's
something that you want us to talk
about, uh we can definitely do that.
Otherwise, uh we've got some discussion
topics that we thought we could talk
about. Uh but if you also if you have
any questions or um any of that we can
definitely
>> I'll run over to the bring you the mic
and feel free to shout out any
questions.
>> Um but yeah if otherwise then feel free
to uh just start on this
and if there's any questions let us
know.
>> Yeah question. You want to run them?
>> Sorry. Where's the question?
There you go. Um, you talked a bit about
this in the beginning. Um, but I always
wonder where to draw the line between
um, encoding instructions um, in like
rules, cloud.mdg and so on and creating
a skill for something. Um, so I'm
curious if you have like h what's your
mental model to making that decision?
Like have you landed anywhere? Like do
you always start with the rule and then
you make a skill if you can make it
specific enough or do you always start
with a skill like how do you go about
it?
>> Yeah, great question.
>> Great question.
>> Uh I usually like like the the one the
number one rule that you have to
remember is that the skills sorry the
claw.mmd or the agents.mmd that is going
to be loaded every time when you kick
off claude that's going to fill your
context window. And if it's filling it
with a bunch of nonsense that isn't
actually applicable to what you're
specifically doing then you probably
don't want it in there. Um, I can show
an example of like my uh what is
it?claude.
Uh, and then I think cloud.md if I can
spell.
This is my cla.md. It's extremely small.
Uh, it just tells it that I want things
to be a little bit more tur. Don't
bloate. I just want to know exactly what
you're saying. Uh, be extremely concise.
And then I also like I have this plugin
that I'm working on. It's a a skill
actually called ideation. And I in here
I put like some configuration for that.
So that all of the projects I I
basically want them all to put like the
ideation, the artifacts that it's
generating into uh my Obsidian vault. So
it puts it all in there so I can more
easily like find the connections between
things. Um but otherwise it's like
extremely tiny. And so that that's one
thing that goes into it. if it's only
relevant to the repo, like like
specifically, you know, I'm tired of it
using npm when I wanted to use PNPM, for
example, I'll put that in there, like a
single line that just says we use PNPM
here. Um, and then anything else like if
it's, you know, more specific about
testing or anything like that, I kind of
leave that to skills so that it's only
going to be loaded when I'm actually
like writing tests. to to the second
part of your question as well. The and
we'll talk about this a little bit
later, but the other thing that's really
fun to do is basically wait a week while
working on it and then go back and ask
Claude analyze my week's worth of work
and then what are the skills I should
split out of that based on this.
>> Yeah.
>> Um so again, ask the system to kind of
help you do that. Another question back
here. Yes, sir.
>> Okay, you can hear me right? Yeah.
>> Y
>> um so stop me if you're going to talk if
you're planning to talking about this
later. I was wondering about uh global
skills uh which we will share amongst
colleagues. So we're all at the moment
with we've got I think 60 engineers.
People are writing their own skills.
We're chatting on Slack. Oh, I've got
this great skill. It's really good. So
then obviously uh engineering managers
are like well we should be sharing
these. Where is where do we keep these?
Where do we keep them in a repo? Um
what's our artifacts library? And then
others have said, "No, we don't want
that because if I put my skill up and
then someone's like, "Oh, I'm gonna
change that." Then we're gonna have MR
requests and then we're gonna have to
review changes to skills.
>> So then we'll get someone else saying,
"Well, I kind of like a skill, but I'm
now going to push my version of that
skill with a very similar description to
the shared repo, which everyone's going
to get." And then suddenly we've got 10
front end skills.
>> Yeah.
>> And they're all the agents then, which
of these do I actually use? Yeah. Yeah.
>> And we're wondering if you guys have got
to that stage of how to maintain and
then the next one is 3 months later a
new model comes out and these skills are
actually a little bit too verbose.
>> So who's evaluating the skills and
checking them and saying okay let's cut
these from the global because now you
get what I'm saying this is where we're
at with
>> y
>> how and so a lot of engineers are just
like
>> no no no skills everyone does it on
their own we are not sharing anything.
So, sorry that was a bit of a rant. You
get where I'm coming from.
>> Yeah.
>> Fantastic question. I'll take the first
stab. Interested to hear what Nick says.
We have um published uh maybe you want
to pull it up like GitHub work OS
skills. That was one of the first places
that we started publishing generic
skills and that's been incredibly useful
because for example I was building
generic rag pipelines and then we found
that aentic tool calling is higher
performance. So I can sideloadad those
skills that Nick put in there that are
specific to certain documents to the
problem of individual engineers like
saying I want a slightly modified
version of this. I I I would almost say
like in that case cool you've got a
forked skill you keep locally.
>> Um and then to your question about you
know evaluating the skill I think asking
claude like with your current model look
at the skill using the skill builder is
it right for truncation or is there like
you know additional extensions that we
need now? Um, but I'll also share that
we are feeling that same pain as I'm
sure everyone else is. And I think the
management layer is just shifting to
that kind of
>> but even if you ask
>> sorry
>> no you're good.
>> Even if you then ask cloth let's say a
week later a month later hey review our
skills there's 30 skills to review and
it comes up with lots of suggestions.
You then got to open a merge request for
possibly one human or two humans or
maybe you we can automatically say that
the person who wrote this skill
originally has to be one of the
reviewers. they have. Have you got down
to that yet where
>> we we I don't think we've gotten to that
level with ours because ours started
kind of formalizing documentation into
buckets that were then easily
sideloadadable in different systems. Um
that does sound painful. I'm curious to
think what what do you think about that?
>> We haven't even got there. We're just
people have just like foreseen that this
is going to happen. So they're actually
blocking us using shared skills at work,
>> right?
>> Because they think this is going to be
the problem. So
>> like literally we're overthinking it
massively. We should just do it and try.
But still interested to hear what you
guys
>> Definitely. And I I also think it's
going to evolve rapidly too, right? As
we're seeing like there's still we
haven't quite hit the LLM training wall,
right? So there's going to be kind of
additional capabilities coming online
and and yeah, what does it look like in
six months? Could we pair the skills
down even further and get the same or
better performance?
>> Right. But yeah, I'll say that that's um
>> yeah, that sounds like a typical human
problem of uh my skill, your skill,
right? Yep, we have a number of like to
to build on that. Uh we have a number of
ways that we solve that. Like Zach said,
this um the skills repo, this is like
our public skills uh that you can just
install with like npx uh skills ad uh
and and those are all available. But
then we also have uh some like internal
skills that are more uh generally
applicable to like engineers at work OS.
And so it's like there's an O
specialist, there's a DX specialist,
there's a ghostriter, different ones
like that. Um, and then I have my own
plug-in marketplace as well where I put
a number of skills that are applicable
to me. Uh, and so I just load from all
of these uh in different ways. We also
have like a big monor repo that like
most of the engineers work in. And you
can a lot of skills just end up in there
if they're monor repo specific.
>> Uh, that's a much easier place. But
yeah, it's the same problem like you got
to get a review on it or it's got to be
it feels kind of dirty because you're
just like appending that to the work
that you're also doing. So, it's like an
an add-on, which doesn't feel super
great on the PR.
>> If I reverse engineer to some degree the
plug-in system, I think that's what
they're trying to address kind of
because you can also install like a
version of a plugin the same way you can
an npm package, right? So, maybe that's
kind of like the interface on top of the
repo. And then the tooling that I'm
seeing everyone keep building repeatedly
is like the tool that reads from a repo
and installs skills into various places
like 2 and stuff that make that kind of
like nicer. But that might be a solution
to some degree where it's like cool,
there is this m, you know, master skill
of this, but I'm running this version
because I need this fork. Um, and then
but it's not as gross as it sounds
because there's an actual standardized
API with the plug-in interface, right?
>> And it's all versioned.
>> Yeah.
>> When you guys
>> No, no, that's these are great. When you
do you then have flags
if you were to go into the public or
your internal work
I don't want skills even if the agent
knows
you then flag like I'm MPX public flag
just front end or just UX or just
product this just come to my mind I've
never thought have you done it is it is
it
>> I haven't no um I haven't used NX for
that I just used like the like I said
we're mostly Claude I I use the the
Claude marketplace like the SLP plug-in
marketplace ad and as long as your
Claude instance can uh access like an
internal git repo, it can just pull from
there. Uh and so that's what it does.
>> It will pull all the skills even the
ones that you don't need because you're
a front end. You don't want the backend
skills.
>> Oh yeah.
>> Yeah. Then that that almost sounds like
a packaging thing to me. But I I I think
that you're kind of like in good company
in a sense that it seems like, you know,
we're kind of got three marketplaces
that are super relevant
>> separate from or in addition to the
project specific like skills, right? And
then it just kind of becomes a matter of
taste of each individual engineer saying
like, "Oh, I'm going to run this version
of that skill." But then something like
the plug-in like interface is the way
that you have a uniform way to approach
it which you could actually write docs
against for onboarding and say plugin
add these three marketplaces when you
come on board and then if you're on a
front-end team like plugin install from
the front end marketplace or whatever
the case may be but that's like still at
the end of the day on the back end
that's like repo management and it's
right it's similar to how it works with
code.
>> Yeah.
>> Yeah. Great questions. Um yes sir.
I was gonna ask.
>> Yeah. Uh so actually two questions. The
first one is do you do any like formal
skill evaluations like a skilled
benchmark so that as new model drops
which skills are relevant?
>> Yes. Um in the the public skills
specifically on the the ones that I use
internally I am a little less formal
about it. Uh but the ones that we
actually ship uh we do ship uh in the
where is it? We have like a whole eval
framework uh that we wrote to make sure
that it it lives up to the standards
that we have and we're gonna we're going
to talk about this a little bit but like
it's mostly uh like doing several runs
where it will load claude without the
skill and ask it to do a task and then
load it with a skill and then it kind of
has like a rubric on confidence or or
like a grade that it gives it and it's
it it'll fail if that grade is less with
the skill than it was without. uh it'll
also fail like it you know it it tries
to be I think 80% above or higher so
like 80% of the time uh or maybe it's
90% uh it's going to get this right with
the skill and sometimes it gets it right
without the skill so the skill is maybe
only adding one or two% to it but that's
something that we track and keep on top
of as new models drop.
>> Yeah.
>> Okay. Yeah.
>> It's it's sort of fuzzy math but it's
almost like by having this this
established baseline you can at least
test that way.
>> Yeah, it makes sense. And then the
second question was u about um
sorry one second
right u skill pickup uh so if you get
lots of skills the models might ignore a
skill or decide I don't need a skill
I'll just I'll just do it what's your
kind of experience with this to a like
test it find it and then maybe improve
it
>> yeah um great question we that that is a
problem and the more skills that you get
like you can have conflicting skills Uh,
and so like which one is it going to
pick? Um, the solution to that like like
for the work OS one specifically like we
try and keep it like for these public
ones we try and keep it like very
generic like mention all of the you know
the the acronyms and things that we
would want we would expect to cover uh
from that. So that'll trigger it to load
uh and it usually does a pretty good
job. You can also like if you're in a
skill uh or sorry in claude uh you can
just do like work OS for example like
the slash command uh if you know that
you want to do it and so like a lot of
times we'll just like suggest you know
if if that's what you want I'll say like
just run slots and it'll it'll load it.
I
>> I'll call a skill by name if I want a
specific like image gen or something
I'll say or I'll say like use the
superpowers brainstorm skill in order to
determine a better plan.
>> Yeah.
>> Yeah.
>> Yep. But also if you got if it really
wasn't behaving that's why you use the
bang and then put a command.
>> Right.
>> Yeah.
>> Um I had a question on um how do you
decide when to create a sub agent versus
a scale? And can you reuse a scale into
a sub aent? And there's just sometimes
that um I'm going to create a skill and
then I'm going to like uh maybe I should
have written um a CLI cuz uh why did I
even made a skill in the first place and
I I struggled between these three
things?
>> That's a great question. Uh on the can
skills can sub agents use skills? I
actually I'm like blanking on that. So
I'm asking Claude uh and you can see
that it loaded the claude code guide pl
skill to go check that. Uh so this is a
great example of doing that and we'll
get the answer here in a moment. Uh but
that's a great question. Uh sub agents
is something that we don't cover a ton
in this workshop. Uh, but it is
something that's super valuable. And the
the number one thing that I think is uh
think of like when I think of when to
run a sub agent versus a skill is do I
want it to have its own standalone
context? Uh so that it can go do like a
bunch of work on on something and then
that's not eating the context window of
like the main task job that that we're
doing. Uh and then that way it can just
like do a check-in on that. And so um
for example I have this ideation plugin.
It's kind of like a a planner or a uh a
superpowers uh type thing that I like
doing. And as part of that, like I'm
really like focused in on feedback loops
to itself so that it doesn't have to
bother me all the time about, hey, does
you know, does this look correct? Or
like tell me, oh, it's done and it's
totally not done. Like I want it to
prove to me without me having to go look
at the code that it's it did the work
that I expected it to do. And oftentimes
that's feeding the information that I
would look for back into it and making
it just go in a loop over and over. I
hear there's a a Ralph Loop's uh
workshop after this, so you should check
that out. Um, but it uh
like in that case like when it's doing
those reviews, those can like muddy it
up and so like I kick off a sub agent to
go do those reviews and then it just
reports back like ah there I found these
problems and then it just has a list of
those problems and then it can feed back
to itself to do it again. So I'm not
eating that full context window every
time.
>> Yeah. Now also further confused by agent
teams which are different than some
agents too, right?
Yeah, another question. Thank you.
>> And
another one another question here.
>> Yes. Um, I have a question about the the
overrides in a skill. So, for instance,
you you put a default and you say or
whatever the user decides, but I find
it's very random or at least I cannot
really reproduce that and sometimes the
overrides doesn't work.
>> Yes.
>> Do you have any idea or like I I just
want to find out what's going on?
Uh my my best suggestion for that is
just ask Claude like why did you pick
that over the other thing? Uh and how
can I improve that in some other way?
Like like you consistently or like you
consistently enough pick the wrong
choice or you don't respect my override?
Why is that? What can I do to improve
it? Um I wish I had a a more clever
answer, but usually it's just like I ask
the machine.
>> Just just ask Lord. It's good enough.
Thank you.
>> Great question.
question.
>> You called out superpowers was a
>> skills library that you referred to. Is
there other uh skills libraries beyond
you guys that you commonly use?
>> Yes. Uh definitely. So superpowers is
one that I actually didn't use until
yesterday when Zach showed me it. Uh and
I I installed it. It has a number of
different skills in it that are are
pretty helpful. Um, these slides are
actually written in slide dev. Uh, and
you might notice well you won't uh I
don't know if I committed it. Um, let me
go to the full repo here, but in here
there's an agents directory and a skills
and a slide dev skill. And Claude might
have had a hand in writing these slides.
Uh, which is really cool. We'll kind of
talk about that. But like some of the
real superpowers I think are like when
you assign it to do non-coding things uh
because you really feel this magic. Uh
and we'll we'll kind of show a demo of
the reotion skill. That one blew my
mind. Uh it's it created a video based
off of a prompt and
>> I I now use that as my so every Friday
when we have the all hands and it's like
what quick demo of what you got done
this week, right? It just reads my git
history for the week and then builds a
movie
about it. which everyone is was tired of
on week three and they're not going to
they're not going to stop. They're just
going to get more like I'm going to
introduce characters and it's going to
get awful. But uh yeah, the Remotion is
incredible. It'll even pop up a like
Chromebased web editor where you can go
and be like h trim and cut and like let
me add some fades, right? Um so that's
insane. And then my favorite one that I
probably got the most leverage of uh
since installing it was just a I I built
it with Claude just a simple Python
wrapper around Nano Banana. um the image
gen model from Google which continues to
improve. So I just say hey now it's on
v3 go update it and we'll show a little
something later but um essentially with
that so most images now I generate with
that it takes like sub seven seconds in
a single prompt but using that same
model is able to say uh take a single
string from the user that's a prompt say
like a child running through a field
first it makes that image then it uses
their video API vo hands that static
image to it and says animate this static
image in the most obvious way possible.
So, one user prompt of child running
through field, nothing exists, and then
30 seconds later, you have a video of it
running through. And I was able to use
that same method to do all of the
interstitial scenes that I needed in a
32-minute film. And I am not a video
person. Like, I mean, I like using like
Da Vinci Resolve and editing stuff, but
I'm not an animator. And I was able to
get all of that done in like maybe an
hour. Um, so those are those are pretty
trippy, too. um you want to get really
really down there. Like I've got I have
like Claude reading my biometrics and
stuff and like pushing back on me and
telling me to like take it easy this
afternoon because he didn't get any
sleep. So um but there's there's not
like necessarily skill for that yet. I I
think the the ones that are really
powerful are when Claude uh the other
day blew my mind by saying it was also
in superpowers. This is easier for me to
show you the variants if I just mock
them up in a web browser. Would you
prefer that? And I said yes please. And
then it showed me all those and I was
like a go. And then we just built from
there like saving countless tokens on
just text iteration.
>> And I'm using that nano banana skill
right now. But uh I just ran SL plugin
and I'm looking at the marketplaces I
have installed. And some of the most
important ones to me I think are the the
claude cload plugins official one. Uh I
think that's where it has a nice um
skill reviewer skill or skill creator
skill. Uh which is really good. Uh
Obsidian is something that I use all the
time. Uh, and so having the obsidian
skills and it knowing just how to use
that. Uh, so it's based on what I want.
But then also one that, uh, is actually
very good. Oh, where did it go? Plug in.
Uh, is the, um, codeex skills
marketplaces.
I don't know why it's not showing it
there. The OpenAI codeex ones. Um,
it's not scrolling down, but anyway,
that is like uh Claude does all this
work. Codex is pretty good at reviewing
it. So, this is a skill from OpenAI that
just like pipes that to Codex and says
review this and it goes and reviews it
and then delivers that back to Claude
and I have cut myself out of the copy
and paste game of Claude said this and
Codex said that and like going back and
forth in in T-Ux splits. So, I'm I'm
super happy about that.
>> Yeah, I would say Verscell is pretty
skills forward. They've got a bunch of
CLIs and stuff that are are pretty
interesting. So they and they were like
kind of the first on using some of the
marketplace stuff. So check out their
like open source skills stuff too as
well.
Great question. Thank you.
Okay.
>> That's what Nana Banana just made.
>> Close. Close.
>> Yeah.
>> Awesome.
>> But the fun thing about that is that you
can ask it for any style. So you can say
like I I mostly do pixel art. Um so I'll
say like you know old school pixel art
and uh it's a lot of fun.
>> Yeah. All right. We are at time for this
uh piece of it. Did anyone uh build a um
a gen a first like pass on the repo row
skill that they want to share?
>> Yeah.
>> Cool.
>> Awesome. If you want to run umshare.sh
>> Oh, cool. Sweet.
>> This guy. All right. wins the workshop.
>> All right. Uh
>> are you Sharif?
>> You're Sharief. Okay. Are you uh
Okay, I'll run uh Well, I'll just run
all three of them real quick. Uh so I'll
run them on the uh skill. Oh, sorry.
This workshop.
Oh, what did they do?
>> It's pretty safe. Don't you want to run
it against like work OS or something or
the CLI?
>> Uh yeah. I just realized that it loaded
it locally into this one, not in a
global way.
>> Oh,
>> uh, I can do that.
>> Okay,
>> sorry.
I'm going to give it the work OS CLI and
then I'll say uh repo roast zackb on the
uh CLI repo. We'll see what it does.
It's a new verb for defending the herb
bird.
>> Oh yeah, you can c you can uh customize
those. So I a lot of my uh spinner verbs
are Lord of the Rings or the office
themed. Uh so you'll see like defending
us and things like that.
I didn't I didn't think that would work,
but that's okay. It's running against
this the uh workshop repo.
>> Okay.
So, it's running all of the commands uh
that you gave. And while you're while
we're doing that, I will bring that up.
So this is Zach's skill. Nice good
description.
Analyze repository health by running git
and file system scripts to find stale
to-dos. Churn hotspots. Yeah, that's
good.
And then it tells it specifically how to
find stale to-dos. Awesome. Hotspot
files, largest files. Nice. Constraints.
Never be vague for evidence. Never
present a finding without a script
output or get data backing it. That's
probably why it's running still. Oh,
nice.
Um, yeah. Scope.
Okay, Zach B. Okay.
>> Nice.
>> Nice skill.
>> You uh didn't tell it to to just like be
mean to you, so that's
>> be super mean to Nick and Zach over on
stage.
>> Awesome.
>> Hi, Amy. She's pretty mean sometimes.
>> All right. Awesome. We will we'll run
more of these. We've got more uh more
things to get through. Uh and we'll
we'll do this again and we'll we'll test
another one. Y uh so moving on to the
next section. Uh we're going to make
that skill smarter.
So the first thing uh that you can do to
make your skills smart uh is by
providing more information to it. But
this gets into the problem of the
claw.md where you can be extremely
verbose in there and give it so much
information about your repo and you're
just bloating the context window because
it doesn't really matter. Well, you can
do the same thing in skills uh but you
can do it in a better way. And that is
specifically with progressive
disclosure. And I guess you could do
this in a cloud. MD as well. Uh but all
it is is just saying like hey if you're
thinking about doing testing for excuse
me for example uh load this file that I
have on testing and read through it. Uh,
and you just give it like a path, like a
local path to testing.m MD or whatever.
And that way it's only going to load
that if it's actually doing like a
testing skill or testing task as part of
the skill run. If it's not, it'll skip
that. And so you can uh specifically
tell it like, oh, in in this example,
uh, if you're doing like a scoring, like
if you're if this is a run where it's
doing scoring, run the scoring uh load
the scoring rubric uh, and read through
that. So we explain to you how to score
things properly. If we're not doing
scoring, you don't have to load that and
we don't have to fill up the context
window with all of that bloat.
>> Yeah, this also gets back to that
gentleman's question in the back too of
like you can imagine this pattern really
scaling out. So the way that it actually
did scale out even in our public work OS
skills repo you can go and check out. We
have multiple migration guides that we
publish for various folks. So like if
you're coming from Ozero, we'll happily
help you move off Ozero to get to work
OS. And then there's n number of you
know competitors essentially that we've
got migration scripts for. And so in
this case you could say here's the
migration skill and the migration skill
is a pointer to the specific reference.
So you're not bloating your context
window. It's just loading the two
markdown files it needs.
>> Yep. And if you look at the work OS
skill like it we literally call it skill
router uh in there and it just has like
a reference map. So if you're going to
install offkit into next.js you should
probably load the work offkit
next.js.mmd js.mmd file uh from the
references and so if you're not working
with nextjs we don't want we don't want
to load that and bloat that we only load
it when you need it for all of those and
so this file is just filled with uh
routing to the actual pertinent
information that you need
okay um another way this is again to
some degree it's fuzzy math under the
hood right if you really get down to
like matrix multiplication but uh
nevertheless Another way to boost
performance here is to kind of enforce
confidence scoring. And one of the
reasons that Nick's uh ideation plug-in,
which is open source, that you can go
check out, works so well is that it has
like an internal counter of confidence
of how close am I to fully fleshing out
all of the variables that this task
requires before I can go and execute.
And it then forces like a iterative loop
with the user of continuously asking
additional questions until it gets to
the point where it's like, I'm 95% or
above confident. I've mapped most of
this problem space in my head and now we
can start work and the result as um you
know as a result the the output is
likely better. Um and so you that same
concept applies here when you're you
know building skills you can you can
kind of add in that that same
functionality and say uh for this
particular aspect of the codebase you
must always find this evidence and then
get to a point until you know the tests
are either this level of coverage or you
have this level of confidence on on A B
or C. Um, and that's another way to
essentially boost performance in the
skills.
>> Yeah. Uh, it's it's really important
like like like Zach was saying, it's
just like kind of pulling that number
out of nowhere. If if you say, "How
confident are you?" And it's like, "I'm
sure confident." Uh, well, why? And as
you give it like ask it to like show
more of its work as to why it's
confident, it might be like, "Oh, wait.
I'm not actually as confident as I
thought." Uh, and so that's the whole
thing is like trying to get it to think
more. uh in the terms of of like the the
ideation skill uh what it's doing is
it's using that to assess that it has
like a full understanding of what I'm
trying to say because like I have a
problem where I don't give it enough
information. I have the information I
know what I want. It's hard for me to
express it to the the machine in a way
that it expects. Uh, and so it's using
that confidence score to say, "Ah, I
don't have like a full rounded
understanding." And it loads like a
whole rubric on what it means to be
confident on something. Uh, but then if
it's not confident, it uses uh Claude's
built-in ask user question tool to ask
me a number of questions to pull that
information out of me rather than me
being like, "Ah, you're not confident.
Let me try and give you more insight."
It's like, "No, I'm not confident
because of these things." And then
here's how you can make me more
confident by answering these questions.
And a lot of times it'll just give me
multiple choice on like, you know, do
you want this? Is this what you mean?
This is the recommended approach I would
take, but if you want to go this other
way, we could do it that way. Uh, and so
like we have that dialogue going back
and forth with it, but it's all based
around how confident is it that it
understands what I want and understands
how to do what I want.
>> Uh, yeah. And so then this gets back to
uh just kind of in practice the way this
works or at least how it has for me and
what we kind of recommend is you know
build an initial skill. Maybe you're
doing that yourself in Markdown. Maybe
you're using the skill builder in Claude
and saying I need a skill to do X.
You're doing it you're using that skill
for a couple of iterations maybe a
couple days maybe a week. Um you look at
what it produces and then you know keep
in mind that as you're having multiple
conversations with say Claude over the
course of a week all of those
conversations are even getting saved
locally to some degree in JSONL files.
And so you can um be honest with the
evaluation phase about is this actually
improving things? Is it not? Where does
the skill fall short? What are the edge
cases it's not currently capturing?
What's the annoying thing that I've now
discovered that I've been running at 7
days that it's missing? And then you
kind of iterate and but again you're
still going faster because you come back
to a state that's already working and
you say these three edge cases are
driving me nuts and you also need to be
able to like review your own PRs in the
future, right? And so then once that
loop is is done, you have a skill that's
significantly more powerful and then you
can keep keep on running from there. But
it's kind of like they're sort of
evolving over time. Um so they're again
like I think of them as like
organizational units of where to put
kind of you know work intelligence and
then over time if you're if you're doing
it right they're getting better.
>> Yeah. when when skills first came out,
uh Zach and I were actually at an
on-site together in San Francisco and uh
like we woke up one morning and they're
like us introducing skills and we're
like this looks like every other
markdown file that they provided. What
what's the difference? And um at like
later that day, we presented on on
skills like I don't know four hours
later. And the the one that I built to
present that was a claude skill claude
skill uh that would analyze the like it
wouldn't analyze your skill running
because nobody had skills like four
hours into them existing. But it would
analyze, oh, you just did this task with
Claude. Let's go through and pick out
what could have been what what we could
like encapsulate into a skill so that it
can do that in an easier way. And like
since then there's like meta-kills and
things like that that have come out
where it will analyze the performance of
actually how you're using claude or how
you're using the skills in cloud and
then it can use that to feed back in
just like Zach was saying just by
looking at those JSNL files there are
these logs of like the conversations
that you're having with claude and uh
that can inform it on how to pro improve
things. So, for example, like in the
repo roast, uh if it's kind of being
wonky about how it's pulling in get
information, adding in like the the bang
with uh like the specific git command
that you want it to run to get log
information, that's a way to improve it
so that it doesn't have to iterate over
that and say and you come back to it and
say, "No, that's not what I wanted. I
wanted it like this." like you you can
be more explicit with it and that can be
fleshed out by reviewing the performance
that you had the first time or the first
couple of times.
>> The the other intuition I'll share is
that um it's kind of like in my
experience recently it's the types of
nagging things that I find the most
cognitive like resistance to doing every
week that I actually need to turn into
skills. And so like a breakthrough
moment for me was realizing that like
context switching between Slack and
focusing on code and then going and
ticketing like new asks in linear was so
disruptive to me that I just needed
Claude to do that. So now it just
monitors and when someone asks me for
something new in Slack, it goes and
looks in my linear and then if there's
not a ticket for it already, it does do
dduplication, adds a new ticket and then
I'm haven't left my flow, right? I'm
still able to focus. And so like that's
kind of the intuition I have now is that
um you can sort I think it's really
powerful and I think we're only at the
very beginning of it like analyzing your
own workflow over time feeding it more
information about how you actually work
>> and then letting LLMs you know do what
they're really good at and compress down
that that actual time.
>> Is there a skill out there that you
recommend for that? Was there
>> is is there a skill that you mentioned
that there is skills out there that met
as skills to review your kind of past
conversations and propose skills or
improvements? Is there one that you use
or
>> that one? There's not one for that. But
um and I didn't do this myself the last
time. This was like last week. But what
I should have done is say, "Hey Claude,
use skill builder yourself." Because
Claude's got that baked in skill hyphen
builder, I think it is. Use skilluer to
um look back at my workflow and tell me
where it's the least efficient. Right.
And then um that's also pulling in
connectors because there's a Slack
connector and there's a linear
connector. So that's where like the
markdown might be referencing you must
always use the Slack connector to pull
in this and I only care about these
channels and direct mentions of my name,
right? Um but yeah, I think it might
even be faster in some cases to just say
here's where I work. This is the tool
that we use to communicate. Make me a
skill that does that.
>> Which is also like kind of crazy. I
think this is the one that I was
thinking of uh specifically is cloud
meta skill uh that helps you configure
claude uh including like setting up
those skills. Uh I think this is the one
I've used but like Zach said I've also
just asked it to review its own
performance and kind of go from there.
Uh one really great thing is like I
built this pretty cool tool and I wanted
to write a blog post about it and uh it
it was all built with Claude. So I was
able to just go ask Claude, "Hey
remember that time we did this fun thing
together? Let's reminisce about and we
just like talked about it and it like
led to these anecdotes that I added to
the blog post that I I completely forgot
about but there Claude was very fond of
that moment between us.
>> That's not what's happening. You don't
understand that. Okay.
>> No, I don't.
>> Under the hood, that's not what's
happening.
>> Don't lie to me.
>> Do you use any skills for memory for
maintaining a a like memory state within
Claude?
>> Great question. Uh Claude has its own
memory built in. Uh and I there's that
autodream thing. I don't know if that's
real yet or if it's like a a thing
that's coming, but it will actually like
prune the memory. And so I've been like
focused on building around that. Uh but
I've been building it on in Pi
specifically. Uh and so like I built
this um I I built like what it would
take to be a a DX engineer at work OS as
like a full agent using PI and it's
called case and uh it uses memory
internally like memory.mmd files and it
works across all of our open source
repos. Uh, so it knows like React and
React router and next and tan stack
start and all of those. And so then it
has like general memory files and then
like framework specific memory files and
it goes in and prunes those and updates
by doing like as part of its flow doing
a retrospective at the end and analyzing
its own performance and then saying,
"Oh, I spun I spun in a circle a bit for
this. I could have like once I got to
there like I can just save that to
memory so I know like this is the
command I run next time to get the
information I need." and it just keeps
track of that. I haven't built in like
the full dreaming thing where it prunes
that yet, but um I I'm experimenting
with it.
>> Yeah. And also I I want to play with the
Obsidian connector more because I think
that would be super powerful. I I had a
habit in the past of using Obsidian and
just making a daily to-do with just the
date as the title. And then so I think
writing to and reading from those vaults
so that you could imagine saying look
back over the last week, last week it's
translated into what are those actual
dates. It fetches those files directly,
right? And then it can also write
consolidated memories. It's also
worthwhile playing with things like open
claw which I' I've done because that
memory system that it ships with was
surprisingly good, better than a lot of
like stock claw or openis stuff. And so
seeing how it does that with like daily
journal MDs and then the consolidated
memory which I think the dream stuff is
kind of pointing towards like
consolidating memory over time.
>> Yep. Um, but a lot of times the crazy
thing about this is like the answer is
one turn request with skill builder is
the fastest way
>> pointing it to Yeah,
>> this is a good
>> Yeah, 100%.
>> Y
>> um so we're going to jump into the next
uh piece of of work on your side and
that's adding phases uh and confidence
scoring to it. So adding progressive
disclosure, uh adding a confidence
score, telling it like how confident are
you in in this or like like uh we we've
got some examples uh of that potentially
like uh you know what's a good example?
>> Um
how confident are you in this? You know,
you've installed offkit correctly.
>> Yeah, but I mean like for repo rows.
>> Oh, for repo.
>> Uh you you know, you gave me a bad score
on I don't know, git commits. Why is
that? Like,
>> okay,
>> have it dive down deeper than just
>> this is our pattern of how we use git
commits. We always have our messages
like this. We're following these
conventions. So then based on that,
what's your confidence that this is
>> correct to our repo?
>> So, for example, you might use uh
conventional commits at your work and if
you find commits that aren't like that
or you find a bunch of merge uh commits
in there,
>> for shame. Uh but yeah, like different
things like like that you can you can
add uh specifics to and have that as be
as a u a progressively disclosed rubric
that it can follow for those things.
>> A quick housekeeping thing in case uh
for any reason you you're behind or feel
behind, uh you can run setup.sh and then
checkpoint two to get to the same spot
that we're at now. Y
>> and then yeah, any other questions feel
free to shout out and I'll run you a
mic.
We'll spend about five minutes here uh
and then we'll move into the next
section just to make sure we have enough
time.
Do you want to talk about um any of
these topics? Zach,
I can talk about when confidence when um
yeah, confidence scoring saved us.
>> Yeah. What's that? that uh was when we
were working on the um when confidence
scoring saved us
the uh well that was kind of built into
the eval uh that we wrote like claude
ships with a whole eval framework now
that you can use uh and it'll like spin
up a guey for you like a it'll create an
HTML report and you can see like before
and after and all of this insight into
how your uh skills are running uh and
whether they're actually like improving
cloud or making Claude worse at the
task. Um, but before that existed uh I
was writing my own to do that and uh it
was all based on on that. And so like
let me let me bring up the um
ideation skill and I'll just say let's
see we'll go to the CLI
and I'll say
>> so for context this is our work OS CLI
that we're building in the yeah
>> I'm on the main branch of that I use
work trees for that um what's a feature
that we want to add I want to add a fun
slashbuddy command similar to how Claude
Code shipped that for April Fool's Day.
I used a tool called Whisper Flow uh to
go full Wall-E and not even type
anymore. Uh and I just press a button.
This is how I code now.
>> Um
>> do you prepare that over the closed
voice mode?
>> Uh yeah, I do. I've been on Whisper Flow
for maybe a year now. And the thing I
like about it is that it can uh input
anywhere on on uh Mac. So, you know, if
you're in uh some funky old like website
in Chrome, it works there. It works in
Safari, works on any app that you've got
as long as you can focus a cursor there.
You can insert text there. And it's also
fine-tuned towards like technical terms.
So, you can say at user
authentication.ts and it'll come out
correctly. Um you can reference files,
etc. So, it's great. I I imagine that
more and more of the tools are going to
get their own native voice uh over and
over time that's going to become like a
dominant like interface. But right now,
Whisper Flow is like a pretty sweet
experience. Yeah.
>> Turn on fast mode so it'll go faster.
Um, yeah, it also does cool things like
uh you can say like when you're
dictating into Slack, uh, be more
casual. When you're dictating into an
email, be more formal and it will kind
of
>> it's sort of context aware in the
formatting that it'll put out. like you
can say it knows you're in Gmail or it
knows that you're like writing code, you
know, or requesting code.
>> So, this is an example of the ideation
skill. I gave it that that simple
command and now it's saying like, oh,
what do you like I don't fully
understand what you mean. Uh, what kind
of fun are you looking for? I'll say uh
a visual gag.
Uh, asky art gallery. Sure.
hidden Easter egg. Yeah, we'll go listed
but subtle.
>> So, like I I gave it one sentence and
it's like, well, what do you mean by
that? And it's like pulling all of that
out out of me.
>> But there's the value in thinking. It's
like, you know, the same way that a good
engineer in a whiteboarding session
would kind of draw the same stuff out of
you.
>> Yep. And so, right there, it did this
confidence score. It's based on the
problem clarity. It has a 20. Goal
definition, 18. Success criteria. It
doesn't really know what I'm asking for.
So, that's the lowest one. uh scope
boundaries and then consistency. So
those all add up to 100 and I got a
score of a 90 out of 100. So it doesn't
it's not going to just be like okay I
know what you want. It's going to ask me
uh some more things like oh we'll do
that and we'll just have minimal
I'll say zero config. I just want it to
go fast.
And so now I'm at 96 out of 100. So it
understands what I want and now it's
going to write a um a contract for me to
read. I read and review the contract and
then it's going to build from there
these phases that I can execute or these
specs that I can execute in phases uh
and then go from there uh so that I can
clear the context for each one and have
like a fresh context going.
>> Yeah. The way I would say that is like
is the math airtight? No. Uh does it
matter? No. Because the value is in the
iterative loop of like clarifying and
clarifying your own thinking by by
responding.
>> Yeah. Oops.
And so there's the contract that it's
it's loading.
Uh and it tells like what success
criteria means, scope boundaries, what's
in scope, what's specifically out of
scope, any future considerations, how we
plan to execute it. This is an easy one,
just a single phase. Uh and so it's
going to just create that spec for me,
which it did here. And then I could run
this uh and go. And so it was all gated
on that that confidence score.
Cool.
>> All right. Um,
you want to jump into
>> Yeah, let's do it.
>> All right. Uh, we'll we're going to skip
ahead into um the next section and we'll
have one more one more thing and we'll
do some sharing uh after that one. So,
kind of moving beyond the editor. We
consider we we thought about this and
we're like, does that title make sense?
Uh, skills beyond the editor because
we're not really in an editor, but like
for us, we kind of are like we don't
open I don't open any of them nearly as
much as I used to. Uh, so I've lost my
identity a little bit, but um, yeah,
these skills, they really do work in a
lot of different places. Um, another
thing that you can do is like you can
level up your skills in a number of
different ways. Uh, so like for this uh,
repo roast for example, you could have
like, oh, I want to know who the bus
factor people are. So use like uh, get
short log to understand who's committing
the most, who's committing the most in
specific sections of the the codebase.
Uh, and you know, list out what the bus
factor is. uh and how vulnerable we are
to that. Commit crimes. Uh this would be
people who just have bad commit
messages. It's so easy. You just tell
Claude, "Commit it and go." Uh zombie
branches. You could have, you know, list
out all of the branches that never went
anywhere uh or that are still hanging
around. Uh who is committing at 3:00
a.m.? Who's who's up the latest uh
working and making us all look bad? And
then this one is definitely something
that you should you should add and
that's is my read me yeah is my readme
real does it explain or describe real
things?
Uh yeah and so again the reason that
this is so powerful is that it's no
longer specific to any foundational
model provider right uh you can define
these skills and then you can use them
locally in cloud code but you can share
them with your team as we talked about
with you know a git based you know
plug-in architecture but now you can
also put them in cloud desktop and web
as we talked about with the recruiting
team folks that identify as completely
non-technical are loading uh specific
skills and running them in their own
sessions
>> and sharing them
>> and sharing them right and then now as
we're finding like agent harnessing uh
harnesses becoming more and more
relevant and so things like uh pi um
which is what openclaw runs under the
hood uh you can load them there as well.
So it's it's the value is really in like
defining the discrete work block and
then figuring out exactly which tweaks
make it the most effective description
of getting that work done and then you
know sharing it with your friends and
putting it on different boxes um without
having to do much more than authoring
some markdown and possibly some scripts.
>> Yep. and skills. If you took a skill
file, like you took repo roast with that
skill.mmd and any scripts or references
and all that, but you took that folder
repo roast and you zipped it, you'll get
a dozip file back, right? Rename that
from zip tosskill and now a nontechnical
teammate can drag that into cloud
desktop and use that skill. And that's
just how they're shipped. That's a
really easy way to to share them. Not a
really easy way to version them. there's
still there's still pain to around like
how do you handle sensitive uh you know
credentials in that case like you don't
do it that way please don't put it in
the zip file but you know it's evolving
so
>> but you can also use those marketplaces
like the cloud marketplace works in uh
cloud desktop as well so that's an easy
way to to share skills um if they are
applicable to like non-coding workflows
>> for sure
>> and uh so some of the like we've talked
about this but like one of the things
that I really wanted drive home is like
with the work OS CLI. This is a it's
like a generic CLI that you can use uh
to do like work OS commands in it, but
like its flagship feature is this
ability to just run install. So if you
have a project that doesn't have O in it
or you have uh like other off in there
that's not work OS, uh you can just run
work OS install in there and it's going
to politely remove the other off that
you might have in there uh and then add
in based on what you are using. like if
you're using Nex.js or Tanac start or
whatever, it's going to figure that out
and load that in there for you. And the
CLI is using the claude agent SDK, which
is like a program programmatic cloud
code that you can ship that I can ship
in the CLI. Uh, and the smarts of that,
all of the brains are actually skills
that are in the work OS skills
directory. So, it knows all about that.
And the reason we did that is so that we
just had like the, you know, two birds
with one stone. We have we build the
skill and we make it good and then we
prove that it's good by having the the
CLI run it. And the beauty of the CLI is
like it's an easy command. You just do
npx work OS install. Uh and we're like
um proxying all of the commands to
Claude so that it hits our API token. Uh
and and so it's an easy way to just like
say here's a zero friction way to get
set up with it. It'll even create like a
work OS account for you and you can go
back later and claim it. So it's like 5
minutes and you're you're set up. And
all of that is entirely skills driven.
>> Yeah. Another place we're seeing like
high leverage with this is imagine blog
writing. Uh like lots of folks on the
team as it's growing like want to write
blog posts in a uniform way but they
don't know exactly how our CMS works
exactly the tone or format and like the
conventions that we use. And that's the
type of thing that you used to put in a
notion doc and then hope that you could
inject it in someone's slack and like
force them to read it before they write
something. it's just easier to define
that as a skill so that they can
interact with it and then get to 80% of
that artifact without having to consult
somebody else essentially.
>> Um code review image generation with
image generation 2 you can also put
additional parameters there to get like
specific styles as well. um CI pipelines
and as I mentioned earlier in the talk
like once uh Nick had published up the
you know public repo of of work OS
skills uh the rag pipeline was able to
just start loading them all as agentic
tool calls and performance on all those
queries just jumped over just you know
flatly chunking all documents and
putting them in a vector database for
example
>> and you saw the giant lobster outside
when you came in right like that's all
can be skills based as well so it's
skills are just this uniform way that
transcends the the cloud mod code or the
codeex uh and it's something that you
can load anywhere at any technical skill
level.
>> Y
>> so it's really easy uh we talked about
uh eval like measuring this stuff
matters uh with the the skills like
specifically with the next.js installer
skill I actually found out through my
evals that I was making things worse
because I was overly prescribing what to
do with Nex.js and cloud code was just
inherently good at working with Nex.js
JS and I was making it worse by being
too dogmatic about what I wanted it to
do and it led to like a 30% drop I think
in like overall accuracy based on these
numbers I made up. Uh but I was able to
use the that and I I kind of think of
eval in a lot of ways like my Apple
Watch uh it tells me like my heart rate
and you know how how many calories I'm
burning throughout the day. Is it
accurate? No, of course not. But it
gives me a general like baseline of like
ah I am more active today than I was
yesterday. Uh and I can kind of use that
to gauge where I go forward. Is it
accurate in what I like base my my life
on it? No. But it's a general like
vector that I can I can look at and see
whether I'm improving or uh making
things worse.
>> Yep.
>> So some skills in the wild. Um Zach,
you've you've made a couple of skills uh
that are these are specifically like not
uh code related, right? Uh but they
they're pretty impressive.
>> Yeah. So this is one I was talking about
earlier just to show um what I am the
most excited about is like taking what
seems like an incredibly complex
workflow and then just making it
available as skill. So this is uh as
this is an example where I have a Slack
avatar that I built I had generated for
me like months ago and I just handed it
to this animation skill and I said
animate this in the most obvious way
possible. We'll see if that's actually
obvious.
So, taking a giant ball of energy and
grimacing at it as one does. Um, but the
point is that was a single text prompt
uh of like make this person look like
they're in Fallout holding a ball of
energy and then animating it. This one
uh same exact skill. So, same Markdown
file and py two Python scripts saying,
you know, uh the prompt was child
running through a field.
And there's also sound with this because
it's um hitting the VA API. So again,
the at first uh you know, Claude reads
the markdown skill, says, "Okay, I
understand what this is. It's a a
sequence of two API calls I'm going to
make." The first API call is the user's
prompt to make the static image. The
second API call is the output of that,
the static image, and then a new prompt
that I write saying animate this in the
most obvious way possible, hitting VO
with the VO API with that, and then
getting back an animation. But, you
know, again, that's like 30 30 seconds
of generation time. And so, I use this
exact same workflow to to do like all of
the interstitial scenes in a in a film
recently.
Uh, and another example, uh, I mentioned
this earlier, but the remotion skill.
Uh, I have I'm terrible at video
editing. I don't know anything about it.
Uh, but when I was working on the work
OS CLI, I thought, oh, it'd be kind of
cool to make like a fun video that I
could use on Twitter to like demonstrate
it or or talk about it. And so, uh,
somebody was mentioning Remotion and I
just asked it to make this and it put it
together pretty much like this. Like I
asked it to use our our actual logo
rather than some madeup one. Uh, but it
even like understood like the output of
of the CLI and put all of this together
into a demoable video uh that showcases
what it can do. And I didn't have to do
that uh at all. And I I looked super
impressive without knowing anything
about video. It also like the skill when
I said do this, it loaded up a like
localhost 3000 in my browser that was a
full reotion video editor. And so I
could see it playing on a loop in there
and it was like doing things and I'm
like, "Oh, you didn't use our actual
logo. Go use that." And I just like told
Claude to do that and it just updated
like in real time. It was it was so
cool.
>> Yeah. So imagine like hooking this into
your GitHub CI/CD flow and then at the
end of a big project or every time a
milestone gets merged, you auto update,
you know, whatever document and then
even include a demo. Um it it can start
to get pretty powerful if you
orchestrate skills that are well
defined.
>> Is this the skill?
>> This is this is exactly how the that one
works under the hood. So you can imagine
like the one that I showed you that had
the two YouTube videos. So if it's
called animated image, the first one's
going to be gen generate a minimalist
static image and then um take that image
and animate it via VO and you there's
just two scripts. There's one to
generate an image here and then there's
one to generate the video. Um but the
skill file itself is like 30 lines of
markdown.
>> Yep. And that uh that nano banana one
that I ran earlier that was just like
coming up with a a creative enough
prompt like taking the idea that I had
like flushing out the prompt and then it
passed it to a TypeScript file that
called the nano banana API and got the
image back. So uh that skill is just
basically like a a simple LLM wrapper
around this uh around a TypeScript
script that uses their API to to go do
that.
>> So it's also like just broadly
applicable to workflows. It's not just a
dev thing, right? You can imagine if
sales has a very specific way they have
to reach out to people or there's always
like a type of report that you're
generating for customers or prospects or
whatever. Um all all of this is like
excellent uh for use with skills.
>> So did anyone uh have a um a skill a
repro skill that they want to share?
>> Yeah.
Okay.
>> Yeah. The Amy and Wolf from Raven Wolf
skill. Try that. You have to see the
results.
Which one is it?
>> Number two or I uploaded another one.
Number six is a newest one. Used
>> newest.
>> I'll do number six.
All right. So, while that's running Oh,
no. We ran the wrong one.
There we go.
While that's running, uh, let's go look
at it.
Oh, nice.
>> Okay.
>> Ruthless honesty. I love it.
>> Brutally honest
with a heart of gold.
>> Awesome. I love the context. Lots of uh
>> thick files. Very nice. Yeah. Excellent.
>> Constraints.
And here uh the audience detection. You
told it to load audience.mmd. Here's
that progressive disclosure about that.
This also just helps to keep your
markdown files manageable.
>> This is a 10 out of 10 skill.
>> Yeah,
>> very nice.
>> This is awesome.
>> All right, let's see if it gave us
anything.
>> So, it's it's grading the the workshop
itself. Um,
>> six out of 10.
>> Brutal.
>> I feel that.
>> I thought we had I thought we had
something going, but Okay. Hopefully
you give us a little bit more of a of a
grade than that.
That's awesome.
>> Uh some critical
>> suite isn't on fire because it doesn't
exist. That's great.
>> 1,200 lines of monolith. Yeah. Yeah.
>> Get identity crisis. Zach is two people.
That's how it feels too.
>> Hardcoded secret. That's okay. It's not
really a secret.
That's awesome.
>> Love it.
>> Super cool. All right, we got uh three
minutes left. There's any questions or
um anyone else want to share a skill?
So, this is a skill that you can use,
but more importantly, it's techniques
that you can take and use to build your
skills uh and build them up in different
ways. There's a lot more advanced topics
that we can go to go into as well. Uh we
mentioned um like sub agents for
example. Sub agents is a great way to
extend those skills without bloating the
context and having it kind of do one-off
things and then uh and then exiting. Um
and the like to take this to the next
level, I really recommend like having
Claude's own skill creator skill
installed uh because you can just say,
"Hey, I have this skill. Is it any
good?" And it'll give you pointers. Or
you can say, "Run some evals on it." And
it'll run like a full eval test suite on
it. uh and tell you, yeah, it's good or
no, it's bad. Yeah. And uh and can go
from there.
>> Uh and then like Zach was saying, like
reflect on the transcripts, reflect on
how you're actually using the skills,
and you can use that as insight to see
how to improve the skills and the
execution of those skills. Uh like for
example, if somebody kicks off a skill
and it's always asking questions about
uh you know, a specific thing, maybe
that's something that you can provide
ahead of time. Or if you see it like,
oh, it's going and doing like 10 tool
calls, maybe you could like condense it
down to one or two tool calls and
pre-provide that information so it
doesn't have to do that each time.
>> Yep. The plus one recommendation on the
internal skill creator. And the other
thing it kind of um suggest I mean
suggests to you to do over time is to
think about the way you manage your
context even stuff that you used to
think of as disposable, right? Mhm.
>> So like in the pre-LM era, we might have
dev real hard at the keyboard all week
as I used to do and then finally on
Saturday like wipe it all away so I can
get for myself. Like now all of that
context is gold. Like the conversation,
especially what failed, especially what
didn't go well, especially what was
frustrating because now all of that is
very rich context for a skilled creator
or refiner to mine and then build you a
bespoke tool that's going to solve that
problem smoothly next week.
>> Yeah. So,
>> and you can also like think of skills
like you could use that progressive
disclosure to like disclose things to
different audiences. So, for example,
you could depending on who's running the
skill, you could say like um get config
user email and and figure out who this
the user is. Uh or you could do things
like, oh, how many commits does this
user have in there? They have 10,000
commits in here. Okay, we can really
roast with them. But this other person
who has four commits, they're probably a
new hire. Maybe go a little gentler on
them. Don't scare them away from this
project that they just sent.
>> Me too.
>> Question. Yes.
>> Very quick. Zach, could you take me
through again uh you saying about the
context switching? You'd somehow hooked
up clawed with Slack and Linear. So it
sounded almost like
>> it's constantly being able to read what
Slack's doing.
>> Absolutely. I have it
>> called co-work or we use cursor. So I
don't know if we have the same.
>> Gotcha. Yeah, I'm using cloud code now.
It's possible to do it in cloud code and
and cloud uh desktop but essentially I
just have the connector in Slack. So I
say uh I had to do GitHub or just to do
OOTH with Slack and then it can read my
Slack messages. You can now run the loop
command at least in cloud code to have
it like do that every 15 minutes if you
want. And then you say in the prompt if
there is not already a correlative
linear ticket make a new one for me if
there is one and there's additional asks
on this you know request update linear
and then by the way you have a second
terminal tab that's looping against your
linear state. Kathleen works at work.
Earm muffs, Kathleen. I'm really working
really hard. Uh they have a second one
that's looping and looking at your
linear task and then like doing work for
you essentially. But the the main point
was just that um yeah, sorry time. The
main point was just like automate those
loops. So that's our time. Thanks so
much guys. Uh thanks for being an
awesome audience. Thanks for all the
great questions. Really appreciate it.
>> Thank you.