Design like Karpathy is watching — Zeke Sikelianos, Replicate

Channel: aiDotEngineer
Published at: 2025-07-19
YouTube video id: huQPkrwVWwc
Source: https://www.youtube.com/watch?v=huQPkrwVWwc
[Music]
How many of you know who Andre Carpathy
is? Raise your hand.
Okay, maybe half of you. Raise your hand
if you are not Andre Karpathy.
Just trying to gauge audience
participation here. Okay, so I got 80%
there. or something like that. Got a lot
of Andre's in the room right now. Um,
raise your hand if you work at
Replicate.
All right, so if you want to talk to any
Replicate folks, there's there's your
group right there. All right, so um for
those who don't know who Andre Carpathi
is, I will jump into that and explain
that. Um,
these are my uh there's a GitHub repo
that corresponds to this um these
slides. So if you want to grab that,
this will I'll put this slide up at the
end too, so you can um track down any
URLs or anything that I mention in the
talk. Uh my name is Zeke. I am Zeke on
GitHub. Um Zeke on X as well. Um and I
work for Replicate. So uh Replicate is a
cloud platform that lets you run AI
models with an a API. So we have um you
know open source models like all the
great flux models from Black Forest Labs
but we also have you know proprietary
models from Anthropic, OpenAI,
Google etc. Um and of course you can
also run your own custom public and
private models on replicate as well. So
let's get to the point. Who is Andre
Karpathy? Well, he's an AI re he's an AI
researcher who's worked at all these big
uh companies and organizations. Google,
OpenAI, Tesla, OpenAI, Eureka Labs. Um
Eureka Labs is his new thing uh an
educational uh platform. Uh, but most
importantly to me, he is a YouTube
educator and does some really amazing
talks that are highly accessible that
explain how AI and machine learning
works for general audiences. Um, he
coined the term vibe coding a few months
ago and of course that's taken the world
by storm. We're all really interested in
that now and subscribes to the idea that
the hottest new programming language is
English. Um, kind of a hot take. Um, he
also wrote something called the software
2.0 know manifesto which was um now
seven years ago kind of a eternity in
machine learning time uh basically
predicting this world in which uh
machine learning models would write code
for us um and that it would be they
would be better at it than than humans
and so of course here we are. Um so
today I want to talk about menu genen.
So, Menuugen
is um an app that Andre created recently
at a I think he was at a hackathon doing
like a a vibe coding experiment. So,
Menuguen is basically this u web app
where you take photos of a menu at a
restaurant that's all in a text format
and it generates image representations
of the contents of the menu for you. So
if you don't know what the words mean or
English isn't your first language or you
just like to see tantalizing photos of
food that may be good. Um that was the
idea behind it. So he was actually able
to build this app which he described as
an exhilarating and fun escapade as a
local demo but a bit of a painful slog
as a deployed real app. So you've
probably many of you have probably
experienced this where you are working
on something locally you have it running
on your machine oh cool it really works
it's amazing and then you try to deploy
it to you know versell or cloudflare or
something like that and that's where a
lot of the the pain begins um so we're
going to talk about that so
um
Andre kind of wrote this blog post about
the experience of creating menu genen um
and saying you know I was able to make
this thing, publish it, get it online,
uh, add payments for it, and it's a
working functioning app that people can
pay for, and it was super fun to build.
However, it kind of rakes all these
different companies over the coals
because of the sort of developer
experience challenges of working with
all of them. So for me it was cool
because it was like okay replicate is
mentioned among all these big hot shot
companies like OpenAI and Verscell. Um
but we also all have work to do to
improve our products to make them
better. So here's a blur about kind of
what he what he experienced when he
started using replicate API. So the
LLM's knowledge of replicate was
outdated. The docs on replicate were out
of date. Um there were changes in the
API. he experienced rate limiting and it
was harder to get started with a new
legitimate paid account. So, this is
kind of embarrassing, but it's also kind
of like an opportunity to fix our
product and make it better and really
listen to, you know, the kind of voices
that are loud and correct about the
problems with our products. So, what can
Replicate do better? Um, one of them is
embracing llm.text. llm.text text is
this thing where you can uh basically
uh modify your website or your API or
existing services to um render textbased
or markdownbased versions of your
documentation in a format that is
friendly for language models to consume
more friendly than like the HTML
contents of a web page. So said tired
elaborate docs pages with fancy color
palettes, branding, animations,
transitions, dark mode, wired one single
docs markdown file and a copy to
clipboard button. So it sounds simple um
and maybe not the most glamorous thing,
but it is actually the thing that your
language models want to consume. So in
response to this, we added a new feature
on the replicate website where you're
viewing any model page. You have a
button to copy the contents of that page
uh as markdown for a language model or
to
send the page directly to Claude to have
an interaction with the contents of the
model page to learn more about what the
model can do. Similarly, we added that
support for linking to chat GPT. You
basically just say I'm on a model page.
You jump into chat GPT and you start
having a conversation about the model.
So, it's a lot more interactive than
just going to a web page and
reading and trying to find the most
relevant content.
Of course, we also just dump the
markdown here, too. So, if you're using
a a tool like cursor or wind surf, grab
this content, put it into your editor,
and it knows how to run this model.
So, next thing, this was not necessarily
from the blog post, but this is from I'm
grabbing some quotes from recent uh
tweets from Andre Carpathy. So, LLMs
don't like to click, they like to curl.
So, love it or love it or hate it, curl
is um a tool that is here to stay. It's
developed, it's been around for, I don't
know, since the 90s maybe. um it's
installed on everyone's machine and it
is basically a standardized way to be
able to make API calls without any
specialized tooling. So let's look at
this curl command. Maybe it looks ugly,
right? It's there's a lot of syntax.
It's not it's not glamorous, but it
covers everything that you or that a
language model needs to know about how
to make an API request. What is the HTTP
method? What is the JSON payload? How do
you send your credentials? What kind of
response type do you want? Do you want
to make a blocking request or a an
asynchronous request? What is the API
endpoint? That's all covered in this one
little line of code. And this is exactly
the kind of thing that LLMs want to
consume. If you give this content to an
LLM, it now knows how to make API
requests to your service. So, it's
really powerful.
So, we have a tool called Cog at
cog.run, run, which is an open source
tool that you can use to package machine
learning models in productionready
Docker containers. It creates a
standardized API around your model um
with standard inputs and outputs using
Open API. So, we took all of Cog's
documentation and stuffed it into a
single llm.ext file at cog.run. And what
you can do with that is drop it into
your editor
on an existing project. Let's say you've
cloned some open source cog model and
you're like, I don't even really know
how this code works, but I want to
change it. You open up the model, you
drop a reference to that llm.ext,
and
your editor knows how to consume that
content, bring it into context, and use
it to write code.
So, pretty powerful stuff.
All right. So the primary audience of
your thing, your product, service,
library, etc. is now an LLM, not a
human.
This might be like a tough pill to
swallow, but I think it's the world that
we're in right now.
Um, so if you've been at this conference
for a couple days, you probably heard
everybody talking about MCP, right? It's
like such a big deal. But what even is
it? Like how many of you actually feel
like you really know what MCP is?
Okay, I like the honesty here. Like
there's like eight hands going up. Okay,
so I'm going to explain this for you
hopefully. So open API is this thing
where you write a JSON schema that
defines the behavior of your HTTP API.
It's basically just a giant JSON file
that says here are the paths, here are
the endpoints, here are the query
parameters, here's the payload for the
body, here's how you run this thing,
here's how you create a prediction,
here's how you get your predictions,
here's how you search, all that sort of
stuff. And it's just one big giant JSON
file that describes your whole behavior
of your API. So, we have that on
replicate. And when you go to our HTTP
API page, all the content on this page
is generated from that schema. So we
just have a template that renders it all
out as a human friendly representation
of how to use our API. Here's an example
where you can search for models. So
here's where the MCP part comes in. So
MCP is basically a way of taking an open
API schema
and stuffing it into a format where a
language model knows what to do with it.
So we now have an MCP server for
replicate which you can install very
easily. You basically open up cloud code
for example cloud desktop not the web
app.
um go into your developer settings, add
this tiny little line of JSON,
and all of a sudden, Claude now knows
how to do everything that the Replicate
API can do, and it has an API token. So,
you didn't have to install any software.
All you had to do is go get a token from
the Replicate website, and Claude takes
care of the installation of the MCP
server locally. And now you can see on
this page
you can actually have an interaction
with cloud
where it's able to run API requests on
replicate for you. So there's a few
factors here. There's you can use this
for discovery. So, you don't know how to
use a product yet and you want to know
what it's capable of or you want to use
a language model to do searches for you
or you want to start um kind of
scaffolding out the beginning of a
project
and you want your language model to help
you with that. So, that's exactly what
MCP is for. It's a way of connecting
tools to your language model so that it
can do all sorts of powerful things. And
I want to emphasize here that
at Replicate, all we really had to do to
make this possible was invest in having
an open API schema that was very well
written, very well documented,
that um covered everything that our API
is capable of doing and the process of
turning that into an MCP server that can
then connect with tools like Claude, uh
GitHub Copilot, and Visual Studio Code.
Um, and now actually I think OpenAI
added MCP support to their agents SDK
earlier this week. So MCP is just going
to be all over the map and it's a way to
really accommodate language models
helping you do things. So
um this is sort of a note to self uh for
the things that we got wrong for Andre
and the things that we want to fix. Some
of them we've already addressed as I
showed in this talk. Some of them we
still need to get right. So maybe kind
of a no-brainer, accept payments. Okay,
so Andre went on the website, he signed
up for an API key, he entered his credit
card info in replicate, you know,
basically legitimate user, and then he
started hammering replicate with API
requests to generate images of French
toast. and
whatever for whatever reason the way he
was doing it he was making a ton of API
requests and he triggered some kind of
abuse mechanism in our website that said
oh well this users only existed for one
hour and they've already sent us a
thousand requests something must be
wrong so we blocked him and this isn't
something you want to do right you want
to let your power users come to your
product dive right in they know what
they're doing they know what they want
and don't get in their way luckily uh
our CEO saw this, you know, blog post
from Andre and immediately contacted him
and, you know, unblocked his account.
But not everyone has the power of being
able to write a blog post and have
everybody in the world see it and know
about it. So the lesson here for us is
replicate should accept
um payments for credit. So if I go on a
website, I should be able to say,
"Here's 500 bucks. Let me go nuts, do
whatever I want, and don't ban me." So,
we're working on that. We're going to
fix that. Uh, next, document your
Literally, just when you ship features
on your product, don't just merge the
pull request and walk away. It's not
done until it's documented and the world
knows about it. And an LLM can consume
the content and put it to use. So,
always document everything, especially
now that LLMs are in charge. We're still
in charge, but you know what I mean. Um,
okay. Okay, so feed the machines that
basically is just a matter of um
producing content in forms that language
models can understand and consume more
easily than traditional HTML web pages.
Use boring technology. So this means um
if a technology has been around for a
long time, SQL SQL statements have been
around since
I don't know longer than some of us have
been alive.
That means that the language models know
how to how to write SQL because they've
encountered so much of it. So when
you're building products, be sure to
keep in mind that your language models
are going to have a better chance of
writing these this software and using it
if it's a well-established technology
that doesn't change a lot.
And lastly, practice good API hygiene.
This means when you're writing your HTTP
service and you're designing what the
JSON response should look like, keep in
mind that it's probably going to be
going into the context window of a
language model. Now, that has
limitations. So instead of dumping a
JSON payload response that has
everything about all the models under
the sun, consider making it a more
small, slim down,
information dense version of what an LLM
wants to see.
That's all I got. Thank you.
It looks like I've got two minutes if
anybody has questions.
Maybe
no questions. Okay, I answered
everything. Here we go.
Yeah, the question is what are some
recommendations for generating docs? So,
first thing to do just start by
generating your own a open API schema.
Write schemas in YAML or JSON that
describe the behavior of your API.
There's a ton of tools out there. Um,
there's Docsaur Docasaurus, there's uh
read the docs, there's readme, what is
it? Readme.com. There's a whole bunch of
these services that know how to take an
open API schema and turn it into not
only documentation, but also you know
SDKs, um, clients in different
programming languages, all that stuff.
Yeah.
Uh yeah. So the question was are we
thinking about discovery as the LLM
start to make purchasing decisions? Was
that it?
I think the key to that is making sure
that our API um has really good search
capabilities and that a lot of the
information that users need to make
informed decisions is actually available
via API. So for example with replicate
models right now um the pricing is
currently something that you have to go
to the web page to look at either on the
pricing page or on the individual model
pages. If we expose pricing um you know
as a JSON structure that our API can
consume that a public user can consume
then it becomes a lot easier for you to
do something like jump into a session
with claude and say oh look I'm
evaluating all the video models I'm
looking at you know Imageen and or uh
you know VO and cling and minax and all
the other things that are out there.
Show me a comparison of which models are
the most expensive, which ones are the
fastest, which ones can produce the
highest quality output, etc. And if the
language model has access to the
structured data to answer those
questions, then it's going to be a lot
easier to make those decisions.
All right, thanks y'all.
[Music]