Design like Karpathy is watching — Zeke Sikelianos, Replicate
Channel: aiDotEngineer
Published at: 2025-07-19
YouTube video id: huQPkrwVWwc
Source: https://www.youtube.com/watch?v=huQPkrwVWwc
[Music] How many of you know who Andre Carpathy is? Raise your hand. Okay, maybe half of you. Raise your hand if you are not Andre Karpathy. Just trying to gauge audience participation here. Okay, so I got 80% there. or something like that. Got a lot of Andre's in the room right now. Um, raise your hand if you work at Replicate. All right, so if you want to talk to any Replicate folks, there's there's your group right there. All right, so um for those who don't know who Andre Carpathi is, I will jump into that and explain that. Um, these are my uh there's a GitHub repo that corresponds to this um these slides. So if you want to grab that, this will I'll put this slide up at the end too, so you can um track down any URLs or anything that I mention in the talk. Uh my name is Zeke. I am Zeke on GitHub. Um Zeke on X as well. Um and I work for Replicate. So uh Replicate is a cloud platform that lets you run AI models with an a API. So we have um you know open source models like all the great flux models from Black Forest Labs but we also have you know proprietary models from Anthropic, OpenAI, Google etc. Um and of course you can also run your own custom public and private models on replicate as well. So let's get to the point. Who is Andre Karpathy? Well, he's an AI re he's an AI researcher who's worked at all these big uh companies and organizations. Google, OpenAI, Tesla, OpenAI, Eureka Labs. Um Eureka Labs is his new thing uh an educational uh platform. Uh, but most importantly to me, he is a YouTube educator and does some really amazing talks that are highly accessible that explain how AI and machine learning works for general audiences. Um, he coined the term vibe coding a few months ago and of course that's taken the world by storm. We're all really interested in that now and subscribes to the idea that the hottest new programming language is English. Um, kind of a hot take. Um, he also wrote something called the software 2.0 know manifesto which was um now seven years ago kind of a eternity in machine learning time uh basically predicting this world in which uh machine learning models would write code for us um and that it would be they would be better at it than than humans and so of course here we are. Um so today I want to talk about menu genen. So, Menuugen is um an app that Andre created recently at a I think he was at a hackathon doing like a a vibe coding experiment. So, Menuguen is basically this u web app where you take photos of a menu at a restaurant that's all in a text format and it generates image representations of the contents of the menu for you. So if you don't know what the words mean or English isn't your first language or you just like to see tantalizing photos of food that may be good. Um that was the idea behind it. So he was actually able to build this app which he described as an exhilarating and fun escapade as a local demo but a bit of a painful slog as a deployed real app. So you've probably many of you have probably experienced this where you are working on something locally you have it running on your machine oh cool it really works it's amazing and then you try to deploy it to you know versell or cloudflare or something like that and that's where a lot of the the pain begins um so we're going to talk about that so um Andre kind of wrote this blog post about the experience of creating menu genen um and saying you know I was able to make this thing, publish it, get it online, uh, add payments for it, and it's a working functioning app that people can pay for, and it was super fun to build. However, it kind of rakes all these different companies over the coals because of the sort of developer experience challenges of working with all of them. So for me it was cool because it was like okay replicate is mentioned among all these big hot shot companies like OpenAI and Verscell. Um but we also all have work to do to improve our products to make them better. So here's a blur about kind of what he what he experienced when he started using replicate API. So the LLM's knowledge of replicate was outdated. The docs on replicate were out of date. Um there were changes in the API. he experienced rate limiting and it was harder to get started with a new legitimate paid account. So, this is kind of embarrassing, but it's also kind of like an opportunity to fix our product and make it better and really listen to, you know, the kind of voices that are loud and correct about the problems with our products. So, what can Replicate do better? Um, one of them is embracing llm.text. llm.text text is this thing where you can uh basically uh modify your website or your API or existing services to um render textbased or markdownbased versions of your documentation in a format that is friendly for language models to consume more friendly than like the HTML contents of a web page. So said tired elaborate docs pages with fancy color palettes, branding, animations, transitions, dark mode, wired one single docs markdown file and a copy to clipboard button. So it sounds simple um and maybe not the most glamorous thing, but it is actually the thing that your language models want to consume. So in response to this, we added a new feature on the replicate website where you're viewing any model page. You have a button to copy the contents of that page uh as markdown for a language model or to send the page directly to Claude to have an interaction with the contents of the model page to learn more about what the model can do. Similarly, we added that support for linking to chat GPT. You basically just say I'm on a model page. You jump into chat GPT and you start having a conversation about the model. So, it's a lot more interactive than just going to a web page and reading and trying to find the most relevant content. Of course, we also just dump the markdown here, too. So, if you're using a a tool like cursor or wind surf, grab this content, put it into your editor, and it knows how to run this model. So, next thing, this was not necessarily from the blog post, but this is from I'm grabbing some quotes from recent uh tweets from Andre Carpathy. So, LLMs don't like to click, they like to curl. So, love it or love it or hate it, curl is um a tool that is here to stay. It's developed, it's been around for, I don't know, since the 90s maybe. um it's installed on everyone's machine and it is basically a standardized way to be able to make API calls without any specialized tooling. So let's look at this curl command. Maybe it looks ugly, right? It's there's a lot of syntax. It's not it's not glamorous, but it covers everything that you or that a language model needs to know about how to make an API request. What is the HTTP method? What is the JSON payload? How do you send your credentials? What kind of response type do you want? Do you want to make a blocking request or a an asynchronous request? What is the API endpoint? That's all covered in this one little line of code. And this is exactly the kind of thing that LLMs want to consume. If you give this content to an LLM, it now knows how to make API requests to your service. So, it's really powerful. So, we have a tool called Cog at cog.run, run, which is an open source tool that you can use to package machine learning models in productionready Docker containers. It creates a standardized API around your model um with standard inputs and outputs using Open API. So, we took all of Cog's documentation and stuffed it into a single llm.ext file at cog.run. And what you can do with that is drop it into your editor on an existing project. Let's say you've cloned some open source cog model and you're like, I don't even really know how this code works, but I want to change it. You open up the model, you drop a reference to that llm.ext, and your editor knows how to consume that content, bring it into context, and use it to write code. So, pretty powerful stuff. All right. So the primary audience of your thing, your product, service, library, etc. is now an LLM, not a human. This might be like a tough pill to swallow, but I think it's the world that we're in right now. Um, so if you've been at this conference for a couple days, you probably heard everybody talking about MCP, right? It's like such a big deal. But what even is it? Like how many of you actually feel like you really know what MCP is? Okay, I like the honesty here. Like there's like eight hands going up. Okay, so I'm going to explain this for you hopefully. So open API is this thing where you write a JSON schema that defines the behavior of your HTTP API. It's basically just a giant JSON file that says here are the paths, here are the endpoints, here are the query parameters, here's the payload for the body, here's how you run this thing, here's how you create a prediction, here's how you get your predictions, here's how you search, all that sort of stuff. And it's just one big giant JSON file that describes your whole behavior of your API. So, we have that on replicate. And when you go to our HTTP API page, all the content on this page is generated from that schema. So we just have a template that renders it all out as a human friendly representation of how to use our API. Here's an example where you can search for models. So here's where the MCP part comes in. So MCP is basically a way of taking an open API schema and stuffing it into a format where a language model knows what to do with it. So we now have an MCP server for replicate which you can install very easily. You basically open up cloud code for example cloud desktop not the web app. um go into your developer settings, add this tiny little line of JSON, and all of a sudden, Claude now knows how to do everything that the Replicate API can do, and it has an API token. So, you didn't have to install any software. All you had to do is go get a token from the Replicate website, and Claude takes care of the installation of the MCP server locally. And now you can see on this page you can actually have an interaction with cloud where it's able to run API requests on replicate for you. So there's a few factors here. There's you can use this for discovery. So, you don't know how to use a product yet and you want to know what it's capable of or you want to use a language model to do searches for you or you want to start um kind of scaffolding out the beginning of a project and you want your language model to help you with that. So, that's exactly what MCP is for. It's a way of connecting tools to your language model so that it can do all sorts of powerful things. And I want to emphasize here that at Replicate, all we really had to do to make this possible was invest in having an open API schema that was very well written, very well documented, that um covered everything that our API is capable of doing and the process of turning that into an MCP server that can then connect with tools like Claude, uh GitHub Copilot, and Visual Studio Code. Um, and now actually I think OpenAI added MCP support to their agents SDK earlier this week. So MCP is just going to be all over the map and it's a way to really accommodate language models helping you do things. So um this is sort of a note to self uh for the things that we got wrong for Andre and the things that we want to fix. Some of them we've already addressed as I showed in this talk. Some of them we still need to get right. So maybe kind of a no-brainer, accept payments. Okay, so Andre went on the website, he signed up for an API key, he entered his credit card info in replicate, you know, basically legitimate user, and then he started hammering replicate with API requests to generate images of French toast. and whatever for whatever reason the way he was doing it he was making a ton of API requests and he triggered some kind of abuse mechanism in our website that said oh well this users only existed for one hour and they've already sent us a thousand requests something must be wrong so we blocked him and this isn't something you want to do right you want to let your power users come to your product dive right in they know what they're doing they know what they want and don't get in their way luckily uh our CEO saw this, you know, blog post from Andre and immediately contacted him and, you know, unblocked his account. But not everyone has the power of being able to write a blog post and have everybody in the world see it and know about it. So the lesson here for us is replicate should accept um payments for credit. So if I go on a website, I should be able to say, "Here's 500 bucks. Let me go nuts, do whatever I want, and don't ban me." So, we're working on that. We're going to fix that. Uh, next, document your Literally, just when you ship features on your product, don't just merge the pull request and walk away. It's not done until it's documented and the world knows about it. And an LLM can consume the content and put it to use. So, always document everything, especially now that LLMs are in charge. We're still in charge, but you know what I mean. Um, okay. Okay, so feed the machines that basically is just a matter of um producing content in forms that language models can understand and consume more easily than traditional HTML web pages. Use boring technology. So this means um if a technology has been around for a long time, SQL SQL statements have been around since I don't know longer than some of us have been alive. That means that the language models know how to how to write SQL because they've encountered so much of it. So when you're building products, be sure to keep in mind that your language models are going to have a better chance of writing these this software and using it if it's a well-established technology that doesn't change a lot. And lastly, practice good API hygiene. This means when you're writing your HTTP service and you're designing what the JSON response should look like, keep in mind that it's probably going to be going into the context window of a language model. Now, that has limitations. So instead of dumping a JSON payload response that has everything about all the models under the sun, consider making it a more small, slim down, information dense version of what an LLM wants to see. That's all I got. Thank you. It looks like I've got two minutes if anybody has questions. Maybe no questions. Okay, I answered everything. Here we go. Yeah, the question is what are some recommendations for generating docs? So, first thing to do just start by generating your own a open API schema. Write schemas in YAML or JSON that describe the behavior of your API. There's a ton of tools out there. Um, there's Docsaur Docasaurus, there's uh read the docs, there's readme, what is it? Readme.com. There's a whole bunch of these services that know how to take an open API schema and turn it into not only documentation, but also you know SDKs, um, clients in different programming languages, all that stuff. Yeah. Uh yeah. So the question was are we thinking about discovery as the LLM start to make purchasing decisions? Was that it? I think the key to that is making sure that our API um has really good search capabilities and that a lot of the information that users need to make informed decisions is actually available via API. So for example with replicate models right now um the pricing is currently something that you have to go to the web page to look at either on the pricing page or on the individual model pages. If we expose pricing um you know as a JSON structure that our API can consume that a public user can consume then it becomes a lot easier for you to do something like jump into a session with claude and say oh look I'm evaluating all the video models I'm looking at you know Imageen and or uh you know VO and cling and minax and all the other things that are out there. Show me a comparison of which models are the most expensive, which ones are the fastest, which ones can produce the highest quality output, etc. And if the language model has access to the structured data to answer those questions, then it's going to be a lot easier to make those decisions. All right, thanks y'all. [Music]