DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners
Channel: aiDotEngineer
Published at: 2026-01-08
YouTube video id: -cKUW6n8hBU
Source: https://www.youtube.com/watch?v=-cKUW6n8hBU
[music] Thanks everybody for uh for joining. I'm here to talk to you today about DSPI. Um, and feel free to jump in with questions or anything throughout the talk. It's, you know, I don't sp I don't plan on spending the full hour and a half or so. I know it's the last session of the day. So, um, keep it casual. Feel free to jump in. I'll start with a little bit of background. Don't want to go through too many slides. I'm technically a consultant, so I have to do some slides, but we will dive into the code for the the latter half. And there's a GitHub repo that you can download to to follow along and play around with it on your own. Um, so how many people here have heard of DSPI? Almost everyone. That's awesome. How many people have actually used it kind of day-to-day in production or anything like that? Three. Okay, good. So hopefully we can convert some more of you today. Um, so high level DSPI, this is straight from the website. Um, it's a declarative framework for how you can build modular software. And most important for someone like myself, I'm not necessarily an engineer that is writing code all day, every day. As I mentioned before, I'm a more of a technical consultant. So, I run across a variety of different problems. Could be um an investigation for a law firm. It could be helping a company understand how to improve their processes, how to deploy AI internally. Maybe we need to look through 10 10,000 contracts to identify a particular clause um or or paragraph. And so DSPI has been a really nice way for me personally and my team to iterate really really quickly on building these applications. Most importantly building programs. It's not um it's not kind of iterating with prompts and tweaking things back and forth. It is building a a proper Python program and and DSP is a really good way for you to do that. So I mentioned before there's a repo online if you want to download it now and kind of just get everything set up. I'll put this on the screen later on. Um, but if you want to go here, just kind of download some of the code. It uh it's been put together over the past couple days. So, it's not going to be perfect production level code. It's much more of utilities and little things here and there to just come and kind of demonstrate the usefulness, demonstrate the point of of what we're talking about today in that and we'll walk through all of these these different use cases. So um sentiment classifier going through a PDF some multimodal work uh a very very simple web research agent detecting boundaries of a PDF document you'll see how to summarize basically arbitrary length text and then go into an optimizer uh with Jeepo but before we do that just again kind of level set the biggest thing for me personally DSP is a really nice way to decompose your logic into a program that treats LLMs as a first class citizen. So at the end of the day, you're fundamentally just calling a function that under the hood just happens to be an LLM and DSPI gives you a really nice intuitive easy way to do that with some guarantees about the input and output types. So of course there are structured outputs, of course there are other ways to do this, Pyantic [snorts] and others. Um, but DSPI has a set of primitives that when you put it all together allows you to build a cohesive modular piece of software that you then happen to be able to optimize. We'll get into that uh in a minute. So, just a few reasons of why I'm such an advocate. It sit at it sits at this really nice level of abstraction. So, it's I I would say it doesn't get in your way as much as a lang chain. And that's not a knock-on lang chain. It's just a different kind of paradigm in the way that DSPI is is structured. Um, and allows you to focus on things that actually matter. So you're not writing choices zero messages content. You're not you're not doing string parser. You're not doing a bunch of stuff under the hood. You're just declaring your intent of how you want the program to operate, what you want your inputs and outputs to be. Because of this, it allows you to create computer programs. As I mentioned before, not just tweaking strings and sending them back and forth. You are building a program first. It just happens to also use LLMs. And really the the most kind of important part of this is that and Omar the KB the uh the founder of this or the the original developer of it had this really good podcast with A16Z. I think it came out just like two or three days ago. But it he put it a really nice way. He said it's a it's built with a systems mindset and it's really about how you're encoding or expressing your intent of what you want to do most importantly in a way that's transferable. So the the design of your system, I would imagine, or your program isn't going to move necessarily as quickly as maybe the model capabilities are under the hood. when we see new releases almost every single day, different capabilities, better models and so DSPI allows you to structure it in a way that retains the control flow uh retains the intent of your system, your program um while allowing you to bounce from model to model to the extent that you want to or need to. Convenience comes for free. There's no parsing, JSON, things like that. It again, it sits at a nice level of abstraction where you can still understand what's going on under the hood. If you want to, you can go in and tweak things, but it allows you to to kind of focus on just what you want to do while retaining the level of precision that you that I think most of us would like to have in and kind of building your programs. Um, [snorts] as mentioned, it's robust to kind of model and paradigm shifts. So, you can again keep the logic of your program. Um but it but keep that those LLMs infused in uh basically in line. Now that being said, you know, there are absolutely other great libraries out there. Pedantic AI, Langchain, there's many many others that allow you to do similar things. Agno is another one. Um this is just one perspective and um it may not be perfect for your use case. For me, it took me a little bit to kind of gro how DSPI works and you'll see why that is in a minute. Um, so I would just recommend kind of have an have an open mind, play with it. Um, run the code, tweak the code, do whatever you need to do. Um, and just see how it might work, might work for you. And really, this talk is more about ways that I found it useful. It's not a dissertation on the ins and outs of every nook and cranny of DSPI. It's more of, you know, I've run into these problems myself now. I naturally run to DSPI to solve them. And this is kind of why. And the hope is that you can extrapolate some of this to your own use cases. So we we'll go through everything uh fairly quickly here, but the core concepts of DSPI really comes down to arguably five or these six that you see on the screen here. So we'll go into each of these in more detail, but high level signatures specify what you want the L what basically what you want your function call to do. This is when you specify your inputs, your outputs. Inputs and outputs can both be typed. Um, and you defer the rest of the basically the how the implementation of it to the LLM. And we'll see how we how that all kind of comes together uh in a minute. Modules themselves are ways to logically structure your program. They're based off of signatures. So, a module can have one or more signatures embedded within it in addition to uh additional logic. and it's based off of um pietorrch and some of the in terms of like the methodology for how it's structured and you'll you'll see how that uh comes to be in a minute. Tools we're all familiar with tools MCP and others and really tools fundamentally as DSPI looks at them are just Python functions. So it's just a way for you to very easily expose Python functions to the LLM within the DSP kind of ecosystem if you will. um adapters live in between your signature and the LLM call itself. I mean, as we all know, prompts are ultimately just strings of text that are sent to the LLM. Signatures are a way for you to express your intent at a at a higher level. And so, adapters are the things that sit in between those two. So, it's how you translate your inputs and outputs into a format basically explodes out from your initial signature into a format that is ultimately the prompt that is sent to the LLM. And so, you know, there's some debate or some research on if certain models perform better with XML as an example or BAML or JSON or others. And so adapters give you a nice easy abstraction to to basically mix and match those at at will as you want. Optimizers um are the most interesting and for whatever reason the most controversial part of DSP. That's kind of the first thing that people think of or at least when they hear of DSP they think optimizers. We'll see a quote in a minute. It's not optimizers first. It is just a nice added benefit and a nice capability that DSPI offers in addition to the ability to structure your program with the signatures and modules and everything else. Um, and metrics are used in tandem with optimizers that that basically defines how you measure success in your in uh your DSPI program. So the optimizers use the metrics to determine if it's finding the right path if you will. So signature as I mentioned before it's how you express your intent your declarative intent can be super simple strings and this is the weirdest part for me initially but is one of the most powerful parts uh of it now or it can be more complicated class-based classbased objects if you've used pyantic it that's basically what what it runs on under the hood. So this is an example of one of the class-based signatures. Again, it it's basically just a pyantic object. What's super interesting about this is that the the names of the fields themselves act almost as like mini prompts. It's part of the prompt itself. And you'll see how this comes to life in a minute. But what's ultimately passed to the model from something like this is it will say okay your inputs are going to be a parameter called text and it's based off of the name of the that particular parameter in this class. And so these things are actually passed through. And so it's it's very important uh to be able to name your parameters in a way that is intuitive for the model to be able to pick it up. Um, and you can add some additional context or what have you in the description field here. So most of this, if not all of this, yes, it is proper, you know, typed Python code, but it's also it also serves almost as a prompt ultimately that feeds into the model. Um, and that's basically translated through the use of adapters. Um, and so just to highlight here like these, it's the ones that are a little bit darker and bold, you know, those are the things that are effectively part of the prompt. uh that's been sent in and you'll see kind of how DSPI works with all this and formats it in a way that again allow you to just worry about what you want. Worry about constructing your signature instead of figuring out how best to word something in the prompt. Go >> ahead. I have a really good prompt. >> Sure. Then I don't want this thing. >> That's exactly right. >> Sure. >> So the the question for folks online is what if I already have a great prompt? I've done all this work. I'm a I'm a amazing prompt engineer. I don't want my job to go away or whatever. Um, yes. So, you can absolutely start with a custom prompt or something that you have demonstrated works really well. And you're exactly right that's that can be done in the dock string itself. There's there's some other methods in order uh for you to inject basically system instructions or add additional things at certain parts of the ultimate prompt and or of course you can just inject it in the in the final string anyway. I mean it's just you know a string that is constructed by VSPI. So um absolutely this doesn't necessarily prevent you it does does not prevent you from adding in some super prompt that you already have. Absolutely. Um and to your point it is it can serve as a nice starting point from which to build the rest of the system. Here's a shorthand version of the same exact thing which to me the first time I saw this so this was like baffling to me. Um, but it it that's exactly how it works is that you're basically again kind of deferring the implementation or the logic or what have you to DSPI and the model to basically figure out what you want to do. So in this case, if I want a super super simple text uh sentiment classifier, this is basically all you need. You're just saying, okay, I'm going to give you text as an input. I want the sentiment as an integer as the output. Now you probably want to specify some additional instructions to say okay your sentiment you know a lower number means negative you know a higher number is more positive sentiment etc. But it just gives you a nice kind of easy way to to kind of scaffold these things out in a way that you don't have to worry about like you know creating this whole prompt from hand. It's like okay I just want to see how this works and then if it works then I can add the additional instructions then I can create a module out of it or you know whatever it might be. It's these shorthand or it is this shorthand that makes experimentation and iteration incredibly quick. So modules it's that base abstraction layer for DSPI programs. There are a bunch of modules that are built in and these are a collection of kind of prompting techniques if you will and you can always create your own module. So to the question before, if you have something that you know works really well, sure yeah, put it in the module. That's now the kind of the base assumption, the base module that others can build off of. And all of DSPI is meant to be composable, optimizable, and when you deconstruct your business logic or whatever you're trying to achieve by using these different primitives, it all it's intended to kind of fit together and flow together. Um, and we'll get to optimizers in a minute, but at least for me and my team's experience, just being able to logically separate the different components of a program, but basically inlining uh LLM calls has been incredibly powerful for us. And it's just an added benefit that at the end of the day, because we're just kind of in the DSPI paradigm, we happen to also be able to optimize it at the end of the day. Uh, so it comes with a bunch of standard ones built in. I I don't use some of these bottom ones as much, although it's they're super interesting. Um the base one at the top there is just DSpi.predict. That's literally just, you know, an LM call. That's just uh a vanilla call. chain of thought uh probably isn't isn't as relevant anymore these days because models have kind of ironed those out but um it is a good example of the types of um kind of prompting techniques that can be built into some of these modules um and basically all this does is add um some some of the uh strings from literature to say okay let's think step by step or whatever that might be same thing for react and codeact react is basically the way that you expose the tools to the model. So, it's wrapping and doing some things under the hood with um basically taking your signatures and uh it's injecting the Python functions that you've given it as tools and basically React is how you do tool calling in DSP. Program with thought is uh is pretty cool. It kind of forces the model to think in code and then we'll return the result. Um, and you can give it a, it comes with a Python interpreter built in, but you can give it some custom one, some type of custom harness if you wanted to. Um, I haven't played with that one too too much, but it is super interesting. If you have like a highly technical problem or workflow or something like that where you want the model to inject reasoning in code at certain parts of your pipeline, that's that's kind of an really easy way to do it. And then some of these other ones are basically just different methodologies for comparing outputs or running things in parallel. So here's what one looks like. Again, it's it's fairly simple. It's, you know, it is a Python class at the end of the day. Um, and so you do some initial initialization up top. In this case, you're seeing the uh uh the shorthand signature up there. So, I'm this module uh just to give you some context is an excerpt from um one of the the Python um files that's in the repo is basically taking in a bunch of time entries and making sure that they adhere to certain standards, making sure that things are capitalized properly or that there are periods at the end of the sentences or whatever it might be. that's from a real client use case where they had hundreds of thousands of time entries and they needed to make sure that they all adhere to the same format. This was one way to to kind of do that very elegantly, at least in my opinion, was taking up top you can define the the signature. It's adding the some additional instructions that were defined elsewhere and then saying for this module the the change tense um call is going to be just a vanilla predict call. And then when you actually call the module, you enter into the forward function which you can inter basically intersperse the LLM call which would be the first one and then do some kind of hard-coded business logic beneath it. Uh tools as I mentioned before these are just vanilla kind of Python functions. It's the DSP's tool interface. So under the hood, DSPI uses light LLM. And so there needs to be some kind of coupling between the two, but fundamentally um any type of tool that would that you would use elsewhere, you can also use in in DSPI. And this is probably obvious to most of you, but here's just an example. You have two functions, get weather, search web. You include that with a signature. So in this case, I'm saying the signature is I'm going to give you a question. please give me an answer. I'm not even specifying the types. It's just going to infer what that means. Uh I'm giving it the get weather and the search web tools and I'm saying, okay, do your thing, but only go five rounds just so it doesn't spin off and do something crazy. And then a call here is literally just calling the React agent that I created above with the question, what's the weather like in Tokyo? We'll see an example of this in the code session, but basically what this would do is give the model the prompt, the tools, and let it do its thing. So adapters, before I cover this a little bit, they're basically prompt formatterers, if you will. So the description from the docs probably says it best. It's you know it takes your signature the inputs other attributes and it converts them into some type of message format that you have specified or that the adapter has specified and so as an example the JSON adapter taking say a pyantic object that we defined before this is the actual prompt that's sent into the LLM and so you can see the input fields so this would have been defined as okay clinical note type string patient info as a patient details object object which which would have been defined elsewhere and then this is the definition of the patient info. It's basically a JSON dump of that pantic object. Go ahead. >> So this idea there's like a base adapter default that's good for most cases and this is if you want to tweak that to do something more specific. >> That's right. >> Yeah. The question was if if there's a base adapter and would this be an example of where you want to do something specific? Answer is yes. So um it's a guy pashant who is um I have his Twitter at the end of this presentation but he's been great. [clears throat] He did some testing comparing the JSON adapter with the BAML adapter. Um and you can see just intuitively even even for us humans the way that this is formatted is a little bit more intuitive. It's probably more token efficient too just considering like if you look at the messy JSON that's here versus the I guess slightly better formatted BAML that's here. um can actually improve performance by you know five to 10 percent depending on your use case. So it's a good example of how you can format things differently. The the rest of the program wouldn't have changed at all. You just specify the BAML adapter and it totally changes how the information is presented under the hood to the LLM multimodality. I mean this obviously is more at the model level but DSPI supports multiple modalities by default. So images, audio, some others. Um, and the same type of thing, you kind of just feed it in as part of your signature and then you can get some very nice clean output. This allows you to work with them very, very, very easily, very quickly. And for those uh, eagle-eyed participants, you can see the first uh, lineup there is attachments. It's probably a lesserk known library. Another guy on Twitter is awesome. Uh, Maxim, I think it is. uh he created this library that just is basically a catch-all for working with different types of files and converting them into a format that's super easy to use with LLMs. Um he's a big DSPI fan as well. So he made basically an adapter that's specific to this. But that's all it takes to pull in images, PDFs, whatever it might be. You'll see some examples of that and it just makes at least has made my life super super easy. Here's another example of the same sort of thing. So this is a PDF of a form 4 form. So, you know, public SEC form from Nvidia. Um, up top I'm just giving it the link. I'm saying, okay, attachments, do your thing. Pull it down, create images, whatever you're going to do. I don't need to worry about it. I don't care about it. This is super simple rag, but basically, okay, I want to do rag over this document. I'm going to give you a question. I'm going to give you the document and I want the answer. Um, and you can see how simple that is. Literally just feeding in the document. How many shares were sold? Interestingly here, I'm not sure if it's super easy to see, but you actually have two transactions here. So, it's going to have to do some math likely under the hood. And you can see here the thinking and the the ultimate answer. Go ahead. >> Is it on the rag step? Is it creating a vector store of some kind or creating embeddings and searching over those? Is there a bunch going on in the background there or what? >> This is poor man's rack. I should have clarified. This is this is literally just pulling in the document images and I think attachments will do some basic OCR under the hood. Um, but it doesn't do anything other than that. That's it. All we're feeding in here, the the actual document object that's being fed in, yeah, is literally just the text that's been OCRD. the images, the model does the rest. [sighs] All right, so optimizers uh let's see how we're doing. Okay. Um optimizers are super powerful, super interesting concept. It's been some research um that argues I think that it's just as performant if not in cert in certain situations more performant than fine-tuning would be for certain models for certain situations. there's all this research about in context learning and such. And so whether you want to go fine-tune and do all of that, nothing stops you. But I would recommend at least trying this first to see how far you can get without having to set up a bunch of infrastructure and, you know, go through all of that. See how the optimizers work. Um, but fundamentally what it allows you to do is DSPI gives you the primitives that you need and the organization you need to be able to measure and then quantitatively improve that performance. And I mentioned transferability before. This the transferability is enabled arguably through the use of optimizers because if you can get okay I want to I have the classification task works really well with 41 but maybe it's a little bit costly because I have to run it a million times a day. Can I try it with 41 nano? Okay, maybe it's at 70% whatever it might be. But I run the optimizer on 41 nano and I can get the performance back up to maybe 87%. maybe that's okay for my use case, but I've now just dropped my cross my cost profile by multiple orders of magnitude. And it's the optimizer that allows you to do that type of model and kind of use case transferability, if you will. But really all it does at at the end of the day under the hood is iteratively prompt uh iteratively optimize or tweak that prompt, that string under the hood. And because you've constructed your program using the different modules, DSPI kind of handles all of that for you under the hood. So if you compose a program with multiple modules and you're optimizing against all that, it it by itself DSPI will optimize the various components in order to improve the input and output performance. And we'll we'll take it from the man himself, Omar. You know, ESPI is not an optimizer. I've said this multiple times. it's it's just a set of programming abstractions or a way to program. You just happen to be able to optimize it. Um so again, the value that I've gotten and my team has gotten is mostly because of the programming abstractions. It's just this incredible added benefit that you are also able to to should you choose to to optimize it afterwards. And I was listening to this to Dwaresh and and uh Carpathy the other day and this kind of I was like prepping for this talk and this like hit home perfectly. I was thinking about the optimizers and someone smarter than me can can ple you know please correct me but I think this makes sense because he he was basically talking about using LLM as a judge can be a bad thing because the model being judged can find adversarial examples and degrade the performance or basically um create a situation where the judge is not uh not scoring something properly. um because he's saying that the model will find these little cracks. It'll find these little spirious things in the nooks and crannies of the giant model and find a way to cheat it. Basically saying that LM as a judge can only go so far until the other model uh finds those adversarial examples. If you kind of invert that and flip that on its head, it's this property that the optimizers for DSpir are taking advantage of to optimize to find the nooks and crannies in the model, whether it's a bigger model model or smaller model to improve the performance against your data set. So that's what the optimizer is doing is finding finding these nooks and crannies in the model to optimize and improve that performance. So a typical flow, I'm not going to spend too much time on this, but fairly logical constructor program which is decomposing your logic into the modules. You use your metrics to define basically the contours of how the program works and you optimize all that through um to to get your your uh your final result. So, another talk that this guy Chris Pototts just had maybe two days ago, um, where he made the point, this is what I was mentioning before, where Jeepa, which is, uh, you probably saw some of the the talks the other day, um, where the optimizers are on par or exceed the performance of something like GRPO, another kind of fine-tuning method. So, pretty impressive. I think it's an active area of research. people a lot smarter than me like Omar and Chris and others are are leading the way on this. But uh point being I think prompt op prompt optimization is a pretty exciting place to be and if nothing else is worth exploring. And [clears throat] then finally metrics again these are kind of the building blocks that allow you to define what success looks like for the optimizer. So this is what it's using and you can have many of these and we'll see examples of this where again at a high level your program works on inputs it works on outputs the optimizer is going to use the metrics to understand okay my last tweak in the prompt did it improve performance it did it degrade performance and the way you define your metrics uh provides that direct feedback for the optimizers to work on. Uh so here's another example, a super simple one from that time entry example I mentioned before. Um, so they can be the metrics can either be like fairly rigorous in terms of like does this equal one or or you know some type of equality check or a little bit more subjective where using LLM as a judge to say whatever was this generated um string does it adhere to these you know various criteria whatever it might be but that itself can be a metric and so all of this is to say it's a very long-winded way of saying in my opinion this is probably most if not all of what you need to construct arbitrarily complex workflows, data processing pipelines, business logic, whatever that might be. Different ways to work with LLMs. If nothing else, DSPI gives you the primitives that you need in order to build these modular composable systems. So, if you're interested in some people online, um there's many many more. There's a Discord community as well. Um, but usually these people are are on top of the latest and greatest and so would recommend giving them a follow. You don't need to follow me. I don't really do much, but uh the others on there are are really pretty good. Okay, so the fun part, we'll actually get into some to some code. So, if you haven't had a chance, now's your last chance to get the repo. U, but I'll just kind of go through a few different examples here of what we talked about. Maybe Yeah. Okay. Okay. So, I'll set up Phoenix, which is from Arise, uh, which is basically an obser an observability platform. Uh, I just did this today, so I don't know if it's going to work or not, but we'll we'll see. We'll give it a shot. Uh, but basically what this allows you to do is have a bunch of observability and tracing for all the calls that are happening under the hood. We'll see if this works. We'll give it like another 5 seconds. Um, but it should, I think, automatically do all this stuff for me. Yeah. So, let's see. Yeah. All right. So, something's up. Okay, cool. So, I'll just I'm just going to run through the notebook, which is a collection of different use cases, basically putting into practice a lot of what we just saw. Feel free to jump in any questions, anything like that. We'll start with this notebook. There's a couple of other uh more proper Python programs that we'll walk through afterwards. Uh but really the intent is a rapidfire review of different ways that DSPI has been useful to me and others. So load in the end file. Usually I'll have some type of config object like this where I can very easily use these later on. So if I'm like call like model mixing. So if I have like a super hairy problem or like some workload I know will need the power of a reasoning model like GPD5 or something else like that, I'll define multiple LM. So like one will be 41, one will be five, maybe I'll do a 41 nano um you know Gemini 2.5 flash, stuff like that. And then I can kind of intermingle or intersperse them depending on what I think or what I'm reasonably sure the workload will be. and you'll see how that comes into play in terms of classification and others. Um, I'll pull in a few others here. I'm I'm using open router for this. So, if you have an open router API key, would recommend plug plugging that in. So, now I have three different LLMs I can work with. I have Claude, I have Gemini, I have 41 mini. And then I'll ask basically for each of them who's best between Google Anthropic OpenAI. All of them are hedging a little bit. They say subjective, subjective, undefined. All right, great. It's not very helpful. But because DSPI works on Pyantic, I can define the answer as a literal. So I'm basically forcing it to only give me those three options and then I can go through each of those. And you can see each of them, of course, chooses their own organization. Um, the reason that those came back so fast is that DSP has caching automated under the hood. So as long as nothing has changed in terms of your uh your signature definitions or basically if nothing has changed this is super useful for testing it will just load it from the cache. Um so I ran this before that's why those came back so quickly. U but that's another kind of super useful um piece here. Let's see. Okay. Make sure we're up and running. So, if I change this to hello with a space, you can see we're making a live call. Okay, great. We're still up. So, super simple class sentiment classifier. Obviously, this can be built into something arbitrarily complex. Make this a little bit bigger. Um, but I'm basically I'm giving it the text, the sentiment that you saw before, and I'm adding that additional specification to say, okay, lower uh is more negative, higher is more positive. I'm going to define that as my signature. I'm going to pass this into just a super simple predict object. And then I'm going to say, okay, well, this hotel stinks. Okay, it's probably pretty negative. Now, if I flip that to I'm feeling pretty happy. Whoops. Good thing I'm not in a hotel right now. U you can see I'm feeling pretty happy. Comes down to eight. And this might not seem that impressive and you know it's it's not really but uh the the the important part here is that it just demonstrates the use of the shorthand um signature. So I have I have the string, I have the integer, I pass in the custom instructions which would be in the dock string if I use the class B classbased uh method. The other interesting part or or useful part about DSPI comes with a bunch of usage information built in. So um because it's cached, it's going to be an empty object. But when I change it, you can see that I'm using Azure right now, but for each call, you get this nice breakdown. and I think it's from late LLM, but allows you to very easily track your usage, token usage, etc. for observability and optimization and everything like that. Just nice little tidbits uh that are part of it here and there. Make this smaller. Uh we saw the example before in the slides, but I'm going to pull in that form 4 off of online. I'm going to create this doc objects using attachments. You can see some of the stuff it did under the hood. So, it pulled out um PDF plumber. It created markdown from it. Pulled out the images, etc. Again, I don't have to worry about all that. Attachments make that super easy. I'm going to show you what we're working with here. This case, we have the form four. And then I'm going to do that poor man's rag that I mentioned before. Okay, great. How many shares were were sold in total? It's going to go through that whole chain of thought and bring back the response. That's all well and good, but the power in my mind of DSPI is that you can have these arbitrarily complex data structures. That's fairly obvious because it uses paidantic and everything else, but you can get a little creative with it. So in this case, I'm going to say, okay, a different type of document analyzer signature. I'm just going to give it the document and then I'm just going to defer to the model on defining the structure of what it thinks is most important from the document. So in this case, [clears throat] I'm defining a dictionary object and so it will hopefully return to me a series of key value pairs that describe important information in the document in a structured way. And so you can see here again this is probably cached uh but I passed in I did it all in one line in this case but I'm saying I want to do chain of thought using the document analyzer signature and I'm going to pass in the input field which is just the document here. I'm going to pass in the document that I got before. And you can see here it pulled out bunch of great information in the super structured way. And I didn't have to really think about it. I just kind of deferred all this to the model to DSPI for how to do this. Now, of course, you can do the inverse in saying, okay, I have a very specific business use case. I have something specific in terms of the formatting or the content that I want to get out of the document. I define that as just kind of your typical paid classes. So in this case I want to pull out the if there's multiple transactions the schema itself important information like the filing date going to define the document analyzer schema signature. Uh again super simple input field which is just the document itself which is parsed by attachments gives me the text and the images and then I'm passing in the document schema parameter which has the document schema type which is defined above and this is the this is effectively what you would pass into structured outputs um but just doing it the DS pie where it's going to give you um basically the the output in that specific format. So you can see pulled out things super nicely. Filing date, form date, form type, transactions themselves, and then the ultimate answer. [clears throat] And it's nice because it exposes it in a way that you can use dot notation. So you can just very quickly access the the resulting objects. So looking at adapters, um I'll use another little tidbit from DSPI, which is the inspect history. So for those who want to know what's going on under the hood, inspect history will give you the raw dump of what's actually going on. So you can see here the system message that was uh constructed under the hood was all of this. So you can see input fields are document output fields or reasoning and the schema. It's going to pass these in. And then you can see here the actual document content that was extracted and put into the text and into the prompt uh with some metadata. This is all generated by attachments. And then you get the response which follows this specific format. So you can see the different fields that are here. And it's this kind of relatively arbitrary response um basically format for the for the names which is then parsed by the pie and passed back to you as the user. Um, so I can do okay response.document schema and get the the actual result. To show you what the BAML adapter looks like, we can basically do two different calls. So this is an example from uh my buddy Pashant uh online again. So what we do here is define pyantic model super simple one. Patient address and then patient details. Patient details has the patient address object within it. And then we're going to say we're going to create a super simple DSPI signature to say taking a clinical note which is a string. The patient info is the output type. And then note so I'm going to run this two different ways. The first time with the smart LLM that I mentioned before and just use the the built-in adapter. So I don't specify anything there. And then the second one will be using the BAML adapter which which is defined there. Um so I guess a few things going on here. One is the ability to use Python's uh context which is the the lines starting with with width which allow you to basically break out of what the global LLM um has been defined as and use a specific one just for that call. So you can see in this case I'm using the same LM but if I want to change this to like LM anthropic or something I think that should work. Um, but basically what that's doing is just offloading that call to the other whatever LLM that you're defining [clears throat] for that particular call and something happened. And I'm on a VPN, so let's kill that. Sorry, Alex Partners. Okay. Okay, great. So, we had two separate calls. One was to the smart LLM, which is I think 41. The other one was to Anthropic. Same. Everything else is the exact same. The notes exact same, etc. We got the same exact output. That's great. But what I wanted to show here is the adapters themselves. So in this case, I'm doing inspect history equals 2. So I'm going to get both of the last two calls. And we're going to see how the prompts are going to be different. And so you can see here the first one, this is the built-in JSON schema, this crazy long JSON string. Yeah, LLMs are good enough to to handle that, but um you know, probably not for super complicated ones. Um uh and then you see here for the the second one, it uses the BAML notation, which as we saw in the slides, a little bit easier to comprehend. Um and on super complicated use cases can actually have a measurable u improvement. Multimodal example, same sort of thing as before. I'll pull in the image itself. Let's just see what we're working with. Okay, great. We're looking at these various street signs. And I'm just going to ask it super simple question. It's this time of day. Can I park here now? When when should I leave? And you can see I'm just passing in again the super simple um shorthand for defining a signature which then I get out the the var the boolean in this case and a string of when I can leave. Um so modules themselves it's again fairly simple. You just kind of wrap all this in a class. Good question. >> So does it return reasoning by default always? >> Oh good question. Yeah. So when you do >> can you repeat the question? >> Yes. So for those online the question was does it always return reasoning by default? When you call DSPI.chain chain of thought as part of the module where it's built in. It's adding the reasoning u automatically into your response. So you're not defining that. It's a great question. It's not defined in the signature as you can see up here. Uh but it will add that in and expose that to you um to the extent that you want to retain it for any you know any reason. Uh but that's so if I ju if I changed this to predict you wouldn't get that same response, right? You just you literally just get that part. Um so that's actually a good segue to the modules. Um so module is basically just wrapping all that into some type of replic replicable uh logic. Um and so we're just we're giving it the signature here. We're saying selfpredict. We're in this case is just a demonstration of how it's being used as a class. So I'll just add this module identifier and sort some sort of counter but this can be any type of arbitrary business logic or control flow or any database action or whatever it might be. When this image analyzer class is called this function would run um and then when you actually invoke it this is when it's actually going to run the the core logic. And so you can see I'm just passing in the So I'm instantiating it the analyzer of AIE123 and then I'll call it. Great. It called that and you can see the counter incrementing each time I actually make the call. So super simple example. Um we don't have a ton of time but I'll I'll show you some of the other modules and how that kind of works out. Terms of tool calling fairly straightforward. I'm going to define two different functions perplexity search and get URL content. creating a bioagent module. So this is going to define Gemini 25 as this particular module's um LLM. It's going to create an answer generator object which is a react call. So I'm going to basically do tool calling whenever this is called and then the forward function is literally just calling that answer generator with the parameters that are provided to it. And then I'm creating an async version of that function as well. So I can do that here. I'm going to say okay identify instances where a particular person has been at their company for more than 10 years. It needs to do tool calling to do this to get the most up-to-date information. And so what this is doing and basically looping through um and it's going to call that bio agent which is using the tool calls in the background and it will make a determination as to whether their background is applicable per my criteria. In this case, Satia is true. Brian should be false. Uh but what's interesting here while that's going in it uh similar to the reasoning uh par or the reasoning object that you get back for chain of thought you can get a trajectory back for things like react. So you can see what tools it's calling the arguments that are passed in um and the observations for each of those calls which is nice for debugging and and other obviously other uses. Um I want to get to the other content so I'm going to speed through the rest of this. This is basically an async version of the same thing. So you would run both of them in parallel. Same idea. Um I'm going to skip the JEPA example here just for a second. Um I can show you what the output looks like, but basically what this is doing is creating a data set. It is showing you what's in the data set. It's creating a variety of signatures. In this case, it's going to create a system that categorizes and classifies different basically help messages um that is part of the data set. So, my sync is broken or my light is out or whatever it is. They want to classify whether it's positive, neutral, or negative and the uh the urgency of the actual message. It's going to categorize it and then it's going to pack all this stuff, all those different modules into a single support analyzer module. And then from there, what it's going to do is define a bunch of metrics which is based off of the data set itself. So it's going to say, okay, how do we score the urgency? This is a a very simple one where it's okay, it either matches or it doesn't. Um, and there's other ones where it can be a little bit more subjective and then you can run it. This going to take too long. Probably takes 20 minutes or so. Um but uh what it will do is basically evaluate the performance of the base model and then apply those metrics uh and iteratively come up with new prompts to uh to create that. Now I want to pause here just for a second because there's different types of metrics and in particular for Jeepa it uses feedback from the teacher model in this case. So it can work with the same level of model, but in particular when you're trying to use say a smaller model, um it can actually provide textual feedback. So, it says not only did you get this classification wrong, but it's going to give you some additional um information or feedback as you can see here for why it got it wrong or what the answer should have been, which allows it you you can read the paper, but it basically allows it to um iteratively find that kind of paro frontier of how it should uh tweak the prompt to optimize it based off that feedback. It basically just tightens that iteration loop. Um you can see there's a bunch here. Um and then you can run it and see how it works. [snorts] Um but kind of just to give you a concrete example of how it all comes together. So we took a bunch of those examples from before. We're basically basically going to do a bit of um categorization. So I have things like contracts, I have images, I have different things that one DSPI program can comprehend and do some type of processing with. So this is something that we see fairly regularly in terms of we might run into a client situation where they have just a big dump of of files. They don't really know what's in it. They want to find something of uh they want to maybe find SEC filings and process them a certain way. they want to find contracts and process those a certain way. Maybe there's some images in in there and they want to process those a certain way. Uh [snorts] so this is an example of how you would do that where if I start at the bottom here, this is a regular Python file. Um and it uses DSPI to do all those things I just mentioned. So we're pulling in the configurations, we're setting the regular LM, the small and one we use for an image. As an example, Gemini might Gemini models might be better at image recognition than others. So I might want to defer or use a particular model for a particular workload. So if I detect an image, I will route the request to Gemini. If I detect something else, I'll route it to a 4.1 or whatever it might be. So I'm going to process a single file. And what it does is use our handy attachments um library to put it into a format that we can use. And then I'm going to classify it. And it's not super obvious here, but I'm getting a file type from this classify file uh function call. And then I'm doing some different type of logic depending on what type of file it is. So if it's an SEC filing, I do certain things. If it's a certain type of SEC filing, I do something else. Uh, if [snorts] it's a contract, maybe I'll summarize it. If it's something that looks like city infrastructure, in this case, the image that we saw before, I might do some more visual interpretation of it. Um, so if I dive into classify file super quick, it's running the document classifier. And all that is is basically doing a predict on the image from the file. and um making sure it returns a type. Where is this returns a type which would be document type and so you can see here at the end of the day it's a fairly simple signature and so what we've done is basically take the PDF file in this case take all the images from it and take the first image or first few images in this case a list of images as the input field and I'm saying okay just give me the type what is this and I'm giving it an option of these document types so obviously say this is a fairly simple use case but it's basically saying given these three images the first three pages of a document is it an SEC filing is it a patent filing is the contract city infrastructure pretty different things so the model really shouldn't have an issue with any of those and then we have a catchall bucket for other and then as I mentioned before um depending on the file type that you get back you can process them differently so I'm using the small model to do the same type of form4 extraction that we saw before um and then asserting basically in this case that it is what we think it is. Um a contract in this case we're saying uh let's see I have like 10 more minutes so we can go we'll we'll stop after this uh up to this file but for the particular contract we'll go we'll create this summarizer object. So we'll go through as many pages as there are. We'll do some uh basically recursive summarization of that using a separate DSPI function and then we'll detect some type of boundaries of that document too. So we'll say I want the summaries and I want the boundaries of the document. Um and then we'll print those things out. So let's just see if I can run this. It's going to classify it should as a [clears throat] contract. >> So is you're just relying on the model itself to realize that it's a city infrastructure. >> Yeah. The question was I'm I'm just relying on the model to determine if it's a city infrastructure. Yes. I mean this is more just like a workshop quick and dirty example. It's only because there's one picture of the street signs. Um, and if we look in the data folder, I have a contract, some image that's irrelevant, the form for SEC filing, and then the parking too. Um, they're pretty different. The model should have no problem out of those categories that I gave it to categorize it properly. In some type of production use case, you would want much more stringent or maybe even multiple passes of classification, maybe using different models to do that. Um but yeah, given those options, at least the many times I've run it, had no problem. So in this case, I gave it um one of these contract documents and it ran some additional summarization logic under the hood. So, if I go to that super quick, um you can find all this in the code, but basically what it does is use three separate signatures to basically decompose the contents of the the um the contract and then summarize them up. So, it's basically just iteratively working through each of the chunks of the document to create a summary that you see here at the bottom. And then just for good measure, we're also detecting basically the the boundaries of the document to say, okay, here's out of the 13 pages, you have the main document and then some of the exhibits or the schedules that are a part of it. So, let me just bring it up super quick just to show you what we're working with. This is just some random thing I found online. And you can see so it said the main document was from page 0 to six and the way and so we zero 1 2 3 4 5 six seems reasonable. Now we have the start of schedule one. Schedule one it says it's the next two pages. That looks pretty good. Schedule two is just the one page 9 to9. That looks good. and then schedule three through to the end of the document. And that looks pretty good, too. And so the way we did that under the hood was basically take the PDF, convert it to a list of images and then for each of the images pass those to classifier um and then use that to well let's just look at the code but basically take the list of those classifications give that to another DSPI signature to say given these classifications of the document give me the structure and basically give me a key pair of you name of the section and two integers, a tupole of integers that detect or that uh determine the um you know the boundaries essentially. Um so that's what that part does. Um [clears throat] if we go back so city infrastructure, I'll do this one super quick just because it's pretty interesting on how it uses tool calls. And while this is running, I should use the right one. Hold on. >> [clears throat] >> Yeah, >> good question. The second part like when you generated the list of like my documents from 0 to six, did you have like original document as an input or no? >> No. Uh so let let's just go to that uh that was super quick. So that should be boundary detector. So, there's a blog post on this that I published probably in August or so that goes into a little bit more detail. The code is actually pretty crappy in that one. It's it's going to be better here. Um, but basically what it does is this is probably the main logic. So, for each of the images in the PDF, we're going to call classify page. We're going to gather the results. So it's doing all that asynchronously pulling it all back saying okay all these you know all the different page classifications that there are and then I pass the output of that into a new signature that says given tupil of p I don't even define it here given tupil of page and classification give me this I don't know relatively complicated output of a dictionary of a string tupil integer integer and I give it this set of instructions to say just detect the boundaries. Like this is all very like non-production code obviously, but the point is that you can do these types of things super super quickly. Like I'm not specifying much not giving it much context and it worked like pretty well. Like it it's worked pretty well in most of my testing. Now obviously there is a ton of low hanging fruit in terms of ways to improve that, optimize it, etc. Um, but all this is doing is taking that signature, these instructions, and then I call react. And then all I give it is, uh, the ability to basically self-reflect and call um, get page images. So, it says, okay, I'm going to look at this boundary. Well, let me get the the page images for these three pages to and make sure basically that the boundary is correct. And then it uses that to construct the final answer. And so it's really this is a perfect example of like the tight iteration loop that you can have both in um building it but then the you can kind of take advantage of the model's introspective ability if you will to use function calls against the data itself the data it generated itself etc to kind of keep that loop going. question. >> So under the hood, the the beauty of ESP then is that it enforces kind of structured output on a on a model. >> I mean yes, I think that's probably reductive of of like its full potential, but generally that's that's correct. I mean yes, you can use structured outputs, but you have to do a bunch of crap basically to coordinate like feeding all the feeding that into the rest of the program. maybe you want to call a model differently or use XML here or use a different type of model or whatever it might be um to to do that. So absolutely yeah I'm not saying this is the only way obviously to kind of create these applications or that you shouldn't use Pantic or shouldn't use structured outputs. You absolutely should. Um, it's just a way that once you kind of wrap your head around the the primitives that DSPI gives you, you can start to very quickly build these types of arguably uh I mean these are like prototypes right now, but like if you want to take this to the next level to production scale, you have all the pieces in front of you to be able to do that. >> Um, any other questions? I probably got about five minutes left. Go ahead. Can you talk about your experience using optimization and just >> Yeah. Yeah. So Jeep and actually I'll pull up uh I I ran one right before this. Um this uses a a different algorithm called my row but basically um the optimizers as long as you have well structured data. So for the machine learning folks in the room, which is probably everybody, obviously the quality of your of your data is very important, um you don't need thousands and thousands of examples necessarily, but as long as you have enough, maybe 10 to 100 of inputs and outputs. [clears throat] And if you're constructing your metrics in a way that is relatively intuitive and and that, you know, accurately describes what you're trying to achieve, the improvement can be pretty significant. Um, and so that time entry corrector thing that I mentioned before, uh, you can see the output of here. It's kind of iterating through. It's measuring the output metrics for each of these. And then you can see all the way at the bottom once it goes through all of its optimization stuff. You can see the actual performance on um, the basic versus the optimized model. In this case, went from 86 to 89. And then interestingly, this is still in development, this one in particular, but you can break it down by metric. So you can see where the model's optimizing better, performing better across certain metrics. And this can be really telling as to whether you need to tweak your metric, maybe you need to decompose your metric, maybe there's other areas within your data set, or the the basically the structure of your program that you can improve. Um, but it's a really nice way to understand what's going under the under the hood. And if if you don't care about some of these and the optimizer isn't doing as well on them, maybe you can maybe you can throw them out, too. So, it's it's a very kind of flexible system, flexible way of kind of doing all that. >> Yeah. What's the output of the optimization? Like what do you get out of it and then how do you use that object, whatever it is? >> Yeah. Yeah. So the output of the optimizers is basically just another um it's almost like a compiled object if you will. >> So DSPI allows you to save and load programs as well. So the output of the optimizer is basically just a module that you can then serialize and save off somewhere >> or you can call it later uh as you would any other module >> and it's just manipulating the phrasing of the prompt. So like what is it actually like you know what's its solution space look like? >> Yeah. Yeah. under the hood, it's literally just iterating on the actual prompt itself. Maybe it's adding additional instructions. It's saying, "Well, I keep failing on this particular thing, like not capitalizing the names correctly. I need to add [clears throat] in my upfront criteria in the prompt an instruction to the model to say you must capitalize names properly." And Chris uh who I mentioned before has a really good way of putting this and I'm going to butcher it now, but like the optimizer is basically finding latent requirements that you might not have specified initially up front, but based off of the data, it's kind of like a poor man's deep learning, I guess, but like it's learning from the data. It's learning what it's doing well, what what it's doing not so well, and it's dynamically constructing a prompt that improves the performance based off of your metrics. And is that like LMG guided like is it like about like capitalization? >> Yeah. Yeah. Question being is it all LLM guided? Yes. Particularly for Jeepa it's using LLM to improve LLM's performance. So it's using the LLM to dynamically construct new prompts which are then fed into the system measured and then it kind of iterates. So it's using AI to build AI if you will. >> Thank you. >> Yeah question. Why is the solution object not just the optimized prompt? >> Why is the solution object not what? >> Not just the optimized prompt. Why are you using >> Oh, absolutely is. You can get it under the hood. I mean, you can The question was why don't you just get the optimized prompt? You can absolutely. Um, >> what else is there besides >> the the So, what else is there other than the prompt? The DSPI object itself. So the module the way things um well we can probably look at one if we have time. Um >> if I could see a dump of what gets you know what is the optimized state that would be interesting. >> Yeah. Yeah sure. Let me see if I can find one quick. Um but fundamentally at the end of the day yes you get an optimized prompt a string that you can dump somewhere if you if you want to. Um actually um >> there's a lot of pieces to the signature, right? So it's like how you describe your feels in the doc. >> This is a perfect segue and I'll I'll conclude right after this. I was playing around with something I was well I was playing around this thing called DSPIHub that I kind of created to create a repository of optimized programs. So basically like if you're an expert in whatever you optimize an LLM against this data set or have a great classifier for city infrastructure images or whatever kind of like a hugging face you can download something that has been pre-optimized and then what I have here this is the actual loaded program this would be the output of the optimized process or it it is and then I can call it as I would any anything else and so you You can see here this is the output and I used the optimized program that I downloaded from from this hub. And if we inspect maybe the loaded program, you can see under the hood, it's a predict object with a string signature of time and reasoning. Here is the optimized prompt. Ultimately, this is the output of the optimization process. this long string here. Um, and then the various uh specifications and definitions of the inputs and outputs. >> Have you found specific uses of those? Like to his question like what is it? What can you do with that? >> It's up to your it's up to your use case. So if I if I have a so a document classifier might be a good example. If in my business I come across whatever documents of a certain type, I might optimize a classifier against those and then I can use that somewhere else on a different project or something like that. So out of 100,000 documents, I want to find only the pages that have an invoice on it as an example. Now sure 100% you can use a typical ML classifier to do that. That's great. This is just an example. We can also theoretically train or optimize a model to do that type of classification or some type of generation of text or what have you which then you have the optimized state of which then lives in your data processing pipeline you know and you can use it for other types of purposes or give it to other teams or whatever it might be. So it's just up to your particular use case. um something like this like hub who maybe it's not useful because each individual's use case is so hyper specific I don't really know but um yeah you can do with it kind of whatever you want last question yeah >> is generally you know like using DSP something where people kind of do replays just to optimize their prop or is there a way to sort of do it in real time given delays What I mean by delayed is okay chat GPT gives you your answer and you can thumbs up or thumbs down. You know that thumbs up comes you know 10 minutes later, 30 minutes later, a day later, right? >> So is the question more about like continuous learning like how would you do that here? >> You can be the judge. >> Well, how are you feeding back delayed metrics to optimize it? Why why would it need to be delayed? Because you know usually the feedback is from the user, right? Like delayed. >> Yeah. Well, then >> yeah, that's right. You it basically be added to the data set and then you would use the latest optimize and just keep keep optimizing off of that >> ground truth data set. >> That's right. >> You will collect the outputs of your optimization and feed it back and the loop hits. >> Yeah. But that Why you're trying to do offline optimization, right? >> Yes. >> But I'm I'm asking, can you do this online where with the metric feedback? >> If you're good if you're a good enough engineer, you probably do it. But >> I'm not I'm not recommending replacing ML models with like optimized DSPI programs for particular use cases. Maybe like classification is a terrible example. I recognize that. But for other other are other in theory, yes, you know, you could do something like that. Yes. But for for particular LLM tasks, I'm sure we all have interesting ones. If you have something that is relatively well defined where you have known inputs and outputs, it might be a candidate for something worth optimizing. If nothing else, to transfer it to a smaller model to preserve the level of performance at a lower cost. That's really one of the biggest benefits I see. All right, last last question. I've heard that uh DSPI is can be kind of expensive because you're doing all these LM calls. >> Um so I was curious your experience with that and maybe relatedly like if you have any experience with like large context in your optimization data set ways of shrinking those. >> Yeah. So the question was do can BSI be expensive and then for large context kind of how have you seen that? How have you managed that? The expensive part is totally up to you. If you call a function a million times asynchronously, you're going to generate a lot of cost. I don't think DSPI necessarily maybe it makes it easier to call things, but it's not inherently expensive. It might, to your point, add more content to the prompt. Like, sure, the signature is a string, but the actual text that's sent to the model is much longer than that. That's totally true. I wouldn't say that it's a large cost driver. I mean it again it's ultimately it's more more of a programming paradigm. So you can write your compressed adapter if you want that like you know reduces the amount that's sent to the to uh to the model. Um in terms of large context I it's kind of the same answer I think in terms of if you're worried about that maybe you have some additional logic either in the program itself or in an adapter or part of the module that keeps track of that. Maybe you do some like context compression or something like that. There's some really good talks about that past few days. Obviously, I have a feeling that that will kind of go away at some point where either context windows get bigger or context management is abstracted away somehow. I don't really have an answer just that's more of an intuition. Um, but DSP again kind of gives you the tools, the primitives for you to do that should you choose. Um, and kind of track that state, track that management over time. So, I think that's it. We're going to get kicked out soon. So, thanks so much for your time. Really appreciate it. [music] [music]