DSPy: The End of Prompt Engineering - Kevin Madura, AlixPartners

Channel: aiDotEngineer
Published at: 2026-01-08
YouTube video id: -cKUW6n8hBU
Source: https://www.youtube.com/watch?v=-cKUW6n8hBU
[music]
Thanks everybody for uh for joining. I'm
here to talk to you today about DSPI.
Um, and feel free to jump in with
questions or anything throughout the
talk. It's, you know, I don't sp I don't
plan on spending the full hour and a
half or so. I know it's the last session
of the day. So, um, keep it casual. Feel
free to jump in. I'll start with a
little bit of background. Don't want to
go through too many slides. I'm
technically a consultant, so I have to
do some slides, but we will dive into
the code for the the latter half. And
there's a GitHub repo that you can
download to to follow along and play
around with it on your own.
Um, so how many people here have heard
of DSPI?
Almost everyone. That's awesome. How
many people have actually used it kind
of day-to-day in production or anything
like that? Three. Okay, good. So
hopefully we can convert some more of
you today. Um, so high level DSPI, this
is straight from the website. Um, it's a
declarative framework for how you can
build modular software. And most
important for someone like myself, I'm
not necessarily an engineer that is
writing code all day, every day. As I
mentioned before, I'm a more of a
technical consultant. So, I run across a
variety of different problems. Could be
um an investigation for a law firm. It
could be helping a company understand
how to improve their processes, how to
deploy AI internally. Maybe we need to
look through 10 10,000 contracts to
identify a particular clause um or or
paragraph. And so DSPI has been a really
nice way for me personally and my team
to iterate really really quickly on
building these applications.
Most importantly building programs. It's
not um it's not kind of iterating with
prompts and tweaking things back and
forth. It is building a a proper Python
program and and DSP is a really good way
for you to do that.
So I mentioned before there's a repo
online if you want to download it now
and kind of just get everything set up.
I'll put this on the screen later on.
Um, but if you want to go here, just
kind of download some of the code. It uh
it's been put together over the past
couple days. So, it's not going to be
perfect production level code. It's much
more of utilities and little things here
and there to just come and kind of
demonstrate the usefulness, demonstrate
the point of of what we're talking about
today in that and we'll walk through all
of these these different use cases. So
um sentiment classifier going through a
PDF some multimodal work uh a very very
simple web research agent detecting
boundaries of a PDF document you'll see
how to summarize basically arbitrary
length text and then go into an
optimizer uh with Jeepo
but before we do that just again kind of
level set the biggest thing for me
personally DSP is a really nice way to
decompose your logic into a program that
treats LLMs as a first class citizen. So
at the end of the day, you're
fundamentally just calling a function
that under the hood just happens to be
an LLM and DSPI gives you a really nice
intuitive easy way to do that with some
guarantees about the input and output
types. So of course there are structured
outputs, of course there are other ways
to do this, Pyantic [snorts]
and others. Um, but DSPI has a set of
primitives that when you put it all
together allows you to build a cohesive
modular piece of software that you then
happen to be able to optimize. We'll get
into that uh in a minute.
So, just a few reasons of why I'm such
an advocate. It sit at it sits at this
really nice level of abstraction. So,
it's I I would say it doesn't get in
your way as much as a lang chain. And
that's not a knock-on lang chain. It's
just a different kind of paradigm in the
way that DSPI is is structured. Um, and
allows you to focus on things that
actually matter. So you're not writing
choices zero messages content. You're
not you're not doing string parser.
You're not doing a bunch of stuff under
the hood. You're just declaring your
intent of how you want the program to
operate, what you want your inputs and
outputs to be.
Because of this, it allows you to create
computer programs. As I mentioned
before, not just tweaking strings and
sending them back and forth. You are
building a program first. It just
happens to also use LLMs. And really the
the most kind of important part of this
is that and Omar the KB the uh the
founder of this or the the original
developer of it had this really good
podcast with A16Z. I think it came out
just like two or three days ago. But it
he put it a really nice way. He said
it's a it's built with a systems mindset
and it's really about how you're
encoding or expressing your intent of
what you want to do most importantly in
a way that's transferable. So the the
design of your system, I would imagine,
or your program isn't going to move
necessarily as quickly as maybe the
model capabilities are under the hood.
when we see new releases almost every
single day, different capabilities,
better models and so DSPI allows you to
structure it in a way that retains the
control flow uh retains the intent of
your system, your program um while
allowing you to bounce from model to
model to the extent that you want to or
need to.
Convenience comes for free. There's no
parsing, JSON, things like that. It
again, it sits at a nice level of
abstraction where you can still
understand what's going on under the
hood. If you want to, you can go in and
tweak things, but it allows you to to
kind of focus on just what you want to
do while retaining the level of
precision that you that I think most of
us would like to have in and kind of
building your programs. Um, [snorts]
as mentioned, it's robust to kind of
model and paradigm shifts. So, you can
again keep the logic of your program. Um
but it but keep that those LLMs infused
in uh basically in line. Now that being
said, you know, there are absolutely
other great libraries out there.
Pedantic AI, Langchain, there's many
many others that allow you to do similar
things. Agno is another one. Um this is
just one perspective and um it may not
be perfect for your use case. For me, it
took me a little bit to kind of gro how
DSPI works and you'll see why that is in
a minute. Um, so I would just recommend
kind of have an have an open mind, play
with it. Um, run the code, tweak the
code, do whatever you need to do. Um,
and just see how it might work, might
work for you. And really, this talk is
more about ways that I found it useful.
It's not a dissertation on the ins and
outs of every nook and cranny of DSPI.
It's more of, you know, I've run into
these problems myself now. I naturally
run to DSPI to solve them. And this is
kind of why. And the hope is that you
can extrapolate some of this to your own
use cases. So we we'll go through
everything uh fairly quickly here, but
the core concepts of DSPI really comes
down to arguably five or these six that
you see on the screen here. So we'll go
into each of these in more detail, but
high level signatures
specify what you want the L what
basically what you want your function
call to do. This is when you specify
your inputs, your outputs. Inputs and
outputs can both be typed. Um, and you
defer the rest of the basically the how
the implementation of it to the LLM. And
we'll see how we how that all kind of
comes together uh in a minute. Modules
themselves are ways to logically
structure your program. They're based
off of signatures. So, a module can have
one or more signatures embedded within
it in addition to uh additional logic.
and it's based off of um pietorrch and
some of the in terms of like the
methodology for how it's structured and
you'll you'll see how that uh comes to
be in a minute. Tools we're all familiar
with tools MCP and others and really
tools fundamentally as DSPI looks at
them are just Python functions. So it's
just a way for you to very easily expose
Python functions to the LLM within the
DSP kind of ecosystem if you will. um
adapters
live in between your signature and the
LLM call itself. I mean, as we all know,
prompts are ultimately just strings of
text that are sent to the LLM.
Signatures are a way for you to express
your intent at a at a higher level. And
so, adapters are the things that sit in
between those two. So, it's how you
translate your inputs and outputs into a
format basically explodes out from your
initial signature into a format that is
ultimately the prompt that is sent to
the LLM. And so, you know, there's some
debate or some research on if certain
models perform better with XML as an
example or BAML or JSON or others. And
so adapters give you a nice easy
abstraction to to basically mix and
match those at at will as you want.
Optimizers um are
the most interesting and for whatever
reason the most controversial part of
DSP. That's kind of the first thing that
people think of or at least when they
hear of DSP they think optimizers. We'll
see a quote in a minute. It's not
optimizers first. It is just a nice
added benefit and a nice capability that
DSPI offers in addition to the ability
to structure your program with the
signatures and modules and everything
else. Um, and metrics are used in tandem
with optimizers that that basically
defines how you measure success in your
in uh your DSPI program. So the
optimizers use the metrics to determine
if it's finding the right path if you
will.
So signature as I mentioned before it's
how you express your intent your
declarative intent can be super simple
strings and this is the weirdest part
for me initially but is one of the most
powerful parts uh of it now or it can be
more complicated class-based classbased
objects if you've used pyantic it that's
basically what what it runs on under the
hood.
So this is an example of one of the
class-based signatures. Again, it it's
basically just a pyantic object.
What's super interesting about this is
that
the
the names of the fields themselves act
almost as like mini prompts. It's part
of the prompt itself. And you'll see how
this comes to life in a minute. But
what's ultimately passed to the model
from something like this is it will say
okay your inputs are going to be a
parameter called text and it's based off
of the name of the that particular
parameter in this class. And so these
things are actually passed through. And
so it's it's very important uh to be
able to name your parameters in a way
that is intuitive for the model to be
able to pick it up. Um, and you can add
some additional context or what have you
in the description field here. So most
of this, if not all of this, yes, it is
proper, you know, typed Python code, but
it's also it also serves almost as a
prompt ultimately that feeds into the
model. Um, and that's basically
translated through the use of adapters.
Um, and so just to highlight here like
these, it's the ones that are a little
bit darker and bold, you know, those are
the things that are effectively part of
the prompt. uh that's been sent in and
you'll see kind of how DSPI works with
all this and formats it in a way that
again allow you to just worry about what
you want. Worry about constructing your
signature instead of figuring out how
best to word something in the prompt. Go
>> ahead.
I have a really good prompt.
>> Sure. Then I don't want this thing.
>> That's exactly right.
>> Sure.
>> So the the question for folks online is
what if I already have a great prompt?
I've done all this work. I'm a I'm a
amazing prompt engineer. I don't want my
job to go away or whatever. Um, yes. So,
you can absolutely start with a custom
prompt or something that you have
demonstrated works really well. And
you're exactly right that's that can be
done in the dock string itself. There's
there's some other methods in order uh
for you to inject basically system
instructions or add additional things at
certain parts of the ultimate prompt and
or of course you can just inject it in
the in the final string anyway. I mean
it's just you know a string that is
constructed by VSPI. So um absolutely
this doesn't necessarily prevent you it
does does not prevent you from adding in
some super prompt that you already have.
Absolutely. Um and to your point it is
it can serve as a nice starting point
from which to build the rest of the
system.
Here's a shorthand version of the same
exact thing which to me the first time I
saw this so this was like baffling to
me. Um, but it it that's exactly how it
works is that you're basically again
kind of deferring the implementation or
the logic or what have you to DSPI and
the model to basically figure out what
you want to do. So in this case, if I
want a super super simple text uh
sentiment classifier, this is basically
all you need. You're just saying, okay,
I'm going to give you text as an input.
I want the sentiment as an integer as
the output. Now you probably want to
specify some additional instructions to
say okay your sentiment you know a lower
number means negative you know a higher
number is more positive sentiment etc.
But it just gives you a nice kind of
easy way to to kind of scaffold these
things out in a way that you don't have
to worry about like you know creating
this whole prompt from hand. It's like
okay I just want to see how this works
and then if it works then I can add the
additional instructions then I can
create a module out of it or you know
whatever it might be. It's these
shorthand
or it is this shorthand that makes
experimentation and iteration incredibly
quick.
So modules it's that base abstraction
layer for DSPI programs. There are a
bunch of modules that are built in and
these are a collection of kind of
prompting techniques if you will and you
can always create your own module. So to
the question before, if you have
something that you know works really
well, sure yeah, put it in the module.
That's now the kind of the base
assumption, the base module that others
can build off of. And all of DSPI is
meant to be composable, optimizable, and
when you deconstruct your business logic
or whatever you're trying to achieve by
using these different primitives, it all
it's intended to kind of fit together
and flow together. Um, and we'll get to
optimizers in a minute, but at least for
me and my team's experience, just being
able to logically separate the different
components of a program, but basically
inlining uh LLM calls has been
incredibly powerful for us. And it's
just an added benefit that at the end of
the day, because we're just kind of in
the DSPI paradigm, we happen to also be
able to optimize it at the end of the
day. Uh, so it comes with a bunch of
standard ones built in. I I don't use
some of these bottom ones as much,
although it's they're super interesting.
Um the base one at the top there is just
DSpi.predict.
That's literally just, you know, an LM
call. That's just uh a vanilla call.
chain of thought uh probably isn't isn't
as relevant anymore these days because
models have kind of ironed those out but
um it is a good example of the types of
um kind of prompting techniques that can
be built into some of these modules um
and basically all this does is add um
some some of the uh strings from
literature to say okay let's think step
by step or whatever that might be same
thing for react and codeact react is
basically the way that you expose the
tools to the model. So, it's wrapping
and doing some things under the hood
with um basically taking your signatures
and uh it's injecting the Python
functions that you've given it as tools
and basically React is how you do tool
calling in DSP.
Program with thought is uh is pretty
cool. It kind of forces the model to
think in code and then we'll return the
result. Um, and you can give it a, it
comes with a Python interpreter built
in, but you can give it some custom one,
some type of custom harness if you
wanted to. Um, I haven't played with
that one too too much, but it is super
interesting. If you have like a highly
technical problem or workflow or
something like that where you want the
model to inject reasoning in code at
certain parts of your pipeline, that's
that's kind of an really easy way to do
it. And then some of these other ones
are basically just different
methodologies for comparing outputs or
running things in parallel.
So here's what one looks like. Again,
it's it's fairly simple. It's, you know,
it is a Python class at the end of the
day. Um, and so you do some initial
initialization up top. In this case,
you're seeing the uh
uh the shorthand signature up there. So,
I'm this module uh just to give you some
context is an excerpt from um one of the
the Python um files that's in the repo
is basically taking in a bunch of time
entries and making sure that they adhere
to certain standards, making sure that
things are capitalized properly or that
there are periods at the end of the
sentences or whatever it might be.
that's from a real client use case where
they had hundreds of thousands of time
entries and they needed to make sure
that they all adhere to the same format.
This was one way to to kind of do that
very elegantly, at least in my opinion,
was taking up top you can define the the
signature. It's adding the some
additional instructions that were
defined elsewhere and then saying for
this module the the change tense um call
is going to be just a vanilla predict
call. And then when you actually call
the module, you enter into the forward
function which you can inter basically
intersperse the LLM call which would be
the first one and then do some kind of
hard-coded business logic beneath it.
Uh tools as I mentioned before these are
just vanilla kind of Python functions.
It's the DSP's tool interface. So under
the hood, DSPI uses light LLM. And so
there needs to be some kind of coupling
between the two, but fundamentally um
any type of tool that would that you
would use elsewhere, you can also use in
in DSPI. And this is probably obvious to
most of you, but here's just an example.
You have two functions, get weather,
search web. You include that with a
signature. So in this case, I'm saying
the signature is I'm going to give you a
question. please give me an answer. I'm
not even specifying the types. It's just
going to infer what that means. Uh I'm
giving it the get weather and the search
web tools and I'm saying, okay, do your
thing, but only go five rounds just so
it doesn't spin off and do something
crazy. And then a call here is literally
just calling the React agent that I
created above with the question, what's
the weather like in Tokyo? We'll see an
example of this in the code session, but
basically what this would do is give the
model the prompt, the tools, and let it
do its thing.
So adapters, before I cover this a
little bit, they're basically prompt
formatterers, if you will. So the
description from the docs probably says
it best. It's you know it takes your
signature the inputs other attributes
and it converts them into some type of
message format that you have specified
or that the adapter has specified and so
as an example the JSON adapter taking
say a pyantic object that we defined
before this is the actual prompt that's
sent into the LLM and so you can see the
input fields so this would have been
defined as okay clinical note type
string patient info as a patient details
object object which which would have
been defined elsewhere and then this is
the definition of the patient info. It's
basically a JSON dump of that pantic
object. Go ahead.
>> So this idea there's like a base adapter
default that's good for most cases and
this is if you want to tweak that to do
something more specific.
>> That's right.
>> Yeah. The question was if if there's a
base adapter and would this be an
example of where you want to do
something specific? Answer is yes. So um
it's a guy pashant who is um I have his
Twitter at the end of this presentation
but he's been great. [clears throat] He
did some testing comparing the JSON
adapter with the BAML adapter. Um and
you can see just intuitively even even
for us humans the way that this is
formatted is a little bit more
intuitive. It's probably more token
efficient too just considering like if
you look at the messy JSON that's here
versus the I guess slightly better
formatted BAML that's here. um can
actually improve performance by you know
five to 10 percent depending on your use
case. So it's a good example of how you
can format things differently. The the
rest of the program wouldn't have
changed at all. You just specify the
BAML adapter and it totally changes how
the information is presented under the
hood to the LLM
multimodality. I mean this obviously is
more at the model level but DSPI
supports multiple modalities by default.
So images, audio, some others. Um, and
the same type of thing, you kind of just
feed it in as part of your signature and
then you can get some very nice clean
output. This allows you to work with
them very, very, very easily, very
quickly. And for those uh, eagle-eyed
participants, you can see the first uh,
lineup there is attachments. It's
probably a lesserk known library.
Another guy on Twitter is awesome. Uh,
Maxim, I think it is. uh he created this
library that just is basically a
catch-all for working with different
types of files and converting them into
a format that's super easy to use with
LLMs. Um he's a big DSPI fan as well. So
he made basically an adapter that's
specific to this. But that's all it
takes to pull in images, PDFs, whatever
it might be. You'll see some examples of
that and it just makes at least has made
my life super super easy.
Here's another example of the same sort
of thing. So this is a PDF of a form 4
form. So, you know, public SEC form from
Nvidia.
Um, up top I'm just giving it the link.
I'm saying, okay, attachments, do your
thing. Pull it down, create images,
whatever you're going to do. I don't
need to worry about it. I don't care
about it. This is super simple rag, but
basically, okay, I want to do rag over
this document. I'm going to give you a
question. I'm going to give you the
document and I want the answer. Um, and
you can see how simple that is.
Literally just feeding in the document.
How many shares were sold? Interestingly
here, I'm not sure if it's super easy to
see, but you actually have two
transactions
here. So, it's going to have to do some
math likely under the hood. And you can
see here the thinking and the the
ultimate answer. Go ahead.
>> Is it on the rag step? Is it creating a
vector store of some kind or creating
embeddings and searching over those? Is
there a bunch going on in the background
there or what?
>> This is poor man's rack. I should have
clarified. This is this is literally
just pulling in the document images and
I think attachments will do some basic
OCR under the hood. Um, but it doesn't
do anything other than that. That's it.
All we're feeding in here, the the
actual document object that's being fed
in, yeah, is literally just the text
that's been OCRD. the images, the model
does the rest.
[sighs] All right, so optimizers uh
let's see how we're doing. Okay. Um
optimizers are super powerful, super
interesting concept. It's been some
research um that argues I think that
it's just as performant if not in cert
in certain situations more performant
than fine-tuning would be for certain
models for certain situations. there's
all this research about in context
learning and such. And so whether you
want to go fine-tune and do all of that,
nothing stops you. But I would recommend
at least trying this first to see how
far you can get without having to set up
a bunch of infrastructure and, you know,
go through all of that. See how the
optimizers work. Um, but fundamentally
what it allows you to do is DSPI gives
you the primitives that you need and the
organization you need to be able to
measure and then quantitatively improve
that performance. And I mentioned
transferability before. This the
transferability is enabled arguably
through the use of optimizers because if
you can get okay I want to I have the
classification task works really well
with 41 but maybe it's a little bit
costly because I have to run it a
million times a day. Can I try it with
41 nano? Okay, maybe it's at 70%
whatever it might be. But I run the
optimizer on 41 nano and I can get the
performance back up to maybe 87%. maybe
that's okay for my use case, but I've
now just dropped my cross my cost
profile by multiple orders of magnitude.
And it's the optimizer that allows you
to do that type of model and kind of use
case transferability, if you will. But
really all it does at at the end of the
day under the hood is iteratively prompt
uh iteratively optimize or tweak that
prompt, that string under the hood. And
because you've constructed your program
using the different modules, DSPI kind
of handles all of that for you under the
hood. So if you compose a program with
multiple modules and you're optimizing
against all that, it it by itself DSPI
will optimize the various components in
order to improve the input and output
performance.
And we'll we'll take it from the man
himself, Omar. You know, ESPI is not an
optimizer. I've said this multiple
times. it's it's just a set of
programming abstractions or a way to
program. You just happen to be able to
optimize it. Um so again, the value that
I've gotten and my team has gotten is
mostly because of the programming
abstractions. It's just this incredible
added benefit that you are also able to
to should you choose to to optimize it
afterwards.
And I was listening to this to Dwaresh
and and uh Carpathy the other day and
this kind of I was like prepping for
this talk and this like hit home
perfectly. I was thinking about the
optimizers and someone smarter than me
can can ple you know please correct me
but I think this makes sense because he
he was basically talking about using LLM
as a judge can be a bad thing because
the model being judged can find
adversarial examples and degrade the
performance or basically um create a
situation where the judge is not uh not
scoring something properly. um because
he's saying that the model will find
these little cracks. It'll find these
little spirious things in the nooks and
crannies of the giant model and find a
way to cheat it. Basically saying that
LM as a judge can only go so far until
the other model uh finds those
adversarial examples. If you kind of
invert that and flip that on its head,
it's this property that the optimizers
for DSpir are taking advantage of to
optimize to find the nooks and crannies
in the model, whether it's a bigger
model model or smaller model to improve
the performance against your data set.
So that's what the optimizer is doing is
finding finding these nooks and crannies
in the model to optimize and improve
that performance.
So a typical flow, I'm not going to
spend too much time on this, but fairly
logical constructor program which is
decomposing your logic into the modules.
You use your metrics to define basically
the contours of how the program works
and you optimize all that through um to
to get your your uh your final result.
So, another talk that this guy Chris
Pototts just had maybe two days ago, um,
where he made the point, this is what I
was mentioning before, where Jeepa,
which is, uh, you probably saw some of
the the talks the other day, um, where
the optimizers are on par or exceed the
performance of something like GRPO,
another kind of fine-tuning method. So,
pretty impressive. I think it's an
active area of research. people a lot
smarter than me like Omar and Chris and
others are are leading the way on this.
But uh point being I think prompt op
prompt optimization is a pretty exciting
place to be and if nothing else is worth
exploring.
And [clears throat] then finally metrics
again these are kind of the building
blocks that allow you to define what
success looks like for the optimizer. So
this is what it's using and you can have
many of these and we'll see examples of
this where again at a high level your
program works on inputs it works on
outputs the optimizer is going to use
the metrics to understand okay my last
tweak in the prompt did it improve
performance it did it degrade
performance and the way you define your
metrics uh provides that direct feedback
for the optimizers to work on.
Uh so here's another example, a super
simple one from that time entry example
I mentioned before. Um, so they can be
the metrics can either be like fairly
rigorous in terms of like does this
equal one or or you know some type of
equality check or a little bit more
subjective where using LLM as a judge to
say whatever was this generated um
string does it adhere to these you know
various criteria whatever it might be
but that itself can be a metric
and so all of this is to say it's a very
long-winded way of saying in my opinion
this is probably most if not all of what
you need to construct arbitrarily
complex workflows, data processing
pipelines, business logic, whatever that
might be. Different ways to work with
LLMs. If nothing else, DSPI gives you
the primitives that you need in order to
build these modular composable systems.
So, if you're interested in some people
online, um
there's many many more. There's a
Discord community as well. Um, but
usually these people are are on top of
the latest and greatest and so would
recommend giving them a follow. You
don't need to follow me. I don't really
do much, but uh the others on there are
are really pretty good.
Okay, so the fun part, we'll actually
get into some to some code. So, if you
haven't had a chance, now's your last
chance to get the repo.
U, but I'll just kind of go through a
few different examples here of what we
talked about. Maybe
Yeah. Okay.
Okay. So, I'll set up Phoenix, which is
from Arise, uh, which is basically an
obser an observability platform. Uh, I
just did this today, so I don't know if
it's going to work or not, but we'll
we'll see. We'll give it a shot. Uh, but
basically what this allows you to do is
have a bunch of observability and
tracing for all the calls that are
happening under the hood. We'll see if
this works. We'll give it like another 5
seconds.
Um, but it should, I think,
automatically do all this stuff for me.
Yeah. So, let's see.
Yeah. All right. So, something's up.
Okay, cool. So,
I'll just I'm just going to run through
the notebook, which is a collection of
different use cases, basically putting
into practice a lot of what we just saw.
Feel free to jump in any questions,
anything like that. We'll start with
this notebook. There's a couple of other
uh more proper Python programs that
we'll walk through afterwards. Uh but
really the intent is a rapidfire review
of different ways that DSPI has been
useful to me and others. So
load in the end file. Usually I'll have
some type of config object like this
where I can very easily use these later
on. So if I'm like call like model
mixing. So if I have like a super hairy
problem or like some workload I know
will need the power of a reasoning model
like GPD5 or something else like that,
I'll define multiple LM. So like one
will be 41, one will be five, maybe I'll
do a 41 nano um you know Gemini 2.5
flash, stuff like that. And then I can
kind of intermingle or intersperse them
depending on what I think or what I'm
reasonably sure the workload will be.
and you'll see how that comes into play
in terms of classification and others.
Um, I'll pull in a few others here. I'm
I'm using open router for this. So, if
you have an open router API key, would
recommend plug plugging that in. So, now
I have three different LLMs I can work
with. I have Claude, I have Gemini, I
have 41 mini. And then I'll ask
basically for each of them who's best
between Google Anthropic OpenAI. All of
them are hedging a little bit. They say
subjective, subjective, undefined. All
right, great. It's not very helpful. But
because DSPI works on Pyantic, I can
define the answer as a literal. So I'm
basically forcing it to only give me
those three options and then I can go
through each of those. And you can see
each of them, of course, chooses their
own organization. Um, the reason that
those came back so fast
is that DSP has caching automated under
the hood. So as long as nothing has
changed in terms of your uh your
signature definitions or basically if
nothing has changed this is super useful
for testing it will just load it from
the cache. Um so I ran this before
that's why those came back so quickly. U
but that's another kind of super useful
um piece here. Let's see.
Okay.
Make sure we're up and running. So, if I
change this to hello
with a space,
you can see we're making a live call.
Okay, great. We're still up. So, super
simple class sentiment classifier.
Obviously, this can be built into
something arbitrarily complex. Make this
a little bit bigger. Um, but I'm
basically I'm giving it the text, the
sentiment that you saw before, and I'm
adding that additional specification to
say, okay, lower uh is more negative,
higher is more positive. I'm going to
define that as my signature. I'm going
to pass this into just a super simple
predict object.
And then I'm going to say, okay, well,
this hotel stinks. Okay, it's probably
pretty negative. Now, if I flip that to
I'm feeling pretty happy. Whoops.
Good thing I'm not in a hotel right now.
U you can see I'm feeling pretty happy.
Comes down to eight. And this might not
seem that impressive and you know it's
it's not really but uh the the the
important part here is that it just
demonstrates the use of the shorthand um
signature. So I have I have the string,
I have the integer, I pass in the custom
instructions which would be in the dock
string if I use the class B classbased
uh method. The other interesting part or
or useful part about DSPI comes with a
bunch of usage information built in. So
um because it's cached, it's going to be
an empty object.
But when I change it, you can see that
I'm using Azure right now, but for each
call, you get this nice breakdown. and I
think it's from late LLM, but allows you
to very easily track your usage, token
usage, etc. for observability and
optimization and everything like that.
Just nice little tidbits uh that are
part of it here and there. Make this
smaller.
Uh we saw the example before in the
slides, but I'm going to pull in that
form 4
off of online. I'm going to create this
doc objects using attachments. You can
see some of the stuff it did under the
hood. So, it pulled out um PDF plumber.
It created markdown from it. Pulled out
the images, etc. Again, I don't have to
worry about all that. Attachments make
that super easy. I'm going to show you
what we're working with here. This case,
we have the form four. And then I'm
going to do that poor man's rag that I
mentioned before. Okay, great. How many
shares were were sold in total? It's
going to go through that whole chain of
thought and bring back the response.
That's all well and good, but the power
in my mind of DSPI is that you can have
these arbitrarily complex data
structures. That's fairly obvious
because it uses paidantic and everything
else, but you can get a little creative
with it. So in this case, I'm going to
say, okay, a different type of document
analyzer signature. I'm just going to
give it the document and then I'm just
going to defer to the model on defining
the structure of what it thinks is most
important from the document. So in this
case, [clears throat] I'm defining a
dictionary object and so it will
hopefully return to me a series of key
value pairs that describe important
information in the document in a
structured way. And so you can see here
again this is probably cached uh but I
passed in I did it all in one line in
this case but I'm saying I want to do
chain of thought using the document
analyzer signature and I'm going to pass
in the input field which is just the
document here. I'm going to pass in the
document that I got before. And you can
see here it pulled out bunch of great
information in the super structured way.
And I didn't have to really think about
it. I just kind of deferred all this to
the model to DSPI for how to do this.
Now, of course, you can do the inverse
in saying, okay, I have a very specific
business use case. I have something
specific in terms of the formatting or
the content that I want to get out of
the document. I define that as just kind
of your typical paid classes. So in this
case I want to pull out the if there's
multiple transactions the schema itself
important information like the filing
date
going to define the document analyzer
schema signature. Uh again super simple
input field which is just the document
itself which is parsed by attachments
gives me the text and the images and
then I'm passing in the document schema
parameter which has the document schema
type which is defined above and this is
the this is effectively what you would
pass into structured outputs um but just
doing it the DS pie where it's going to
give you um basically the the output in
that specific format. So you can see
pulled out things super nicely. Filing
date, form date, form type, transactions
themselves, and then the ultimate
answer. [clears throat] And it's nice
because it exposes it in a way that you
can use dot notation. So you can just
very quickly access the the resulting
objects.
So looking at adapters, um I'll use
another little tidbit from DSPI, which
is the inspect history. So for those who
want to know what's going on under the
hood, inspect history will give you the
raw dump of what's actually going on. So
you can see here the system message that
was uh constructed under the hood was
all of this. So you can see input fields
are document output fields or reasoning
and the schema. It's going to pass these
in. And then you can see here the actual
document content that was extracted and
put into the text and into the prompt uh
with some metadata. This is all
generated by attachments. And then you
get the response which follows this
specific format. So you can see the
different fields that are here. And it's
this kind of relatively arbitrary
response um basically format for the for
the names which is then parsed by the
pie and passed back to you as the user.
Um, so I can do okay response.document
schema and get the the actual result.
To show you what the BAML adapter looks
like, we can basically do two different
calls. So this is an example from uh my
buddy Pashant uh online again. So what
we do here is define pyantic model super
simple one. Patient address and then
patient details. Patient details has the
patient address object within it. And
then we're going to say we're going to
create a super simple DSPI signature to
say taking a clinical note which is a
string. The patient info is the output
type. And then note so I'm going to run
this two different ways. The first time
with the smart LLM that I mentioned
before and just use the the built-in
adapter. So I don't specify anything
there. And then the second one will be
using the BAML adapter which which is
defined there. Um so I guess a few
things going on here. One is the ability
to use Python's uh context which is the
the lines starting with with width which
allow you to basically break out of what
the global LLM um has been defined as
and use a specific one just for that
call. So you can see in this case I'm
using the same LM but if I want to
change this to like LM anthropic or
something
I think that should work. Um, but
basically what that's doing is just
offloading that call to the other
whatever LLM that you're defining
[clears throat] for that particular call
and something happened. And I'm on a
VPN, so let's kill that.
Sorry, Alex Partners.
Okay.
Okay, great. So, we had two separate
calls. One was to the smart LLM, which
is I think 41. The other one was to
Anthropic. Same. Everything else is the
exact same. The notes exact same, etc.
We got the same exact output. That's
great. But what I wanted to show here is
the adapters themselves. So in this
case, I'm doing inspect history equals
2. So I'm going to get both of the last
two calls. And we're going to see how
the prompts are going to be different.
And so you can see here the first one,
this is the built-in JSON schema, this
crazy long JSON string. Yeah, LLMs are
good enough to to handle that, but um
you know, probably not for super
complicated ones. Um uh and then you see
here for the the second one, it uses the
BAML notation, which as we saw in the
slides, a little bit easier to
comprehend. Um and on super complicated
use cases can actually have a measurable
u improvement.
Multimodal example, same sort of thing
as before. I'll pull in the image
itself.
Let's just see what we're working with.
Okay, great. We're looking at these
various street signs.
And I'm just going to ask it super
simple question. It's this time of day.
Can I park here now? When when should I
leave? And you can see I'm just passing
in again the super simple um shorthand
for defining a signature which then I
get out the the var the boolean in this
case and a string of when I can leave.
Um
so modules themselves it's again fairly
simple. You just kind of wrap all this
in a class. Good question.
>> So does it return reasoning by default
always?
>> Oh good question. Yeah. So when you do
>> can you repeat the question?
>> Yes. So for those online the question
was does it always return reasoning by
default? When you call DSPI.chain chain
of thought as part of the module where
it's built in. It's adding the reasoning
u automatically into your response. So
you're not defining that. It's a great
question. It's not defined in the
signature as you can see up here. Uh but
it will add that in and expose that to
you um to the extent that you want to
retain it for any you know any reason.
Uh but that's so if I ju if I changed
this to predict
you wouldn't get that same response,
right? You just you literally just get
that part.
Um so that's actually a good segue to
the modules. Um so module is basically
just wrapping all that into some type of
replic replicable uh logic. Um and so
we're just we're giving it the signature
here. We're saying selfpredict.
We're in this case is just a
demonstration of how it's being used as
a class. So I'll just add this module
identifier and sort some sort of counter
but this can be any type of arbitrary
business logic or control flow or any
database action or whatever it might be.
When this image analyzer class is called
this function would run um and then when
you actually invoke it this is when it's
actually going to run the the core
logic. And so you can see I'm just
passing in the So I'm instantiating it
the analyzer of AIE123
and then I'll call it.
Great. It called that and you can see
the counter incrementing each time I
actually make the call. So super simple
example. Um we don't have a ton of time
but I'll I'll show you some of the other
modules and how that kind of works out.
Terms of tool calling fairly
straightforward. I'm going to define two
different functions perplexity search
and get URL content. creating a bioagent
module. So this is going to define
Gemini 25 as this particular module's um
LLM. It's going to create an answer
generator object which is a react call.
So I'm going to basically do tool
calling whenever this is called and then
the forward function is literally just
calling that answer generator with the
parameters that are provided to it. And
then I'm creating an async version of
that function as well.
So I can do that here. I'm going to say
okay identify instances where a
particular person has been at their
company for more than 10 years. It needs
to do tool calling to do this to get the
most up-to-date information. And so what
this is doing and basically looping
through um and it's going to call that
bio agent which is using the tool calls
in the background and it will make a
determination as to whether their
background is applicable per my
criteria. In this case, Satia is true.
Brian should be false. Uh but what's
interesting here while that's going in
it uh similar to the reasoning uh par or
the reasoning object that you get back
for chain of thought you can get a
trajectory back for things like react.
So you can see what tools it's calling
the arguments that are passed in um and
the observations for each of those calls
which is nice for debugging and and
other obviously other uses.
Um I want to get to the other content so
I'm going to speed through the rest of
this. This is basically an async version
of the same thing. So you would run both
of them in parallel. Same idea.
Um I'm going to skip the JEPA example
here just for a second. Um I can show
you what the output looks like, but
basically what this is doing is creating
a data set.
It is showing you what's in the data
set. It's creating a variety of
signatures. In this case, it's going to
create a system that categorizes and
classifies different basically help
messages um that is part of the data
set. So, my sync is broken or my light
is out or whatever it is. They want to
classify whether it's positive, neutral,
or negative and the uh the urgency of
the actual message. It's going to
categorize it and then it's going to
pack all this stuff, all those different
modules into a single support analyzer
module. And then from there, what it's
going to do is define a bunch of metrics
which is based off of the data set
itself. So it's going to say, okay, how
do we score the urgency? This is a a
very simple one where it's okay, it
either matches or it doesn't. Um, and
there's other ones where it can be a
little bit more subjective and then you
can run it. This going to take too long.
Probably takes 20 minutes or so. Um but
uh what it will do is basically evaluate
the performance of the base model and
then apply those metrics uh and
iteratively come up with new prompts to
uh to create that.
Now I want to pause here just for a
second because there's different types
of metrics and in particular for Jeepa
it uses feedback from the teacher model
in this case. So it can work with the
same level of model, but in particular
when you're trying to use say a smaller
model, um it can actually provide
textual feedback. So, it says not only
did you get this classification wrong,
but it's going to give you some
additional um information or feedback as
you can see here for why it got it wrong
or what the answer should have been,
which allows it you you can read the
paper, but it basically allows it to um
iteratively find that kind of paro
frontier of how it should uh tweak the
prompt to optimize it based off that
feedback. It basically just tightens
that iteration loop.
Um you can see there's a bunch here. Um
and then you can run it and see how it
works. [snorts] Um but kind of just to
give you a concrete example of how it
all comes together. So we took a bunch
of those examples from before. We're
basically basically going to do a bit of
um categorization. So I have things like
contracts, I have images, I have
different things that one DSPI program
can comprehend and do some type of
processing with. So this is something
that we see fairly regularly in terms of
we might run into a client situation
where they have just a big dump of of
files. They don't really know what's in
it. They want to find something of uh
they want to maybe find SEC filings and
process them a certain way. they want to
find contracts and process those a
certain way. Maybe there's some images
in in there and they want to process
those a certain way. Uh [snorts] so this
is an example of how you would do that
where if I start at the bottom here,
this is a regular Python file. Um and it
uses DSPI to do all those things I just
mentioned. So we're pulling in the
configurations,
we're setting the regular LM, the small
and one we use for an image. As an
example, Gemini might Gemini models
might be better at image recognition
than others. So I might want to defer or
use a particular model for a particular
workload. So if I detect an image, I
will route the request to Gemini. If I
detect something else, I'll route it to
a 4.1 or whatever it might be.
So I'm going to process a single file.
And what it does is use our handy
attachments
um library to put it into a format that
we can use. And then I'm going to
classify it. And it's not super obvious
here, but I'm getting a file type from
this classify file uh function call. And
then I'm doing some different type of
logic depending on what type of file it
is. So if it's an SEC filing, I do
certain things. If it's a certain type
of SEC filing, I do something else. Uh,
if [snorts] it's a contract, maybe I'll
summarize it. If it's something that
looks like city infrastructure, in this
case, the image that we saw before, I
might do some more visual interpretation
of it. Um, so if I dive into classify
file super quick,
it's running the document classifier.
And all that is is basically doing a
predict on the image from the file. and
um making sure
it returns a type. Where is this
returns a type which would be document
type and so you can see here at the end
of the day it's a fairly simple
signature and so what we've done is
basically take the PDF file in this case
take all the images from it and take the
first image or first few images in this
case a list of images as the input field
and I'm saying okay just give me the
type what is this and I'm giving it an
option of these document types so
obviously say this is a fairly simple
use case but it's basically saying given
these three images the first three pages
of a document is it an SEC filing is it
a patent filing is the contract city
infrastructure pretty different things
so the model really shouldn't have an
issue with any of those and then we have
a catchall bucket for other and then as
I mentioned before um depending on the
file type that you get back you can
process them differently so I'm using
the small model to do the same type of
form4 extraction that we saw before um
and then asserting basically in this
case that it is what we think it is. Um
a contract in this case we're saying uh
let's see I have like 10 more minutes so
we can go we'll we'll stop after this uh
up to this file but for the particular
contract we'll go we'll create this
summarizer object. So we'll go through
as many pages as there are. We'll do
some uh basically recursive
summarization of that using a separate
DSPI function and then we'll detect some
type of boundaries of that document too.
So we'll say I want the summaries and I
want the boundaries of the document. Um
and then we'll print those things out.
So let's just see if I can run this.
It's going to classify it should as a
[clears throat] contract.
>> So is you're just relying on the model
itself to realize that it's a city
infrastructure.
>> Yeah. The question was I'm I'm just
relying on the model to determine if
it's a city infrastructure. Yes. I mean
this is more just like a workshop quick
and dirty example. It's only because
there's one picture of the street signs.
Um, and if we look in the data folder, I
have a contract,
some image that's irrelevant, the form
for SEC filing, and then the parking
too. Um, they're pretty different. The
model should have no problem out of
those categories that I gave it to
categorize it properly. In some type of
production use case, you would want much
more stringent or maybe even multiple
passes of classification, maybe using
different models to do that. Um but
yeah, given those options, at least the
many times I've run it, had no problem.
So in this case, I gave it um one of
these contract documents and it ran some
additional summarization logic under the
hood. So, if I go to that super quick,
um you can find all this in the code,
but basically what it does is use three
separate signatures to basically
decompose the contents of the the um the
contract and then summarize them up. So,
it's basically just iteratively working
through each of the chunks of the
document to create a summary that you
see here at the bottom. And then just
for good measure, we're also detecting
basically the the boundaries of the
document to say, okay, here's out of the
13 pages, you have the main document and
then some of the exhibits or the
schedules that are a part of it. So, let
me just bring it up super quick
just to show you what we're working
with. This is just some random thing I
found online. And you can see so it said
the main document was from page 0 to six
and the way and so we zero 1 2 3 4 5 six
seems reasonable. Now we have the start
of schedule one.
Schedule one it says it's the next two
pages. That looks pretty good. Schedule
two is just the one page 9 to9.
That looks good. and then schedule three
through to the end of the document.
And that looks pretty good, too. And so
the way we did that under the hood was
basically take the PDF, convert it to a
list of images and then for each of the
images pass those to classifier
um and then use that to
well let's just look at the code but
basically take the list of those
classifications
give that to another DSPI signature to
say given these classifications of the
document give me the structure and
basically give me a key pair of you name
of the section and two integers, a
tupole of integers that detect or that
uh determine the um you know the
boundaries essentially. Um so that's
what that part does.
Um [clears throat]
if we go back so city infrastructure,
I'll do this one super quick just
because it's pretty interesting on how
it uses tool calls. And while this is
running,
I should use the right one. Hold on.
>> [clears throat]
>> Yeah,
>> good question. The second part like when
you generated the list of like my
documents from 0 to six, did you have
like original document as an input or
no?
>> No. Uh so let let's just go to that uh
that was super quick. So
that should be boundary detector.
So, there's a blog post on this that I
published probably in August or so that
goes into a little bit more detail. The
code is actually pretty crappy in that
one. It's it's going to be better here.
Um, but basically what it does is
this is probably the main logic. So, for
each of the images in the PDF, we're
going to call classify page.
We're going to gather the results. So
it's doing all that asynchronously
pulling it all back saying okay all
these you know all the different page
classifications that there are and then
I pass the output of that into a new
signature that says given tupil of p I
don't even define it here given tupil of
page and classification
give me this I don't know relatively
complicated output of a dictionary of a
string tupil integer integer and I give
it this set of instructions to say just
detect the boundaries. Like this is all
very like non-production code obviously,
but the point is that you can do these
types of things super super quickly.
Like I'm not specifying much not giving
it much context and it worked like
pretty well. Like it it's worked pretty
well in most of my testing. Now
obviously there is a ton of low hanging
fruit in terms of ways to improve that,
optimize it, etc.
Um, but all this is doing is taking that
signature, these instructions, and then
I call react. And then all I give it is,
uh, the ability to basically
self-reflect and call um, get page
images. So, it says, okay, I'm going to
look at this boundary. Well, let me get
the the page images for these three
pages to and make sure basically that
the boundary is correct. And then it
uses that to construct the final answer.
And so it's really this is a perfect
example of like the tight iteration loop
that you can have both in um building it
but then the you can kind of take
advantage of the model's introspective
ability if you will to use function
calls against the data itself the data
it generated itself etc to kind of keep
that loop going. question.
>> So under the hood, the the beauty of ESP
then is that it enforces kind of
structured output on a on a model.
>> I mean yes, I think that's probably
reductive of of like its full potential,
but generally that's that's correct. I
mean yes, you can use structured
outputs, but you have to do a bunch of
crap basically to coordinate like
feeding all the feeding that into the
rest of the program. maybe you want to
call a model differently or use XML here
or use a different type of model or
whatever it might be
um to to do that. So absolutely yeah I'm
not saying this is the only way
obviously to kind of create these
applications or that you shouldn't use
Pantic or shouldn't use structured
outputs. You absolutely should. Um, it's
just a way that once you kind of wrap
your head around the the primitives that
DSPI gives you, you can start to very
quickly build these types of
arguably uh I mean these are like
prototypes right now, but like if you
want to take this to the next level to
production scale, you have all the
pieces in front of you to be able to do
that.
>> Um,
any other questions? I probably got
about five minutes left. Go ahead. Can
you talk about your experience using
optimization
and just
>> Yeah. Yeah. So Jeep and actually I'll
pull up uh I I ran one right before
this. Um this uses a a different
algorithm called my row but basically um
the optimizers as long as you have well
structured data. So for the machine
learning folks in the room, which is
probably everybody, obviously the
quality of your of your data is very
important,
um you don't need thousands and
thousands of examples necessarily, but
as long as you have enough, maybe 10 to
100 of inputs and outputs.
[clears throat] And if you're
constructing your metrics in a way that
is relatively intuitive and and that,
you know, accurately describes what
you're trying to achieve, the
improvement can be pretty significant.
Um, and so that time entry corrector
thing that I mentioned before, uh, you
can see the output of here. It's kind of
iterating through. It's measuring the
output metrics for each of these. And
then you can see all the way at the
bottom once it goes through all of its
optimization stuff.
You can see the actual performance
on
um, the basic versus the optimized
model. In this case, went from 86 to 89.
And then interestingly, this is still in
development, this one in particular, but
you can break it down by metric. So you
can see where the model's optimizing
better, performing better across certain
metrics. And this can be really telling
as to whether you need to tweak your
metric, maybe you need to decompose your
metric, maybe there's other areas within
your data set, or the the basically the
structure of your program that you can
improve. Um, but it's a really nice way
to understand what's going under the
under the hood. And if if you don't care
about some of these and the optimizer
isn't doing as well on them, maybe you
can maybe you can throw them out, too.
So, it's it's a very kind of flexible
system, flexible way of kind of doing
all that.
>> Yeah. What's the output of the
optimization? Like what do you get out
of it and then how do you use that
object, whatever it is?
>> Yeah. Yeah. So the output of the
optimizers is basically just another um
it's almost like a compiled object if
you will.
>> So DSPI allows you to save and load
programs as well. So the output of the
optimizer is basically just a module
that you can then serialize and save off
somewhere
>> or you can call it later uh as you would
any other module
>> and it's just manipulating the phrasing
of the prompt. So like what is it
actually like you know what's its
solution space look like?
>> Yeah. Yeah. under the hood, it's
literally just iterating on the actual
prompt itself. Maybe it's adding
additional instructions. It's saying,
"Well, I keep failing on this particular
thing, like not capitalizing the names
correctly. I need to add [clears throat]
in my upfront criteria in the prompt an
instruction to the model to say you must
capitalize names properly." And Chris uh
who I mentioned before has a really good
way of putting this and I'm going to
butcher it now, but like the optimizer
is basically finding latent requirements
that you might not have specified
initially up front, but based off of the
data, it's kind of like a poor man's
deep learning, I guess, but like it's
learning from the data. It's learning
what it's doing well, what what it's
doing not so well, and it's dynamically
constructing a prompt that improves the
performance based off of your metrics.
And is that like LMG guided like is it
like about like capitalization?
>> Yeah. Yeah. Question being is it all LLM
guided? Yes. Particularly for Jeepa it's
using LLM to improve LLM's performance.
So it's using the LLM to dynamically
construct new prompts which are then fed
into the system measured and then it
kind of iterates. So it's using AI to
build AI if you will.
>> Thank you.
>> Yeah
question. Why is the solution object not
just the optimized prompt?
>> Why is the solution object not what?
>> Not just the optimized prompt. Why are
you using
>> Oh, absolutely is. You can get it under
the hood. I mean, you can The question
was why don't you just get the optimized
prompt? You can absolutely. Um,
>> what else is there besides
>> the the So, what else is there other
than the prompt? The DSPI object itself.
So the module the way things um well we
can probably look at one if we have
time. Um
>> if I could see a dump of what gets you
know what is the optimized state that
would be interesting.
>> Yeah. Yeah sure. Let me see if I can
find one quick. Um but fundamentally at
the end of the day yes you get an
optimized prompt a string that you can
dump somewhere if you if you want to. Um
actually
um
>> there's a lot of pieces to the
signature, right? So it's like how you
describe your feels in the doc.
>> This is a perfect segue and I'll I'll
conclude right after this. I was playing
around with something I was well I was
playing around this thing called DSPIHub
that I kind of created to create a
repository of optimized programs. So
basically like if you're an expert in
whatever you optimize an LLM against
this data set or have a great classifier
for city infrastructure images or
whatever kind of like a hugging face you
can download something that has been
pre-optimized
and then what I have here this is the
actual loaded program this would be the
output of the optimized process or it it
is and then I can call it as I would any
anything else and so you You can see
here this is the output and I used the
optimized program that I downloaded from
from this hub. And if we inspect maybe
the loaded program,
you can see under the hood, it's a
predict object with a string signature
of time and reasoning. Here is the
optimized prompt. Ultimately,
this is the output of the optimization
process. this long string here.
Um, and then the various uh
specifications and definitions of the
inputs and outputs.
>> Have you found specific uses of those?
Like to his question like what is it?
What can you do with that?
>> It's up to your it's up to your use
case. So if I if I have a so a document
classifier might be a good example. If
in my business I come across whatever
documents of a certain type, I might
optimize a classifier against those and
then I can use that somewhere else on a
different project or something like
that. So out of 100,000 documents, I
want to find only the pages that have an
invoice on it as an example. Now sure
100% you can use a typical ML classifier
to do that. That's great. This is just
an example. We can also theoretically
train or optimize a model to do that
type of classification or some type of
generation of text or what have you
which then you have the optimized state
of which then lives in your data
processing pipeline you know and you can
use it for other types of purposes or
give it to other teams or whatever it
might be. So it's just up to your
particular use case. um something like
this like hub who maybe it's not useful
because each individual's use case is so
hyper specific I don't really know but
um yeah you can do with it kind of
whatever you want last question yeah
>> is generally you know like using DSP
something where people kind of do
replays just to optimize their prop or
is there a way to sort of do it in real
time given delays
What I mean by delayed is okay chat GPT
gives you your answer and you can thumbs
up or thumbs down. You know that thumbs
up comes you know 10 minutes later, 30
minutes later, a day later, right?
>> So is the question more about like
continuous learning like how would you
do that here?
>> You can be the judge.
>> Well, how are you feeding back delayed
metrics to optimize it? Why why would it
need to be delayed? Because you know
usually the feedback is from the user,
right? Like delayed.
>> Yeah. Well, then
>> yeah, that's right. You it basically be
added to the data set and then you would
use the latest optimize and just keep
keep optimizing off of that
>> ground truth data set.
>> That's right.
>> You will collect the outputs of your
optimization and feed it back and the
loop hits.
>> Yeah. But that Why you're trying to do
offline optimization, right?
>> Yes.
>> But I'm I'm asking, can you do this
online where with the metric feedback?
>> If you're good if you're a good enough
engineer, you probably do it. But
>> I'm not I'm not recommending replacing
ML models with like optimized DSPI
programs for particular use cases. Maybe
like classification is a terrible
example. I recognize that. But for other
other are other in theory, yes, you
know, you could do something like that.
Yes.
But for for particular LLM tasks, I'm
sure we all have interesting ones. If
you have something that is relatively
well defined where you have known inputs
and outputs, it might be a candidate for
something worth optimizing. If nothing
else, to transfer it to a smaller model
to preserve the level of performance at
a lower cost. That's really one of the
biggest benefits I see.
All right, last last question.
I've heard that uh DSPI is can be kind
of expensive because you're doing all
these LM calls.
>> Um so I was curious your experience with
that and maybe relatedly like if you
have any experience with like large
context in your optimization data set
ways of shrinking those.
>> Yeah. So the question was do can BSI be
expensive and then for large context
kind of how have you seen that? How have
you managed that? The expensive part is
totally up to you. If you call a
function a million times asynchronously,
you're going to generate a lot of cost.
I don't think DSPI necessarily maybe it
makes it easier to call things, but it's
not inherently expensive. It might, to
your point,
add more content to the prompt. Like,
sure, the signature is a string, but the
actual text that's sent to the model is
much longer than that. That's totally
true. I wouldn't say that it's a large
cost driver. I mean it again it's
ultimately it's more more of a
programming paradigm. So you can write
your compressed adapter if you want that
like you know reduces the amount that's
sent to the to uh to the model. Um in
terms of large context I it's kind of
the same answer I think in terms of if
you're worried about that maybe you have
some additional logic either in the
program itself or in an adapter or part
of the module that keeps track of that.
Maybe you do some like context
compression or something like that.
There's some really good talks about
that past few days. Obviously, I have a
feeling that that will kind of go away
at some point where either context
windows get bigger or context management
is abstracted away somehow. I don't
really have an answer just that's more
of an intuition. Um, but DSP again kind
of gives you the tools, the primitives
for you to do that should you choose.
Um, and kind of track that state, track
that management over time.
So, I think that's it. We're going to
get kicked out soon. So, thanks so much
for your time. Really appreciate it.
[music]
[music]