Vibe Engineering Effect Apps — Michael Arnaldi, Effectful

Channel: aiDotEngineer

Published at: 2026-05-07

YouTube video id: Wmp2Tku2PrI

Source: https://www.youtube.com/watch?v=Wmp2Tku2PrI

So, welcome everybody. Um, just setting
up the context for this workshop.
I had a lot of ideas to potentially
prepare but at the end I thought we are
vibe engineering for this to be
authentic it has to be from scratch so I
actually prepared absolutely nothing
that means we can take any path that we
want
and let's hope this is real enough first
of all I'd like to know from the crowd
do you already effect
you know zero like what's your level of
familiarity with AI tooling and some
kind of questions like that lucky enough
we're not too many so I hope this can be
as interactive as possible maybe let's
just start I know Chris
>> hello hi
>> familiarity with AI and with effect
>> sun Okay, good.
>> Running V4 in production.
>> Running V4 in production. Against
advice, by the way.
>> Good.
>> Good.
my hand of this year doing everything
and the reason I'm particularly
interested with the effect is like it's
encouraging so much safety
so my agents cannot become small
and the reason uh I want effect was we
had one API client
I trans
and now I'm more interested how you make
the effect more discovered by the agents
>> okay
>> I saw the idea about having the loy
not convinced with that so I'm curious
Good.
>> Good. Well, how about you?
>> Yeah,
but heard much about the effect.
>> Sounds good. Sounds good.
>> Good. So pretty aogeneous crowd all
interested in some sort of how to use
agents effectively with effect pun
intended.
You pointed out at a at a very good
thing which is cloning uh giving the
repository access to giving the agent
access to to the repository. And in
reality, this session should just be
called just clone the [ __ ] repo
and get and be done with it. And really
like I' I've also have not been coding
by hand since about late this summer. So
it's been quite a while. I started
programming when I was 12 years old. So
it's quite an odd feeling to get to the
point where you know you're no longer
writing code by hand. And most of what I
do is library level coding. So it's
pretty low level usually fairly complex
type machinery stuff that
used to require
a very good understanding of the
language of how the user interacts with
your software and so on and so forth.
not diminishing in any way up level
development just that the way you treat
a language if you have to build a
library versus the way you treat the
language if you are building an app on
top of it is usually very different. Now
sometimes in app land you have the same
requirements as library land especially
when you need to you know generalize
abstract over some patterns make them uh
repeatable well um remove the verbosity
from the repetitions and so on and so
forth. So there's some crossing there,
but I definitely thought that uh AI
would be more useful in upland. And I
didn't see much usage at library level
land. And I was dead wrong because I'm
not writing code by hand. I have not
wrote any line of code by hand for a
while. And I've done that in Typescript.
I've done that in Rust.
And the funny element is given I mostly
write libraries,
I usually interact with code bases that
have zero documentation,
that have zero best practices available
online.
And so I couldn't really use the usual
let's just add an MCP server to get
access to the documentation or hoping
that the models have been trained on the
documentation enough to be directly
useful.
And the reality is with LLMs like people
treat them like a human brain but they
are very different.
We learn continuously.
This is a learning experience. Once we
get out of this room, hopefully you're
going to know a little bit more
from the starting point on when you come
in and your brain will keep going and
will internalize more and more patterns
over time.
Then you go to sleep. Your brain cleans
up a little bit of the mess of
irrelevant information that you got
during the day. And there's this whole
process of transforming
experience. So the the word that we
experience every day into long-term
memory
with LLMs. This does not happen with
LLMs. get a pre-training phase where LM
are trained on all the world of
existence of existing knowledge. Usually
they get trained on the whole internet.
Then they get specialized in some tasks
and then there's the whole post-raining
phase where models are fine-tuned to act
on specific things. For example, coding
agents are generic models that have been
reinforced that they have had passes of
reinforcement learning to operate on
code bases. The whole post-raining phase
of a of a large language model dedicated
to coding is letting the model rip
through code bases and having
evaluations that tells the training
phase how is the model performing. Is it
doing good? Is it doing bad? Does the
code compile after this change? Does the
code fail to compile after this change?
And so on and so forth. But once that is
done, it's done. There's no more
knowledge that comes into the model
every day. So
if you interact with a model today and
you tell you tell something to the model
and you say, "Hey, I want you to do this
in a very specific way. Tomorrow it's
not going to remember."
So how do you make it remember that is
the that is the big question
and
models
you have to think of them like
you're chatting with them but the
reality is you are basically appending
messages onto a fixed size array which
is called the context window and
context window is limited. Now there are
models with a 1 million tokens context
window and that's not necessarily a good
idea because
the context window of the model is what
is pushed to the neural network and the
neural network is going to try to
predict what's coming next. So if you
push more information, there's a very
good chance you're going to confuse the
model, which is why a 1 million context
window is not necessarily helpful,
especially if you're doing multiple
things in the same context.
That really means
we have to architect around
a dump process. We have to architect
around some machine that had knowledge
of six months ago at best that it's not
going to remember everything because
even if you have
one trillion parameters in the model or
even if you have 10 trillion parameters
in the model that's not enough to store
all the human knowledge. So you're
always going to get compressed knowledge
and in the best case scenario you have
some ability of generalization in the
model so that the model can say hey I
know AB
maybe I can do C because it's similar to
A and B and you have some form
of this emergent behavior and and
capability of reasoning on new problems
but models have become very good I've
said it by myself I'm not writing code
by hand since at a minimum six or eight
months
So that means even if the machine is
dumb, it's already at the point where we
can leverage it to do good. But how do
we do it? Now
if the assumption is the model has
outdated knowledge, we need a way for
the model to get new knowledge.
And we said that those models that we
use for coding have been reinforc have
went through reinforcement learning to
be able to understand your own codebase,
make changes in your own codebase
and replicate patterns that exist in
your own codebase.
They haven't really been trained on
reading human documentation.
They haven't been trained on using an
MCP server that they never seen. They've
been trained primarily to consume and
produce code.
So eight months ago I was thinking what
if
I just give the model access to code
that means if I want to use effect
I'm going to add the effect repository
in my directory.
just masquerading the effect code base
as my own codebase and maybe I can trick
the model into thinking that it's just
one big code base and that it would
explore it and would progressively use
it to build up the the required
knowledge and to sort of clone the
patterns
and there's various ways of doing that.
One could argue the model already has
access to library code by having it in
npm in node modules
but coding agents have been trained to
focus on your own code not on the code
that is on node modules. So if you have
it in node modules, the model is the
optimized.
It's not going to look at it with the
same frequency as it look at your own
code.
If you have it in a gignor
directory, the models have been trained
not to look at files that are gignored.
For example, cursor does not index stuff
that is git ignored.
So there are all of those sort of random
restrictions that we figure out while
while developing.
And the only way I found the models to
be good regardless of the language,
regardless of what you use, is if you
just clone the [ __ ] repo, which is
the point of this work. So this is a
completely empty project. I have some
ideas of where we could take this. My
idea would be to set up um a bun
repository, use vest for testing, uh use
build up some kind of HTTP server,
ideally providing an open API
documentation
for consumption, build kind of type-
safe client to interact with the back
end. uh hopefully if we have enough time
I'm not sure um tap into the word of
workflows and and clustering for
persistent operations in the in the back
end and really I have nothing set it up.
So how do I usually start? Well, I would
like you to
I start nice with the model but as soon
as it derails
you're going to see I'm going to start
to insult the model. It's fun because it
cannot really answer you back. If you
don't like the answer, you can just shut
it down. It's not like a human that gets
offended.
Maybe I would like to set up a project
using
the project should also include
setup of Vest
and
type script
check
script
I'm using GPT 5.4
when I started this journey I was using
set 4 there many difference between set
4 and GPT 5.4 four, namely set four felt
like a kid with a knife running through
the house. That's that's an example that
comes from Joffrey Huntley, the the
author of the Ralph Loops. But even as a
kid running through the house with a
knife, it was still enough to do coding.
and GP. Now we have models like Opus
4.5,
GPT 5.4
that are much much better.
But a very interesting
element to to think about is
open weights models are
kind of lagging behind by three to six
months compared to frontier models.
Which means now we already have models
in the open that are smarter
compared to set 4 which I already used
in library level development.
How long would it take for those open
weight models to become good enough to
be used in our daily operations? I don't
know.
It's just one thought that lately I have
more and more especially because
well anthropic is putting arbitrary
restrictions on how we use their models.
So I don't really want to use entropic
models. Open AAI is good for now. Who
knows what are what they're going to do
in a in a year or two.
And I like open source of course.
Okay. I don't have a g repo created.
Create an empty
g repo. And by the way, if you have
questions, if you want to interrupt me,
this is supposed to be interactive. I'm
I'm here to entertain you for another
hour and a half.
Initialize the repo. Okay, this is done.
It's amazing that using GPT 5.4 for with
open code would create by default a code
MD
I think like this is this is from from
bun yeah let's trash this
>> absolutely has nothing to do with this
perfect marketing strategy
plus they they've said it wasn't a real
fool and two days after they announced
MAS as the new model.
Okay. I create a source
and test directory.
Let's see what created.
Okay. Types bun
bundler mode. No emit.
That's fine.
strict
skip lip check. That's fine. Implicit
override.
That's good.
Yes.
Also
actually move the
files
in the proper
directory.
moving the entry file. Good. Seems smart
enough.
Runs a basic smoke test. Okay.
Okay. So, that's a good starting point.
We verified with bun run test, bun run
type check. Good. Uh we want to add
effect
beta.
We're going to use effect v4. It's not
yet released for production usage except
he uses in production already. So if I
have any problem, I'm going to ask you.
It's fine.
>> Yes, it's effect small.
>> Small because it used to be small and
evolve to become bigger.
Still very uh very thin in bundle size.
Okay, 1% 14k.
It's plenty of context left.
Uh want to add effect beta
and we want to use
effect
test to write the tests.
I will. I will. That's next. That's
next.
And speaking of that,
I want to try to use the TSGO version of
it.
Now, I never used this. So
what is I haven't used it so I don't
know how to use it
let's set up
as the compiler
as the
type
Check
the read me
and set it up. Not sure if this is going
to work or not.
>> Oh, the the actual base compiler.
Yes. I I don't know if Matia
allowed.
>> Yeah, it does. The point is it does not
use it and
we could just do an alias install. So
install typescript as something else but
I'm not sure if
he did it.
Maybe let's follow the normal the normal
practice. Let's install TypeScript
Go instead of
Typescript. Would it be able to do this?
Who knows?
We will find out.
>> We will find out.
Is this the package? No, I don't think
this is the package.
I think they just stole my crypto
wallet. Except I do not have one. So
check from here.
Oh,
the npm the npm package name TypeScript
Go is only a placeholder security
package.
So I use the real preview compiler that
provides the TSGO binary. Well, that
that was probably a good idea.
Uh script ts go no emit.
Let's see
bun
exact
go. Okay.
Okay. Type check.
Type check.
Okay. Set up VS code to use
DSGO
VS
go
LSP
we work maybe.
Yes, that that I need.
There we go. Native preview. That's it.
Okay, I did install that.
I need to reload the window most likely.
Let's go.
Okay, maybe it worked.
Then let's go here.
And
yeah, I should be able to do that.
But also there is a nice
will not be loaded if files are
specified.
Command line
config to skip this error. What
I'm going to feed it to the agent in a
minute.
It's bun that gives issues
probably. Maybe not.
Yes, it is bun.
Then let me stop this.
uh select the TS config to configure
this one. This other is a package JSON
installing dev dependencies.
Select all.
What is this?
That's BS code. That's fine.
This needs a lot of work.
Do we have the effect pattern installed?
Oh gosh.
Where is this coming from? Who knows?
Okay. Okay. Okay.
One.
Install. Okay. That's installed.
Let's see if it catches anything.
Import.
from effect
100.
Nope, that's a dangling effect. That
should be
I think I've done it.
You mean the prepare one?
>> Yeah, I did. Um,
maybe I need to reload.
Reload after that.
Yes, was easy. The Windows solution just
restarted.
Okay, so we have it. Uh and now
now we want to
we have some diagnostic severity to
suggestion warning and so on and so
forth.
For AI we would like to turn everything
into an error so that the
The LLM cannot
cannot pass cannot accept code that has
any remote resemblance or an error. So
this is
a project where we will use AI a lot. We
want all diagnostics
available
for
to be set to error.
I should switch from
What is the model doing?
Did it update the TS config? It did not.
Oh, I'm updating the TS config. Okay.
No, no,
no. Uh, and that that's another
interesting point. Uh, the effect
solutions. Uh, there is a website called
effect.solutions.
It's a really nice website. Uh Kit
Langton did this
and it's kind of a quick start to use
effect in an in an AI project and it
does install the language service and
strict policy defaults and so on and so
forth. But then it uses uh a CLI to give
the model access to the effect repo
and
the model needs to know how to use the
CLI. So it's kind of a dog beating its
tail.
>> Yes.
Yes, but
>> there are some markdown files,
>> but it it doesn't work as well. And if
you actually read at some point, it says
you should actually just clone the
repository.
Okay, this this looks exactly what I had
in mind. So, we have all the diagnostics
set to error.
which is good. It's exactly what we
want.
Load window. Okay.
I also want to
format
on save to true
just because it's annoying otherwise.
Okay. Very good point. Uh commit
current
commit current.
Now I want to add effects more as a sub
tree.
Okay, it's committed. Good.
Now create a dot repos folder and add as
a g sub tree without history
squashed
uh in repos
effect.
Who knows if it's going to be able to do
it.
At least it did.
Okay, here.
Why is it trying to
Okay.
Okay, we have it.
Let's just check git log. Yep, it did
audit.
Okay. And now we are at the point where
we can start to do our research. For
example, we said we want to create an
HTTP API.
I would
clean up this. Open a new session to
avoid context pollution.
You have access
to the effect repository
at repos. Actually, let's do something
else before we want
to set up an agents.mmd.
setup and agents.m MD listing the
commands
available
like one run type check
and specify
that you have access to the effect
repository at repos effect and you
should use that to extract
best practices. this
look at how things works
etc.
Now the agents.mmd now we're going to
get an initial prototype as you work in
the project you're going to evolve that
you're going to add more commands to it
you're going to add rules when you spot
that some bet patterns are created in
code
one thing we have not set it up yet is a
llinter
uh llinter is going to be an essential
piece of the back pressure loop that
helps the model drive in the right
direction.
If you want a kind of fully working
setup, uh I have a repository of mine
that I use for fun which is called
accountability.
uh in this repository
you can find
um a lot of things but for example I
have an ESLint config with a lot of
custom rules
and those are like arbitrary for example
I don't want the model to do an explicit
type assertion on things I want the
model to use schema to check for the
shape
I have rules prohibiting the usage of X
as Z. I have rules prohibiting the usage
of any of unknown.
Basically, I'm trying to avoid the model
to do dumb stuff that I realized it was
doing in my code.
>> No.
>> Yeah. The same for unknown.
>> And the funny thing is initially I
banned unknown because I wanted the
model to not do as unknown as X. It
found that never is a bottom type. So
you can do as never as X. I like okay
then I I'm going to ban as and and now
it's doing better.
Uh
okay let's see what it created. Okay,
this is short.
Use bun. Okay.
Available project commands. That's fine.
Test watch
this is going to create issues. I
already know because the model is going
to try to run this and get stuck. Same
with dev servers.
Fact reference repositories. Good. Look
at the for specific guidance.
Okay, that's enough of a start mention
in the agents.m MD that you should never
ever
try to run commands
commands in watch mode. For example, you
are not allowed to run
or a dev server.
Otherwise, it's going to try to run the
dev server as the first thing and get
stuck.
Okay. What I like about OpenAI models is
that they are way more concise compared
to entropic models. The same task with
Opus would have probably wrote 200 lines
of agents MD
but that's good. It's enough for it's
enough as a start.
So we are back to square zero. We said
we want to create an HTTP API. I know
nothing about effect.
So
I would like to create an HTTP API that
should
have open API documentation
and type save client
generated
by default.
Explore the effect repo for patterns on
how
to
do this.
Save your research
into
patterns
http api.md.
Ask me any question you need.
Again, I'm I'm starting from the
perspective that
I have no idea how to do this in effect.
>> No, I I find plan mode to be
Like the issue with plan mode is that
the the model has crippled access to
tools.
So it cannot easily do the same things
that it does outside of plan mode.
So not I don't make heavy usage of it. I
usually do what's called specd driven
development in the sense that the first
task I do with the model is I discuss
with the model how to create a spec for
something then the spec is persisted as
a markdown file which is effectively my
plan
and I tell the model then to implement
uh that usually the the second step I do
in a ral loop because you've seen I
already restarted open code a few times
to clean up the context window.
Doing this manually is boring and you
usually end up reusing the same context
window for multiple things and it's
going to just deoptimize the model at
some point because the context window is
limited. You're going to push a lot of
information in and the earlier
information is going to confuse the the
model for the later information. So I
use a very simple bash script that
tells the model pick up a small task
implement the small task and then exit
and I run that in a loop.
It's funny how with with AI many times
less is more.
You can have very complex architectures
around context management and so on and
so forth. At the end, the dumbest thing
ever ends up working better.
And we are doing research
by ourselves and it looks like
there's actually very good margins of
improvement
uh by reducing the number of tools that
the model has access to. For example, we
have been experimenting with a coding
agent that has a single tool call which
is called execute and it can execute
arbitrary TypeScript code including
calling Bosch through TypeScript.
And in that scenario, the model doesn't
even have access to a patch. It cannot
change files directly. It has to write a
TypeScript file that changes the the
code and then it ends up doing
TypeScript transformers asbased
transformations.
It's fantastic how you reduce the things
that the model can do and it it does
better. So let's see save the research
to HTTP API. Good. Main conclusion for
this repo the strongest default effect
pattern is to define the shared HTTP
API.
You're absolutely right.
Derive open API from it.
Mount the docs.
Okay.
Open API generator only when you need
generated client artifacted. We don't
know. We don't we don't need that. One
question before I implement anything
further. Do you want the primary pattern
here to be shared HTTP API with HTTP API
client make?
No,
I am fine with a shared
HTTP API.
I don't need a committed
client in the repo itself.
Let's see what it did here
for this workshop repo. The best part.
Okay, this give you relevant upstream
files.
Good.
It look tests.
Nice.
Okay, this looks like a decent
enough. We should probably tell it what
we want
to do.
But this is just generic patterns that
we're going to use as reference.
So list the files in
patterns
in the agents
MD. So the agent has context
of their
existence
model does not care about grammar.
And I feel like
many people uh raise the point that
a model is not good at something if it
if it doesn't do good by default.
I don't think there's anything more
wrong with that statement. The model is
good when it can operate a large scale
codebase using patterns and it doesn't
fail at scale. The zero to one problem
is not really it's a problem for the
first 10 days or 10 hours depending on
what you're building.
And
as programmers
if our job is not to write code
our job should be to set up the
repositories in ways that the models can
act good on it. So what I'm doing now is
like most of what I do when I operate a
coding agent at scale in a codebase even
if the codebase has no concept of AI
like if I start in a project that is
brownfield codebase existing from five
to 10 years no context set it up the
first thing I do is let the model
explore the code clone the main
libraries that are used if you're using
a framework like tanstack or so on and
so forth clone the code of tanstack
router if you're using zvel clone the
code basel
ask the model to generate best practice
files and so on and so forth once you
have all of it the model is going to be
much more uh efficient
so now that we have a little bit of
context on http apis we can start
implementing one uh I do want to check
something quickly because I'm using bun
and I'm using vest
uh there's a best run
does vest run actually uses bun as the
runtime or does it use note
because if I recall there was a tag that
I had to pass to V test to let it use
bun and I don't want our test setup to
defer from our uh
what is it doing? Uh no
add to V test that it should ignore
anything in repos.
It was running the effect tests that it
found.
Yeah, there was no V test config
whatsoever. Good.
add to the test
something that uses a bun API.
I feel like I did it here. So I should
have
this one.
Okay. Was I using note? probably
you should expect it to be defined
because now we did one of the SQL
mistakes. It had to make the test pass.
It changed the test to make it pass.
Wow.
Okay.
Okay, it did it.
Let's now begin our HTTP API
implementation.
So we want to implement an HTTP
API following
the patterns
at
pattern st HTTP API. We want the
API to
um expose
a todo
functionality
where you can
one create
todos
description
title
description
Two, update todos,
change title,
etc.
Three, flag todo as done or not.
for list todos.
Uh I should have done something else.
Discuss the plan with me and create a
plans
to API MD. So here I'm telling the LLM
to
read the pattern file that we created
before where it's going to gather
generic knowledge about the effect ways
of doing things. It still has access to
the original code base of effect if it
wants to. But now I'm creating a
specific plan
to implement the API that I would like
to uh that I would like to implement.
drafting the plan. Okay.
To shave. That's fine.
initial storage strategy.
Let's do something different for
storage. Use effect SQL and um SQite
store.
Explore the effect repo for how to do
that.
and create patterns
patterns
SQL
MD
I realize we need a persistent strategy
and I don't have a persistent strategy
I know that effect has some SQL thing
and again I'm
using the same process where I first
generate some patterns for it.
And this is also useful because
you may want to use something from
effect but you may not want to use
everything from effect. So if we were to
push all the patterns in your repository
by default, you would end up using
everything from effect even if you don't
want to.
This is kind of self select. Uh so you
can pick and choose whatever
you want to use.
Especially in brownfield projects. This
is very important because you don't want
to refactor everything you already have.
For example, here I could have picked
diesel to do the persistence just as
well.
Most likely we're going to develop some
kind of CLI where
you can prefetch some patterns that are
already available
and still let you pick and choose.
And we also want to automate this kind
of process of exploring something,
create patterns out of it because the
patterns that we have as best practices
might not exactly fit your needs. So you
would still maybe update them as a as a
second step.
the model I use may not be as good as
the one that
for example like the PRs that are
you go and you read the code and some of
them are just
like
and the code generates all genra
library authors are providing not like
skills like software
But this kind of like pattern is like
effect solutions but officially like
distributed by the package collocated.
>> I feel like generally it's a good idea
but there are some caveats to that for
example even the agents.mmd standard
is kind of not a standard because the
way you prompt cloud and the way you
prompt gpt is different. For example,
you've noticed I never wrote anything in
uppercase.
If I were if I if I was using code, I
would write a lot of stuff in uppercase.
The reason is GPT gets scared if you
scream at them at it. I don't even know
how to call the model. And uh if you
scream at it, it's going to deoptimize
and then be passive and like agree on
everything.
that is not what you want. Uh with code,
if you scream at it, it's going to pay
attention to that specific sentence.
So that comes also in these shared
patterns. I feel like the patterns
should be almost generated with the
model you use versus being off the
shelf. Now we can do that for like the
top three frontier models. All the GPT
family is very similar. 5.3, 5.4, 4 5.2
there are there are not so many
differences oppus set and haiku
are also very similar so ideally we can
have the CLI where where it says which
model do you use okay I'm going to
optimize the context for this versus the
context for that and it's annoying
because you would obviously like to have
a standard
>> I would love to maintain it
>> yes Yes, it's very painful to to
maintain this stuff.
Our
approach is to make the code
as good and self-explanatory with
examples and everything that any model
you use can generate those and then the
CLI would generate them on the spot
>> for the model you use.
That's one approach. It may fail and in
six months we provide patterns for
everything and just tell you please use
either one or two.
Another very interesting argument is
fine-tune an open source model to use
effect patterns by default.
We thought of that
>> kind of
okay let's see
update
v htt no
if you want the next step for me to
update yes
do that this is the annoying part of GPT
models
They are going to ask constantly for
input from you to continue.
OPUS would have just done it.
>> But sometimes done wrong and you have to
like do it three times your session.
>> That's why I use GPT 5.4. Well, I'd like
some sort of fusion and you know an
inbredad fusion of entropic models and
open AI models so that it doesn't ask me
all the time because GPT usually
especially in complex tasks
takes its time but at the end the output
is good.
with Opus is right. Sometimes it it
likes to take these shortcuts. And the
funny thing is if you let one sleep,
it's gonna repeat like if you let one
any sleep in your codebase and if you
have opus, it's going to do as any all
the time. It's like, oh, I can't do
this. Let me do that for everything. I
need this to compile. Let's remove the
code.
>> Yes. That's why in in this project and
in accountability I was using ous and I
have a lint file of thousands of lines
of code to prohibit any shortcut.
I can start implementing this next. Yes,
please
feel like we've spent enough time
and let's see what it does.
See, it's a it it's correctly looking up
in the effect repo
in the AI docs for uh ideas.
This most likely it's going to take a
little bit
which is positive.
>> Kind of
In some projects it was using schema by
default and I didn't need a lot of uh
back pressure for it.
Sometimes
yes one example is the is the rule in
In accountability, I have
this yes lint rule
SQL
custom yes lint rule to ban SQL type
because it would write an SQL query. It
would write an interface and it would
just this is the exam exact same thing
as casting
and I had to ban this pattern fully
and using type parameters with SQL
template literally provides no runtime
validation. use SQL schema. Find one.
And you see that the the rule
ends up suggesting to use SQL schema.
So I'm more or less just watching what
the model produces and if there's
something I don't like,
I end up writing linked rules to
prohibit that specific pattern.
For example, in in schema
many times it would for example have a
user ID as a string
and then it would have another ID as a
string and you would of course have no
type safety whatsoever and the code
would try to pass one into the other. So
I would force all identifiers to be
branded types and I would then prohibit
the usage of type casting because
otherwise it would do like this requires
a user ID let me do as user ID and it's
like yeah it's pointless you should
validate the data so I would ban uh the
usage of as and force them to use u
constructors. So instead of doing 100 as
user ID user ID domake
or prohibit usage of constructors in
places where you should do validation
for example
one case where that that I found was it
would do the the API layer as plain
strings
and then use constructors inside the
handler to create the objects defeating
the purpose. then I would write rules
for the model to write validation
directly in the schemas. So that I was
basically saying if you use a
constructor inside the handler most
likely you you're wrong. You should
improve the starting schema to provide
the validation at the edge. It's kind of
babysitting a junior developer
with a knife running through the kitchen
instead of a kid running through the
kitchen with the knife.
And this is still going.
I
both models are exceptional.
Sometimes one model drives you nuts and
you try the other.
There's not much of a rule. Uh lately I
tend to use more open AI models because
I don't really like to be restricted on
the hardness that I can use.
uh the
CLI itself. I want to use open code. I
want to use my own
TypeScript files that interact with the
AI SDK natively and I'm prohibited from
doing that from anthropic.
So up until a few months ago when this
was allowed I would use mostly oppus
when they enforced
their policies against uh open code I
switch to open AI models and now I'm
most of the time just using open AI
model models
there are some small edge cases for
example when you do UI oppus is much
better than codex
So for
there are some specific things where one
is clearly better than the other but for
most of the tasks they are they are the
same. I just had some experience for
example where
GPT
thought for half a day on a on a on a
bug that I had and went nowhere and
oppus one shot at the solution but I had
the opposite experience too. So it's
very hard to know uh which one is which.
Okay, let's see what what what is this
creating? Uh, okay. It created an SQL
client.
The layer looks
correct.
Uh, has migrations.
It decided to
inline
the migrations.
Okay, that's a valid choice.
Okay,
it correctly provided
the SQL live layer to the migration
layers.
This feels like duplicated.
There is clear duplication between
I used for this
creating
it. It's also available
sometimes refactor and it leaves one
code in place and it's like never
exported in the same catches.
>> Okay, good to know. Uh
we are in our experimentation. Another
thing we're doing is we're using
semantic code search
>> because we've noticed that a lot of
times the model reimplements the same
features because it doesn't find it.
>> And with semantic code search it finds
it.
Well, okay. Here's there's a duplication
here.
probably tell it that there is a
duplication at some point.
Want to check the API.
Exactly. You see it's using plain
strings for identifiers.
So one of the future things that we
might want to uh that we might want to
do is to tell it to use branded stuff.
Okay. Okay. To do not found, it added a
schema notation
to flag that to do not found should be a
404.
This looks decent.
Uh I don't understand why it sometimes
creates
strrus instead of classes. I personally
prefer to use classes.
So I would in the future um
either create a best practice to prefer
classes or depending on how strict I
want create a lint rule to prohibit
usage of schema.struct in specific files
and stuff like that.
For now it's obviously it's fine.
Doesn't need to.
Not sure. There might be, but it's not
flagging anything here. So, and the LSP
is on
lint all the files
linked with what? We don't have a
llinter in place.
Good point. We also do not have um
formatter in place.
Let's ignore for now.
Let's see.
Okay.
client with a base URL.
That's good.
We have the live handler.
Server
index is just exporting everything.
The index.ts
show probably
run the server instead of
exporting
everything.
Do that
condition
checking that the file is main. So it
doesn't
run when you import the file.
I also created some tests.
What? What is doing here?
It created an arbitrary with HTTP
to run an effect.
Make test HTTP live.
Okay, it's one way.
Do the test actually pass? Come on. Run
test. I'd be surprised. Wow.
Uh, is there a start command?
add a start command to start the API
server and tell me where to find
the open API
docs.
Okay, it really likes this pattern.
As a future thing, I would probably just
tell it to use it layer instead of
using the width repository
and with
thing. But let's see if if at least it
works.
One run start.
Good
listing to those.
Let's check the open API. Good. There is
an open API.
created.
This looks decent as a first.
Choose the schemas properly.
Good.
Okay. Then let's let me
It did create a database here.
Let me maybe get ignore
the full DB
to DB all
to DB
to do. You're right.
Yeah,
no longer able to write anything by
hand.
Yes.
Okay, let's actually clean up the tests
a little bit.
So
clean.
You see, I'm fooling myself in wanting
to use the same session over and over
again.
That's when Ralph loops are really
useful.
We created a lot of mess.
You created a lot of mess in tests.
Clean up everything. This should be the
cleanest
code you've ever
seen. Not like the crappy Python code
you've been trained on.
Do not use patterns like
simply use it layer with layer
and put utilities in
their own folder. No offense to Python
developers, of course.
>> Probably. Now, now I'm winging it. I'm
going to see if it's able to do it. Uh
if it does, once it's done, I'm going to
create a pattern from it. But yes, that
would have been a good idea.
Which is why automating the process is
very important because we are lazy. Like
now I was so lazy that I didn't want to
create
a pattern for it.
Maybe I'll use test layers. Maybe.
Maybe.
Oh, the bed pattern.
>> It basically created a function to
provide a layer
to an effect. It built the layer
manually.
It wrapped everything in effect. sccoped
which is going to close the layer once
this is done.
And my guess is that it did this because
this pattern is actually used to test
some layered internals in the codebase.
But it's completely unnecessary here.
But if you look at the file, even
without knowing
details of effect,
it stinks.
Something's not right. Now it cleaned it
up. So
when you see something that isn't it
doesn't look right usually just ask the
model why you did that is there any
alternative
and in this case I knew that to provide
a layer in test we should just use it
layer so I kind of skip that
but in reality I would have if I didn't
notice I would have discussed with the
model that I didn't like to see that
repeated thing all over.
And sometimes it's necessary. Sometimes
you're wrong and the model is right.
That's the way to do it.
In this case was completely unnecessary.
>> No, I think we have it do the we have
it.cribe.
So you would you would do it layer as a
top
thing
pass the layer like layer whatever
then
in the closure do it.cribe describe
could probably also add an describe as a
short
models don't care about verbose code.
Why should we make it less verbose?
>> Does it do any cleanups?
>> Yes.
>> Yes.
Yes, but you can do it per test still.
Now
it does poison the other tests.
>> The other alternative is you provide it
layer at every test.
The reality is whenever you're using a
database, in this case it's SQLite.
So the argument is kind of moot. But if
I were to use a postcress in a project
where you have hundreds of file,
hundreds of tests, spinning up a
posgress instance per test is going to
make your test runtime
two days maybe. So usually what I end up
doing
is I end up making tests that are that
can run that do self cleanup. Like for
example, I run a test within a
transaction and I roll back the
transaction as soon as the test finishes
so that they are kind of atomic by the
fact that they don't leak that
it would be another pattern that we can
tell the tell the model to to do. It
would be a matter of creating the
transaction and the roll back.
But there's there's alternatives
and
>> library.
>> No, we added the the effect codebase in
a repository folder. We created an
agents.mmd
that references
the the effect repo and then for the
features we wanted to use we asked the
model to create patterns by looking at
the repo investigating how things are
done in the repo
as kind of general knowledge. In this
case we did one for SQL we did one for
API. Now the good point is in this
session we have best practices about
testing. So let's create
patterns/testing.md.
It should include all the best
practices
of testing effect based code including
usage of
heat layer
etc.
I also update I'm going to cue that
agents.m MD to reference
all the patterns
in do
patterns
and the next thing that you would do to
automate the flow is for example
open code allows you to create slash
commands
code allows you to do the same.
you optimize for slash new pattern
whatever you want and um
>> you can create skills and tag the skills
uh skills are very useful for these kind
of things uh
I'm kind of against skills in general
not for these things they are ideal but
many people think that just by adding a
skill you're going to make the model
good at React, you're going to make the
model good at Next.js. The reality is if
you put a skill for every single Nex.js
internal, you're going to pollute the
context and not get anywhere. So skills
have very good use case, which is this
kind of use case and I guess they are
more general than slash commands. So I
tend to do slash commands because I tend
to use a single coding agent. But
definitely if you are for example in a
team where everybody's free to use their
own agent maybe some people use cursor
some people use open code some people
use code skills are a good baseline
let's see patterns testing
use effect test for all effect based
tests
use it all effect use it layer Avoid
custom wrappers
that call layer.build.
This is a very specific rule. Now, a
friend of mine told me whenever you you
you read a rule book, a legal rule book
or you find those specific rules that
are just like when you enter a pub and
it's like don't do skateboarding on top
of and you ask yourself, why does this
rule exist? Because somebody did that.
Why does this rule exist? Because the
model did that.
Why this pattern? Okay, you see relevant
files.
They're all linked.
>> Yes. And
there's a friend of mine who's writing
um a llinter plugin that checks for
existing references.
So, when you add uh
when you change code it runs the in the
CI and says hey this reference is
broken.
>> Yes.
>> Yes.
>> The fullome
slash whatever. Yeah. Yeah.
How would you write a test for a
pattern?
>> You mean actually write a file?
feel like that could be a way. Sometimes
the code that is inside
the patterns is not really executable.
I guess it has pros and cons. It's
definitely an interesting idea. For
example, maybe with a with an additional
tag like TS execute these
to flag which of the patterns you
actually want executed or like
references which files you want to be
referenced because sometimes it mentions
files as examples. For example, if you
write this feature use a file called ABC
and that's not a concrete reference.
So you don't want your program to fail
because it read that
That's more more in the direction of
evaluations. So evolves.
>> Yes, that's at scale. That's very good.
I found doing it on a per project basis
ends up
>> more.
What we are thinking of doing in the
effect repo is for example to have
evolves running once per day
and generating reports. So anytime we do
library changes or we add more docs we
add more examples we see exactly
if the outputs are are better or are
worse. Sometimes in evils it's very hard
like even anthropic a while ago wrote a
blog post
where the summary of the blog post is we
don't really know when code is good or
bad because
is is more terrace code better
depends is more verbose code better
depends
there are some properties where you can
say this is definitely better than not
Like code that type check is better than
code that doesn't. Probably true.
But when it comes to style, when it
comes to
like is this file
structure
better than another file structure and
they both convey meaning, you kind of
need a human at the end to say, "Yeah, I
prefer this." And if you take 100
humans, you're going to have an 8020
split.
So we have the same problem now with
defining effect patterns because we are
running evils and evils are kind of our
opinion of what's good and it's not
really
an absolute truth. Let's putting it
let's put it this way.
We have
humanly written best practice codes.
We have generated code and then we have
an LLM that matches and says is these
two different or not? Give it give us a
score. And that's pretty much how you
run the EVO. not very not a very nice
way to run but we're trying to figure
this out because we are thinking of
fine-tuning the model on top of effect
and for the reinforcement learning part
we are going to need to have good
evolves.
So
it's part of what we are researching
right now.
There's no right or wrong answer.
If there was all the models would
perform would perform the same because
everybody would have the same evils,
everybody would have the same thing. But
now we have all the patterns for what we
want. So I feel like we're at the point
of saying commit this.
I'm going to create a repository and
push it
so that at least you have access to it.
Gosh,
I'm too big.
New repository.
Is it public?
Please choose an owner. Sure.
Add more orange
and push
pushing the final repository. So
hopefully
so we haven't got to the point of doing
clustering and workflows.
Just sharing a few words about why you
would want those aspects in your code.
This is a very dumb to-do API. One thing
I wanted to add
would be authentication and
registration. For example, when you have
a registration, your process is usually
write something in the database and then
send an email
or send an email code and wait for
confirmation.
Anytime you do two unrelated operation,
there is no transaction between them, no
database transaction between them and
your server may fail at any random point
within your code. So it's very hard to
guarantee that the email has actually
been sent which is why many time in a
registration procedure you see the
sentence
if the email did not arrive in 30
minutes please retry
you retry for me why should I retry if I
haven't received the email that's the
that's a symptom of a of a badly
designed system that cannot guarantee
that two operations happened
to do that you have various ways. One
way is to implement cues and so on and
so forth. The other way is to use
something like workflows. You have
solutions like temporal ingest. There's
many workflow solution. effect has one
uh implemented on top of what uh what is
called effect cluster where basically
you run a cluster of ban node whatever
instances and the system itself
guarantees that once a procedure starts
it's going to finish even if the server
crashes it's going to move to a
different uh location how I would go
about it same way as I did now uh ask
the model to explore the the repository
extract the best the patterns around how
to use effect cluster, how to use effect
workflows and u just
gone from uh from there. It's very
interesting. Uh it's still in the
unstable part of effect but it's going
to be stable very soon. And uh we think
especially with um if you do
if you integrate AI in your app
it's going to be even more important
because with AI every process becomes
more long running like LLMs takes
minutes to answer. There's a lot of
things that can go wrong in a minute. If
the average response time is 10
milliseconds
server is pretty much never going to
fail in that 10 millisecond. If that 10
millisecond becomes a minute, yes,
you're pretty sure the server is going
to fail in that minute at some point.
And usually
before the companies that would use
workflows where larger scale companies
because at scale every edge case happens
twice per day, uh with longer response
time, even even if you have 10 users,
you're pretty much going to have
disruption.
If your average process takes a minute
and you're going to have failure all all
over the place which is why for example
temporal became much more interesting in
the in the past 12 months because
everybody's now implementing AI in their
own products. So they have chat bots,
they have uh any kind of AI AIdriven
process and with the fact you get
workflows, you get clustering, you have
AI integrations,
you have discord, slack integrations and
so on and so forth. So it's system is
really composable and
the models are pretty decent on it. We
have a working API.
I've been
speaking for about an hour and a half
and
I started with zero fat knowledge. It
was an empty repository
and this is why I wanted to call this
workshop just clone the [ __ ] repo.
That's pretty much it. Uh if you have
any question or anything else, I'm happy
to discuss uh with you at a later point.
and let's get the next speaker set up.
Thank you so much.