Vibe Engineering Effect Apps — Michael Arnaldi, Effectful
Channel: aiDotEngineer
Published at: 2026-05-07
YouTube video id: Wmp2Tku2PrI
Source: https://www.youtube.com/watch?v=Wmp2Tku2PrI
So, welcome everybody. Um, just setting up the context for this workshop. I had a lot of ideas to potentially prepare but at the end I thought we are vibe engineering for this to be authentic it has to be from scratch so I actually prepared absolutely nothing that means we can take any path that we want and let's hope this is real enough first of all I'd like to know from the crowd do you already effect you know zero like what's your level of familiarity with AI tooling and some kind of questions like that lucky enough we're not too many so I hope this can be as interactive as possible maybe let's just start I know Chris >> hello hi >> familiarity with AI and with effect >> sun Okay, good. >> Running V4 in production. >> Running V4 in production. Against advice, by the way. >> Good. >> Good. my hand of this year doing everything and the reason I'm particularly interested with the effect is like it's encouraging so much safety so my agents cannot become small and the reason uh I want effect was we had one API client I trans and now I'm more interested how you make the effect more discovered by the agents >> okay >> I saw the idea about having the loy not convinced with that so I'm curious Good. >> Good. Well, how about you? >> Yeah, but heard much about the effect. >> Sounds good. Sounds good. >> Good. So pretty aogeneous crowd all interested in some sort of how to use agents effectively with effect pun intended. You pointed out at a at a very good thing which is cloning uh giving the repository access to giving the agent access to to the repository. And in reality, this session should just be called just clone the [ __ ] repo and get and be done with it. And really like I' I've also have not been coding by hand since about late this summer. So it's been quite a while. I started programming when I was 12 years old. So it's quite an odd feeling to get to the point where you know you're no longer writing code by hand. And most of what I do is library level coding. So it's pretty low level usually fairly complex type machinery stuff that used to require a very good understanding of the language of how the user interacts with your software and so on and so forth. not diminishing in any way up level development just that the way you treat a language if you have to build a library versus the way you treat the language if you are building an app on top of it is usually very different. Now sometimes in app land you have the same requirements as library land especially when you need to you know generalize abstract over some patterns make them uh repeatable well um remove the verbosity from the repetitions and so on and so forth. So there's some crossing there, but I definitely thought that uh AI would be more useful in upland. And I didn't see much usage at library level land. And I was dead wrong because I'm not writing code by hand. I have not wrote any line of code by hand for a while. And I've done that in Typescript. I've done that in Rust. And the funny element is given I mostly write libraries, I usually interact with code bases that have zero documentation, that have zero best practices available online. And so I couldn't really use the usual let's just add an MCP server to get access to the documentation or hoping that the models have been trained on the documentation enough to be directly useful. And the reality is with LLMs like people treat them like a human brain but they are very different. We learn continuously. This is a learning experience. Once we get out of this room, hopefully you're going to know a little bit more from the starting point on when you come in and your brain will keep going and will internalize more and more patterns over time. Then you go to sleep. Your brain cleans up a little bit of the mess of irrelevant information that you got during the day. And there's this whole process of transforming experience. So the the word that we experience every day into long-term memory with LLMs. This does not happen with LLMs. get a pre-training phase where LM are trained on all the world of existence of existing knowledge. Usually they get trained on the whole internet. Then they get specialized in some tasks and then there's the whole post-raining phase where models are fine-tuned to act on specific things. For example, coding agents are generic models that have been reinforced that they have had passes of reinforcement learning to operate on code bases. The whole post-raining phase of a of a large language model dedicated to coding is letting the model rip through code bases and having evaluations that tells the training phase how is the model performing. Is it doing good? Is it doing bad? Does the code compile after this change? Does the code fail to compile after this change? And so on and so forth. But once that is done, it's done. There's no more knowledge that comes into the model every day. So if you interact with a model today and you tell you tell something to the model and you say, "Hey, I want you to do this in a very specific way. Tomorrow it's not going to remember." So how do you make it remember that is the that is the big question and models you have to think of them like you're chatting with them but the reality is you are basically appending messages onto a fixed size array which is called the context window and context window is limited. Now there are models with a 1 million tokens context window and that's not necessarily a good idea because the context window of the model is what is pushed to the neural network and the neural network is going to try to predict what's coming next. So if you push more information, there's a very good chance you're going to confuse the model, which is why a 1 million context window is not necessarily helpful, especially if you're doing multiple things in the same context. That really means we have to architect around a dump process. We have to architect around some machine that had knowledge of six months ago at best that it's not going to remember everything because even if you have one trillion parameters in the model or even if you have 10 trillion parameters in the model that's not enough to store all the human knowledge. So you're always going to get compressed knowledge and in the best case scenario you have some ability of generalization in the model so that the model can say hey I know AB maybe I can do C because it's similar to A and B and you have some form of this emergent behavior and and capability of reasoning on new problems but models have become very good I've said it by myself I'm not writing code by hand since at a minimum six or eight months So that means even if the machine is dumb, it's already at the point where we can leverage it to do good. But how do we do it? Now if the assumption is the model has outdated knowledge, we need a way for the model to get new knowledge. And we said that those models that we use for coding have been reinforc have went through reinforcement learning to be able to understand your own codebase, make changes in your own codebase and replicate patterns that exist in your own codebase. They haven't really been trained on reading human documentation. They haven't been trained on using an MCP server that they never seen. They've been trained primarily to consume and produce code. So eight months ago I was thinking what if I just give the model access to code that means if I want to use effect I'm going to add the effect repository in my directory. just masquerading the effect code base as my own codebase and maybe I can trick the model into thinking that it's just one big code base and that it would explore it and would progressively use it to build up the the required knowledge and to sort of clone the patterns and there's various ways of doing that. One could argue the model already has access to library code by having it in npm in node modules but coding agents have been trained to focus on your own code not on the code that is on node modules. So if you have it in node modules, the model is the optimized. It's not going to look at it with the same frequency as it look at your own code. If you have it in a gignor directory, the models have been trained not to look at files that are gignored. For example, cursor does not index stuff that is git ignored. So there are all of those sort of random restrictions that we figure out while while developing. And the only way I found the models to be good regardless of the language, regardless of what you use, is if you just clone the [ __ ] repo, which is the point of this work. So this is a completely empty project. I have some ideas of where we could take this. My idea would be to set up um a bun repository, use vest for testing, uh use build up some kind of HTTP server, ideally providing an open API documentation for consumption, build kind of type- safe client to interact with the back end. uh hopefully if we have enough time I'm not sure um tap into the word of workflows and and clustering for persistent operations in the in the back end and really I have nothing set it up. So how do I usually start? Well, I would like you to I start nice with the model but as soon as it derails you're going to see I'm going to start to insult the model. It's fun because it cannot really answer you back. If you don't like the answer, you can just shut it down. It's not like a human that gets offended. Maybe I would like to set up a project using the project should also include setup of Vest and type script check script I'm using GPT 5.4 when I started this journey I was using set 4 there many difference between set 4 and GPT 5.4 four, namely set four felt like a kid with a knife running through the house. That's that's an example that comes from Joffrey Huntley, the the author of the Ralph Loops. But even as a kid running through the house with a knife, it was still enough to do coding. and GP. Now we have models like Opus 4.5, GPT 5.4 that are much much better. But a very interesting element to to think about is open weights models are kind of lagging behind by three to six months compared to frontier models. Which means now we already have models in the open that are smarter compared to set 4 which I already used in library level development. How long would it take for those open weight models to become good enough to be used in our daily operations? I don't know. It's just one thought that lately I have more and more especially because well anthropic is putting arbitrary restrictions on how we use their models. So I don't really want to use entropic models. Open AAI is good for now. Who knows what are what they're going to do in a in a year or two. And I like open source of course. Okay. I don't have a g repo created. Create an empty g repo. And by the way, if you have questions, if you want to interrupt me, this is supposed to be interactive. I'm I'm here to entertain you for another hour and a half. Initialize the repo. Okay, this is done. It's amazing that using GPT 5.4 for with open code would create by default a code MD I think like this is this is from from bun yeah let's trash this >> absolutely has nothing to do with this perfect marketing strategy plus they they've said it wasn't a real fool and two days after they announced MAS as the new model. Okay. I create a source and test directory. Let's see what created. Okay. Types bun bundler mode. No emit. That's fine. strict skip lip check. That's fine. Implicit override. That's good. Yes. Also actually move the files in the proper directory. moving the entry file. Good. Seems smart enough. Runs a basic smoke test. Okay. Okay. So, that's a good starting point. We verified with bun run test, bun run type check. Good. Uh we want to add effect beta. We're going to use effect v4. It's not yet released for production usage except he uses in production already. So if I have any problem, I'm going to ask you. It's fine. >> Yes, it's effect small. >> Small because it used to be small and evolve to become bigger. Still very uh very thin in bundle size. Okay, 1% 14k. It's plenty of context left. Uh want to add effect beta and we want to use effect test to write the tests. I will. I will. That's next. That's next. And speaking of that, I want to try to use the TSGO version of it. Now, I never used this. So what is I haven't used it so I don't know how to use it let's set up as the compiler as the type Check the read me and set it up. Not sure if this is going to work or not. >> Oh, the the actual base compiler. Yes. I I don't know if Matia allowed. >> Yeah, it does. The point is it does not use it and we could just do an alias install. So install typescript as something else but I'm not sure if he did it. Maybe let's follow the normal the normal practice. Let's install TypeScript Go instead of Typescript. Would it be able to do this? Who knows? We will find out. >> We will find out. Is this the package? No, I don't think this is the package. I think they just stole my crypto wallet. Except I do not have one. So check from here. Oh, the npm the npm package name TypeScript Go is only a placeholder security package. So I use the real preview compiler that provides the TSGO binary. Well, that that was probably a good idea. Uh script ts go no emit. Let's see bun exact go. Okay. Okay. Type check. Type check. Okay. Set up VS code to use DSGO VS go LSP we work maybe. Yes, that that I need. There we go. Native preview. That's it. Okay, I did install that. I need to reload the window most likely. Let's go. Okay, maybe it worked. Then let's go here. And yeah, I should be able to do that. But also there is a nice will not be loaded if files are specified. Command line config to skip this error. What I'm going to feed it to the agent in a minute. It's bun that gives issues probably. Maybe not. Yes, it is bun. Then let me stop this. uh select the TS config to configure this one. This other is a package JSON installing dev dependencies. Select all. What is this? That's BS code. That's fine. This needs a lot of work. Do we have the effect pattern installed? Oh gosh. Where is this coming from? Who knows? Okay. Okay. Okay. One. Install. Okay. That's installed. Let's see if it catches anything. Import. from effect 100. Nope, that's a dangling effect. That should be I think I've done it. You mean the prepare one? >> Yeah, I did. Um, maybe I need to reload. Reload after that. Yes, was easy. The Windows solution just restarted. Okay, so we have it. Uh and now now we want to we have some diagnostic severity to suggestion warning and so on and so forth. For AI we would like to turn everything into an error so that the The LLM cannot cannot pass cannot accept code that has any remote resemblance or an error. So this is a project where we will use AI a lot. We want all diagnostics available for to be set to error. I should switch from What is the model doing? Did it update the TS config? It did not. Oh, I'm updating the TS config. Okay. No, no, no. Uh, and that that's another interesting point. Uh, the effect solutions. Uh, there is a website called effect.solutions. It's a really nice website. Uh Kit Langton did this and it's kind of a quick start to use effect in an in an AI project and it does install the language service and strict policy defaults and so on and so forth. But then it uses uh a CLI to give the model access to the effect repo and the model needs to know how to use the CLI. So it's kind of a dog beating its tail. >> Yes. Yes, but >> there are some markdown files, >> but it it doesn't work as well. And if you actually read at some point, it says you should actually just clone the repository. Okay, this this looks exactly what I had in mind. So, we have all the diagnostics set to error. which is good. It's exactly what we want. Load window. Okay. I also want to format on save to true just because it's annoying otherwise. Okay. Very good point. Uh commit current commit current. Now I want to add effects more as a sub tree. Okay, it's committed. Good. Now create a dot repos folder and add as a g sub tree without history squashed uh in repos effect. Who knows if it's going to be able to do it. At least it did. Okay, here. Why is it trying to Okay. Okay, we have it. Let's just check git log. Yep, it did audit. Okay. And now we are at the point where we can start to do our research. For example, we said we want to create an HTTP API. I would clean up this. Open a new session to avoid context pollution. You have access to the effect repository at repos. Actually, let's do something else before we want to set up an agents.mmd. setup and agents.m MD listing the commands available like one run type check and specify that you have access to the effect repository at repos effect and you should use that to extract best practices. this look at how things works etc. Now the agents.mmd now we're going to get an initial prototype as you work in the project you're going to evolve that you're going to add more commands to it you're going to add rules when you spot that some bet patterns are created in code one thing we have not set it up yet is a llinter uh llinter is going to be an essential piece of the back pressure loop that helps the model drive in the right direction. If you want a kind of fully working setup, uh I have a repository of mine that I use for fun which is called accountability. uh in this repository you can find um a lot of things but for example I have an ESLint config with a lot of custom rules and those are like arbitrary for example I don't want the model to do an explicit type assertion on things I want the model to use schema to check for the shape I have rules prohibiting the usage of X as Z. I have rules prohibiting the usage of any of unknown. Basically, I'm trying to avoid the model to do dumb stuff that I realized it was doing in my code. >> No. >> Yeah. The same for unknown. >> And the funny thing is initially I banned unknown because I wanted the model to not do as unknown as X. It found that never is a bottom type. So you can do as never as X. I like okay then I I'm going to ban as and and now it's doing better. Uh okay let's see what it created. Okay, this is short. Use bun. Okay. Available project commands. That's fine. Test watch this is going to create issues. I already know because the model is going to try to run this and get stuck. Same with dev servers. Fact reference repositories. Good. Look at the for specific guidance. Okay, that's enough of a start mention in the agents.m MD that you should never ever try to run commands commands in watch mode. For example, you are not allowed to run or a dev server. Otherwise, it's going to try to run the dev server as the first thing and get stuck. Okay. What I like about OpenAI models is that they are way more concise compared to entropic models. The same task with Opus would have probably wrote 200 lines of agents MD but that's good. It's enough for it's enough as a start. So we are back to square zero. We said we want to create an HTTP API. I know nothing about effect. So I would like to create an HTTP API that should have open API documentation and type save client generated by default. Explore the effect repo for patterns on how to do this. Save your research into patterns http api.md. Ask me any question you need. Again, I'm I'm starting from the perspective that I have no idea how to do this in effect. >> No, I I find plan mode to be Like the issue with plan mode is that the the model has crippled access to tools. So it cannot easily do the same things that it does outside of plan mode. So not I don't make heavy usage of it. I usually do what's called specd driven development in the sense that the first task I do with the model is I discuss with the model how to create a spec for something then the spec is persisted as a markdown file which is effectively my plan and I tell the model then to implement uh that usually the the second step I do in a ral loop because you've seen I already restarted open code a few times to clean up the context window. Doing this manually is boring and you usually end up reusing the same context window for multiple things and it's going to just deoptimize the model at some point because the context window is limited. You're going to push a lot of information in and the earlier information is going to confuse the the model for the later information. So I use a very simple bash script that tells the model pick up a small task implement the small task and then exit and I run that in a loop. It's funny how with with AI many times less is more. You can have very complex architectures around context management and so on and so forth. At the end, the dumbest thing ever ends up working better. And we are doing research by ourselves and it looks like there's actually very good margins of improvement uh by reducing the number of tools that the model has access to. For example, we have been experimenting with a coding agent that has a single tool call which is called execute and it can execute arbitrary TypeScript code including calling Bosch through TypeScript. And in that scenario, the model doesn't even have access to a patch. It cannot change files directly. It has to write a TypeScript file that changes the the code and then it ends up doing TypeScript transformers asbased transformations. It's fantastic how you reduce the things that the model can do and it it does better. So let's see save the research to HTTP API. Good. Main conclusion for this repo the strongest default effect pattern is to define the shared HTTP API. You're absolutely right. Derive open API from it. Mount the docs. Okay. Open API generator only when you need generated client artifacted. We don't know. We don't we don't need that. One question before I implement anything further. Do you want the primary pattern here to be shared HTTP API with HTTP API client make? No, I am fine with a shared HTTP API. I don't need a committed client in the repo itself. Let's see what it did here for this workshop repo. The best part. Okay, this give you relevant upstream files. Good. It look tests. Nice. Okay, this looks like a decent enough. We should probably tell it what we want to do. But this is just generic patterns that we're going to use as reference. So list the files in patterns in the agents MD. So the agent has context of their existence model does not care about grammar. And I feel like many people uh raise the point that a model is not good at something if it if it doesn't do good by default. I don't think there's anything more wrong with that statement. The model is good when it can operate a large scale codebase using patterns and it doesn't fail at scale. The zero to one problem is not really it's a problem for the first 10 days or 10 hours depending on what you're building. And as programmers if our job is not to write code our job should be to set up the repositories in ways that the models can act good on it. So what I'm doing now is like most of what I do when I operate a coding agent at scale in a codebase even if the codebase has no concept of AI like if I start in a project that is brownfield codebase existing from five to 10 years no context set it up the first thing I do is let the model explore the code clone the main libraries that are used if you're using a framework like tanstack or so on and so forth clone the code of tanstack router if you're using zvel clone the code basel ask the model to generate best practice files and so on and so forth once you have all of it the model is going to be much more uh efficient so now that we have a little bit of context on http apis we can start implementing one uh I do want to check something quickly because I'm using bun and I'm using vest uh there's a best run does vest run actually uses bun as the runtime or does it use note because if I recall there was a tag that I had to pass to V test to let it use bun and I don't want our test setup to defer from our uh what is it doing? Uh no add to V test that it should ignore anything in repos. It was running the effect tests that it found. Yeah, there was no V test config whatsoever. Good. add to the test something that uses a bun API. I feel like I did it here. So I should have this one. Okay. Was I using note? probably you should expect it to be defined because now we did one of the SQL mistakes. It had to make the test pass. It changed the test to make it pass. Wow. Okay. Okay, it did it. Let's now begin our HTTP API implementation. So we want to implement an HTTP API following the patterns at pattern st HTTP API. We want the API to um expose a todo functionality where you can one create todos description title description Two, update todos, change title, etc. Three, flag todo as done or not. for list todos. Uh I should have done something else. Discuss the plan with me and create a plans to API MD. So here I'm telling the LLM to read the pattern file that we created before where it's going to gather generic knowledge about the effect ways of doing things. It still has access to the original code base of effect if it wants to. But now I'm creating a specific plan to implement the API that I would like to uh that I would like to implement. drafting the plan. Okay. To shave. That's fine. initial storage strategy. Let's do something different for storage. Use effect SQL and um SQite store. Explore the effect repo for how to do that. and create patterns patterns SQL MD I realize we need a persistent strategy and I don't have a persistent strategy I know that effect has some SQL thing and again I'm using the same process where I first generate some patterns for it. And this is also useful because you may want to use something from effect but you may not want to use everything from effect. So if we were to push all the patterns in your repository by default, you would end up using everything from effect even if you don't want to. This is kind of self select. Uh so you can pick and choose whatever you want to use. Especially in brownfield projects. This is very important because you don't want to refactor everything you already have. For example, here I could have picked diesel to do the persistence just as well. Most likely we're going to develop some kind of CLI where you can prefetch some patterns that are already available and still let you pick and choose. And we also want to automate this kind of process of exploring something, create patterns out of it because the patterns that we have as best practices might not exactly fit your needs. So you would still maybe update them as a as a second step. the model I use may not be as good as the one that for example like the PRs that are you go and you read the code and some of them are just like and the code generates all genra library authors are providing not like skills like software But this kind of like pattern is like effect solutions but officially like distributed by the package collocated. >> I feel like generally it's a good idea but there are some caveats to that for example even the agents.mmd standard is kind of not a standard because the way you prompt cloud and the way you prompt gpt is different. For example, you've noticed I never wrote anything in uppercase. If I were if I if I was using code, I would write a lot of stuff in uppercase. The reason is GPT gets scared if you scream at them at it. I don't even know how to call the model. And uh if you scream at it, it's going to deoptimize and then be passive and like agree on everything. that is not what you want. Uh with code, if you scream at it, it's going to pay attention to that specific sentence. So that comes also in these shared patterns. I feel like the patterns should be almost generated with the model you use versus being off the shelf. Now we can do that for like the top three frontier models. All the GPT family is very similar. 5.3, 5.4, 4 5.2 there are there are not so many differences oppus set and haiku are also very similar so ideally we can have the CLI where where it says which model do you use okay I'm going to optimize the context for this versus the context for that and it's annoying because you would obviously like to have a standard >> I would love to maintain it >> yes Yes, it's very painful to to maintain this stuff. Our approach is to make the code as good and self-explanatory with examples and everything that any model you use can generate those and then the CLI would generate them on the spot >> for the model you use. That's one approach. It may fail and in six months we provide patterns for everything and just tell you please use either one or two. Another very interesting argument is fine-tune an open source model to use effect patterns by default. We thought of that >> kind of okay let's see update v htt no if you want the next step for me to update yes do that this is the annoying part of GPT models They are going to ask constantly for input from you to continue. OPUS would have just done it. >> But sometimes done wrong and you have to like do it three times your session. >> That's why I use GPT 5.4. Well, I'd like some sort of fusion and you know an inbredad fusion of entropic models and open AI models so that it doesn't ask me all the time because GPT usually especially in complex tasks takes its time but at the end the output is good. with Opus is right. Sometimes it it likes to take these shortcuts. And the funny thing is if you let one sleep, it's gonna repeat like if you let one any sleep in your codebase and if you have opus, it's going to do as any all the time. It's like, oh, I can't do this. Let me do that for everything. I need this to compile. Let's remove the code. >> Yes. That's why in in this project and in accountability I was using ous and I have a lint file of thousands of lines of code to prohibit any shortcut. I can start implementing this next. Yes, please feel like we've spent enough time and let's see what it does. See, it's a it it's correctly looking up in the effect repo in the AI docs for uh ideas. This most likely it's going to take a little bit which is positive. >> Kind of In some projects it was using schema by default and I didn't need a lot of uh back pressure for it. Sometimes yes one example is the is the rule in In accountability, I have this yes lint rule SQL custom yes lint rule to ban SQL type because it would write an SQL query. It would write an interface and it would just this is the exam exact same thing as casting and I had to ban this pattern fully and using type parameters with SQL template literally provides no runtime validation. use SQL schema. Find one. And you see that the the rule ends up suggesting to use SQL schema. So I'm more or less just watching what the model produces and if there's something I don't like, I end up writing linked rules to prohibit that specific pattern. For example, in in schema many times it would for example have a user ID as a string and then it would have another ID as a string and you would of course have no type safety whatsoever and the code would try to pass one into the other. So I would force all identifiers to be branded types and I would then prohibit the usage of type casting because otherwise it would do like this requires a user ID let me do as user ID and it's like yeah it's pointless you should validate the data so I would ban uh the usage of as and force them to use u constructors. So instead of doing 100 as user ID user ID domake or prohibit usage of constructors in places where you should do validation for example one case where that that I found was it would do the the API layer as plain strings and then use constructors inside the handler to create the objects defeating the purpose. then I would write rules for the model to write validation directly in the schemas. So that I was basically saying if you use a constructor inside the handler most likely you you're wrong. You should improve the starting schema to provide the validation at the edge. It's kind of babysitting a junior developer with a knife running through the kitchen instead of a kid running through the kitchen with the knife. And this is still going. I both models are exceptional. Sometimes one model drives you nuts and you try the other. There's not much of a rule. Uh lately I tend to use more open AI models because I don't really like to be restricted on the hardness that I can use. uh the CLI itself. I want to use open code. I want to use my own TypeScript files that interact with the AI SDK natively and I'm prohibited from doing that from anthropic. So up until a few months ago when this was allowed I would use mostly oppus when they enforced their policies against uh open code I switch to open AI models and now I'm most of the time just using open AI model models there are some small edge cases for example when you do UI oppus is much better than codex So for there are some specific things where one is clearly better than the other but for most of the tasks they are they are the same. I just had some experience for example where GPT thought for half a day on a on a on a bug that I had and went nowhere and oppus one shot at the solution but I had the opposite experience too. So it's very hard to know uh which one is which. Okay, let's see what what what is this creating? Uh, okay. It created an SQL client. The layer looks correct. Uh, has migrations. It decided to inline the migrations. Okay, that's a valid choice. Okay, it correctly provided the SQL live layer to the migration layers. This feels like duplicated. There is clear duplication between I used for this creating it. It's also available sometimes refactor and it leaves one code in place and it's like never exported in the same catches. >> Okay, good to know. Uh we are in our experimentation. Another thing we're doing is we're using semantic code search >> because we've noticed that a lot of times the model reimplements the same features because it doesn't find it. >> And with semantic code search it finds it. Well, okay. Here's there's a duplication here. probably tell it that there is a duplication at some point. Want to check the API. Exactly. You see it's using plain strings for identifiers. So one of the future things that we might want to uh that we might want to do is to tell it to use branded stuff. Okay. Okay. To do not found, it added a schema notation to flag that to do not found should be a 404. This looks decent. Uh I don't understand why it sometimes creates strrus instead of classes. I personally prefer to use classes. So I would in the future um either create a best practice to prefer classes or depending on how strict I want create a lint rule to prohibit usage of schema.struct in specific files and stuff like that. For now it's obviously it's fine. Doesn't need to. Not sure. There might be, but it's not flagging anything here. So, and the LSP is on lint all the files linked with what? We don't have a llinter in place. Good point. We also do not have um formatter in place. Let's ignore for now. Let's see. Okay. client with a base URL. That's good. We have the live handler. Server index is just exporting everything. The index.ts show probably run the server instead of exporting everything. Do that condition checking that the file is main. So it doesn't run when you import the file. I also created some tests. What? What is doing here? It created an arbitrary with HTTP to run an effect. Make test HTTP live. Okay, it's one way. Do the test actually pass? Come on. Run test. I'd be surprised. Wow. Uh, is there a start command? add a start command to start the API server and tell me where to find the open API docs. Okay, it really likes this pattern. As a future thing, I would probably just tell it to use it layer instead of using the width repository and with thing. But let's see if if at least it works. One run start. Good listing to those. Let's check the open API. Good. There is an open API. created. This looks decent as a first. Choose the schemas properly. Good. Okay. Then let's let me It did create a database here. Let me maybe get ignore the full DB to DB all to DB to do. You're right. Yeah, no longer able to write anything by hand. Yes. Okay, let's actually clean up the tests a little bit. So clean. You see, I'm fooling myself in wanting to use the same session over and over again. That's when Ralph loops are really useful. We created a lot of mess. You created a lot of mess in tests. Clean up everything. This should be the cleanest code you've ever seen. Not like the crappy Python code you've been trained on. Do not use patterns like simply use it layer with layer and put utilities in their own folder. No offense to Python developers, of course. >> Probably. Now, now I'm winging it. I'm going to see if it's able to do it. Uh if it does, once it's done, I'm going to create a pattern from it. But yes, that would have been a good idea. Which is why automating the process is very important because we are lazy. Like now I was so lazy that I didn't want to create a pattern for it. Maybe I'll use test layers. Maybe. Maybe. Oh, the bed pattern. >> It basically created a function to provide a layer to an effect. It built the layer manually. It wrapped everything in effect. sccoped which is going to close the layer once this is done. And my guess is that it did this because this pattern is actually used to test some layered internals in the codebase. But it's completely unnecessary here. But if you look at the file, even without knowing details of effect, it stinks. Something's not right. Now it cleaned it up. So when you see something that isn't it doesn't look right usually just ask the model why you did that is there any alternative and in this case I knew that to provide a layer in test we should just use it layer so I kind of skip that but in reality I would have if I didn't notice I would have discussed with the model that I didn't like to see that repeated thing all over. And sometimes it's necessary. Sometimes you're wrong and the model is right. That's the way to do it. In this case was completely unnecessary. >> No, I think we have it do the we have it.cribe. So you would you would do it layer as a top thing pass the layer like layer whatever then in the closure do it.cribe describe could probably also add an describe as a short models don't care about verbose code. Why should we make it less verbose? >> Does it do any cleanups? >> Yes. >> Yes. Yes, but you can do it per test still. Now it does poison the other tests. >> The other alternative is you provide it layer at every test. The reality is whenever you're using a database, in this case it's SQLite. So the argument is kind of moot. But if I were to use a postcress in a project where you have hundreds of file, hundreds of tests, spinning up a posgress instance per test is going to make your test runtime two days maybe. So usually what I end up doing is I end up making tests that are that can run that do self cleanup. Like for example, I run a test within a transaction and I roll back the transaction as soon as the test finishes so that they are kind of atomic by the fact that they don't leak that it would be another pattern that we can tell the tell the model to to do. It would be a matter of creating the transaction and the roll back. But there's there's alternatives and >> library. >> No, we added the the effect codebase in a repository folder. We created an agents.mmd that references the the effect repo and then for the features we wanted to use we asked the model to create patterns by looking at the repo investigating how things are done in the repo as kind of general knowledge. In this case we did one for SQL we did one for API. Now the good point is in this session we have best practices about testing. So let's create patterns/testing.md. It should include all the best practices of testing effect based code including usage of heat layer etc. I also update I'm going to cue that agents.m MD to reference all the patterns in do patterns and the next thing that you would do to automate the flow is for example open code allows you to create slash commands code allows you to do the same. you optimize for slash new pattern whatever you want and um >> you can create skills and tag the skills uh skills are very useful for these kind of things uh I'm kind of against skills in general not for these things they are ideal but many people think that just by adding a skill you're going to make the model good at React, you're going to make the model good at Next.js. The reality is if you put a skill for every single Nex.js internal, you're going to pollute the context and not get anywhere. So skills have very good use case, which is this kind of use case and I guess they are more general than slash commands. So I tend to do slash commands because I tend to use a single coding agent. But definitely if you are for example in a team where everybody's free to use their own agent maybe some people use cursor some people use open code some people use code skills are a good baseline let's see patterns testing use effect test for all effect based tests use it all effect use it layer Avoid custom wrappers that call layer.build. This is a very specific rule. Now, a friend of mine told me whenever you you you read a rule book, a legal rule book or you find those specific rules that are just like when you enter a pub and it's like don't do skateboarding on top of and you ask yourself, why does this rule exist? Because somebody did that. Why does this rule exist? Because the model did that. Why this pattern? Okay, you see relevant files. They're all linked. >> Yes. And there's a friend of mine who's writing um a llinter plugin that checks for existing references. So, when you add uh when you change code it runs the in the CI and says hey this reference is broken. >> Yes. >> Yes. >> The fullome slash whatever. Yeah. Yeah. How would you write a test for a pattern? >> You mean actually write a file? feel like that could be a way. Sometimes the code that is inside the patterns is not really executable. I guess it has pros and cons. It's definitely an interesting idea. For example, maybe with a with an additional tag like TS execute these to flag which of the patterns you actually want executed or like references which files you want to be referenced because sometimes it mentions files as examples. For example, if you write this feature use a file called ABC and that's not a concrete reference. So you don't want your program to fail because it read that That's more more in the direction of evaluations. So evolves. >> Yes, that's at scale. That's very good. I found doing it on a per project basis ends up >> more. What we are thinking of doing in the effect repo is for example to have evolves running once per day and generating reports. So anytime we do library changes or we add more docs we add more examples we see exactly if the outputs are are better or are worse. Sometimes in evils it's very hard like even anthropic a while ago wrote a blog post where the summary of the blog post is we don't really know when code is good or bad because is is more terrace code better depends is more verbose code better depends there are some properties where you can say this is definitely better than not Like code that type check is better than code that doesn't. Probably true. But when it comes to style, when it comes to like is this file structure better than another file structure and they both convey meaning, you kind of need a human at the end to say, "Yeah, I prefer this." And if you take 100 humans, you're going to have an 8020 split. So we have the same problem now with defining effect patterns because we are running evils and evils are kind of our opinion of what's good and it's not really an absolute truth. Let's putting it let's put it this way. We have humanly written best practice codes. We have generated code and then we have an LLM that matches and says is these two different or not? Give it give us a score. And that's pretty much how you run the EVO. not very not a very nice way to run but we're trying to figure this out because we are thinking of fine-tuning the model on top of effect and for the reinforcement learning part we are going to need to have good evolves. So it's part of what we are researching right now. There's no right or wrong answer. If there was all the models would perform would perform the same because everybody would have the same evils, everybody would have the same thing. But now we have all the patterns for what we want. So I feel like we're at the point of saying commit this. I'm going to create a repository and push it so that at least you have access to it. Gosh, I'm too big. New repository. Is it public? Please choose an owner. Sure. Add more orange and push pushing the final repository. So hopefully so we haven't got to the point of doing clustering and workflows. Just sharing a few words about why you would want those aspects in your code. This is a very dumb to-do API. One thing I wanted to add would be authentication and registration. For example, when you have a registration, your process is usually write something in the database and then send an email or send an email code and wait for confirmation. Anytime you do two unrelated operation, there is no transaction between them, no database transaction between them and your server may fail at any random point within your code. So it's very hard to guarantee that the email has actually been sent which is why many time in a registration procedure you see the sentence if the email did not arrive in 30 minutes please retry you retry for me why should I retry if I haven't received the email that's the that's a symptom of a of a badly designed system that cannot guarantee that two operations happened to do that you have various ways. One way is to implement cues and so on and so forth. The other way is to use something like workflows. You have solutions like temporal ingest. There's many workflow solution. effect has one uh implemented on top of what uh what is called effect cluster where basically you run a cluster of ban node whatever instances and the system itself guarantees that once a procedure starts it's going to finish even if the server crashes it's going to move to a different uh location how I would go about it same way as I did now uh ask the model to explore the the repository extract the best the patterns around how to use effect cluster, how to use effect workflows and u just gone from uh from there. It's very interesting. Uh it's still in the unstable part of effect but it's going to be stable very soon. And uh we think especially with um if you do if you integrate AI in your app it's going to be even more important because with AI every process becomes more long running like LLMs takes minutes to answer. There's a lot of things that can go wrong in a minute. If the average response time is 10 milliseconds server is pretty much never going to fail in that 10 millisecond. If that 10 millisecond becomes a minute, yes, you're pretty sure the server is going to fail in that minute at some point. And usually before the companies that would use workflows where larger scale companies because at scale every edge case happens twice per day, uh with longer response time, even even if you have 10 users, you're pretty much going to have disruption. If your average process takes a minute and you're going to have failure all all over the place which is why for example temporal became much more interesting in the in the past 12 months because everybody's now implementing AI in their own products. So they have chat bots, they have uh any kind of AI AIdriven process and with the fact you get workflows, you get clustering, you have AI integrations, you have discord, slack integrations and so on and so forth. So it's system is really composable and the models are pretty decent on it. We have a working API. I've been speaking for about an hour and a half and I started with zero fat knowledge. It was an empty repository and this is why I wanted to call this workshop just clone the [ __ ] repo. That's pretty much it. Uh if you have any question or anything else, I'm happy to discuss uh with you at a later point. and let's get the next speaker set up. Thank you so much.