The Unbearable Lightness of Agent Optimization — Alberto Romero, Jointly
Channel: aiDotEngineer
Published at: 2025-11-24
YouTube video id: zfvEMNmVlNY
Source: https://www.youtube.com/watch?v=zfvEMNmVlNY
Right. Hello everyone. Uh today I will present meta adaptive context engineering or meta AC for short which is a new framework designed to optimize AI agents beyond single dimension approaches. We will explore how orchestrating multiple adaptation strategies can overcome the limitations of existing context engineering methods. Now a little introduction about myself. Uh so I'm Alberto Romero. I'm the co-founder and CEO at jointly. And for context at jointly we build the main specialized agents for regulated industries where policy adherance constraints are particularly strict. Most of our research work is in the area of selfoptimizing agent architectures uh using systematic approaches. Now about myself, I have spent uh 20 plus years at the intersection of AI and data. Uh some of my recent experience includes being the CTO and co-founder of human AI uh think MLbased risk prediction for mobility which was acquired by AON in 2023 and in my previous role I headed up city bank's genai engineering team. Now here's our agenda for today. Um we'll begin with the motivation and problems that current systems face. Then we'll review the AC framework and its limitations. Um after surveying recent research uh insights, we'll introduce the meta AC approach. We'll discuss its architecture and strategy toolbox, show some results um and finish with future directions on challenges. Now the agentic context engineering framework or AC for short uh for which you've got the paper link uh on the slide there. So it's it's very popular framework um and the paper um came out a few months ago. Um basically organizes a patient into three roles. First of all there's a generator that produces reasoning paths. Then there's a reflector that extracts lessons. And finally, there is a curator that synthesizes these lessons into incremental updates. AC uses incremental delta updates and a grow and refine mechanism to prevent context collapse and maintain relevance. Now, most importantly, it can improve without label data by learning directly from execution feedback. Now so AC has been um quite successful and has achieved substantial gains across some of the most popular HM benchmarks like Upworld or finer uh almost an 11% compared to previous state-of-the-art approaches such as Japa or DC. Um and it's also achieved an 8.6% um gain on financial reasoning tasks. Um there are four fundamental limitations um for AC that I'm going to reflect on and um just discuss on the next slide. Um and those form the basis for um for meta AC basically. Now as I was saying um despite it strength AC has got four critical failure modes. First it is highly dependent on the reflector. Um so when reflection fails the context becomes noisy and even harmful. Uh secondly there's feedback brittleleness which means that when ground truth signals are weak or absent AC may reinforce incorrect behaviors. Third, the the task complexity blindness um which leads to treat simple and complex tasks the same which can be a waste of resource uh and also a miss of opportunities um for optimization and then finally um AC optimizes only the context dimension so ignores compute memory and parameter updates. Now the 24 and 25 research landscape offers um four key insights in my views. First of all uh verification me mechanisms uh like self evaluation, multimodel consensus and execution checks are really important for robustness of any solution. Secondly, uh adaptive compute allocation shows that small models can outperform much larger ones by selectively increasing inference steps. The third one is that structured memory architectures outperform linear context context accumulation by organizing facts as graphs or multi-randular memories. Then finally, test time training bridges inference and learning uh and enables temporary parameter updates to yield large accuracy gains. So these advances suggest that we need a hybrid multi-dimensional system. Now, MetaC um addresses AC's limitation by adding a meta controller that learns to orchestrate multiple adaptation strategies based on a task's complexity, uncertainty, verifiability, and also resource constraints. So instead of applying the same procedure to every problem, Metaac profiles each task and allocates the right combination of strategies across context, compute, verification, memory and parameter dimensions. Um so this adaptive uh learned coordination is what enables it to outperform single dimension methods. Now the the meta framework consists of four layers. So getting into the architecture um the first layer is the task profiling one which assesses complexity uncertainty verifiability and resource budgets. Then there is a lightweight meta controller that selects and allocates adaptation strategies accordingly. The next layer down is a strategy execution one and the carries out the reflection, adaptive compute, hierarchical verification, structure memory retrieval and selective uh test time training. And then finally uh there's a feedback aggregation layer that collects the outcomes and updates the meta controllers policy through metalarning. So this layer design allows the system to learn from its experience and uh continuously refine its decision making. Now in terms of the task profiling um there are four key dimensions that are being assessed. The first one is uh semantic complexity. So this is basically an embedding based similarity to uh known dash distributions that gets produced. Uh second one is uncertainty quantification. Uh think of it as a relative softmax uh scoring that predicts model confidence. The third one is verifiability assessment. So whether we can execute and validate the output. And then the fourth one is resource availability. So we take into consideration the context window, the compute budget and even other constraints such as time. So the output of this layer of the task profiling layer is a 32dimensional task embedding which is what fits as input into the meta controller. Now in terms of the strategy toolbox um meta draws from six strategies. First one is minimal context which uses concise prompts for simple tasks. Um then we use AC reflection uh which retains the generator reflector curator loop for incremental knowledge accumulation um as established by uh standard AC. Then we also use adaptive compute which scales the number of reasoning steps or samples based on the task difficulty. We also use hierarchical verification that combines self-evaluation multimodal consensus and execution checks. uh adaptive memory uh that retrieves relevant information from structured multi granular memories and then finally we use selective test time training which applies temporary parameter updates such as lower adapters for high stakes tasks. So the meta controller learns to combine these tools effectively over time. Now the um reward formula um upon which the the learning strategy is selected accounts for the following components. Um the first one is the correctness of an action or prediction which is accuracy. Then we also have the penalty associated um with resources used or negative outcomes. So one minus cost and then is the trustworthiness of the models which is self-expressed certainty. So the confidence calibration basically uh with weighted importance determined by the hyperparameters alpha, beta and gamma. In terms of the uh metalarning loop um we have four sources of feedback collection. Uh first of all is task outcomes. The success failure or correctness um of the task. Then we've got the strategy performance. So what is the individual contribution of each strategy to the overall performance of the task? Then we also have efficiency metrics such as the compute, latency, memory. And then finally we've got confidence calibration. So where predictions are accurate. Um so moving on to um how we go on about uh solving the uh the limitations from AC. The first one was the weak reflector problem. So AC's issue is that there is a a 50 to 60% performance drop when reflector quality degrades. Um with beta AC we introduce um uh three things basically. So first of all is quality gates. Um so it's a learned classifier that blocks harmful deltas and secondly there's a multi- signal reflector uh or reflection which basically um is an ensemble of specialist models uh when there is a level of uncertainty. Uh and then the third one is adaptive strategy allocation. So the meta controller learns when reflection fails and then it roots to verification or test time compute instead. Um so we can expect to maintain an 80% plus performance even when the uh reflector degrades around 30%. Now the the second um limitation we had was um the feedback quality brittleleness. So what we observe with AC is that there can be significant degradation without reliable ground truth signals. Uh with beta AC we introduce a hierarchical verification cascade um where we can expect a 50 to 60% reduction in errors from poor feedback and that's through three tiers. The first tier is self verification which is just fast filter. We just accept if the confidence level is over a certain value. Second tier is a multimodel consensus. So we leverage a diverse range of models such as GBT4, claude and dips and we do confidence weighted voting. And then the tier three is execution based verification uh where we leverage code sandbox APA API validation and schema compliance. Um the the third um limitation we had was uh task complexity mismatch. Um so in a sense the fact that AC uses uniform processing um also for simple tasks which can be a waste of resource. So meta adapts uh strategy allocation dynamically rather than using the same heavy pipeline for everything. The alphas are allocation weights for the six optimization strategies and they represent how much computational budget is assigned to each strategy for a given task. So simple tasks um require minimal processing can save n around a 90% uh compute compared to standard AC. moderate tasks um is more of a balanced approach um that include AC plus verification and then complex tasks um basically heavy test time compute multiple attempts and memory retrieval. Um so just to conclude with some results um and and these are initial results uh we have observed um around an 8 to 11% uh improvement on agent benchmarks. Um we have also observed a six to eight points improvement on on some domain specific tasks. um also a 30 to 40% reduction in compute costs um through the allocation of um adaptive strategies um and overall there's um there's more robustness more consistency um and you know we can generalize better we can use the framework across a a diverse range of of domains so the conclusion is that um meta can can orchestrate ates a context compute and verification and memory and parameter adaptation and produce a robust uh self-improvement um framework for agents. Um future work will implement uh and evaluate the full system across uh a a more diverse range of domains and we'll continue exploring metalarning and this will involve also incorporating um additional strategies as well. Now I also wanted to touch on um additional applications of meta that I think are quite relevant. Um so first one is um for multimodel AI systems. So for example deciding when to use vision versus uh language processing again can be um a like a a meta adaptive uh strategy decisioning. Um also when you have uh compound AI systems that um require different models for different stages um and the complexity is um you know is substantial uh we can actually um uh in a in a meta adaptive manner uh select the most effective uh strategies to to resolve a task and to end. um also um for human collaboration um so in other words to determine when to have a human in the loop and also for continual learning systems um where we are balancing exploration versus exploitation. Um so the the core takeaway is that optimization requires a meta layer of intelligence and and that has to be trained um and you know um it requires um a lot of trial and error before it can actually um perform at the right level. In terms of the future direction and challenges um there are still several challenges that remain. So the meta controllers training u may be unstable um due to sparse rewards and that this can be mitigated through curriculum learning. Uh also robust advantage estimation and um regularization of entropy. Also computational overhead from profiling and multiple uh strategies um needs to be reduced with efficient models. Um we can leverage things like lazy execution, batching and caching. Um also uh the ver verification uh cascades can be brittle if all models um make the same mistake. So we need diverse models um with confidence waiting and human oversight um as well as active learning. uh metalarning loops require substantial data. Uh synthetic task uh task generation of policy learning uh transfer from related domains and sample efficient algorithms uh can also help as well. And finally uh addressing these ch these challenges um is going to be key to scaling meta and applying it across um a wide range of domains. So that was all from me. Thank you very much for listening. Um, and yeah, uh, appreciate you being there. Thank you.