You can't just one shot it — Mehedi Hassan, Granola

Channel: aiDotEngineer

Published at: 2026-05-10

YouTube video id: ON5LIT0M4do

Source: https://www.youtube.com/watch?v=ON5LIT0M4do

[music]
>> Cool. How's it going, guys? I'm Maddie.
We're going to talk about some product
engineering stuff that we've been doing
at Cronulla. This is not going to go
deep into our engineering stuff. So, if
you have for me to like go into LLMs and
stuff, it's not going to happen. I'm
warning you right now. So, you know
what's coming. Cool. So,
I'm a product engineer at Cronulla. I've
been, you know, coding since jQuery was
cool. I've seen React kind of change
front-end engineering. And obviously now
experiencing LLMs change
engineering and everything else just
like many of you. For those of you who
don't know, Cronulla is an app for
getting your work done. Essentially,
we're a meeting notes app where we
sit on your doc
like it is doing right now. And it has
access to your system transcription
system audio as well as your microphone
audio, which means we have real-time
transcription. And then at the end of
your meeting, we can give you really
awesome notes. So, I'm just going to
give you a quick demo. So, I was
recording the previous um
talk right here and you can see picked
up literally everything the presenter
said. And the cool thing about Cronulla
is that you can also write your own
notes on top of what the transcription
is saying. So, the final result is more
aligned to like what you'd normally
actually write on a notepad. So, I'll go
ahead and generate the notes here. And
you'll see that this will go ahead and
write a really good summary.
And as you can see like I wrote down
this 20% overlap thing and it focused
more on the output, right? So, this is
Cronulla. We have the best-in-class
meeting notes no matter what role you're
in and it doesn't get in your way. And
that's been like our product philosophy
since day one.
So, we ship a lot of AI features in
Cronulla and our product is known to be
again not to get in your way. So, let's
see what happens when you put a simple
AI feature into prod.
I'm going to kind of give you an example
with this chat feature that we have.
This is a feature that already exists in
Cronulla. You can ask questions about a
meeting that you just had and across a
bunch of different meetings or like
shared context as well. And Cronulla
will try answer it to the best of your
ability. So, let's say I built, you
know, like a one-shot this chat system.
It's very easy to do. And I put it into
production in my fake Cronulla app.
And then as soon as users hear, you
know, it's like a call me in a minute.
So, give me a list of cities. Web search
is too slow. It's not writing follow-up
emails how I normally write my emails. I
asked it to coach me about my meetings
and it's telling me meetings about my
football coach. Obviously, these are
very
very common problems that you're going
to run into when you make a generic
chatbot. So, how do we get around this,
right?
So, what we've seen is like molding the
LLM to work to your specific use case
can be super hard.
And one of the examples is is web
search. So, web search for most LLM
providers looks like a line of code. You
simply add the web search tool and you
expect it to just work. That's what the
labs want you to believe, but once you
get into it, there's lots of other
complications. So, for example, the
token usage and token cost can bubble up
quite a lot, especially for complex
queries. It's going to blow up your
context. And each chat could be costing
you like 10 pence. Obviously, at scale
when you have millions of users, this is
not really feasible.
And then, you know, like the web search
providers are also like completely up to
the labs as well. So, for example, in in
our development, what we see was like
was we were using a model for a good
amount of time. And then overnight, they
shipped an update and for some reason
web search degraded and it was
completely out of our control. And we
generally had no idea like what was
going on apart from just like switching
providers. But we want to have more
control over that because it affects our
user user experience.
And you know, like there's literally
billion-dollar companies who do web
search. So, that kind of tells you that
it's what's much more than just adding a
web search tool to your LLM pipeline.
The other thing that's super important
for apps like Cronulla is the output.
So, the summary that you saw was pretty
good for what I would expect, but
someone in sales might expect more of a
deal focus. Someone in engineering might
expect like action items, blockers, or
like linear tickets. HR might want
something completely different. And the
thing is that one prompt can't generally
serve everyone. And you know, LLMs are
stubborn and we need to figure out how
to get inside them and make them work
how we want it to work.
And yeah, as you as you know, like LLM
behavior is largely seen as like a black
box, but we want to kind of go very deep
into the details and figure out exactly
what's going on. So, what we did
recently at Cronulla is we started
building our own tracing tools. And
obviously, thanks to LLMs, you can
actually one-shot these things. And this
is where one-shotting is kind of nice.
And so, we built our tooling tracing
tools here where we basically have
complete visibility on the tool calls
straight from the beginning to the end.
So, we have full visibility over the
individual tool calls, why it's making
those tool calls, the search tools, the
reasoning tools, the cost structured
exactly how we want it. And the most
useful part of this is that we
structured the data exactly how we want
it. And the UI is built to like serve
our our employees internally, not just
like engineers, but also product,
data, and like CX, and everyone. So, you
don't have to like, you know, go into
CloudWatch and do like very complex
queries to figure out why something
failed. And that's been like the key for
us in like figuring out this black box.
And previously, obviously, building this
kind of tools would be like up to using
a SaaS provider.
And it simply wouldn't you simply
wouldn't have the time. But now you
actually can spend time building this
tracing tool that actually serves what
you need.
And this is obviously a very basic
example, but obviously you can use
OpenTelemetry or like other providers.
But we essentially just like save things
to a DB, wrap around like AI SDK, and
then the front-end is like kind of like
the most important part cuz that's what
people are going to use to figure out
what breaks and what doesn't. And we
literally have like our founder
literally goes into like the details
like following the agent loop completely
front to back to figure out exactly what
went wrong. So, then at the end of this,
you can actually figure out like, you
know, this output feels off to like
exactly what failed. And then when you
iterate, you can improve on those
things.
But as I said earlier, this is going to
be more more than just like basic LLM
stuff.
And LLM behavior is obviously part of
the picture. The how users interact and
experiences your product is also very
important.
So, with LLMs, you can one-shot more
things and you can have more variants,
which we like cuz we can experiment with
different features. We can experiment
with like one feature looking very
different in four different users. But
the problem for us specifically at
Cronulla was that we are a desktop app,
which means you can only run one
instance of the app at a time. And there
was a lot of friction when it came to
like testing new features, different
variants, and actually testing those in
parallel.
>> [clears throat]
>> So, before, you know,
before you'd have to like run the
Electron app locally, install the
dependencies, and test things. If you
wanted a coworker to to test those
changes, you'd have to get them to do
those things as well. We We didn't have
the same luxuries as like web apps do.
So, essentially, what we did is we took
our Electron app and we turned the
front-end of the Electron app into a web
shell. And this was deployed online. So,
now our CI, whenever we open a PR, we
get a preview link and we can go and
test those things. And this generally
sped up our development time so much
more. And like the cooler part of this
is that because LLMs can now self-verify
their work, these guys are now like once
we open a PR, Cursor goes and tests it,
uploads a screenshot into our PRs, which
speeds up the testing so much more.
And again, this is like you might think
that this is a lot of work, but it's
actually quite simple. So, what we did
is for those of you are not familiar
with Electron, there's obviously a main
process
and a render process. The main process
works with the system APIs and the
render process is basically your
front-end. And essentially, we
abstracted our IPC APIs, which is the
system APIs,
to fall back to web standards when we're
in the web environment. And similarly
with React APIs as well, like routers,
sessions, and query layer, we move those
to the
web standards. And essentially, this
just made the render agnostic of of
Electron. And we can just simply run it
as a web app. So,
this has helped us on top of the LLM
improvements was like we were able to
just like change and like test like one
feature in like multiple different
variants. So, like whatever the end
product is actually feels super good cuz
we know that we've tried so many
different variants.
And we actually felt those products in
in like in practice rather than just
like seeing it in Figma.
So, essentially, this this is basically
a long talk to tell you that the answer
isn't to one-shot better. It's about
figuring out how you can make that
feedback loop where it kind of feels
like playing a tennis game with LLM. So,
the end product feels more like magic
rather than just like a black box and
hoping that the feature that you're
going to release works well with
customers and having that conviction
that what you're shipping is actually
going to connect to the users.
Thank you. Any questions?
>> [applause]
>> What do you think about re-platforming
from Electron to Tauri?
We've thought about moving to Tauri a
couple times.
Um
I think the way Electron serves us right
now has been super nice. Like the APIs
changing quite a quite a lot. We've
tried Tauri before as well and we didn't
really see massive performance gains,
which we which is what we care about the
most.
So, yeah, it's been discussed before.
We've played around with it, but haven't
shipped it.
Cool.
Thank you, guys.
>> [applause]
>> Yay!
>> [music]