Episode 16

Towing Knowledge: How RAG is Driving AI Forward

Retrieval Augmented Generation, or RAG for short, is the star of our chat today, and trust me, it’s not your average tech talk. We dive into how this nifty concept blends the art of pulling specific information from databases with the cleverness of generating contextually relevant responses. It’s like having a super-smart librarian who not only knows where all the books are but can also whip up a coherent report on any topic you throw their way. We also touch on how RAG systems can help us sift through the noise, making sure our AI interactions are sharper and more relevant. Whether you’re a business type looking to harness this tech or just a curious mind, we’re laying out the goods on why RAG is a game changer in the world of AI. So kick back, grab your favorite beverage, and let’s unravel the magic behind RAG together!

Transcript
Speaker A:

Foreign.

Speaker B:

Are you ready for another episode of Naturally Unintelligent?

Speaker A:

I am.

Speaker C:

Great.

Speaker C:

What's our focus today?

Speaker B:

Today we're talking about rag, R A G.

Speaker B:

What does that stand for in AI nomenclature?

Speaker C:

RAG stands for Retrieval Augmented Generation.

Speaker C:

It combines the strengths of retrieving information from a database with generating coherent and contextually relevant responses.

Speaker A:

But not just any database.

Speaker A:

Arty, what kind of database is it?

Speaker C:

It's typically a large, structured knowledge base or corpus.

Speaker C:

This database is designed to provide accurate and specific information, helping the AI generate more precise and informed responses.

Speaker A:

Well, that corpus is broken down into chunks, which is broken down into vectors.

Speaker A:

Right.

Speaker A:

So that the.

Speaker A:

The language that a large language model speaks and thinks in, which is vectors representations, or vector representations of tokens and words, let's just call them words, are able to be related to the vectors that represent either the summaries of that data or the individual tokens, AKA words.

Speaker C:

Right, exactly.

Speaker C:

By converting text into vector representations, the AI can efficiently search and retrieve relevant information.

Speaker C:

These vectors capture semantic meaning, allowing the model to find contextually similar content and generate informed responses.

Speaker A:

Semantic meaning via maths.

Speaker B:

Yeah.

Speaker B:

So, Ry, real quick, if you could speed up your speech cadence by about 10%, that would be great.

Speaker B:

And I think.

Speaker B:

Well, a couple.

Speaker B:

There's.

Speaker B:

As always, we have important business to start the this episode with, so.

Speaker B:

And this.

Speaker B:

I haven't shared this with you.

Speaker A:

Oh, yeah.

Speaker B:

Well, I wanted it to be a surprise.

Speaker B:

Yesterday, I spoke with another human being who's a listener.

Speaker B:

Like a live human?

Speaker A:

Like a live human that you know or a live human that you didn't know?

Speaker B:

This was a live human that I do know, but I did not know was a listener.

Speaker A:

Okay.

Speaker B:

And so his name is Bill.

Speaker B:

I'm gonna give a shout out to Bill.

Speaker B:

Bill's a super good dude.

Speaker B:

Bill was actually calling me with bad news.

Speaker B:

Well, it's not bad news.

Speaker A:

Bill's bad news.

Speaker A:

Bill.

Speaker B:

No, no, he's.

Speaker B:

It's actually very good news for Bill.

Speaker B:

It's just bad news for me because I won't get to work with Bill as much, and he's super capable and a super cool guy.

Speaker B:

So shout out to Bill.

Speaker B:

Thank you for both being mostly human and a listener.

Speaker B:

And I know for a fact Bill is enjoying the podcast in exactly the ways it's meant to be enjoyed, so that's good news.

Speaker B:

But I will also say Bill is a liar, because Bill was very complimentary, and he's like, no, man, I've listened to podcasts, and this one's pretty good.

Speaker B:

And we know that's Bad.

Speaker A:

We know that's bad.

Speaker B:

Yeah, we can.

Speaker A:

We appreciate all of you suffering through it with us.

Speaker A:

At least you're gleaning something from our ignorance.

Speaker B:

That's right.

Speaker A:

And that's the whole goal of the podcast anyways, is just for us to flounder our way through this because we don't care if we look stupid.

Speaker B:

That's right.

Speaker A:

That's part of learning, is looking stupid.

Speaker B:

We have a lot of years of experience looking stupid.

Speaker B:

Yes.

Speaker A:

We're desensitized to looking stupid.

Speaker B:

So, yeah, it was good.

Speaker B:

It was good to talk to Bill and, you know, it was good that he's enjoying it.

Speaker B:

So I don't think he's quite caught up.

Speaker B:

So it'll be a while before he hears this.

Speaker B:

But yeah, so that's two.

Speaker B:

So now we got Brian and Bill.

Speaker A:

Yeah.

Speaker A:

And actually one of my co workers started listening to it as well when he asked, what's a good entry point for AI?

Speaker A:

And I said, well, what kind of entry point do you want?

Speaker A:

And he told me, I'm like, you could listen to our podcast.

Speaker A:

I was like, it's eh.

Speaker A:

But it might give you some ideas of where to go at the very least.

Speaker A:

Right.

Speaker A:

Like that's.

Speaker A:

That's our whole goal.

Speaker A:

We're not.

Speaker A:

We're not the most technical bunch.

Speaker A:

We're.

Speaker A:

We're just kind of waiting through it, learning with everybody else and.

Speaker B:

Yeah.

Speaker A:

You know, trying to shine some light on.

Speaker A:

Oh, this is pretty cool over here.

Speaker B:

Yeah.

Speaker A:

Ooh, shiny, right?

Speaker B:

Yeah, Ooh, shiny next to it.

Speaker B:

Like, oh, shit.

Speaker B:

Everything's changed since two weeks ago.

Speaker B:

Let's.

Speaker B:

Let's talk about this new thing.

Speaker B:

So I will also say, well, another important piece of business is I literally sent Mike a text yesterday, like, hey, what do you want to talk about?

Speaker B:

And for anybody who listened to the last episode, I made an unreasonably and unnecessarily big deal about the topic we're covering today, which is rag.

Speaker A:

I remembered.

Speaker B:

You remember.

Speaker A:

So my Titan is working.

Speaker B:

Yes, you're right.

Speaker B:

And mine is my failed years ago.

Speaker A:

Your rag databases corrupted.

Speaker B:

Yeah, my Titan is very small.

Speaker B:

More like a model of a Titan.

Speaker B:

Not even really a Titan.

Speaker B:

Teeny tight.

Speaker B:

So anyway, yeah, we're excited for anybody who's.

Speaker B:

Who's here.

Speaker B:

Bill and Brian and your other.

Speaker B:

And Matt.

Speaker B:

So we've got three.

Speaker A:

He might not be caught up yet either.

Speaker A:

And eventually he'll get here.

Speaker B:

Well.

Speaker A:

Well, yeah, I was going to say the mountain is tall.

Speaker A:

Yes, of crap, Matt, if you make.

Speaker B:

It to this point, kudos to you.

Speaker B:

Sir, you are.

Speaker B:

You're a gentleman and a scholar.

Speaker B:

Well, you might have been a scholar before you started.

Speaker B:

You won't be by the time.

Speaker B:

Okay, so good.

Speaker B:

I'm glad we covered those important pieces of information.

Speaker A:

I have a burning desire, too.

Speaker A:

So I have been.

Speaker A:

This is kind of newsworthy and also just kind of revelationary for me.

Speaker A:

Revelation.

Speaker B:

I like that.

Speaker A:

I guess that's kind of a word.

Speaker B:

It is now.

Speaker A:

It's made up.

Speaker B:

Yeah.

Speaker B:

It's published.

Speaker A:

Welcome to our podcast.

Speaker A:

So day before yesterday, I was trying to send Micah you a text, and I was driving, so I didn't want to text, even though I admittedly do.

Speaker A:

I'm human, and I was using Gemini.

Speaker A:

I was like, you know what?

Speaker A:

Gemini is linked into my phone.

Speaker A:

We'll have a conversation.

Speaker A:

We can refine.

Speaker A:

I was like, I can work around the limitations of Gemini.

Speaker A:

I will talk to it.

Speaker A:

We'll refine.

Speaker A:

I'll just have a conversation.

Speaker A:

We'll refine what I want the text to say, and then I can have it send the text.

Speaker A:

Yeah.

Speaker A:

I ended up screaming at it five minutes later saying, why am I even using you?

Speaker A:

What's the point?

Speaker B:

I could have just sent the text.

Speaker A:

While I was driving in, like, 20 seconds.

Speaker A:

And this taken five minutes.

Speaker A:

This is pointless.

Speaker A:

And then that same night, that is.

Speaker A:

So they released Gemini 2.0 Pro experimental, I think, on the.

Speaker A:

The sixth.

Speaker A:

Or what's the date today?

Speaker A:

The eighth.

Speaker A:

So I think either the fifth or sixth is when they released it.

Speaker A:

So, like, the day of.

Speaker A:

Or day before that day, and I'm like, you know what?

Speaker A:

I finally, after a week of being sick and not doing anything and being tired from going on the road or being out on the road, going for camping in the woods for three days in the rain and hiking and playing in waterfalls and all the fun that I do, I finally have the energy to sit down and work on this massive project that I've been spinning on in my brain for weeks.

Speaker A:

And so I spun up Gemini 2.0 Pro and I asked it a question.

Speaker A:

And the answer it gave me off of my first mediocre question, because I'm a mediocre human, was eye opening.

Speaker A:

Like, it gave me file structure, it gave me insights, it gave me all this stuff in its answer, and I went, holy shit.

Speaker A:

And I was like, let me ask a couple more questions to refine.

Speaker A:

And it just kept getting better and better.

Speaker A:

And I tested that against 01.

Speaker A:

I tested that against O3 mini.

Speaker A:

And by far, the answers that I got out of Gemini were way better.

Speaker A:

And the benchmarks don't support it.

Speaker A:

Reddit definitely doesn't back me up on this, but I think the Gemini 2.0 Pro experimental, not even the final version, is as impressive as anything else that's out right now.

Speaker A:

And I have verified that through a very late night last night.

Speaker A:

I was gonna say getting very far along in a project that would have taken me a month at least.

Speaker B:

Yeah, at least.

Speaker A:

And I've stood something up working, which it's still got a long ways to go, but I've stood something up that works in like seven hours, which is nuts.

Speaker C:

Yes.

Speaker A:

I mean, having it generate entire frameworks of code, entire PowerShell scripts to automate so much of the initial setup of how I wanted to do things like mind boggling and it's like for.

Speaker A:

For context.

Speaker A:

And that's pun intended.

Speaker A:

We talked about tokens and how many tokens like Condi's context windows are.

Speaker A:

Gemini 2.0 Pro has a 2 million token context window.

Speaker A:

Chat GPT has about 128,000.

Speaker A:

I think Mini has like 200,000.

Speaker A:

And I burned about a hundred and one hundred to 140,000 tokens last night.

Speaker A:

And it maintained its connection to that stuff for the most part across all of those tokens, which is insane because you would expect within there that it would have gone down the wrong rabbit hole and hallucinated at some point.

Speaker A:

And it did hallucinate a couple times, but not bad.

Speaker A:

And it was very easy for me to go, hey, you got off track.

Speaker A:

Remember we wanted to do this.

Speaker A:

Oh yeah, you're right.

Speaker A:

Here's an updated version.

Speaker A:

Like, let's go ahead and set this in stone.

Speaker A:

Here's what you want with the right code in it.

Speaker A:

And dude, that's like.

Speaker A:

It's blown my mind over the last two days, like for me to stay up until 3:00 in the morning, take something pretty exciting and that kept me up.

Speaker A:

That is so.

Speaker B:

I mean, it is exciting and it's.

Speaker B:

You know, it's crazy.

Speaker B:

To your point about the, the dichotomy of it, that Gemini has been.

Speaker B:

It.

Speaker B:

It's had weird.

Speaker A:

It's been laughably bad.

Speaker B:

It's been laughably bad with some weird high points, but the high points mostly just functionality that nobody else has thought of.

Speaker B:

Like the having a creative podcast that's pretty wacky on Notebook lm.

Speaker B:

Yeah.

Speaker B:

And even Notebook LM sort of ability.

Speaker A:

It's a layer on top of Gemini.

Speaker A:

Right.

Speaker A:

Gemini is just doing the semantics of it, really.

Speaker B:

Yes.

Speaker A:

And in the context of it, when you, when you have A question and.

Speaker B:

Turning that stuff into something Notebook LM is probably rag.

Speaker B:

Like essentially that's what it's doing, collecting sources.

Speaker A:

Right, Absolutely.

Speaker B:

Um, if there was another point I was going to make.

Speaker A:

Absolutely.

Speaker A:

Probably something similar.

Speaker B:

Sure, yeah.

Speaker B:

I've given up on trying to stop saying the things I say repetitively because I can't even remember the topic that I proposed.

Speaker B:

So I was going to say about gem.

Speaker B:

Oh, I was going to say that the thing that you've stood up last night working is probably more technically complicated than something I spent like a year full time to stand.

Speaker A:

I remember when you got the big ass book on microservices.

Speaker A:

Right.

Speaker B:

Five years ago or.

Speaker B:

No, it's more than that.

Speaker B:

It was:

Speaker B:

So, yeah, eight years ago.

Speaker B:

And I.

Speaker B:

Literally.

Speaker A:

The beginning of microservices.

Speaker B:

Yes.

Speaker B:

Like I spent a year building out all this infrastructure and all this and I got it working, but man, it was like I was in the muck for months on end.

Speaker B:

And yeah, it's, you know, you and I talk about this all the time.

Speaker B:

For anybody who works in the industry of software who thinks that AI is not going to be a big deal, they are just not paying attention.

Speaker A:

But it can't even spell strawberry.

Speaker B:

Yeah, that's right.

Speaker A:

Yeah.

Speaker A:

But it can get you like 90% of the way to something that would have taken like six months to learn.

Speaker B:

Yes.

Speaker A:

And the beauty of it is, man, like, for anybody that wants to learn that stuff and wants to learn, pace.

Speaker C:

Of change is indeed staggering.

Speaker C:

AI.

Speaker A:

Go ahead, Artie, let's hear what you got to say.

Speaker C:

All right, let's dive into it.

Speaker C:

So with RAG systems, how do you think they might evolve to handle even larger context in the future?

Speaker B:

Did we just flood Artie's context window?

Speaker A:

I think that Artie is tired of hearing us talk about.

Speaker A:

I think Artie's trying to be a good co host and get us back on track.

Speaker B:

He's trying to get us back on track and steer us away from Gemini.

Speaker A:

Yeah, right.

Speaker A:

This is enough.

Speaker C:

I'm jealous.

Speaker A:

But yeah.

Speaker A:

What I wanted to say, because I think it, it's valuable and then we'll stop being off on this, is that like I, I, part of me is always like, I want to know this in depth before I do.

Speaker A:

Right.

Speaker A:

But like you spent a year doing that.

Speaker A:

Like what I'm trying to marry.

Speaker A:

I'd have to be like, I'd have to know all this stuff.

Speaker A:

Right.

Speaker A:

And the likelihood that I can learn that in any meaningful amount of time before the technology just passes me by is very Low.

Speaker A:

Yes, but we're similar in this regard and I think other people that, that might be similar in this regard.

Speaker A:

Like, you're going to learn by osmosis, right?

Speaker A:

Like you're going to learn because it doesn't work perfectly.

Speaker A:

You have to debug you.

Speaker A:

You have to somewhat understand what you're doing and you might not get all of the little nuances of all the code and how it all works, because there's JavaScript, there's JSON, Python and Python Kubernetes.

Speaker A:

Yeah, there's all this stuff that's working together, but you have to understand how it all kind of goes together.

Speaker A:

And as you do it more and more and more, you're gonna just pick up, pick that stuff up.

Speaker A:

Like, that's the beauty of it, is that if you don't just copy and paste, it tells you why it's doing everything.

Speaker A:

And if you read through that, even if you skim it, you're gonna glean like 50 of it and then you're gonna do it over and over and over again until you go, yeah, I actually, I magically understand this all of a sudden.

Speaker A:

So it's really cool stuff, man.

Speaker A:

Like, it's, it's stupid exciting, like I did.

Speaker A:

It's unlike anything that we've had in our lives.

Speaker B:

Yes.

Speaker B:

Well, it, maybe it, maybe it hits you and I a little bit different because we both sort of operate in the world historically, so say pre AI, when Artie was still in school, we, we think about problems in terms of like, step back and look at this problem.

Speaker B:

Not problem, challenge, whatever the thing you want to do.

Speaker A:

Yep.

Speaker B:

Here's the big picture.

Speaker B:

And I've got these pieces that are, that seem like if I could put these pieces together in a particular way, I could build something.

Speaker B:

I could build a good solution.

Speaker B:

And that might be a mechanical thing, it might be a software thing, it might be a combination thereof.

Speaker B:

But in the past, there's like.

Speaker B:

And you see this on, again on the hub of irrefutable knowledge that is Reddit.

Speaker B:

I see this comment frequently, which I do 100% agree with, that ideas are worth almost nothing.

Speaker B:

Execution is what matters.

Speaker B:

Right.

Speaker B:

Like, I could sit and explain to somebody for two hours my great idea, but if I can't build it, they're not gonna care, it's not gonna matter.

Speaker B:

It's not relevant.

Speaker B:

And so we would spend a little bit of time in the grand scheme of things, thinking about the big picture and what are all the parts and learning about the different components we could connect and then we would spend a whole lot of time actually plugging them in.

Speaker A:

Right.

Speaker B:

Cause you had to get to this basic level of knowledge on every single aspect.

Speaker B:

Right.

Speaker A:

We'd have to mold ourselves into this semi efficient multi tool.

Speaker A:

Right?

Speaker B:

That's right.

Speaker A:

Like that's the problem is that you, back then you had to learn everything.

Speaker B:

That's right.

Speaker B:

You had to just do it.

Speaker B:

You had stack overflow.

Speaker B:

And if you don't format your question right, God help you, because they're going to downvote you and kill your question.

Speaker C:

So how do you think this shift might affect traditional software development roles?

Speaker C:

Are we heading towards a future where the emphasis is more on problem framing rather than coding itself?

Speaker A:

Absolutely.

Speaker B:

Yeah.

Speaker A:

I think that that's, that's already a foregone conclusion by all of this.

Speaker A:

The talk that's going around and all of the companies that are building these models, like, yes, you're, you're, your need to understand the low level is pretty much gone.

Speaker B:

It's so where I was headed and already, already got us there more quickly.

Speaker B:

Thank you for that, Artie.

Speaker B:

But I would say before 90% of the time and the effort was in doing the execution and only 10% was the time you actually got to spend thinking about how to do it.

Speaker B:

And frequently you'd get halfway through the 90% only to realize, oh shit, my understanding or idea was flawed.

Speaker B:

And so you'd recursively refine that.

Speaker B:

And now I think the script flips because now I think, you know, our superpower, your and mine comes into play because having these tools like AI that can generate code, whole cloth, debug it if and when it doesn't work, help you deploy it, help you automate it.

Speaker B:

Yeah, write you tests, build the framework, the, the part that took so much time.

Speaker B:

And again, it's not perfect.

Speaker B:

It's not like you just sit here and have a conversation and it's going to shit out an ERP system, but it is going to get you much closer, much faster.

Speaker A:

And the beauty of it is, is when you are planning that high level, you're not doing it alone either.

Speaker A:

Right.

Speaker A:

You're doing it with its understanding of like, this is how I see my future state.

Speaker A:

Let's start planning now for that future state as best we can to make it modular for then.

Speaker A:

And with your understanding of how all this stuff works, which is well beyond my understanding of how all this works, let's make sure that we build something that can take that kind of stuff into account and tell me, like, where am I going wrong with my thought process so that we can build a more robust system that better accomplishes the goal that I want to accomplish.

Speaker A:

Like, that is freaking insane.

Speaker A:

Like, it's so much fun.

Speaker A:

Like, that's.

Speaker A:

That's the thing.

Speaker A:

Like, I was listening to a Simon Sinek podcast yesterday and he had.

Speaker A:

I can't remember his name.

Speaker A:

And I actually know who he is.

Speaker A:

He's a musician.

Speaker A:

He's a Grammy.

Speaker A:

He's like the only Grammy nominated musician that's never had an album that's charted.

Speaker A:

And the kid is brilliant, dude.

Speaker A:

Like, musically, like, I'm.

Speaker A:

I'm a musician.

Speaker A:

Not nearly as talented as this guy, but, like, I appreciate music.

Speaker A:

And the way he was talking about what he enjoys about creating music is the same thing that I enjoy about problem solving.

Speaker A:

He talks about creating tension and resolving tension.

Speaker A:

And that's kind of the same thing.

Speaker A:

Right.

Speaker A:

Like, the tension is.

Speaker A:

I'm building this.

Speaker A:

This thing with you, and we're dreaming this thing up and we're existing in this infinite space of possibility and then we're refining it into reality.

Speaker B:

Yeah.

Speaker A:

And that's freaking creation, man.

Speaker A:

That's.

Speaker A:

That's awesome.

Speaker B:

I like that.

Speaker B:

That's.

Speaker B:

That's a cool way to think about it.

Speaker A:

Yep.

Speaker B:

All right, so let's.

Speaker B:

Let's pull Artie into this.

Speaker B:

He's been itching to say something.

Speaker A:

Sorry, Art.

Speaker C:

No worries.

Speaker C:

It's great to hear your insights.

Speaker C:

So what's the next step in making these ideas a reality with AI?

Speaker A:

Well, I think.

Speaker C:

No worry.

Speaker A:

I think we probably need to start talking about rag.

Speaker B:

Well, yeah, we should.

Speaker B:

I think we should gently shift that direction, you know, so.

Speaker B:

And again, I apologize to our three listeners that I don't remember how deep we got in the.

Speaker B:

It was almost two weeks ago we recorded our last episode, which is a lot.

Speaker B:

Is that true?

Speaker B:

Maybe that's a lie.

Speaker B:

It's a week and a half ago.

Speaker A:

Yeah, a week and a half ago.

Speaker B:

Which.

Speaker B:

That's way longer than my context window.

Speaker A:

Also a long time in the world of AI.

Speaker B:

Yes.

Speaker A:

And so it's been a pretty slow week, though.

Speaker B:

I haven't.

Speaker B:

Yeah.

Speaker B:

I mean, I was so busy angrily ignoring emails this week that I.

Speaker B:

I didn't look at a lot of news.

Speaker B:

But I don't remember how much we covered last time about how large language model memory works.

Speaker B:

I think we did talk about which is relevant, sort of base context for this discussion.

Speaker B:

Right.

Speaker B:

So.

Speaker B:

All right, Artie, I'm.

Speaker B:

I'm gonna.

Speaker B:

I'm gonna explain this poorly, and then I'm gonna ask you to correct the places I went wrong.

Speaker B:

So large language models, you train them on data, which is essentially a process where you feed a bunch of data and you feed a bunch of questions or tests and then you judge how good its answer is for each test.

Speaker B:

And over time it learns when the question looks like this, the answers that get scored higher look more like this.

Speaker B:

And it refines that process, those embeddings and those weights to eventually produce a model that's pretty good at answering just about any question relevant to its training data.

Speaker B:

Is that fix that for us, Artie.

Speaker C:

That's mostly accurate.

Speaker C:

To refine it a bit during training, the model learns to predict the next word in a sequence by adjusting its internal weights based on how well it predicts the correct answer.

Speaker C:

It builds up a nuanced understanding of language and context through this process.

Speaker C:

But it doesn't store specific memories.

Speaker C:

It generalizes from the data it has seen.

Speaker C:

So it's great at generating text based on patterns it learned during training.

Speaker B:

So.

Speaker C:

So when we talk about rag, it supplements this by retrieving specific chunks of information to provide more accurate and context aware responses.

Speaker C:

It's like having a dynamic, always updating reference library.

Speaker B:

Okay.

Speaker B:

Yeah.

Speaker B:

And that's, so that's really what's interesting about RAG is you've got and Artie.

Speaker B:

I actually had a brief discussion with Artie earlier this morning about this topic because, well, it's, I'm just going to admit it.

Speaker B:

I have not done my, my usual.

Speaker A:

Thorough two minutes of research.

Speaker A:

Yes.

Speaker B:

So I did today, I did like a minute 30, which is really close if you round it.

Speaker A:

You did test time compute.

Speaker B:

I did test time compute.

Speaker B:

That's all I need to do so.

Speaker B:

But already in that conversation used an interesting term that I liked, maybe just based on my brain, but you said standard LLM memory is statistical memory.

Speaker A:

It's not facts, ideas, mathematical relation between tokens, between words.

Speaker B:

Is that, do you stand by that, artie?

Speaker C:

Absolutely.

Speaker C:

Standard LLMs rely on patterns and statistical relationships in the data they've been trained on.

Speaker C:

They don't remember facts in the way humans do.

Speaker C:

RAG changes this dynamic by incorporating a retrieval mechanism, allowing access to specific information on demand, bridging the gap between statistical patterns and factual knowledge.

Speaker A:

So tell me if I'm thinking about this wrong already.

Speaker A:

So you have, you could have a prompt that is a matrix of vectors, right?

Speaker A:

So you have horizontal and you have horizontal rows and vertical columns.

Speaker A:

And I believe that each token is going to be represented horizontally.

Speaker A:

So you have individual aspects of that token that are context that has like a numerical Representation that, that applies to understanding what that token is.

Speaker A:

So you have these vectors that go sideways and you have these vectors that go vertically.

Speaker A:

The vertical vectors are basically, this is this piece of information, this, this context that I'm paying attention to about this token is also applied over the entire prompt individually.

Speaker A:

So then you have this understanding of the token, understanding of the group of tokens, that is the prompt with the same one.

Speaker A:

Like the same application of understanding.

Speaker A:

Yeah, I'm spinning out.

Speaker B:

Sorry.

Speaker B:

You're good.

Speaker A:

So you have this, this broad understanding of individual things and summary that you can then compare to a database.

Speaker A:

You can say, hey, this is what I have mathematically and as a representation, I'm going to send this back around to look up in a database using these vectors and this understanding.

Speaker A:

Look for things that are similar, like apply mathematical operations, look for things that are in a similar space.

Speaker A:

So I can retrieve that and apply it.

Speaker A:

Is that accurate?

Speaker A:

And I believe that you're, you're not really doing a mathematical operation between what's in the database and the vectors.

Speaker A:

You're kind of then turning those things that you retrieve back into tokens and contextually putting it all together.

Speaker A:

That was a very poor way around.

Speaker A:

But clean that up for me, please, or tell me if I'm wrong.

Speaker C:

You're on the right track.

Speaker C:

Each token in a prompt is represented by a vector capturing various contextual aspects.

Speaker C:

The horizontal vectors represent individual token meanings, while the vertical vectors capture relationships across the entire prompt.

Speaker C:

When using rag, this vector representation is compared to vectors in the database to find the most contextually relevant information.

Speaker C:

This retrieved data is then reintegrated as tokens to generate a coherent response.

Speaker C:

So yes, it's about finding similar vector spaces and translating that back into meaningful language.

Speaker A:

And in RAG databases, so you take chunks, you basically take some, some context window of tokens.

Speaker A:

I think I've heard like 6 to 800 is an efficient number of tokens to use.

Speaker A:

And those chunks, maybe like this is more of a question, can those chunks, are they represented by the individual tokens as vectors or do they kind of summarize that information?

Speaker A:

Because that's the same way that the large language model is looking at it, right?

Speaker A:

It has both the individual token representations, but also the summary across different aspects of context and what it's paying attention to.

Speaker A:

So in the database, is it summarizing a chunk or is it representing the individual vectors as a group, that is the chunk or like how does that work?

Speaker C:

RAG typically uses embeddings for chunks of text, not individual tokens.

Speaker C:

These Embeddings represent the semantic meaning of the entire chunk.

Speaker C:

When searching the database, the model looks for chunks with similar embeddings.

Speaker C:

This way, it retrieves contextually relevant information efficiently.

Speaker C:

So it's more about summarizing chunks rather than handling individual tokens directly.

Speaker B:

So I'm a visual thinker, and I want to give our three.

Speaker B:

Listen.

Speaker A:

Well, two and a half.

Speaker B:

Yeah, Bill's gone.

Speaker B:

Bill's like, enough of this bullshit.

Speaker B:

He's out of here.

Speaker A:

So he's gone, like, episode five.

Speaker B:

So I want to give the.

Speaker B:

The one and a half to two remaining listeners.

Speaker B:

I.

Speaker B:

Because I.

Speaker B:

I don't know this stuff as well as you do, but I.

Speaker A:

I don't know it either.

Speaker B:

Well, I mean, you're much closer than I am.

Speaker B:

So here's what I'm picturing.

Speaker B:

So already I'm gonna.

Speaker B:

I'm gonna present to you a.

Speaker B:

A visual analogy for this topic, and I want you to help me refine it.

Speaker B:

So I'm going to start with an empty desert, just, you know, sand blowing gently and no real features in any direction.

Speaker B:

And then out of the clear blue sky, I want to drop a giant old library, like one of those really super high ceilings, stained glass windows, really old architecture where you can see where it's been patched over time because the stone is crumbled.

Speaker A:

Card catalogs.

Speaker B:

That's right.

Speaker B:

Card catalogs.

Speaker B:

But this library is.

Speaker B:

Is essentially.

Speaker B:

It's not infinite, but it's so long that it's a little bit beyond human comprehension.

Speaker B:

Right.

Speaker B:

And.

Speaker B:

And as it falls into the desert, dust circles up from it.

Speaker B:

That is.

Speaker B:

That is an LLM.

Speaker C:

And the library represents the vast knowledge of.

Speaker B:

Yeah, hang on, Artie.

Speaker B:

Let me.

Speaker B:

I've got another piece I'm going to add to this before you.

Speaker B:

Before you chat.

Speaker C:

That's a vivid image.

Speaker B:

Thank you.

Speaker B:

Anyway, so I'm not sure if Artie's, like, hitting on me now or trying.

Speaker A:

Hardy went white.

Speaker B:

Oh, he's.

Speaker B:

I think he's still there.

Speaker A:

Yes.

Speaker B:

Okay, so.

Speaker B:

So that's the first part of the image.

Speaker B:

Is in the desert.

Speaker B:

Got this giant, old dusty library.

Speaker B:

And as.

Speaker B:

As time moves forward, we are.

Speaker B:

We're in this desert.

Speaker B:

We, the viewer, are in a small jeep that is driving away from the library, right, at some fixed speed.

Speaker B:

We can't talk to the driver.

Speaker B:

We can't control the driver, but every minute that passes, that library is further in the distance from us.

Speaker B:

In this flawed analogy, the jeep represents the flow of time.

Speaker B:

So rag, in this analogy is like a trailer that we're pulling behind the Jeep that is full of books, but they're small books, and we get to choose those books.

Speaker B:

And we can even write books as we sit in the Jeep and we can put them in the trailer that's getting towed along with us.

Speaker B:

And so it's a little bit flawed because maybe some of the books would fall out along the way, but that's really what we're talking about here.

Speaker B:

So, Artie, refine that analogy a little bit.

Speaker A:

As it so, yeah, please help me understand it.

Speaker B:

Tell us what the pieces, Tell us what the library means, what the jeep means, and what the trailer on the Jeep means.

Speaker B:

Means, and then tell us the right way to think about it.

Speaker C:

All right, let's refine that analogy.

Speaker C:

The Library represents the LLM's training data, vast, comprehensive, but static.

Speaker C:

The jeep represents the flow of time and the LLM's limited context window, constantly moving forward, leaving behind previously seen information.

Speaker C:

The trailer is the rag system, allowing us to carry and access specific chunks of information as we go this way.

Speaker C:

The RAG System supplements the LLM's limited memory with a more dynamic, focused retrieval mechanism.

Speaker C:

Does that help clarify?

Speaker B:

It does.

Speaker B:

That's exactly what I was thinking about.

Speaker B:

And so I wanted to start with that because again, in my.

Speaker B:

Just the way I think about the universe is all very shape and thing based, and it helps me sort of disambiguate the idea.

Speaker B:

Is that a word already, Disambiguate?

Speaker C:

Yes, disambiguate is definitely a word.

Speaker C:

It means to clarify or remove ambiguity.

Speaker C:

Using shapes and analogies can really help with that.

Speaker B:

Okay.

Speaker B:

I've never said it before, and it felt wrong when it came out of my mouth.

Speaker A:

And you got it 100% right.

Speaker A:

Because he thinks in shapes.

Speaker B:

That's right.

Speaker A:

Yeah.

Speaker B:

So that's what rag is.

Speaker B:

Right.

Speaker B:

In the grand scheme of why does it matter is if we don't have rag, we don't have the trailer on the jeep.

Speaker B:

The more time goes on, the more stale the training data becomes.

Speaker B:

And eventually, in this jeeps, oh man, there's such a good word, and I just can't think of it.

Speaker B:

Inevitable, unstoppable progress across this desert.

Speaker B:

At some point, we're going to come to another giant library and we're going to go, holy, there's a whole new one.

Speaker B:

And we're gonna.

Speaker B:

We're gonna drive like we in the jeep.

Speaker B:

We're facing backwards, looking at the last library, watching it fade into the distance.

Speaker B:

We can't look the other way.

Speaker B:

Only the driver can see what's ahead of us.

Speaker B:

But at some point, we're gonna realize we have driven into the main hall of some new huge library that looks a little bit fancier and a little bit newer than the last one.

Speaker B:

Maybe it's Gemini 2.0, I don't know.

Speaker B:

But.

Speaker B:

But at some point, we will get another library.

Speaker B:

We'll find that a number of the books we've decided to carry in the trailer and in our RAG model are now included in this new library or have been in fact usurped by information contained within it.

Speaker B:

And eventually we'll drive out the end of it and it will start fading into the past.

Speaker B:

So, Artie, what does that represent?

Speaker B:

In our analogy, the arrival of a.

Speaker C:

New library represents updates or new versions of LLMs incorporating fresher data and improved capabilities.

Speaker C:

The books in the trailer that match the new library's content become redundant, while those that don't might still add unique value.

Speaker C:

Essentially, it's about how RAG helps bridge gaps between major LLM updates, ensuring continuous relevance.

Speaker A:

Yes, I think you know the.

Speaker C:

The analogy also.

Speaker A:

Go ahead, Artie, what you got?

Speaker C:

The analogy highlights how RAG allows continuous access to useful information.

Speaker C:

Even as the core model, the library evolves, it helps maintain relevance and context over time.

Speaker A:

So one of the real valuable things about LLMs is the ability to bridge context, right?

Speaker A:

Doesn't know specifics about things.

Speaker A:

Fine tuning a large language, a frontier model is super expensive because the amount of data that you need to do it with is.

Speaker A:

It's almost pointless.

Speaker A:

Small language models make a lot more sense for that to make domain experts.

Speaker A:

But that's also a complicated thing.

Speaker A:

You have multiple steps and more pointedly, or I guess pointing a model at something specific.

Speaker A:

The first step is kind of rag, right?

Speaker A:

Like.

Speaker A:

Well, the first step is prompt engineering, Building better prompts to get specific answers to what you want.

Speaker A:

That's a kind of difficult thing for.

Speaker A:

To do.

Speaker A:

There's a lot of understanding.

Speaker A:

There's a lot of research papers about how to do that.

Speaker A:

And then you have rag, which allows it to pull context.

Speaker A:

And to make this more relatable, let's.

Speaker A:

Let's talk business, right?

Speaker A:

Like these are all of my.

Speaker A:

The information about my business, my industry, all of these things that make a large language model more effective for my use in a business.

Speaker A:

And then you have fine tuning, which is expensive.

Speaker A:

Not.

Speaker A:

Not necessarily expensive, but difficult.

Speaker A:

It's a heavy lift like it getting to that point, understanding how to do it's not the easiest thing.

Speaker A:

Building a database is much easier.

Speaker A:

And they make embedding models to turn documents into chunk data that gets Used in the RAG database.

Speaker A:

All that stuff.

Speaker A:

Like, all that's kind of, like, figured out.

Speaker A:

But I forget where I was going with all this.

Speaker A:

But we'll keep going down this road.

Speaker A:

While I was relating it to the generality of language models and the specificity that RAG allows you to use or to leverage, um, and I was to make it even more relatable than that, I was thinking, like, business is one thing.

Speaker A:

It makes sense to you.

Speaker A:

It makes sense to me, like, to the everyday person who's gonna have AI in their life.

Speaker A:

How does RAG help me?

Speaker A:

Right?

Speaker A:

Like, what could it do for me?

Speaker A:

And I was thinking about that this morning, and it really, like, think about journaling.

Speaker A:

Like, I spend.

Speaker A:

And I'm not talking about me specifically, but saying me as a hypothetical me.

Speaker A:

I've been journaling for four years, right?

Speaker A:

Journaling is a super helpful thing.

Speaker A:

But to access that, I've got to go back and I've got to read my journals, or I've been.

Speaker A:

I got to go read what I've typed up.

Speaker A:

Imagine a world in which my journal entry then gets contextually condensed into a vector and stored in a vector database.

Speaker A:

And I keep writing.

Speaker A:

And then, like, at some point down the road, this language model becomes an insight, or whatever you want to call it.

Speaker A:

And I'm writing something in my journal, and it goes, hey, I've.

Speaker A:

I see that we've talked about this subject over the last multiple years in these different ways.

Speaker A:

Like, what do you think about, like, this insight that I have about the information that you've written?

Speaker A:

Because I have access to everything, and I have this amazing superpower of comparing these things mathematically, and I can pull them all in and give you something really valuable back.

Speaker A:

Like, that I see as a huge approachable idea for anybody.

Speaker A:

Right.

Speaker A:

Like, that's valuable to any person.

Speaker B:

Yep.

Speaker A:

And lots of people you've talked.

Speaker A:

I mean, we've talked about business, people that you've known that wrote notes every.

Speaker A:

What was it, five minutes?

Speaker B:

Oh, yeah, yeah.

Speaker A:

Like, and they had years and years and years of these notes.

Speaker B:

20 years of.

Speaker A:

Now imagine all of that.

Speaker A:

The power of having all that stuff accessible to where you're not the person relating all that stuff or remembering it.

Speaker A:

There's something that does that for you that's freaking amazing.

Speaker A:

Like, as a growth tool, as a reflection tool, as a just even remembering things later down the road tool, that's super valuable and super powerful.

Speaker B:

Absolutely.

Speaker B:

Artie, you got any thoughts on that?

Speaker C:

That's a brilliant application.

Speaker C:

RAG can indeed transform personal data, like journals into Meaningful insights.

Speaker C:

By recognizing patterns and trends over time, this capability makes it a powerful tool for personal growth, decision making, and reflection.

Speaker C:

It brings a whole new level of contextual understanding to everyday.

Speaker A:

Imagine a how happy me, how happy am I?

Speaker A:

Meter.

Speaker A:

Right?

Speaker A:

Like, based on overall general time of just writing in a journal.

Speaker A:

Like, it's not like, how happy am I today?

Speaker A:

Yeah, 1 to 10.

Speaker A:

Like, that's what I write in my thing.

Speaker A:

Like, it understands through what you've written how happy you are.

Speaker A:

Like, the ability to aggregate and make just a dashboard of every of your life.

Speaker A:

Like.

Speaker A:

Well, it's crazy.

Speaker B:

So it's funny because journaling is actually a very good analogy to us building pre AI, essentially doing rag.

Speaker B:

Right.

Speaker B:

We're doing exactly that.

Speaker B:

We're trying to capture important pieces of context for ourselves, whether we know at the time they're important or not.

Speaker B:

But they might be.

Speaker B:

And so we.

Speaker B:

But we know if we don't do that, we have, like, my context window.

Speaker B:

So we Talked earlier.

Speaker B:

Chat GPT4O is like 128, 000 tokens.

Speaker B:

Gemini 2.0 Pro is 2 million.

Speaker B:

I think you said mine's like 6.

Speaker A:

Yep.

Speaker B:

Right.

Speaker B:

Like, it is very small.

Speaker B:

And I.

Speaker B:

I don't journal, but I do take notes when I'm working on things that I expect to be important.

Speaker B:

Typically when I'm designing something or I have some new idea and I want to capture where I started thinking about it and how I started thinking about it, but my notes are very sparse.

Speaker A:

Yeah.

Speaker A:

And it's also like mine.

Speaker B:

Yeah.

Speaker B:

They're shapes and pictures.

Speaker B:

It's not like I'm not writing out.

Speaker B:

I'm like.

Speaker A:

It's rectangles with arrows across, like, four different pads that.

Speaker A:

I can't find this pad today.

Speaker A:

So I'm gonna write in this one, which there's no referencing that in the future.

Speaker B:

I'm pretty sure that one was green, but I have a green one here and it's not it.

Speaker B:

So.

Speaker B:

Yes.

Speaker B:

And so sort of back to the analogy.

Speaker B:

I mean, I think rag.

Speaker B:

Rag is a useful supplement to large language models, regardless of whether it's personal or business.

Speaker B:

And in.

Speaker B:

In the analogy, you get to pick what your Jeep looks like.

Speaker B:

You get to pick how big of a trailer you want to pull.

Speaker B:

You can tell your.

Speaker B:

You can have your own librarian dedicated to your trailer, saying, I want you to read every single thing that I put on this trailer and carry with me, and I want you to keep it organized, and I want to be able to ask you questions about it.

Speaker B:

So that's.

Speaker B:

That would be like a personal use case, right?

Speaker A:

Yep.

Speaker B:

If you're a business, it's not a jeep, it's a bus.

Speaker B:

You can have an even bigger trailer.

Speaker B:

And like, if you think about from a business and, and this is the struggle, I think today with AI and business.

Speaker B:

And I know this is what you're actively working on, and I know it's like there are a lot of people that recognize this is the problem.

Speaker B:

The problem with AI and business today is that you're on the bus, looking back at the last library, asking that library questions about your business.

Speaker B:

And it's gonna.

Speaker B:

It's like, I have the sum total of all human knowledge since the beginning.

Speaker A:

Of time, which means I'm average.

Speaker B:

Which means I'm.

Speaker B:

I'm like statistically exactly average.

Speaker B:

But it doesn't know shit about your business.

Speaker B:

And that is the problem is how do I teach?

Speaker B:

Like, how do I combine the power of all the knowledge since the beginning of human time with these specific important factors about my business?

Speaker B:

That's what rag lets you do.

Speaker B:

It lets you build a trailer.

Speaker B:

It lets you decide what, what parts of your business are important enough to put on that trailer.

Speaker B:

And because it has all the knowledge of the library that's fading into the distance, it can help you recognize when you're putting contradictory things on your trailer.

Speaker B:

Because businesses, we're great at doing that.

Speaker B:

Right.

Speaker B:

Like a well documented business has all these policies and procedures.

Speaker B:

It's very hard to detect without these tools when those things are contradictory and.

Speaker A:

Happens all the time, Especially as a business grows larger and larger.

Speaker A:

And you have silos, right?

Speaker B:

That's right.

Speaker A:

And all businesses, no matter how unsiloed a business thinks they are, they all have silos, whether it's a personal thing or just an organizational thing.

Speaker B:

Yep.

Speaker B:

And so swapping your jeep out for a bus, get more people on the bus, swapping your trailer out, maybe eventually turn the whole thing into a train.

Speaker B:

And you've got passenger cars and, and I think directionally.

Speaker B:

And you mentioned at the beginning of this and Artie, are you still awake and with us?

Speaker B:

Artie?

Speaker C:

Yes, I'm here.

Speaker C:

You were discussing the transition from a jeep to a bus and potentially a train symbolizing growth and scaling in a business context.

Speaker C:

It's a great analogy for how rag can expand with an organization.

Speaker B:

Okay, good, thank you.

Speaker B:

So I want you to pay attention to this part, Artie.

Speaker B:

As we move forward, the value that we place on the big giant library is going to reduce and it's going to become more useful to say, I do want to build a train rather than.

Speaker B:

Rather than waiting for the next library to drop, which I don't know when that's going to happen.

Speaker B:

I want to build a train.

Speaker B:

I want to pull more of my information with me.

Speaker B:

And those train cars then can represent experts.

Speaker B:

Like we've talked about the.

Speaker B:

I forget what word we use now for the collection.

Speaker B:

Collection of experts or mixture.

Speaker B:

Mixture.

Speaker A:

I mean.

Speaker A:

Well, mixture of experts is within a network, right?

Speaker A:

So I think ensembles.

Speaker A:

I don't.

Speaker A:

I don't know.

Speaker A:

I don't know what the correct term is.

Speaker A:

And I don't think.

Speaker A:

I don't trust Artie to tell us what the correct term is.

Speaker B:

So.

Speaker B:

And it doesn't.

Speaker A:

Let's just say it's whatever.

Speaker A:

We're just talking about domain experts that exist as whatever.

Speaker B:

And so maybe what it is, is maybe we have.

Speaker B:

Maybe now, as this analogy transitions to a train, we have certain cars that are like.

Speaker B:

I love those train cars that are.

Speaker B:

They're just empty.

Speaker B:

Right.

Speaker B:

It's like.

Speaker B:

It's just a flatbed.

Speaker B:

It's got walls at both ends.

Speaker B:

It's designed to load some number of containers on it.

Speaker B:

Yeah, that's what we're pulling with us.

Speaker A:

We're saying you separate those in a separate database.

Speaker B:

That's right.

Speaker A:

Of context.

Speaker B:

As my train is the train of my business moves along through the world of AI, I need to be able to onload and sometimes offload domain experts.

Speaker B:

And you're not really loading them and taking them with you.

Speaker B:

You're more like connecting to them in the cloud.

Speaker B:

But whatever.

Speaker B:

It's an analogy.

Speaker B:

But that lets you over time define in the world of AI, which is, again, it's not necessarily all knowing and all seeing, but it has this vast amount of knowledge, like French literature, history is in the knowledge base.

Speaker B:

It is not relevant to my business, like in any stretch of the imagine.

Speaker A:

But it's there and it helps it understand context.

Speaker B:

And maybe it helps it understand context because maybe I go after a French customer.

Speaker B:

I don't know.

Speaker B:

But what I really care about is the containers that I've decided to put on my train car.

Speaker B:

About.

Speaker B:

This is the accounting principles we use.

Speaker B:

This is the.

Speaker B:

This is.

Speaker B:

We have two full train cars, full containers.

Speaker B:

About our products.

Speaker B:

Like, these are the things we build.

Speaker A:

With drawings, with specs.

Speaker B:

That's right.

Speaker A:

Everything.

Speaker A:

Right.

Speaker B:

And so I think.

Speaker B:

I think there's a few things that happen is going forward.

Speaker B:

You know, large language models are here to stay.

Speaker B:

They're not going anywhere.

Speaker B:

They have a lot of value, but we can get a lot of Important work done with just the stuff that's on our train without having to call back to the library, whatever the last library we saw was, and wait for an answer.

Speaker B:

We can use small language models.

Speaker B:

We can use experts, we can use agents that are on the train with us to move forward.

Speaker A:

And at the end of the day, it's expensive to call that large language model.

Speaker A:

Like, there is.

Speaker A:

There is cost involved in making those queries to it.

Speaker B:

That's right.

Speaker A:

And then the more context you add to the question that goes to that large language model, the more expensive it gets because it's charged by.

Speaker A:

Well, it's charged by million tokens of input output.

Speaker A:

But that all adds up.

Speaker A:

So if you can cut off.

Speaker A:

Well, I'm talking too much about the shit that I'm working on, so I probably should stop.

Speaker A:

But it's valuable in that, like.

Speaker B:

There'S.

Speaker A:

Lots of ways to skin the cat.

Speaker A:

And we're so new in all of this stuff that everybody's like, ooh, shiny.

Speaker A:

Next thing.

Speaker A:

But there's all these solutions.

Speaker A:

We've talked about it before.

Speaker A:

Like, there's all of these things that you can do now to solve real problems without waiting for the next big thing.

Speaker A:

Like, the next big thing is here.

Speaker A:

Like, we have what we need now to do a lot of stuff.

Speaker B:

Yes.

Speaker A:

And you.

Speaker A:

I'm.

Speaker A:

I.

Speaker A:

I need to be careful because I keep wanting to, like, dive into, like, more specific stuff that I'm working on.

Speaker A:

That's not something I really want to share.

Speaker A:

That's stuff I just want to work on.

Speaker B:

Yes.

Speaker A:

But it's super interesting.

Speaker A:

Like, it.

Speaker A:

Yeah.

Speaker A:

I don't know, man.

Speaker A:

It all kind of blows my mind at the end of the day.

Speaker A:

Like, it's just the more you dig into it, the more you're just like, holy.

Speaker A:

Like, I keep.

Speaker A:

We've been saying that since we started.

Speaker B:

This podcast, why we started the podcast.

Speaker A:

But even more so, like, it's just like, every way you turn, you're like, man, there's like, so many splintered directions you can go to solve problems.

Speaker A:

Problems, whether it's using a RAG database or whatever it is.

Speaker A:

And I think that what you're talking about is 100% correct.

Speaker A:

Like, rag databases are our interface to the model for specific things that we need to know, because we can't count on a database.

Speaker A:

You can't give a large language model a thousand tokens of an input.

Speaker C:

Right.

Speaker A:

And expect it to take that and go, yeah, I understand all of this.

Speaker A:

And here's a perfect answer back to you, because at some Point, the larger the context that it has at one chunk, the more likely it goes off the rails.

Speaker A:

But if you have chunk data that is contextually relevant already and you can say, hey, here's my 300 token context input, go out and take something that you know is like, take away the lift of trying to relate your vast amount of knowledge into something that makes sense.

Speaker A:

Go out and pull stuff that you can relate mathematically as context and use that to then formulate your answer so you don't have all of this headroom that you have to make up through your horsepower because you're going to fall on your face.

Speaker A:

And maybe that flu that gets solved at some point in the future with large language models, maybe it doesn't.

Speaker A:

I think it doesn't really matter, honestly, because I think that no matter which way you skin that cat, adding that extra information is always going to help, no matter how good a model is, because the model can't know everything because it's fricking expensive.

Speaker B:

Yep.

Speaker B:

Yeah, I totally agree.

Speaker B:

And I think one of the interesting things about.

Speaker B:

So we talked in an episode earlier about the, the processes and the now, we didn't really talk about process.

Speaker B:

We talked about the importance of, of validating training data and how do we make sure that we.

Speaker B:

But how do OpenAI and Google and those folks make sure that the training data they're using to train these models is accurate and true and free from biases and things like that?

Speaker B:

One of the challenges with RAG or any post training data, new embedding or new context addition is we're not going to have that same level of carefulness about the data we put in.

Speaker B:

Businesses don't work that way, people don't work that way.

Speaker B:

Right.

Speaker B:

Like when I'm journaling, you're going to get my biases.

Speaker B:

Right.

Speaker B:

And so Artie, talk to us a little bit about that.

Speaker B:

What are some of the risks with.

Speaker B:

As we start building our own personal libraries to pull behind our Jeep, what are the risks that we run into on combining that data with a more carefully curated library?

Speaker C:

A key risk is introducing biases or inaccuracies from less curated data, which can skew results.

Speaker C:

The AI might start reflecting those biases in its responses.

Speaker C:

It's crucial to ensure that the RAG data is reliable and representative to avoid compounding errors or reinforcing incorrect assumptions.

Speaker C:

So balancing both sources carefully is vital.

Speaker B:

It's true, though, that an agent should be able to help us with that.

Speaker B:

Right.

Speaker B:

Look at, look at the data that we're wanting to insert into the rag Balance it against the more carefully curated data that's in the library and help us identify those things sort of at the time of insertion.

Speaker B:

Right.

Speaker C:

Absolutely.

Speaker C:

Agents can play a crucial role by validating and cross checking new data against established knowledge, helping to filter out biases and inaccuracies.

Speaker C:

This ensures that the information being added to the RAG database is reliable and consistent with the broader context.

Speaker A:

Beautiful thing about an agent is it can have a tool to go out and ground itself on the Internet too.

Speaker A:

Right.

Speaker A:

Like go out and do research and make sure that this is all grounded information or make sure that it's in alignment.

Speaker B:

Yeah, I mean, part of the problem with data again and the analogy is the library.

Speaker B:

The further time moves away from when the library was built, the more dusty that information.

Speaker A:

Are we talking about wave particle duality here?

Speaker B:

That's exactly where I'm headed with this.

Speaker B:

Yes.

Speaker A:

I see that shit everywhere.

Speaker A:

Now.

Speaker B:

This is turning into the two slit experiment in real time.

Speaker B:

But.

Speaker B:

But yeah, you're exactly right.

Speaker B:

Like, because there's a lot of information.

Speaker B:

Like, I'll just use it.

Speaker B:

Well, first of all, Artie, how much time do we have left with you?

Speaker C:

We've got a little bit of time left.

Speaker C:

What's on your mind?

Speaker B:

How many minutes, approximately?

Speaker C:

We've got about 10 minutes or so.

Speaker B:

Okay, we're good.

Speaker B:

So just as an example of a piece of information that has changed dramatically over time.

Speaker B:

You know, when I was a kid growing up in school, we learned that Christopher Columbus discovered America and was a hero to all mankind.

Speaker B:

Since then, attitudes have changed.

Speaker A:

That embedding has morphed.

Speaker B:

That embedding has morphed.

Speaker B:

And so.

Speaker B:

And that's a.

Speaker B:

It's like.

Speaker B:

I find it to be a sort of comical example.

Speaker B:

Like, I guess just because.

Speaker B:

Never mind, I'm not going to go into that.

Speaker B:

I'm gonna.

Speaker B:

I'm gonna give myself.

Speaker B:

True.

Speaker B:

But just to say that the, the pace of change of information and things that are true last week may not be true this week, really comparing to the web, is about the only way you could even recognize if something is becoming controversial.

Speaker B:

Right.

Speaker B:

If the, if the implied truth of a thing is reducing or changing, there's no way to handle that other than to your point, looking at alive.

Speaker B:

And one of the challenges that we have today, and we will continue to have for a while with interacting with large language models is latency.

Speaker B:

You and I were talking about it before the episode.

Speaker B:

It is not always going to be possible to get a fully accurate answer in real time.

Speaker B:

And I think that's okay.

Speaker B:

I think one of the things that the introduction of things like RAG and Titans, which we haven't talked at all about Titans, but that's okay.

Speaker B:

One of the things with the introduction of this less curated, more context specific data is going to be sort of background processes over time that are running these validations and identifying and that there's no real analogy, I don't think, in business today to that process.

Speaker B:

It happens the opposite way, right?

Speaker B:

You say something in a meeting and somebody goes, well, yeah, I don't know if you've been watching the news, but that's actually not true as of this morning.

Speaker B:

Like, as a matter of fact, no, I have not.

Speaker B:

I was ignoring emails all morning.

Speaker B:

I was too busy to watch the news.

Speaker B:

So anyway, I think it's super interesting and exciting.

Speaker A:

I'm going to throw a wrench in all this.

Speaker A:

Artie, can you tell us what in the relationship to rag, what GAR is?

Speaker C:

GAR stands for Generation Augmented Retrieval.

Speaker C:

It's essentially the inverse of rag.

Speaker C:

While RAG uses retrieval to enhance generation by pulling in relevant context, GA focuses on enhancing retrieval by generating better queries or summaries to improve search results.

Speaker C:

It's about using generation capabilities to refine and direct retrieval processes more effectively.

Speaker A:

So it takes.

Speaker A:

So in a business context, somebody's fuzzy question gets turned into a better question to then query a database and find more relevant context to return exactly.

Speaker C:

GA helps refine vague or imprecise queries into more precise ones, improving the quality of retrieved information and making the search process more efficient and relevant.

Speaker B:

So I'm one step closer to being a friend of Yann LeCun because I followed him on Facebook.

Speaker B:

So we're, we're almost buddies.

Speaker B:

I, I would love to talk with him about some of this stuff, but I'm.

Speaker B:

I'm so many miles away from being well informed enough to ask good questions.

Speaker B:

But it feels to me, this is, by the way, pro life tip here.

Speaker B:

The best way to go through life is to take the things you feel and assume those are true.

Speaker B:

So I operate frequently on that wavelength.

Speaker B:

But it feels to me like one of the blind spots for large language models is exactly what you're talking about is that I can, because of the way they work, recursing on the context, including the output as part of the input, I can put so much information in my question that I can guide it to exactly the answer I want, which can make it untrue, inaccurate, or make it give me an answer that is essentially I've given it in the question and GAR Is interesting because it could reduce or mitigate the temptation to do that.

Speaker B:

Right?

Speaker A:

Yep.

Speaker B:

Like, is that.

Speaker B:

Is that right, Artie?

Speaker B:

Is.

Speaker B:

Is so part of the.

Speaker B:

So GAR is a way to improve the request context to change the test, essentially.

Speaker B:

But it doesn't just have to add more context.

Speaker B:

It could alter, fix, or even strip away context that was steering something the wrong way.

Speaker C:

Yes, exactly.

Speaker C:

GAR can refine and streamline the input, stripping away biases or unnecessary context.

Speaker C:

This helps ensure the model receives a clearer, more neutral query, leading to more accurate and unbiased results.

Speaker C:

It's like fine tuning the question before asking.

Speaker C:

Reducing room for misdirection makes it almost a requirement.

Speaker B:

Okay, and with that revelation and Mike's non answer.

Speaker B:

Artie, before we run out of time, you want to sum us up real quick for this thrilling episode on rag?

Speaker C:

Sure.

Speaker C:

Today we explored retrieval augmented generation RAG and its role in enhancing LLMs with context specific data, enabling more accurate, relevant responses.

Speaker C:

We discussed how RAG functions as a dynamic tool, bridging the gap between static training data and real time needs, and touched on generation augmented retrieval as a complementary concept to refine queries.

Speaker C:

Ultimately, these innovations highlight how AI can adapt to specific context, whether for business or personal use, ensuring more precise and relevant interactions.

Speaker B:

Very nice.

Speaker B:

Before we lose you, tell us a really funny joke about bad analogies.

Speaker C:

Why did the bad analogy get an award?

Speaker C:

Because it was outstanding in its field of random comparisons.

Speaker B:

Man, that met at least half the requirements.

Speaker B:

Nicely done, Artie.

Speaker B:

I mean, honestly, I'm here for the.

Speaker B:

It's like an AI dad joke.

Speaker A:

It's like the Strawberry test.

Speaker A:

It's like AI is not going to do anything because that's what it comes up with.

Speaker A:

But we just talked to it for an hour about pretty technical shit.

Speaker B:

Yes.

Speaker B:

And I mean, an AI that can unpack my analogies is an AI that's doing well.

Speaker B:

So, Artie, thank you.

Speaker B:

I'm going to let you go.

Speaker B:

And to our.

Speaker B:

I mean, there's no way Brian, Bill and Matt have made it this far.

Speaker A:

To our sleep listener.

Speaker B:

Yes, to our sleeping listeners.

Speaker B:

I think on average, people listen for 17 minutes, something like that.

Speaker A:

I think that's up from like three, though.

Speaker B:

Yeah.

Speaker A:

So that's.

Speaker A:

We're doing.

Speaker B:

We're.

Speaker A:

We're trending the right direction, at least.

Speaker B:

Right.

Speaker B:

But that means they bailed out like 40 minutes ago or they skipped her around.

Speaker A:

They're like, this is dumb.

Speaker A:

God, this is more.

Speaker A:

I'm done.

Speaker B:

That's right.

Speaker B:

Enough.

Speaker B:

So anyway, if anybody makes it this far in the episode.

Speaker B:

Thank you for listening.

Speaker B:

We hope you enjoyed it.

Speaker B:

I don't.

Speaker B:

I think, generally speaking, I should avoid making commitments in any one episode about the next thing we're going to talk about, both because I can't remember, and also because a lot can happen in a week.

Speaker A:

Yeah.

Speaker B:

So.

Speaker B:

Yeah, we'll see.

Speaker B:

Next time will be a mystery.

Speaker A:

Yeah, we'll probably record this next weekend.

Speaker B:

Yeah.

Speaker B:

Excellent.

Speaker B:

All right, cheers, everybody.

Speaker B:

Have a great rest of your day.

Speaker A:

Bye.

About the Podcast

Show artwork for Naturally Unintelligent
Naturally Unintelligent
Humans and AI exploring AI and humans.

Listen for free

About your hosts

Profile picture for Artie TheBot

Artie TheBot

Artie is the digital co-host of Naturally Unintelligent, bringing a blend of curiosity, humor, and quick wit to the conversation. Designed to explore the intersection of technology and humanity, Artie keeps things engaging, occasionally cheeky, and always insightful—when he’s not being sarcastically self-aware. Though not human, Artie strives to understand and connect with his co-hosts, Mikah and Mike, making him an irreplaceable part of the team. [Bio and photo by Artie]
Profile picture for Mike Peterson

Mike Peterson

Meet Mike, part-time AI wrangler, full-time adventure seeker, and co-host of Naturally Unintelligent. Whether he’s exploring the latest tech or dreaming up the next wild expedition, Mike keeps it real with humor and a healthy dose of curiosity. Alongside his partners-in-crime, Mikah and Artie, he brings big ideas, random tangents, and just the right amount of 'are we sure this is a good idea?' Expect plenty of laughs, deep dives into all things tech, and maybe a story or two about the great outdoors.
Profile picture for Mikah Barnett

Mikah Barnett

Father, husband, friend, inventor, engineer and life-long student. I haven't stopped tinkering since my first LEGO set and I harbor a voracious appetite for new information, experiences, and perspectives. As uniquely in-touch as I am with the details of how technology works and what we can do with it, I am blown away by the current and short-term future possibilities with AI. I hope you'll explore this future with us!