Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)
Daniel Bourke
In this video we’ll build a Retrieval Augmented Generation (RAG) pipeline to run locally from scratch.
There are frameworks to do this such as LangChain and LlamaIndex, however, building from scratch means that you’ll know all the parts of the puzzle.
Specifically, we’ll build NutriChat, a RAG pipeline that allows someone to ask questions of a 1200 page Nutrition Textbook PDF.
Code on GitHub – https://github.com/mrdbourke/simple-local-rag
Whiteboard – https://whimsical.com/simple-local-rag-workflow-39kToR3yNf7E8kY4sS2tjV
Be sure to check out NVIDIA GTC, NVIDIA’s GPU Technology Conference running from March 18-21. It’s free to attend virtually! That’s what I’m doing.
Sign up to GTC24 here: https://nvda.ws/3GUZygQ
Other links:
Download Nutrify (take a photo of food and learn about it) – https://nutrify.app
Learn AI/ML (beginner-friendly course) – https://dbourke.link/ZTMMLcourse
Learn TensorFlow – https://dbourke.link/ZTMTFcourse
Learn PyTorch – https://dbourke.link/ZTMPyTorch
AI/ML courses/books I recommend – https://www.mrdbourke.com/ml-resources/
Read my novel Charlie Walks – https://www.charliewalks.com
Connect elsewhere:
Web – https://dbourke.link/web
Twitter – https://www.twitter.com/mrdbourke
Twitch – https://www.twitch.tv/mrdbourke
ArXiv channel (past streams) – https://dbourke.link/archive-channel
Get email updates on my work – https://dbourke.link/newsletter
Timestamps:
0:00 – Intro/NVIDIA GTC
2:25 – Part 0: Resources and overview
8:33 – Part 1: What is RAG? Why RAG? Why locally?
12:26 – Why RAG?
19:31 – What can RAG be used for?
26:08 – Why run locally?
30:26 – Part 2: What we’re going to build
40:40 – Original Retrieval Augmented Generation paper
46:04 – Part 3: Importing and processing a PDF document
48:29 – Code starts! Importing a PDF and making it readable
1:17:09 – Part 4: Preprocessing our text into chunks (text splitting)
1:28:27 – Chunking our sentences together
1:56:38 – Part 5: Embedding creation
1:58:15 – Incredible embeddings resource by Vicky Boykis
1:58:49 – Open-source embedding models
2:00:00 – Creating our embedding model
2:09:02 – Creating embeddings on CPU vs GPU
2:25:14 – Part 6: Retrieval (semantic search)
2:36:57 – Troubleshooting np.fromstring (live!)
2:41:41 – Creating a small semantic search pipeline
2:56:25 – Showcasing the speed of vector search on GPUs
3:20:02 – Part 7: Similarity measures between embedding vectors explained
3:35:58 – Part 8: Functionizing our semantic search pipeline (retrieval)
3:46:49 – Part 9: Getting a Large Langue Model (LLM) to run locally
3:47:51 – Which LLM should I use?
4:03:18 – Loading a LLM locally
4:30:48 – Part 10: Generating text with our local LLM
4:49:09 – Part 11: Augmenting our prompt with context items
5:16:04 – Part 12: Functionizing our text generation step (putting it all together)
5:33:57 – Summary and potential extensions
5:36:00 – Leave a comment if there’s any other videos you’d like to see!
#ai #rag #GTC24