
This blog post details the creation of a Retrieval-Augmented Generation (RAG) chatbot that answers questions using uploaded documents. The process begins with document uploads that are chunked and transformed into embeddings, which are stored in a PostgreSQL database. When a user asks a question, the system embeds the query, retrieves the most relevant document chunks through cosine similarity, and feeds this context to an OpenAI model to generate accurate responses. Key insights include the importance of optimizing chunk size for context retention and effective prompt design to enhance output reliability, ultimately leading to a chatbot capable of providing precise, document-based answers.
Cary Li
Title: How I Built a RAG Chatbot with Vercel AI SDK
Content:
In this post, I'll walk through how I built a simple RAG (Retrieval-Augmented Generation) chatbot that can answer questions based on uploaded documents.
RAG stands for Retrieval-Augmented Generation. Instead of relying purely on the model's training data, it retrieves relevant content from your own documents first, then feeds that context into the model to generate an answer. This makes it much more accurate for domain-specific questions.
Vercel AI SDK for the chat interface and streaming
OpenAI text-embedding-3-small for generating embeddings
PostgreSQL with pgvector for vector storage and similarity search
Next.js as the full-stack framework
Upload a document — the text gets chunked and converted into embeddings, then stored in the database.
User asks a question — the question is also embedded and compared against stored chunks using cosine similarity.
Top matching chunks are retrieved and passed as context to the LLM.
The model generates a response grounded in your actual document content.
Getting the chunk size right matters a lot. Too small and you lose context, too large and the retrieval becomes noisy. I settled on roughly 500 tokens per chunk with some overlap.
Prompt design also plays a bigger role than I expected — telling the model to only answer based on the provided context, and to say "I don't know" when the answer isn't there, makes the output much more reliable.
Title: From Cursor to Claude Code: My Vibe Coding Journey
Content:
A year ago I started using Cursor. Today I use Claude Code for almost everything. Here's what changed.
Cursor felt like a supercharged VS Code. Autocomplete that actually understood context, inline chat, codebase-aware suggestions. It was a clear upgrade from GitHub Copilot.
But I was still writing most of the code myself. AI was a tool I reached for when stuck, not something I collaborated with continuously.
At some point I stopped thinking of AI as an autocomplete tool and started treating it more like a pair programmer. I'd describe what I wanted, review what came back, give feedback, iterate.
The code quality surprised me. Not perfect, but solid enough that my job became more about direction and review than raw implementation.
Claude Code lives in the terminal and works across the whole project — not just the file I have open. I can ask it to find something, refactor across files, or explain why something is broken, and it actually has the full picture.
The bigger shift is that I now spend more time thinking about what to build and less time thinking about how to write it. That's a meaningful change.
I still review every change. I still need to understand what the code is doing. But the output per hour has gone up noticeably, and the types of problems I can tackle solo have expanded.
Vibe coding isn't about letting AI do everything. It's about raising your own ceiling.