Large Language Models (LLMs) are powerful, but they often lack real-time knowledge. Retrieval-Augmented Generation (RAG) bridges this gap by fetching relevant information from external sources before generating responses. In this workshop, we’ll explore how to build an efficient RAG pipeline in Node.js using RSS feeds as a data source. We’ll compare different vector databases (FAISS, pgvector, Elasticsearch), embedding methods, and testing strategies. We’ll also cover the crucial role of chunking—splitting and structuring data effectively for better retrieval performance.
Software Engineer, Netherlands
My primary interest is self development and craftsmanship. I enjoy exploring technologies, coding open source and enterprise projects, teaching, speaking and writing about programming - JavaScript, Node.js, TypeScript, Go, Java, Docker, Kubernetes, JSON Schema, DevOps, Web Components, Algorithms 🎧 ⚽️ 💻 👋 ☕️ 🌊 🎾
Software Engineer, Netherlands
JavaScript developer with full-stack experience and frontend passion. He happily works at ING in a Fraud Prevention department, where helps to protect the finances of the ING customers.
Use of Artificial Intelligence to produce human consumable content in text, image, audio, video format.
Language Models are Machine Learning models trained on natural language resources and aim predict next word based on given context.
LM relevant tasks - summarization, Q&A, classification, and more.
Artificial Intelligence is powered by Large Language Models - models trained on tons of sources and materials, having billions of parameters.
RAG is a method that combines information retrieval with language model generation.
Retrieval = Search
Generation = LLM
Use Cases:
async function rag(q) {
const results = await search(q)
const prompt = makePrompt(q, results)
const answer = await llm(prompt)
return answer
}
LangChain is a Python and JavaScript framework that brings flexible abstractions and AI-first toolkit for developers to build with GenAI and integrate your applications with LLMs. It includes components for abstracting and chaining LLM prompts, configure and use vector databases (for semantic search), document loaders and splitters (to analyze documents and learn from them), output parsers, and more.
npm i langchain
ollama.chat({
messages: [
{ role: "user", content: "What is retrieval-augmented generation?" },
],
});
We aim to understand what happened in the Node.js community over the past year. To achieve this, we:
Chunking involves breaking down texts into smaller, manageable pieces called “chunks”. Each chunk becomes a unit of information that is vectorized and stored in a database, fundamentally shaping the efficiency and effectiveness of natural language processing tasks.
The number of characters we would like in our chunks. 50, 100, 100,000, etc.
The amount we would like our sequential chunks to overlap. This is to try to avoid cutting a single piece of context into multiple pieces. This will create duplicate data across chunks.
Length-based (CharacterTextSplitter
) - Easy & Simple, don’t take into account the text’s structure
const { CharacterTextSplitter } = require("@langchain/textsplitters");
const textSplitter = new CharacterTextSplitter({
chunkSize: 100,
chunkOverlap: 0,
});
const texts = await textSplitter.splitText(document)
Text-structured based (RecursiveCharacterTextSplitter
)
What split characters do you think are included in Langchain by default?
Semantic meaning based - Extract Propositions
Agentic Chunking
const { RecursiveCharacterTextSplitter } = require("@langchain/textsplitters");
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 100,
chunkOverlap: 0,
});
return await splitter.splitText(text);
An embedding is a vector representation of data in embedding space (projecting the high-dimensional space of initial data vectors into a lower-dimensional space).
Vectors are stored in a database, which compare them as a way to search for data that is similar in meaning (by using dot product or cosine measurement).
await Chroma.fromTexts(
texts,
{ id: Array.from({ length: texts.length }, (_, i) => `chunk-${i}`) },
embeddings,
{ collectionName: "rag_node_workshop_articles" },
);
Rerankers analyze the initial search output and adjust the ranking to better match user preferences and application requirements
Evaluate RAG the retriever and generator of a RAG pipeline separately
Hyperparameters:
Does your reranker model ranks the retrieved documents in the “correct” order?
Metrics:
Hyperparameters:
Can we use a smaller, faster, cheaper LLM?
Metrics:
Hit Rate (HR) or Recall at k: Measures the proportion of queries for which at least one relevant document is retrieved in the top k results.
Mean Reciprocal Rank (MRR): Evaluates the rank position of the first relevant document.
Chunk attributions/utilization - if chunk contributed to model/rag response
const relevantDocs = await getRelevantDocuments(vectorStore, summary);
const relevance = relevantDocs.map((d) => d.pageContent === chunk);
console.log(hitRate(relevance));
Please share your feedback on the workshop. Thank you and have a great coding!
If you like the workshop, you can become our patron, yay! 🙏