Multilingual RAG on a Podcast
Last Updated: June 19, 2026
Notebook by Stefano Fiorucci
This notebook shows how to create a multilingual Retrieval Augmented Generation application, starting from a podcast.
🧰 Stack:
- Haystack LLM framework
- OpenAI Whisper model for audio transcription
- Qdrant vector database
- multilingual embedding model: multilingual-e5-large
- multilingual LLM: Mistral Small
Installation
%%capture
! pip install -U mistral-haystack haystack-ai qdrant-haystack whisper-haystack "openai-whisper>=20231106" pytube sentence-transformers-haystack "huggingface_hub>=0.23.0"
Podcast transcription
- download the audio from Youtube using
pytube - transcribe it locally using Haystack’s
LocalWhisperTranscriberwith thewhisper-smallmodel. We could use bigger models, which take longer to transcribe. We could also call the paid OpenAI API, usingRemoteWhisperTranscriber.
Since the transcription takes some time (about 10 minutes), I commented out the following code and will provide the transcription.
# # https://www.tutorialspoint.com/download-video-in-mp3-format-using-pytube
# from pytube import YouTube
# url = "https://www.youtube.com/watch?v=vrf4_XMSlE0"
# video = YouTube(url)
# stream = video.streams.filter(only_audio=True).first()
# stream.download(filename=f"podcast.mp3")
# from haystack_integrations.components.audio.whisper import LocalWhisperTranscriber
# from haystack.utils import ComponentDevice
# whisper = LocalWhisperTranscriber(model="small", device=ComponentDevice.from_str("cuda:0"),)
# transcription = whisper.run(sources=["podcast.mp3"])
# with open('podcast_transcript_whisper_small.txt.txt', 'w') as fo:
# fo.write(transcription["documents"][0].content)
Indexing pipeline
Create an Indexing pipeline that stores chunks of the transcript in the Qdrant vector database.
-
TextFileToDocumentconverts the transcript into a Haystack Document. -
DocumentSplitterdivides the original Document into smaller chunks. -
SentenceTransformersDocumentEmbeddercomputes embeddings(=vector representations) of Documents using a multilingual model, to allow semantic retrieval -
DocumentWriterstores the Documents in Qdrant
# download the podcast transcript
# to create the transcript, you can uncomment and run the previous section
!wget "https://raw.githubusercontent.com/deepset-ai/haystack-cookbook/main/data/multilingual_rag_podcast/podcast_transcript_whisper_small.txt"
# let's take a look at the begininning of our 🇮🇹 transcript
!head --bytes 300 podcast_transcript_whisper_small.txt
--2026-06-16 11:46:28-- https://raw.githubusercontent.com/deepset-ai/haystack-cookbook/main/data/multilingual_rag_podcast/podcast_transcript_whisper_small.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response...
200 OK
Length: 61083 (60K) [text/plain]
Saving to: ‘podcast_transcript_whisper_small.txt’
podcast_t 0%[ ] 0 --.-KB/s
podcast_transcript_ 100%[===================>] 59.65K --.-KB/s in 0.007s
2026-06-16 11:46:28 (8.54 MB/s) - ‘podcast_transcript_whisper_small.txt’ saved [61083/61083]
Ciao e benvenuti nella puntata 183 del Pointer Podcast, torniamo oggi con degli ospiti, ma prima vi introduto Eugenio, ciao Eugenio. Ciao Luca. Come va? Tutto bene? Tutto bene, tutto bene. Oggi abbiamo due ospiti che arrivano dalla stessa azienda, che è una azienda che produce una libreria che pro
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack import Pipeline
from haystack_integrations.components.embedders.sentence_transformers import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.preprocessors import DocumentSplitter
from haystack.utils import ComponentDevice
# initialize the Document store
document_store = QdrantDocumentStore(
":memory:",
embedding_dim=1024, # the embedding_dim should match that of the embedding model
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("text_file_converter", TextFileToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))
indexing_pipeline.add_component(
"embedder",
SentenceTransformersDocumentEmbedder(
model="intfloat/multilingual-e5-large", # good multilingual model: https://huggingface.co/intfloat/multilingual-e5-large
device=ComponentDevice.from_str("cuda:0"), # load the model on GPU
prefix="passage:", # as explained in the model card (https://huggingface.co/intfloat/multilingual-e5-large#faq), documents should be prefixed with "passage:"
))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
# connect the components
indexing_pipeline.connect("text_file_converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7c44a22ce420>
🚅 Components
- text_file_converter: TextFileToDocument
- splitter: DocumentSplitter
- embedder: SentenceTransformersDocumentEmbedder
- writer: DocumentWriter
🛤️ Connections
- text_file_converter.documents -> splitter.documents (list[Document])
- splitter.documents -> embedder.documents (list[Document])
- embedder.documents -> writer.documents (list[Document])
# show the pipeline
# indexing_pipeline.show()
# Run the pipeline! 🚀
res = indexing_pipeline.run({"text_file_converter":{"sources":["podcast_transcript_whisper_small.txt"]}})
modules.json: 0%| | 0.00/387 [00:00<?, ?B/s]
README.md: 0%| | 0.00/160k [00:00<?, ?B/s]
sentence_bert_config.json: 0%| | 0.00/57.0 [00:00<?, ?B/s]
config.json: 0%| | 0.00/690 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/2.24G [00:00<?, ?B/s]
Loading weights: 0%| | 0/391 [00:00<?, ?it/s]
tokenizer_config.json: 0%| | 0.00/418 [00:00<?, ?B/s]
sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/280 [00:00<?, ?B/s]
config.json: 0%| | 0.00/201 [00:00<?, ?B/s]
Batches: 0%| | 0/2 [00:00<?, ?it/s]
0%| | 0/52 [00:00<?, ?it/s]
100it [00:00, 1590.70it/s]
document_store.count_documents()
52
RAG pipeline
Finally our RAG pipeline: from an Italian podcast 🇮🇹🎧 to answering questions in English 🇬🇧
-
SentenceTransformersTextEmbeddertransforms the query into a vector that captures its semantics, to allow vector retrieval -
QdrantEmbeddingRetrievercompares the query and Document embeddings and fetches the Documents most relevant to the query. -
ChatPromptBuilderprepares the prompt for the LLM: renders a prompt template and fill in variable values. -
MistralChatGeneratorallows using Mistral LLMs. Read their Quickstart to get an API key.
from haystack_integrations.components.generators.mistral import MistralChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack.dataclasses import ChatMessage
from pprint import pprint
from getpass import getpass
import os
os.environ["MISTRAL_API_KEY"] = getpass("Enter your Mistral API key")
generator = MistralChatGenerator(model="mistral-small-latest")
generator.run([ChatMessage.from_user("Please explain in a fun way why vim is the ultimate IDE")])
{'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Ah, the eternal holy war: **Vim vs. The World**! 🎭 Let’s break it down in a way that’s as fun as it is accurate—because Vim isn’t just an editor; it’s a **lifestyle, a religion, and a superpower** all rolled into one.\n\n---\n\n### **1. Vim is the "Swiss Army Knife" of Editors 🔧**\nImagine your IDE is a **giant, bloated spaceship** (looking at you, VS Code with 500 extensions). Vim? It’s the **pocket knife** you carry everywhere—small, sharp, and *always* ready to slice through code.\n\n- **No bloat?** ✅ (It’s just a text editor, not a browser with a debugger attached.)\n- **No lag?** ✅ (It’s faster than your coffee gets cold.)\n- **Works everywhere?** ✅ (From a **raspberry Pi** to a **mainframe**, Vim is there.)\n\n---\n\n### **2. Modal Editing = The Matrix for Coders 🕶️**\nNormal mode, Insert mode, Visual mode—Vim is like **learning Neo’s kung fu** but for text.\n\n- **Normal mode** = **God Mode** (You fly through code like Neo dodges bullets.)\n- **Insert mode** = **Typing like a regular person** (But why would you?)\n- **Visual mode** = **Selecting text like a boss** (No mouse needed—ever.)\n\nOnce you get the hang of it, **regular editors feel like typing with mittens on.**\n\n---\n\n### **3. Plugins? Vim Has "Vim-Plug" (And It’s Awesome) 🧩**\nYes, Vim can be an IDE! With plugins like:\n- **fzf.vim** (Fuzzy file finder—faster than `Ctrl+P` in VS Code.)\n- **coc.nvim** (Intellisense, autocompletion—like VS Code but *inside Vim*.)\n- **vim-fugitive** (Git integration—because `git commit` in the terminal is *so* 2005.)\n- **NERDTree** (File explorer—because `ls` is for peasants.)\n\nIt’s like **LEGO for developers**—build your perfect IDE, one plugin at a time.\n\n---\n\n### **4. Keyboard > Mouse (Because Mice Are for Cats) ⌨️🐱**\n- **No reaching for the mouse** = **No wrist pain** = **No regrets.**\n- **HJKL navigation** = **Your fingers never leave the home row.**\n- **Macros** = **Record repetitive tasks like a robot overlord.**\n\nOnce you go keyboard-only, **you’ll never touch a mouse again** (unless you’re playing a game).\n\n---\n\n### **5. Vim is the Ultimate "Undo" Button for Your Workflow 🔄**\nMade a mistake? `u` undoes it.\nNeed to redo? `Ctrl+r` redoes it.\nWant to repeat a command? `.` repeats it.\n**Vim is the only editor where "undo" is a single keystroke.**\n\n---\n\n### **6. It’s Everywhere (Even in Your Nightmares) 🌍**\n- **Linux servers?** Vim is there.\n- **MacOS?** Vim is there.\n- **Windows?** (Yes, even there—though you might need to install it.)\n- **Your future job?** Probably requires Vim (because sysadmins love it).\n\n**Learning Vim is like learning Latin—it never dies.**\n\n---\n\n### **7. The Vim Community is Like a Secret Society 🕵️\u200d♂️**\n- **Vim golf** (Write code in the fewest keystrokes possible—like code golf but harder.)\n- **Vim memes** (Because nothing is funnier than `:wq` failing.)\n- **Vim vs. Emacs wars** (The original holy war—now it’s just Vim vs. *everything else*.)\n\nOnce you join, you’re part of an **elite club** of people who **don’t need a GUI to be productive.**\n\n---\n\n### **Final Verdict: Why Vim is the Ultimate IDE 🏆**\n✅ **Fast as lightning** (No waiting for Electron to load.)\n✅ **Customizable like a dream** (Make it *your* IDE.)\n✅ **Works on anything** (From a toaster to a supercomputer.)\n✅ **Teaches you discipline** (No hand-holding—just pure efficiency.)\n✅ **Future-proof** (Will still work in 2050.)\n\n**Other IDEs?** They’re like **training wheels**—Vim is the **motorcycle you ride into the sunset.**\n\n---\n\n### **So, Should You Learn Vim?**\n**Yes.** Even if you only use it for **quick edits**, it’ll make you a **better programmer.**\n\nAnd if someone says *"Vim is hard,"* just smile and say:\n*"So is life. Now go `:wq` and deal with it."* 😎\n\n---\n**Want to get started?**\nRun `vimtutor` in your terminal—it’s the **fastest way to become dangerous.** 🚀')], _name=None, _meta={'model': 'mistral-small-latest', 'index': 0, 'finish_reason': 'stop', 'usage': {'prompt_tokens': 27, 'total_tokens': 1160, 'completion_tokens': 1133, 'prompt_tokens_details': {'cached_tokens': 0}}})]}
# define a multilingual template
template = [ChatMessage.from_user("""
Using only the information contained in these documents in Italian, answer the question using English.
If the answer cannot be inferred from the documents, respond \"I don't know\".
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Question: {{question}}
Answer:
""")]
# define the query pipeline
query_pipeline = Pipeline()
query_pipeline.add_component(
"text_embedder",
SentenceTransformersTextEmbedder(
model="intfloat/multilingual-e5-large", # good multilingual model: https://huggingface.co/intfloat/multilingual-e5-large
device=ComponentDevice.from_str("cuda:0"), # load the model on GPU
prefix="query:", # as explained in the model card (https://huggingface.co/intfloat/multilingual-e5-large#faq), queries should be prefixed with "query:"
))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", ChatPromptBuilder(template=template))
query_pipeline.add_component("generator", generator)
# connect the components
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")
WARNING:haystack.components.builders.chat_prompt_builder:ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.
<haystack.core.pipeline.pipeline.Pipeline object at 0x7c447fc5aff0>
🚅 Components
- text_embedder: SentenceTransformersTextEmbedder
- retriever: QdrantEmbeddingRetriever
- prompt_builder: ChatPromptBuilder
- generator: MistralChatGenerator
🛤️ Connections
- text_embedder.embedding -> retriever.query_embedding (list[float])
- retriever.documents -> prompt_builder.documents (list[Document])
- prompt_builder.prompt -> generator.messages (list[ChatMessage])
# show the pipeline
# query_pipeline.show()
# try the pipeline
question = "What is Pointer Podcast?"
results = query_pipeline.run(
{ "text_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
for d in results['generator']['replies']:
pprint(d.text)
Batches: 0%| | 0/1 [00:00<?, ?it/s]
('Pointer Podcast is an Italian podcast available on platforms like Apple '
'Podcasts, Google Podcasts, and Spotify. The podcast features discussions, '
'often with guests, about topics such as Large Language Models (LLMs) and '
'Natural Language Processing (NLP). The hosts, Luca and Eugenio, introduce '
'episodes and interview experts like Stefano Fiorucci and Sara Zanzotera, who '
'work with frameworks like iStack at DeepSet. The podcast encourages '
'listeners to leave reviews, share episodes on social media, and subscribe to '
'stay updated on future episodes.')
✨ Nice!
# let's create a simple wrapper to call the pipeline and show the answers
def ask_rag(question: str):
results = query_pipeline.run(
{
"text_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
for d in results["generator"]["replies"]:
pprint(d.text)
Try our multilingual RAG application!
import random
questions="""What are some interesting directions in Large Language Models?
What is Haystack?
What is Ollama?
How did Stefano end up working at deepset?
Will open source models achieve the quality of closed ones?
What are the main features of Haystack?
Summarize in a bulleted list the main stages of training a Large Language Model
What is Zephyr?
What is it and why is the quantization of Large Language Models interesting?
Could you point out the names of the hosts and guests of the podcast?""".split("\n")
q = random.choice(questions)
print(q)
ask_rag(q)
What is Zephyr?
Batches: 0%| | 0/1 [00:00<?, ?it/s]
('Zephyr is a small, open-source language model developed by Hugging Face, as '
'mentioned in the first document. It was created by distilling many '
'characteristics of much larger models (such as GPT-4 or GPT-3.5 Turbo) into '
'a more compact model using automated techniques, with a focus on aligning '
'with human preferences but relying less on direct human supervision. Hugging '
'Face released both the technical report and the "recipe" for replicating '
'Zephyr, making it accessible even with a modest budget (e.g., a few thousand '
'dollars).')
