🆕 Haystack 2.30 is here! Pass a plain string to any ChatGenerator

Web QA with Mistral


Colab by Tuana Celik - ( LI & Twitter)

Quick guide to building Question Answering on the web with a Mistral AI model and Haystack. We use mistral-small-latest via the Mistral AI API and the MistralChatGenerator from the official Mistral integration for Haystack.

  1. Use the MistralChatGenerator to query the model on its own
  2. Add the generator to a full RAG Pipeline (on the web)

Screenshot 2023-12-13 at 17.46.33.png

Install dependencies

!uv pip install haystack-ai trafilatura sentence-transformers-haystack mistral-haystack

Prompt the Model - Standalone

We are using the Mistral AI API with mistral-small-latest.

import os
from getpass import getpass

os.environ["MISTRAL_API_KEY"] = getpass("Enter Mistral API key: ")
from haystack_integrations.components.generators.mistral import MistralChatGenerator

generator = MistralChatGenerator(model="mistral-small-latest")
from haystack.dataclasses import ChatMessage

messages = [
    ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
    ChatMessage.from_user("What's Natural Language Processing?")
]

result = generator.run(messages)
print(result["replies"][0].text)
**Natural Language Processing (NLP)** is a field of **artificial intelligence (AI)** that focuses on the interaction between computers and human (natural) languages. The goal of NLP is to enable computers to **understand, interpret, generate, and respond to human language** in a way that is both meaningful and useful.

### **Key Aspects of NLP:**
1. **Understanding Human Language**
   - NLP helps computers analyze and derive meaning from text or speech, just like humans do.
   - Example: Sentiment analysis (determining if a review is positive or negative).

2. **Language Generation**
   - NLP enables computers to produce human-like text or speech.
   - Example: Chatbots, automated email responses, or AI-generated articles.

3. **Translation & Communication**
   - NLP powers machine translation (e.g., Google Translate) and cross-language communication.

4. **Information Extraction**
   - NLP can pull out key details from large texts (e.g., summarizing news articles or extracting named entities like people, dates, and places).

5. **Speech Recognition**
   - Converts spoken language into text (e.g., Siri, Alexa, or voice-to-text tools).

### **Common NLP Techniques & Tools:**
- **Tokenization** (breaking text into words/sentences)
- **Part-of-Speech (POS) Tagging** (identifying nouns, verbs, etc.)
- **Named Entity Recognition (NER)** (finding people, places, organizations)
- **Sentiment Analysis** (detecting emotions in text)
- **Machine Learning & Deep Learning Models** (e.g., Transformers, BERT, GPT)
- **Rule-Based Systems** (using linguistic rules for specific tasks)

### **Real-World Applications of NLP:**
✔ **Virtual Assistants** (Siri, Alexa, Google Assistant)
✔ **Spam Detection** (filtering emails)
✔ **Language Translation** (Google Translate, DeepL)
✔ **Chatbots & Customer Support** (automated responses)
✔ **Medical Text Analysis** (extracting insights from patient records)
✔ **Social Media Monitoring** (analyzing trends and sentiments)

### **Challenges in NLP:**
- **Ambiguity** (words with multiple meanings, e.g., "bank" as a financial institution vs. a riverbank).
- **Context Understanding** (sarcasm, idioms, cultural nuances).
- **Multilingual Support** (handling different languages and dialects).
- **Bias in Language Models** (ensuring fairness in AI-generated text).

NLP is a rapidly evolving field, with advancements in **deep learning (e.g., Transformers, LLMs like ChatGPT)** making it more powerful than ever. Would you like a deeper dive into any specific aspect? 😊

Use the Model in a full RAG pipeline (on the web)

Here, we will be using the same generator component as the above, in a full RAG pipeline. You can change this pipeline to use your own data source (such as a vector database, Notion, documentation) instead of the LinkContentFetcher we are using here.

from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.components.rankers.sentence_transformers import SentenceTransformersSimilarityRanker
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
document_splitter = DocumentSplitter(split_by="word", split_length=50)
similarity_ranker = SentenceTransformersSimilarityRanker(top_k=3)

prompt_template = """
According to these documents:

{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Answer the given question: {{question}}
Answer:
"""

prompt_template = [ChatMessage.from_user(prompt_template)]
prompt_builder = ChatPromptBuilder(template=prompt_template, required_variables="*")

pipeline = Pipeline()
pipeline.add_component("fetcher", fetcher)
pipeline.add_component("converter", converter)
pipeline.add_component("splitter", document_splitter)
pipeline.add_component("ranker", similarity_ranker)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", generator)

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm")
<haystack.core.pipeline.pipeline.Pipeline object at 0x354231160>
🚅 Components
  - fetcher: LinkContentFetcher
  - converter: HTMLToDocument
  - splitter: DocumentSplitter
  - ranker: SentenceTransformersSimilarityRanker
  - prompt_builder: ChatPromptBuilder
  - llm: MistralChatGenerator
🛤️ Connections
  - fetcher.streams -> converter.sources (list[ByteStream])
  - converter.documents -> splitter.documents (list[Document])
  - splitter.documents -> ranker.documents (list[Document])
  - ranker.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])
question = "What do graphs have to do with Haystack?"
result = pipeline.run({"prompt_builder": {"question": question},
                   "ranker": {"query": question},
                   "fetcher": {"urls": ["https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent"]},
                    "llm":{}})

print(result['llm']['replies'][0].text)

Loading weights:   0%|          | 0/105 [00:00<?, ?it/s]


Loading weights: 100%|██████████| 105/105 [00:00<00:00, 9051.71it/s]

Graphs are fundamental to Haystack's architecture, particularly in how pipelines are structured and executed. Here's the connection:

1. **Pipeline Structure as Graphs**:
   - Haystack 1.x used **directed acyclic graphs (DAGs)**, where pipelines were linear or branching flows without cycles (like a water slide with no loops).
   - Haystack 2.0 expands this to **general (multi)graphs**, allowing:
     - **Branching**: Multiple paths from a single component.
     - **Joining**: Components merging outputs from multiple paths.
     - **Cyclic flows**: Loops where outputs can feed back into earlier components (e.g., for retries or infinite services).

2. **Why Graphs Matter**:
   - **Flexibility**: Graphs model complex workflows (e.g., retry logic, conditional branching) more naturally than linear DAGs.
   - **Explicitness**: Haystack 2.0 aims to make component interactions clearer by exposing the graph structure directly in the code.
   - **Advanced Use Cases**: Cycles enable pipelines to "loop back" (e.g., for iterative processing or continuous services).

3. **Implications**:
   - The shift from DAGs to general graphs reflects Haystack’s evolution toward more dynamic, resilient, and expressive pipelines.

In short, graphs are the backbone of Haystack’s pipeline design, enabling both simplicity (DAGs) and complexity (cycles, multi-path flows) as needed.