๐Ÿ†• Haystack 2.30 is here! Pass a plain string to any ChatGenerator
Maintained by deepset

Integration: OpenAI

Use OpenAI Models with Haystack

Authors
deepset

Table of Contents

Overview

You can use OpenAI Models in your Haystack pipelines with the Generators and Embedders. Check out the Whisper Integration for LocalWhisperTranscriber and RemoteWhisperTranscriber.

Installation

pip install haystack-ai

Usage

You can use OpenAI models in various ways:

Embedding Models

You can leverage embedding models from OpenAI through two components: OpenAITextEmbedder and OpenAIDocumentEmbedder.

To create semantic embeddings for documents, use OpenAIDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use OpenAITextEmbedder. Once you’ve selected the suitable component for your specific use case, initialize the component with the model name and OpenAI API key.

Below is the example indexing pipeline with InMemoryDocumentStore, OpenAIDocumentEmbedder and DocumentWriter:

from haystack import Document, Pipeline
from haystack.utils import Secret
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="text-embedding-ada-002"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"embedder": {"documents": documents}})

Generative Models (LLMs)

You can leverage OpenAI models through two components: OpenAIGenerator and OpenAIChatGenerator.

To use OpenAI’s GPT models for text generation, initialize a OpenAIGenerator with the model name and OpenAI API key. You can then use the OpenAIGenerator instance in a question answering pipeline after the PromptBuilder.

Below is the example of generative questions answering pipeline using RAG with PromptBuilder and OpenAIGenerator:

from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY")))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)   

Transcriber Models

To use Whisper models from OpenAI for audio transcription, see the Whisper Integration. It provides LocalWhisperTranscriber (runs Whisper on your machine) and RemoteWhisperTranscriber (uses the OpenAI Whisper API).