Integration: OpenAI
Use OpenAI Models with Haystack
Table of Contents
Overview
You can use OpenAI Models in your Haystack pipelines with the Generators and Embedders. Check out the Whisper Integration for LocalWhisperTranscriber and RemoteWhisperTranscriber.
Installation
pip install haystack-ai
Usage
You can use OpenAI models in various ways:
Embedding Models
You can leverage embedding models from OpenAI through two components: OpenAITextEmbedder and OpenAIDocumentEmbedder.
To create semantic embeddings for documents, use OpenAIDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use OpenAITextEmbedder. Once you’ve selected the suitable component for your specific use case, initialize the component with the model name and OpenAI API key.
Below is the example indexing pipeline with InMemoryDocumentStore, OpenAIDocumentEmbedder and DocumentWriter:
from haystack import Document, Pipeline
from haystack.utils import Secret
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="text-embedding-ada-002"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"embedder": {"documents": documents}})
Generative Models (LLMs)
You can leverage OpenAI models through two components: OpenAIGenerator and OpenAIChatGenerator.
To use OpenAI’s GPT models for text generation, initialize a OpenAIGenerator with the model name and OpenAI API key. You can then use the OpenAIGenerator instance in a question answering pipeline after the PromptBuilder.
Below is the example of generative questions answering pipeline using RAG with PromptBuilder and OpenAIGenerator:
from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document
docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])
query = "What is the capital of France?"
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{ query }}?
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY")))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
res=pipe.run({
"prompt_builder": {
"query": query
},
"retriever": {
"query": query
}
})
print(res)
Transcriber Models
To use Whisper models from OpenAI for audio transcription, see the
Whisper Integration. It provides LocalWhisperTranscriber (runs Whisper on your machine) and RemoteWhisperTranscriber (uses the OpenAI Whisper API).
