Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock

Back in September, we introduced Knowledge Bases for Amazon Bedrock in preview. Starting today, Knowledge Bases for Amazon Bedrock is generally available.

With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant, context-specific, and accurate responses without continuously retraining the FM. All information retrieved from knowledge bases comes with source attribution to improve transparency and minimize hallucinations. If you’re curious how this works, check out my previous post that includes a primer on RAG.

With today’s launch, Knowledge Bases gives you a fully managed RAG experience and the easiest way to get started with RAG in Amazon Bedrock. Knowledge Bases now manages the initial vector store setup, handles the embedding and querying, and provides source attribution and short-term memory needed for production RAG applications. If needed, you can also customize the RAG workflows to meet specific use case requirements or integrate RAG with other generative artificial intelligence (AI) tools and applications.

Fully managed RAG experience
Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you. You specify the location of your data, select an embedding model to convert the data into vector embeddings, and have Amazon Bedrock create a vector store in your account to store the vector data. When you select this option (available only in the console), Amazon Bedrock creates a vector index in Amazon OpenSearch Serverless in your account, removing the need to manage anything yourself.

Vector embeddings include the numeric representations of text data within your documents. Each embedding aims to capture the semantic or contextual meaning of the data. Amazon Bedrock takes care of creating, storing, managing, and updating your embeddings in the vector store, and it ensures your data is always in sync with your vector store.

Amazon Bedrock now also supports two new APIs for RAG that handle the embedding and querying and provide the source attribution and short-term memory needed for production RAG applications.

With the new RetrieveAndGenerate API, you can directly retrieve relevant information from your knowledge bases and have Amazon Bedrock generate a response from the results by specifying a FM in your API call. Let me show you how this works.

Use the RetrieveAndGenerate API
To give it a try, navigate to the Amazon Bedrock console, create and select a knowledge base, then select Test knowledge base. For this demo, I created a knowledge base that has access to a PDF of Generative AI on AWS. I choose Select Model to specify a FM.

Then, I ask, “What is Amazon Bedrock?”

Behind the scenes, Amazon Bedrock converts the queries into embeddings, queries the knowledge base, and then augments the FM prompt with the search results as context information and returns the FM-generated response to my question. For multi-turn conversations, Knowledge Bases manages the short-term memory of the conversation to provide more contextual results.

Here’s a quick demo of how to use the APIs with the AWS SDK for Python (Boto3).

def retrieveAndGenerate(input, kbId):
    return bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1'
                }
            }
        )

response = retrieveAndGenerate("What is Amazon Bedrock?", "AES9P3MT9T")["output"]["text"]

The output of the RetrieveAndGenerate API includes the generated response, the source attribution, and the retrieved text chunks. In my demo, the API response looks like this (with some of the output redacted for brevity):


{ ... 
    'output': {'text': 'Amazon Bedrock is a managed service from AWS that ...'}, 
    'citations': 
        [{'generatedResponsePart': 
             {'textResponsePart': 
                 {'text': 'Amazon Bedrock is ...', 'span': {'start': 0, 'end': 241}}
             }, 
	      'retrievedReferences': 
			[{'content':
                 {'text': 'All AWS-managed service API activity...'}, 
				 'location': {'type': 'S3', 's3Location': {'uri': 's3://data-generative-ai-on-aws/gaia.pdf'}}}, 
		     {'content': 
			      {'text': 'Changing a portion of the image using ...'}, 
				  'location': {'type': 'S3', 's3Location': {'uri': 's3://data-generative-ai-on-aws/gaia.pdf'}}}, ...]
        ...}]
}

The generated response looks like this:

Amazon Bedrock is a managed service that offers a serverless experience for generative AI through a simple API. It provides access to foundation models from Amazon and third parties for tasks like text generation, image generation, and building conversational agents. Data processed through Amazon Bedrock remains private and encrypted.

Customize RAG workflows
If you want to process the retrieved text chunks further, see the relevance scores of the retrievals, or develop your own orchestration for text generation, you can use the new Retrieve API. This API converts user queries into embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom workflows on top of the semantic search results.

Use the Retrieve API
In the Amazon Bedrock console, I toggle the switch to disable Generate responses.

Then, I ask again, “What is Amazon Bedrock?” This time, the output shows me the retrieval results with links to the source documents where the text chunks came from.

Here’s how to use the Retrieve API with boto3.

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_runtime.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults
            }
        }
    )

response = retrieve("What is Amazon Bedrock?", "AES9P3MT9T")["retrievalResults"]

The output of the Retrieve API includes the retrieved text chunks, the location type and URI of the source data, and the scores of the retrievals. The score helps to determine chunks that match more closely with the query.

In my demo, the API response looks like this (with some of the output redacted for brevity):

[{'content': {'text': 'Changing a portion of the image using ...'},
  'location': {'type': 'S3',
   's3Location': {'uri': 's3://data-generative-ai-on-aws/gaia.pdf'}},
  'score': 0.7329834},
 {'content': {'text': 'back to the user in natural language. For ...'},
  'location': {'type': 'S3',
   's3Location': {'uri': 's3://data-generative-ai-on-aws/gaia.pdf'}},
  'score': 0.7331088},
...]

To further customize your RAG workflows, you can define a custom chunking strategy and select a custom vector store.

Custom chunking strategy – To enable effective retrieval from your data, a common practice is to first split the documents into manageable chunks. This enhances the model’s capacity to comprehend and process information more effectively, leading to improved relevant retrievals and generation of coherent responses. Knowledge Bases for Amazon Bedrock manages the chunking of your documents.

When you configure the data source for your knowledge base, you can now define a chunking strategy. Default chunking splits data into chunks of up to 200 tokens and is optimized for question-answer tasks. Use default chunking when you are not sure of the optimal chunk size for your data.

You also have the option to specify a custom chunk size and overlap with fixed-size chunking. Use fixed-size chunking if you know the optimal chunk size and overlap for your data (based on file attributes, accuracy testing, and so on). An overlap between chunks in the recommended range of 0–20 percent can help improve accuracy. Higher overlap can lead to decreased relevancy scores.

If you select to create one embedding per document, Knowledge Bases keeps each file as a single chunk. Use this option if you don’t want Amazon Bedrock to chunk your data, for example, if you want to chunk your data offline using an algorithm that is specific to your use case. Common use cases include code documentation.

Custom vector store – You can also select a custom vector store. The available vector database options include vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud. To use a custom vector store, you must create a new, empty vector database from the list of supported options and provide the vector database index name as well as index field and metadata field mappings. This vector database will need to be for exclusive use with Amazon Bedrock.

Integrate RAG with other generative AI tools and applications
If you want to build an AI assistant that can perform multistep tasks and access company data sources to generate more relevant and context-aware responses, you can integrate Knowledge Bases with Agents for Amazon Bedrock. You can also use the Knowledge Bases retrieval plugin for LangChain to integrate RAG workflows into your generative AI applications.

Availability
Knowledge bases for Amazon Bedrock is available today in AWS Regions US East (N. Virginia) and US West (Oregon).

Learn more

— Antje

Related Posts