Part #2: Create Your Model Endpoint With Amazon SageMaker, AWS Lambda, and AWS API Gateway

Dominic Frei7 min read • Published Sep 18, 2024 • Updated Sep 18, 2024

AI AWS Serverless Atlas Vector Search Python

Rate this tutorial

Welcome to Part 2 of the Amazon SageMaker + Atlas Vector Search series. In Part 1, I showed you how to set up an architecture that uses both tools to create embeddings for your data and how to use those to then semantically search through your data.

This article is part of a three part series:

In this part of the series, we will look into the actual doing. No more theory! Part 2 will show you how to create the REST service described in the architecture.

The REST endpoint will serve as the encoder that creates embeddings (vectors) that will then be used in the next part of this series to search through your data semantically. The deployment of the model will be handled by Amazon SageMaker, AWS's all-in-one ML service. We will expose this endpoint using AWS Lambda and AWS API Gateway later on to make it available to the server app.

Amazon SageMaker

Amazon SageMaker is a cloud-based, machine-learning platform that enables developers to build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.

Getting Started With Amazon SageMaker

Amazon SageMaker JumpStart helps you quickly and easily get started with machine learning. The solutions are fully customizable and support one-click deployment and fine-tuning of more than 150 popular open-source models, such as natural language processing, object detection, and image classification models.

It includes a number of popular solutions:

Extract and analyze data: Automatically extract, process, and analyze documents for more accurate investigation and faster decision-making.
Fraud detection: Automate detection of suspicious transactions faster and alert your customers to reduce potential financial loss.
Churn prediction: Predict the likelihood of customer churn and improve retention by honing in on likely abandoners and taking remedial actions such as promotional offers.
Personalized recommendations: Deliver customized, unique experiences to customers to improve customer satisfaction and grow your business rapidly.

Let's set up a playground for you to try it out!

Before we start, make sure you choose a region that is supported for RStudio (more on that later) and JumpStart. You can check both on the Amazon SageMaker pricing page by checking if your desired region appears in the On-Demand Pricing list.

On the main page of Amazon SageMaker, you'll find the option to Set up for a single user. This will set up a domain and a quick-start user.

A QuickSetupDomain is basically just a default configuration so that you can get started deploying models and trying out SageMaker. You can customize it later to your needs.

The initial setup only has to be done once, but it might take several minutes. When finished, Amazon SageMaker will notify you that the new domain is ready.

Amazon SageMaker Domain supports Amazon SageMaker machine learning (ML) environments and contains the following:

The domain itself, which holds an AWS EC2 that models will be deployed onto. This inherently contains a list of authorized users and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
The UserProfile, which represents a single user within a domain that you will be working with.
A shared space, which consists of a shared JupyterServer application and shared directory. All users within the domain have access to the same shared space.
An App, which represents an application that supports the reading and execution experience of the user’s notebooks, terminals, and consoles.

After the creation of the domain and the user, you can launch the SageMaker Studio, which will be your platform to interact with SageMaker, your models, and deployments for this user.

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models.

Here, we’ll go ahead and start with a new JumpStart solution.

All you need to do to set up your JumpStart solution is to choose a model. For this tutorial, we will be using an embedding model called All MiniLM L6 v2 by Hugging Face.

When choosing the model, click on Deploy and SageMaker will get everything ready for you.

You can adjust the endpoint to your needs but for this tutorial, you can totally go with the defaults.

As soon as the model shows its status as In service, everything is ready to be used.

Note that the endpoint name here is jumpstart-dft-hf-textembedding-all-20240117-062453. Note down your endpoint name — you will need it in the next step.

Using the model to create embeddings

Now that the model is set up and the endpoint ready to be used, we can expose it for our server application.

We won’t be exposing the SageMaker endpoint directly. Instead, we will be using AWS API Gateway and AWS Lambda.

Let’s first start by creating the lambda function that uses the endpoint to create embeddings.

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is designed to enable developers to run code without provisioning or managing servers. It executes code in response to events and automatically manages the computing resources required by that code.

In the main AWS Console, go to AWS Lambda and click Create function.

Choose to Author from scratch, give your function a name (sageMakerLambda, for example), and choose the runtime. For this example, we’ll be running on Python.

When everything is set correctly, create the function.

The following code snippet assumes that the lambda function and the Amazon SageMaker endpoint are deployed in the same AWS account. All you have to do is replace <YOUR_ENDPOINT_NAME> with your actual endpoint name from the previous section.

Note that the lambda_handler returns a status code and a body. It’s ready to be exposed as an endpoint, for using AWS API Gateway.

1 import json
2 import boto3
3 
4 sagemaker_runtime_client = boto3.client("sagemaker-runtime")
5 
6 def lambda_handler(event, context):
7     try:
8         # Extract the query parameter 'query' from the event
9         query_param = event.get('queryStringParameters', {}).get('query', '')
10 
11         if query_param:
12             embedding = get_embedding(query_param)
13             return {
14                 'statusCode': 200,
15                 'body': json.dumps({'embedding': embedding})
16             }
17         else:
18             return {
19                 'statusCode': 400,
20                 'body': json.dumps({'error': 'No query parameter provided'})
21             }
22 
23     except Exception as e:
24         return {
25             'statusCode': 500,
26             'body': json.dumps({'error': str(e)})
27         }
28 
29 def get_embedding(synopsis):
30     input_data = {"text_inputs": synopsis}
31     response = sagemaker_runtime_client.invoke_endpoint(
32         EndpointName="<YOUR_ENDPOINT_NAME>",
33         Body=json.dumps(input_data),
34         ContentType="application/json"
35     )
36     result = json.loads(response["Body"].read().decode())
37     embedding = result["embedding"][0]
38     return embedding

Don’t forget to click Deploy!

One last thing we need to do before we can use this lambda function is to make sure it actually has permission to execute the SageMaker endpoint. Head to the Configuration part of your Lambda function and then to Permissions. You can just click on the Role Name link to get to the associated role in AWS Identity and Access Management (IAM).

In IAM, you want to choose Add permissions.

You can choose Attach policies to attach pre-created policies from the IAM policy list.

For now, let’s use the AmazonSageMakerFullAccess, but keep in mind to select only those permissions that you need for your specific application.

Exposing your lambda function via AWS API Gateway

Now, let’s head to AWS API Gateway, click Create API, and then Build on the REST API.

Choose to create a new API and name it. In this example, we’re calling it sageMakerApi.

That’s all you have to do for now. The API endpoint type can stay on regional, assuming you created the lambda function in the same region. Hit Create API.

First, we need to create a new resource.

The resource path will be /. Pick a name like sageMakerResource.

Next, you'll get back to your API overview. This time, click Create method. We need a GET method that integrates with a lambda function.

Check the Lambda proxy integration and choose the lambda function that you created in the previous section. Then, create the method.

Finally, don’t forget to deploy the API.

Choose a stage. This will influence the URL that we need to use (API Gateway will show you the full URL in a moment). Since we’re still testing, TEST might be a good choice.

This is only a test for a tutorial, but before deploying to production, please also add security layers like API keys. When everything is ready, the Resources tab should look something like this.

When sending requests to the API Gateway, we will receive the query as a URL query string parameter. The next step is to configure API Gateway and tell it so, and also tell it what to do with it. Go to your Resources, click on GET again, and head to the Method request tab. Click Edit.

In the URL query string parameters section, you want to add a new query string by giving it a name. We chose query here. Set it to Required but not cached and save it.

The new endpoint is created. At this point, we can grab the URL and test it via cURL to see if that part worked fine. You can find the full URL (including stage and endpoint) in the Stages tab by opening the stage and endpoint and clicking on GET. For this example, it’s https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource. Your URL should look similar.

Using the Amazon Cloud Shell or any other terminal, try to execute a cURL request:

1 curl -X GET 'https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource?query=foo'

If everything was set up correctly, you should get a result that looks like this (the array contains 384 entries in total):

1 {"embedding": [0.01623343490064144, -0.007662375457584858, 0.01860642433166504, 0.031969036906957626,................... -0.031003709882497787, 0.008777940645813942]}

Your embeddings REST service is ready. Congratulations! Now you can convert your data into a vector with 384 dimensions!

In the next and final part of the tutorial, we will be looking into using this endpoint to prepare vectors and execute a vector search using MongoDB Atlas.

✅ Sign-up for a free cluster.

✅ Already have an AWS account? Atlas supports paying for usage via the AWS Marketplace (AWS MP) without any upfront commitment — simply sign up for MongoDB Atlas via AWS Marketplace.

✅ Get help on our Community Forums.

Top Comments in Forums

tadi_bitiniTadi Bitini2 days ago

I am getting this error : curl -X GET ‘https://lpeg644c9b.execute-api.us-east-1.amazonaws.com/proddidi/sageMakerResource?query=foo’
{“error”: “An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message "Failed to deserialize the JSON body into the target type: missing field inputs at line 1 column 22". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/jumpstart-dft-hf-textembedding-all-20250417-123155 in account 102570980430 for more information.”}% (base) didibitini@Didis-Air atlas_starter_python-master %

tadi_bitiniTadi Bitini2 days ago

Hello, it working with the below code :

import json
import boto3

sagemaker_runtime_client = boto3.client(“sagemaker-runtime”)

def lambda_handler(event, context):
try:
# Extract the query parameter ‘query’ from the event
query_param = event.get(‘queryStringParameters’, {}).get(‘query’, ‘’)

    if query_param:
        embedding = get_embedding(query_param)
        return {
            'statusCode': 200,
            'body': json.dumps({'embedding': embedding})
        }
    else:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': 'No query parameter provided didi'})
        }

except Exception as e:
    return {
        'statusCode': 500,
        'body': json.dumps({'error': str(e)})
    }

def get_embedding(synopsis):
input_data = {“inputs”: synopsis}
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=“jumpstart-dft-hf-textembedding-all-20250417-123155”,
Body=json.dumps(input_data),
ContentType=“application/json”
)
result = json.loads(response[“Body”].read().decode())
embedding = result[0]
return embedding

See More on Forums

Rate this tutorial

This is part of a series

Vector Search with MongoDB Atlas and Amazon SageMaker

Up Next

Part #3: Semantically Search Your Data With MongoDB Atlas Vector Search

Continue

Capturing and Storing Real-World Optics With MongoDB Atlas, OpenAI GPT-4o, and PyMongo

Sep 04, 2024 | 7 min read

Tutorial

Caching LLMs Response With MongoDB Atlas and Vector Search

Sep 02, 2024 | 8 min read

Article

Discover Latent Semantic Structure With Vector Clustering

Oct 11, 2024 | 10 min read