Part #2: Create Your Model Endpoint With Amazon SageMaker, AWS Lambda, and AWS API Gateway
Dominic Frei7 min read • Published Sep 18, 2024 • Updated Sep 18, 2024
Rate this tutorial
Welcome to Part 2 of the
Amazon SageMaker + Atlas Vector Search
series. In Part 1, I showed you how to set up an architecture that uses both tools to create embeddings for your data and how to use those to then semantically search through your data.This article is part of a three part series:
In this part of the series, we will look into the actual doing. No more theory! Part 2 will show you how to create the REST service described in the architecture.
The REST endpoint will serve as the encoder that creates embeddings (vectors) that will then be used in the next part of this series to search through your data semantically. The deployment of the model will be handled by Amazon SageMaker, AWS's all-in-one ML service. We will expose this endpoint using AWS Lambda and AWS API Gateway later on to make it available to the server app.
Amazon SageMaker is a cloud-based, machine-learning platform that enables developers to build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.
Amazon SageMaker JumpStart helps you quickly and easily get started with machine learning. The solutions are fully customizable and support one-click deployment and fine-tuning of more than 150 popular open-source models, such as natural language processing, object detection, and image classification models.
It includes a number of popular solutions:
- Extract and analyze data: Automatically extract, process, and analyze documents for more accurate investigation and faster decision-making.
- Fraud detection: Automate detection of suspicious transactions faster and alert your customers to reduce potential financial loss.
- Churn prediction: Predict the likelihood of customer churn and improve retention by honing in on likely abandoners and taking remedial actions such as promotional offers.
- Personalized recommendations: Deliver customized, unique experiences to customers to improve customer satisfaction and grow your business rapidly.
Before we start, make sure you choose a region that is supported for
RStudio
(more on that later) and JumpStart
. You can check both on the Amazon SageMaker pricing page by checking if your desired region appears in the On-Demand Pricing
list.On the main page of Amazon SageMaker, you'll find the option to
Set up for a single user
. This will set up a domain and a quick-start user.A QuickSetupDomain is basically just a default configuration so that you can get started deploying models and trying out SageMaker. You can customize it later to your needs.
The initial setup only has to be done once, but it might take several minutes. When finished, Amazon SageMaker will notify you that the new domain is ready.
Amazon SageMaker Domain supports Amazon SageMaker machine learning (ML) environments and contains the following:
- The domain itself, which holds an AWS EC2 that models will be deployed onto. This inherently contains a list of authorized users and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
- The
UserProfile
, which represents a single user within a domain that you will be working with. - A
shared space
, which consists of a shared JupyterServer application and shared directory. All users within the domain have access to the same shared space. - An
App
, which represents an application that supports the reading and execution experience of the user’s notebooks, terminals, and consoles.
After the creation of the domain and the user, you can launch the SageMaker Studio, which will be your platform to interact with SageMaker, your models, and deployments for this user.
Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models.
Here, we’ll go ahead and start with a new JumpStart solution.
All you need to do to set up your JumpStart solution is to choose a model. For this tutorial, we will be using an embedding model called
All MiniLM L6 v2
by Hugging Face.When choosing the model, click on
Deploy
and SageMaker will get everything ready for you.You can adjust the endpoint to your needs but for this tutorial, you can totally go with the defaults.
As soon as the model shows its status as
In service
, everything is ready to be used.Note that the endpoint name here is
jumpstart-dft-hf-textembedding-all-20240117-062453
. Note down your endpoint name — you will need it in the next step.Now that the model is set up and the endpoint ready to be used, we can expose it for our server application.
We won’t be exposing the SageMaker endpoint directly. Instead, we will be using AWS API Gateway and AWS Lambda.
Let’s first start by creating the lambda function that uses the endpoint to create embeddings.
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is designed to enable developers to run code without provisioning or managing servers. It executes code in response to events and automatically manages the computing resources required by that code.
In the main AWS Console, go to
AWS Lambda
and click Create function
.Choose to
Author from scratch
, give your function a name (sageMakerLambda
, for example), and choose the runtime. For this example, we’ll be running on Python.When everything is set correctly, create the function.
The following code snippet assumes that the lambda function and the Amazon SageMaker endpoint are deployed in the same AWS account. All you have to do is replace
<YOUR_ENDPOINT_NAME>
with your actual endpoint name from the previous section.Note that the
lambda_handler
returns a status code and a body. It’s ready to be exposed as an endpoint, for using AWS API Gateway.1 import json 2 import boto3 3 4 sagemaker_runtime_client = boto3.client("sagemaker-runtime") 5 6 def lambda_handler(event, context): 7 try: 8 # Extract the query parameter 'query' from the event 9 query_param = event.get('queryStringParameters', {}).get('query', '') 10 11 if query_param: 12 embedding = get_embedding(query_param) 13 return { 14 'statusCode': 200, 15 'body': json.dumps({'embedding': embedding}) 16 } 17 else: 18 return { 19 'statusCode': 400, 20 'body': json.dumps({'error': 'No query parameter provided'}) 21 } 22 23 except Exception as e: 24 return { 25 'statusCode': 500, 26 'body': json.dumps({'error': str(e)}) 27 } 28 29 def get_embedding(synopsis): 30 input_data = {"text_inputs": synopsis} 31 response = sagemaker_runtime_client.invoke_endpoint( 32 EndpointName="<YOUR_ENDPOINT_NAME>", 33 Body=json.dumps(input_data), 34 ContentType="application/json" 35 ) 36 result = json.loads(response["Body"].read().decode()) 37 embedding = result["embedding"][0] 38 return embedding
Don’t forget to click
Deploy
!One last thing we need to do before we can use this lambda function is to make sure it actually has permission to execute the SageMaker endpoint. Head to the
Configuration
part of your Lambda function and then to Permissions
. You can just click on the Role Name
link to get to the associated role in AWS Identity and Access Management (IAM).In IAM, you want to choose
Add permissions
.You can choose
Attach policies
to attach pre-created policies from the IAM policy list.For now, let’s use the
AmazonSageMakerFullAccess
, but keep in mind to select only those permissions that you need for your specific application.Now, let’s head to AWS API Gateway, click
Create API
, and then Build
on the REST API
.Choose to create a new API and name it. In this example, we’re calling it
sageMakerApi
.That’s all you have to do for now. The API endpoint type can stay on regional, assuming you created the lambda function in the same region. Hit
Create API
.First, we need to create a new resource.
The resource path will be
/
. Pick a name like sageMakerResource
.Next, you'll get back to your API overview. This time, click
Create method
. We need a GET method that integrates with a lambda function.Check the
Lambda proxy integration
and choose the lambda function that you created in the previous section. Then, create the method.Finally, don’t forget to deploy the API.
Choose a stage. This will influence the URL that we need to use (API Gateway will show you the full URL in a moment). Since we’re still testing,
TEST
might be a good choice.This is only a test for a tutorial, but before deploying to production, please also add security layers like API keys. When everything is ready, the
Resources
tab should look something like this.When sending requests to the API Gateway, we will receive the query as a URL query string parameter. The next step is to configure API Gateway and tell it so, and also tell it what to do with it.
Go to your
Resources
, click on GET
again, and head to the Method request
tab. Click Edit
.In the
URL query string parameters
section, you want to add a new query string by giving it a name. We chose query
here. Set it to Required
but not cached and save it.The new endpoint is created. At this point, we can grab the URL and test it via cURL to see if that part worked fine. You can find the full URL (including stage and endpoint) in the
Stages
tab by opening the stage and endpoint and clicking on GET
. For this example, it’s https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource
. Your URL should look similar.Using the Amazon Cloud Shell or any other terminal, try to execute a cURL request:
1 curl -X GET 'https://4ug2td0e44.execute-api.ap-northeast-2.amazonaws.com/TEST/sageMakerResource?query=foo'
If everything was set up correctly, you should get a result that looks like this (the array contains 384 entries in total):
1 {"embedding": [0.01623343490064144, -0.007662375457584858, 0.01860642433166504, 0.031969036906957626,................... -0.031003709882497787, 0.008777940645813942]}
Your embeddings REST service is ready. Congratulations! Now you can convert your data into a vector with 384 dimensions!
In the next and final part of the tutorial, we will be looking into using this endpoint to prepare vectors and execute a vector search using MongoDB Atlas.
✅ Already have an AWS account? Atlas supports paying for usage via the AWS Marketplace (AWS MP) without any upfront commitment — simply sign up for MongoDB Atlas via AWS Marketplace.
Top Comments in Forums
There are no comments on this article yet.
This is part of a series