Skip to content
  • Privacy Policy
  • Privacy Policy
High DA, PA, DR Guest Blogs Posting Website – Pcp247.com

High DA, PA, DR Guest Blogs Posting Website – Pcp247.com

Pcp247.com

  • Computer
  • Fashion
  • Business
  • Lifestyle
  • Automobile
  • Login
  • Register
  • Technology
  • Travel
  • Post Blog
  • Toggle search form
  • India Sexual Wellness Market Trends And Growth, Forecast 2024-2032 Business
  • megamarkets.com is now megamarkets.co Business
  • The Road to Reliability: Breakdown Recovery Service in Swindon Business
  • AC Repairing Institute in Delhi: Unlocking Potentials Education
  • Build a Thriving Professional Career with Graphic Designing Courses Education
  • Step-by-Step: How to Download Kheloyar for Seamless Gaming Game
  • Consider experimentation and contingency Internet of Things
  • Strategies for Streamlining ELISA Assay Development Processes Healthcare

Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency

Posted on November 30, 2023 By Editorial Team

Today, we are announcing new Amazon SageMaker inference capabilities that can help you optimize deployment costs and reduce latency. With the new inference capabilities, you can deploy one or more foundation models (FMs) on the same SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. This helps to improve resource utilization, reduce model deployment costs on average by 50 percent, and lets you scale endpoints together with your use cases.

For each FM, you can define separate scaling policies to adapt to model usage patterns while further optimizing infrastructure costs. In addition, SageMaker actively monitors the instances that are processing inference requests and intelligently routes requests based on which instances are available, helping to achieve on average 20 percent lower inference latency.

Key components
The new inference capabilities build upon SageMaker real-time inference endpoints. As before, you create the SageMaker endpoint with an endpoint configuration that defines the instance type and initial instance count for the endpoint. The model is configured in a new construct, an inference component. Here, you specify the number of accelerators and amount of memory you want to allocate to each copy of a model, together with the model artifacts, container image, and number of model copies to deploy.

Let me show you how this works.

New inference capabilities in action
You can start using the new inference capabilities from SageMaker Studio, the SageMaker Python SDK, and the AWS SDKs and AWS Command Line Interface (AWS CLI). They are also supported by AWS CloudFormation.

For this demo, I use the AWS SDK for Python (Boto3) to deploy a copy of the Dolly v2 7B model and a copy of the FLAN-T5 XXL model from the Hugging Face model hub on a SageMaker real-time endpoint using the new inference capabilities.

Create a SageMaker endpoint configuration

import boto3
import sagemaker

role = sagemaker.get_execution_role()
sm_client = boto3.client(service_name="sagemaker")

sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ExecutionRoleArn=role,
    ProductionVariants=[{
        "VariantName": "AllTraffic",
        "InstanceType": "ml.g5.12xlarge",
        "InitialInstanceCount": 1,
		"RoutingConfig": {
            "RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"
        }
    }]
)

Create the SageMaker endpoint

sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name,
)

Before you can create the inference component, you need to create a SageMaker-compatible model and specify a container image to use. For both models, I use the Hugging Face LLM Inference Container for Amazon SageMaker. These deep learning containers (DLCs) include the necessary components, libraries, and drivers to host large models on SageMaker.

Prepare the Dolly v2 model

from sagemaker.huggingface import get_huggingface_llm_image_uri

# Retrieve the container image URI
hf_inference_dlc = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

# Configure model container
dolly7b = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'databricks/dolly-v2-7b',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "dolly-v2-7b",
    ExecutionRoleArn = role,
    Containers       = [dolly7b]
)

Prepare the FLAN-T5 XXL model

# Configure model container
flant5xxlmodel = {
    'Image': hf_inference_dlc,
    'Environment': {
        'HF_MODEL_ID':'google/flan-t5-xxl',
        'HF_TASK':'text-generation',
    }
}

# Create SageMaker Model
sagemaker_client.create_model(
    ModelName        = "flan-t5-xxl",
    ExecutionRoleArn = role,
    Containers       = [flant5xxlmodel]
)

Now, you’re ready to create the inference component.

Create an inference component for each model
Specify an inference component for each model you want to deploy on the endpoint. Inference components let you specify the SageMaker-compatible model and the compute and memory resources you want to allocate. For CPU workloads, define the number of cores to allocate. For accelerator workloads, define the number of accelerators. RuntimeConfig defines the number of model copies you want to deploy.

# Inference compoonent for Dolly v2 7B
sm_client.create_inference_component(
    InferenceComponentName="IC-dolly-v2-7b",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "dolly-v2-7b",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 2, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

# Inference component for FLAN-T5 XXL
sm_client.create_inference_component(
    InferenceComponentName="IC-flan-t5-xxl",
    EndpointName=endpoint_name,
    VariantName=variant_name,
    Specification={
        "ModelName": "flan-t5-xxl",
        "ComputeResourceRequirements": {
		    "NumberOfAcceleratorDevicesRequired": 2, 
			"NumberOfCpuCoresRequired": 1, 
			"MinMemoryRequiredInMb": 1024
	    }
    },
    RuntimeConfig={"CopyCount": 1},
)

Once the inference components have successfully deployed, you can invoke the models.

Run inference
To invoke a model on the endpoint, specify the corresponding inference component.

import json
sm_runtime_client = boto3.client(service_name="sagemaker-runtime")
payload = {"inputs": "Why is California a great place to live?"}

response_dolly = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-dolly-v2-7b",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

response_flant5 = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    InferenceComponentName = "IC-flan-t5-xxl",
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload),
)

result_dolly = json.loads(response_dolly['Body'].read().decode())
result_flant5 = json.loads(response_flant5['Body'].read().decode())

Next, you can define separate scaling policies for each model by registering the scaling target and applying the scaling policy to the inference component. Check out the SageMaker Developer Guide for detailed instructions.

The new inference capabilities provide per-model CloudWatch metrics and CloudWatch Logs and can be used with any SageMaker-compatible container image across SageMaker CPU- and GPU-based compute instances. Given support by the container image, you can also use response streaming.

Now available
The new Amazon SageMaker inference capabilities are available today in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Jakarta, Mumbai, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Stockholm), Middle East (UAE), and South America (São Paulo). For pricing details, visit Amazon SageMaker Pricing. To learn more, visit Amazon SageMaker.

Get started
Log in to the AWS Management Console and deploy your FMs using the new SageMaker inference capabilities today!

— Antje

Amazon SageMaker, Announcements, AWS re:Invent, Generative AI, Launch, News

Post navigation

Previous Post: The Role of Aesthetic Body Sculpting and Infusion Therapy for Total Transformation
Next Post: Package and deploy models faster with new tools and guided workflows in Amazon SageMaker

Related Posts

  • Industrial Cloud Platform Marketfor Automation Market Size, Share & Trends Analysis Report News
  • Fresh Meat Packaging Market Size and Forecasts, Share and Trends News
  • Leverage foundation models for business analysis at scale with Amazon SageMaker Canvas Amazon SageMaker Canvas
  • AWS Weekly Roundup – CloudFront security dashboard, EBS snapshots improvements, and more – November 13, 2023 Amazon CloudFront
  • Amazon Managed Service for Prometheus collector provides agentless metric collection for Amazon EKS Amazon Elastic Kubernetes Service
  • AWS Weekly Roundup—Amazon Route53, Amazon EventBridge, Amazon SageMaker, and more – January 15, 2024 Amazon Elastic Container Service

lc_banner_enterprise_1

Top 30 High DA-PA Guest Blog Posting Websites 2024

Recent Posts

  • How AI Video Generators Are Revolutionizing Social Media Content
  • Expert Lamborghini Repair Services in Dubai: Preserving Luxury and Performance
  • What do you are familiar Oxycodone?
  • Advantages and Disadvantages of having White Sliding Door Wardrobe
  • The Future of Online Counseling: Emerging Technologies and their Impact on Mental Health Care

Categories

  • .NET
  • *Post Types
  • Amazon AppStream 2.0
  • Amazon Athena
  • Amazon Aurora
  • Amazon Bedrock
  • Amazon Braket
  • Amazon Chime SDK
  • Amazon CloudFront
  • Amazon CloudWatch
  • Amazon CodeCatalyst
  • Amazon CodeWhisperer
  • Amazon Comprehend
  • Amazon Connect
  • Amazon DataZone
  • Amazon Detective
  • Amazon DocumentDB
  • Amazon DynamoDB
  • Amazon EC2
  • Amazon EC2 Mac Instances
  • Amazon EKS Distro
  • Amazon Elastic Block Store (Amazon EBS)
  • Amazon Elastic Container Registry
  • Amazon Elastic Container Service
  • Amazon Elastic File System (EFS)
  • Amazon Elastic Kubernetes Service
  • Amazon ElastiCache
  • Amazon EMR
  • Amazon EventBridge
  • Amazon Fraud Detector
  • Amazon FSx
  • Amazon FSx for Lustre
  • Amazon FSx for NetApp ONTAP
  • Amazon FSx for OpenZFS
  • Amazon FSx for Windows File Server
  • Amazon GameLift
  • Amazon GuardDuty
  • Amazon Inspector
  • Amazon Interactive Video Service
  • Amazon Kendra
  • Amazon Lex
  • Amazon Lightsail
  • Amazon Location
  • Amazon Machine Learning
  • Amazon Managed Grafana
  • Amazon Managed Service for Apache Flink
  • Amazon Managed Service for Prometheus
  • Amazon Managed Streaming for Apache Kafka (Amazon MSK)
  • Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
  • Amazon MemoryDB for Redis
  • Amazon Neptune
  • Amazon Omics
  • Amazon OpenSearch Service
  • Amazon Personalize
  • Amazon Pinpoint
  • Amazon Polly
  • Amazon QuickSight
  • Amazon RDS
  • Amazon RDS Custom
  • Amazon Redshift
  • Amazon Route 53
  • Amazon S3 Glacier
  • Amazon S3 Glacier Deep Archive
  • Amazon SageMaker
  • Amazon SageMaker Canvas
  • Amazon SageMaker Data Wrangler
  • Amazon SageMaker JumpStart
  • Amazon SageMaker Studio
  • Amazon Security Lake
  • Amazon Simple Email Service (SES)
  • Amazon Simple Notification Service (SNS)
  • Amazon Simple Queue Service (SQS)
  • Amazon Simple Storage Service (S3)
  • Amazon Transcribe
  • Amazon Translate
  • Amazon VPC
  • Amazon WorkSpaces
  • Analytics
  • Announcements
  • Application Integration
  • Application Services
  • Artificial Intelligence
  • Auto Scaling
  • Automobile
  • AWS Amplify
  • AWS Application Composer
  • AWS Application Migration Service
  • AWS AppSync
  • AWS Audit Manager
  • AWS Backup
  • AWS Chatbot
  • AWS Clean Rooms
  • AWS Cloud Development Kit
  • AWS Cloud Financial Management
  • AWS Cloud9
  • AWS CloudTrail
  • AWS CodeArtifact
  • AWS CodeBuild
  • AWS CodePipeline
  • AWS Config
  • AWS Control Tower
  • AWS Cost and Usage Report
  • AWS Data Exchange
  • AWS Database Migration Service
  • AWS DataSync
  • AWS Direct Connect
  • AWS Fargate
  • AWS Glue
  • AWS Glue DataBrew
  • AWS Health
  • AWS HealthImaging
  • AWS Heroes
  • AWS IAM Access Analyzer
  • AWS Identity and Access Management (IAM)
  • AWS IoT Core
  • AWS IoT SiteWise
  • AWS Key Management Service
  • AWS Lake Formation
  • AWS Lambda
  • AWS Management Console
  • AWS Marketplace
  • AWS Outposts
  • AWS re:Invent
  • AWS SDK for Java
  • AWS Security Hub
  • AWS Serverless Application Model
  • AWS Service Catalog
  • AWS Snow Family
  • AWS Snowball Edge
  • AWS Step Functions
  • AWS Supply Chain
  • AWS Support
  • AWS Systems Manager
  • AWS Toolkit for AzureDevOps
  • AWS Toolkit for JetBrains IntelliJ IDEA
  • AWS Toolkit for JetBrains PyCharm
  • AWS Toolkit for JetBrains WebStorm
  • AWS Toolkit for VS Code
  • AWS Training and Certification
  • AWS Transfer Family
  • AWS Trusted Advisor
  • AWS Wavelength
  • AWS Wickr
  • AWS X-Ray
  • Best Practices
  • Billing & Account Management
  • Business
  • Business Intelligence
  • Compliance
  • Compute
  • Computer
  • Contact Center
  • Containers
  • CPG
  • Customer Enablement
  • Customer Solutions
  • Database
  • Dating
  • Developer Tools
  • DevOps
  • Education
  • Elastic Load Balancing
  • End User Computing
  • Events
  • Fashion
  • Financial Services
  • Game
  • Game Development
  • Gateway Load Balancer
  • General News
  • Generative AI
  • Generative BI
  • Graviton
  • Health and Fitness
  • Healthcare
  • High Performance Computing
  • Home Decor
  • Hybrid Cloud Management
  • Industries
  • Internet of Things
  • Kinesis Data Analytics
  • Kinesis Data Firehose
  • Launch
  • Lifestyle
  • Management & Governance
  • Management Tools
  • Marketing & Advertising
  • Media & Entertainment
  • Media Services
  • Messaging
  • Migration & Transfer Services
  • Migration Acceleration Program (MAP)
  • MySQL compatible
  • Networking & Content Delivery
  • News
  • Open Source
  • PostgreSQL compatible
  • Public Sector
  • Quantum Technologies
  • RDS for MySQL
  • RDS for PostgreSQL
  • Real Estate
  • Regions
  • Relationship
  • Research
  • Retail
  • Robotics
  • Security
  • Security, Identity, & Compliance
  • Serverless
  • Social Media
  • Software
  • Storage
  • Supply Chain
  • Technical How-to
  • Technology
  • Telecommunications
  • Thought Leadership
  • Travel
  • Week in Review

#digitalsat #digitalsattraining #satclassesonline #satexamscore #satonline Abortion AC PCB Repairing Course AC PCB Repairing Institute AC Repairing Course AC Repairing Course In Delhi AC Repairing Institute AC Repairing Institute In Delhi Amazon Analysis AWS Bird Blog business Care drug Eating fitness Food Growth health Healthcare Industry Trends Kheloyar kheloyar app kheloyar app download kheloyar cricket NPR peacock.com/tv peacocktv.com/tv People Review Share Shots site Solar Module Distributor Solar Panel Distributor solex distributor solplanet inverter distributor U.S Week

  • India Sexual Wellness Market Trends And Growth, Forecast 2024-2032 Business
  • megamarkets.com is now megamarkets.co Business
  • The Road to Reliability: Breakdown Recovery Service in Swindon Business
  • AC Repairing Institute in Delhi: Unlocking Potentials Education
  • Build a Thriving Professional Career with Graphic Designing Courses Education
  • Step-by-Step: How to Download Kheloyar for Seamless Gaming Game
  • Consider experimentation and contingency Internet of Things
  • Strategies for Streamlining ELISA Assay Development Processes Healthcare

Latest Posts

  • How AI Video Generators Are Revolutionizing Social Media Content
  • Expert Lamborghini Repair Services in Dubai: Preserving Luxury and Performance
  • What do you are familiar Oxycodone?
  • Advantages and Disadvantages of having White Sliding Door Wardrobe
  • The Future of Online Counseling: Emerging Technologies and their Impact on Mental Health Care

Gallery

Quick Links

  • Login
  • Register
  • Contact us
  • Post Blog
  • Privacy Policy

Powered by PressBook News WordPress theme