High-performance, low-cost machine learning infrastructure is accelerating innovation in the cloud

[ad_1]
Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, streamline business processes, and better understand customers. AWS helps customers accelerate their AI / ML adoption by providing powerful computing, high-speed networking, and scalable high-performance storage options for any machine learning project. This reduces the entry barrier for organizations that want to take the cloud to scale ML applications.
Developers and data scientists are pushing the boundaries of technology and are increasingly embracing deep learning, which is a type of machine learning based on neural network algorithms. These in-depth learning models are larger and more sophisticated, and as a result, the costs of running infrastructure to train and deploy these models are rising.
To help customers accelerate their AI / ML transformation, AWS is building high-performance, low-cost machine learning chips. AWS Inference AWS is the first machine learning chip built from scratch for the lowest cost machine learning inference. In fact, Amazon-powered Amazon EC2 Inf1 instances offer 2.3 times more performance and 70% less cost for machine learning inference than current-generation GPU-based EC2 instances. The AWS Trainium is the second AWS machine learning chip designed to train in-depth learning models and will be available by the end of 2021.
Customers from all industries have expanded their ML applications into production at Inferentia and have seen significant performance improvements and cost savings. For example, AirBnB’s customer service platform enables intelligent, scalable, and exceptional service experiences for a community of millions of hosts and guests around the world. He used inference-based EC2 Inf1 instances to extend natural language processing (NLP) models that supported his chatbots. This improved performance by 2 times compared to GPU-based instances.
With these innovations in silicon, AWS enables customers to easily train and execute their in-depth learning models in production at a significantly lower cost and performance.
Machine learning is challenging to shift speed to cloud-based infrastructures
Machine learning is a repetitive process, and teams need to quickly build, train, and implement applications, as well as frequent training, retraining, and experimentation to increase the accuracy of model prediction. As they deploy prepared models in their business applications, organizations also need to scale applications to serve new users around the world. They need to be able to serve multiple requests received at the same time with real-time latency to ensure a great user experience.
Emerging use cases such as object detection, natural language processing (NLP), image classification, conversation AI, and time series data are all based on in-depth learning technology. In-depth learning models are increasing exponentially in size and complexity, going from millions of parameters to millions of millions in a couple of years.
Training and deploying these complex and sophisticated models incurs high infrastructure costs. The costs can be enormous for snowballs as organizations scale up applications to provide users and customers with real-time experiences.
Cloud-based machine learning infrastructure services can help. Cloud provides on-demand access to computing, high-performance networking, and large data storage combined with ML operations and advanced AI services to help organizations get started right away and scale their AI / ML initiatives.
How AWS helps customers accelerate their AI / ML transformation
AWS Inference and AWS Trainium aim to democratize machine learning, and make it available to developers regardless of experience and organization size. The inference design is optimized for high performance, performance, and low latency, making it ideal for expanding the ML inference scale.
Each AWS Inference chip has four NeuronCoes that implement a high-performance systolic matrix multiplier engine that dramatically accelerates typical deep learning operations such as convection and transformers. NeuronCores are also equipped with a large chip cache, which helps reduce access to external memory, reduces latency, and increases load.
AWS Neuron, the software development kit for Inference, supports the main core ML frameworks, such as TensorFlow and PyTorch. Developers can continue to use the same framework and life cycle development tools that they know and love. For many prepared models, they can compile and expand the Inference by changing a single line of code without changing any additional application code.
The result is a high-performance inference implementation that can be easily scaled while keeping costs under control.
Sprinklr, as a software service, has a unified platform for managing the customer experience based on AI, which allows companies to gather real-time customer feedback on multiple channels and turn it into viable information. This leads to proactive problem solving, improved product development, improved content marketing, and improved customer service. Sprinklr used Inference to expand its NLP and some of its computer vision models and saw significant performance improvements.
Several Amazon services are also rolling out their machine learning models in Inference.
Amazon Prime Video uses ML models of computer vision to analyze the video quality of live events to ensure the optimal viewing experience for Prime Video members. The ML model of its image classification was extended to EC2 Inf1 instances and improved 4 times in performance and saw a 40% cost savings compared to GPU-based instances.
Another example is Amazon Alexa’s AI and ML-based intelligence, powered by Amazon Web Services, which is currently available on more than 100 million devices. Alexa’s promise to customers is that she’s always smarter, more conversational, more proactive, and even more enjoyable. Fulfilling this promise requires continuous improvement in response time and the cost of machine learning infrastructure. By expanding Alexa’s text-to-speech ML models in Inf1 instances, it was able to reduce inference latency by 25% and the cost per inference by 30% to improve the service experience of ten million customers who use Alexa every month.
Releasing new machine learning skills in the cloud
As companies race to prove their business in the future by enabling the best digital products and services, an organization can’t help but deploy sophisticated machine learning models to help innovate customer experiences. In recent years, the applicability of machine learning to a variety of use cases has increased dramatically, from personalization and prediction to fraud detection and supply chain forecasting.
Fortunately, the cloud learning infrastructure is unleashing new capabilities that were not possible until now, making it much more accessible to non-expert practitioners. That’s why AWS customers are using Instance-driven Amazon EC2 Inf1 instances to provide intelligence behind recommendation engines and chat bots and get details from customer feedback.
With AWS cloud-based machine learning infrastructure options appropriate for a variety of skill levels, it is clear that any organization can accelerate innovation and take the whole life cycle of machine learning at the scale level. As e-learning expands, organizations are now able to transform the customer experience and way of doing business with a cost-effective and high-performance cloud-based machine learning infrastructure.
Learn more about AWS’s machine learning platform that can help you innovate your business here.
This content is produced by AWS. The editorial staff of the MIT Technology Review did not write.
[ad_2]
Source link