Tag: Inference
All the articles with the tag "Inference".
Rafay Launches Serverless Inference Platform for Kubernetes Applications
Published: at 08:43 AMRafay has introduced a serverless inference platform to simplify AI deployments on Kubernetes by providing autoscaling, traffic management, and multi-cloud support, ultimately reducing operational complexities and costs.
Rafay Systems Launches Serverless Inference Offering for AI/ML Workloads
Published: at 02:34 PMRafay Systems has launched a serverless inference offering, simplifying AI/ML model deployment and management. The offering supports popular frameworks, reduces operational overhead, and provides automatic scaling.
Sakura Internet Launches NVIDIA GPU-Based Cloud Service, Enhancing AI Capabilities in Japan
Published: at 11:44 AMSakura Internet has launched an NVIDIA L4 GPU-powered cloud service in Japan, providing cost-effective AI inference solutions for businesses. This aims to democratize AI access and boost adoption across various sectors.
Huawei Ascend 910D Aims to Challenge Nvidia's AI Dominance
Published: at 04:18 AMHuawei is reportedly developing the Ascend 910D AI processor to compete with Nvidia's Blackwell and Rubin, signaling China's push for self-sufficiency in AI technology. The chip appears to be focused on inference tasks and may offer a cost-effective alternative.
GPU-Accelerated Serverless Inference on Google Cloud Run: A Tutorial Analysis
Published: at 01:36 PMThis article is a tutorial on deploying GPU-accelerated serverless inference using Google Cloud Run and vLLM, highlighting the benefits of scalability, cost-effectiveness, and ease of deployment for machine learning applications.
MangoBoost Sets MLPerf Inference Record on AMD Instinct MI300X for Llama2 70B
Published: at 04:31 AMMangoBoost achieved record-breaking MLPerf Inference v5.0 results for Llama2 70B on AMD Instinct MI300X GPUs, showcasing AMD's capabilities in handling large language model inference.
NVIDIA Blackwell: MLPerf Inference Performance Breakthrough
Published: at 03:57 PMNVIDIA showcases the Blackwell architecture's impressive MLPerf inference performance, demonstrating significant improvements for AI workloads and solidifying its position in the AI hardware market.
Azure Introduces Serverless GPUs with Nvidia NIM Integration
Published: at 12:59 PMAzure introduces serverless GPU capabilities with Nvidia NIM integration, simplifying AI workload deployments and providing on-demand, optimized GPU resources for AI inferencing.
GPU Analysis: Identifying Throughput Bottlenecks in Large Batch Inference
Published: at 01:30 AMThe article analyzes performance bottlenecks in large batch GPU inference, focusing on memory management and GPU utilization to optimize throughput and improve efficiency for AI workloads.
NVIDIA Introduces Dynamo: An Open-Source Framework for Scaling AI Inference
Published: at 03:09 AMNVIDIA's Dynamo is an open-source inference framework designed to accelerate and scale AI models, significantly improving performance and efficiency for large-scale AI deployments.