Tag: Inference

All the articles with the tag "Inference".

Rafay Launches Serverless Inference Platform for Kubernetes Applications
Published:May 9, 2025 at 08:43 AM
Rafay has introduced a serverless inference platform to simplify AI deployments on Kubernetes by providing autoscaling, traffic management, and multi-cloud support, ultimately reducing operational complexities and costs.
Rafay Systems Launches Serverless Inference Offering for AI/ML Workloads
Published:May 8, 2025 at 02:34 PM
Rafay Systems has launched a serverless inference offering, simplifying AI/ML model deployment and management. The offering supports popular frameworks, reduces operational overhead, and provides automatic scaling.
Sakura Internet Launches NVIDIA GPU-Based Cloud Service, Enhancing AI Capabilities in Japan
Published:May 6, 2025 at 11:44 AM
Sakura Internet has launched an NVIDIA L4 GPU-powered cloud service in Japan, providing cost-effective AI inference solutions for businesses. This aims to democratize AI access and boost adoption across various sectors.
Huawei Ascend 910D Aims to Challenge Nvidia's AI Dominance
Published:Apr 30, 2025 at 04:18 AM
Huawei is reportedly developing the Ascend 910D AI processor to compete with Nvidia's Blackwell and Rubin, signaling China's push for self-sufficiency in AI technology. The chip appears to be focused on inference tasks and may offer a cost-effective alternative.
GPU-Accelerated Serverless Inference on Google Cloud Run: A Tutorial Analysis
Published:Apr 18, 2025 at 01:36 PM
This article is a tutorial on deploying GPU-accelerated serverless inference using Google Cloud Run and vLLM, highlighting the benefits of scalability, cost-effectiveness, and ease of deployment for machine learning applications.
MangoBoost Sets MLPerf Inference Record on AMD Instinct MI300X for Llama2 70B
Published:Apr 3, 2025 at 04:31 AM
MangoBoost achieved record-breaking MLPerf Inference v5.0 results for Llama2 70B on AMD Instinct MI300X GPUs, showcasing AMD's capabilities in handling large language model inference.
NVIDIA Blackwell: MLPerf Inference Performance Breakthrough
Published:Apr 2, 2025 at 03:57 PM
NVIDIA showcases the Blackwell architecture's impressive MLPerf inference performance, demonstrating significant improvements for AI workloads and solidifying its position in the AI hardware market.
Azure Introduces Serverless GPUs with Nvidia NIM Integration
Published:Apr 1, 2025 at 12:59 PM
Azure introduces serverless GPU capabilities with Nvidia NIM integration, simplifying AI workload deployments and providing on-demand, optimized GPU resources for AI inferencing.
GPU Analysis: Identifying Throughput Bottlenecks in Large Batch Inference
Published:Mar 31, 2025 at 01:30 AM
The article analyzes performance bottlenecks in large batch GPU inference, focusing on memory management and GPU utilization to optimize throughput and improve efficiency for AI workloads.
NVIDIA Introduces Dynamo: An Open-Source Framework for Scaling AI Inference
Published:Mar 24, 2025 at 03:09 AM
NVIDIA's Dynamo is an open-source inference framework designed to accelerate and scale AI models, significantly improving performance and efficiency for large-scale AI deployments.

Tag: Inference

Rafay Launches Serverless Inference Platform for Kubernetes Applications

Rafay Systems Launches Serverless Inference Offering for AI/ML Workloads

Sakura Internet Launches NVIDIA GPU-Based Cloud Service, Enhancing AI Capabilities in Japan

Huawei Ascend 910D Aims to Challenge Nvidia's AI Dominance

GPU-Accelerated Serverless Inference on Google Cloud Run: A Tutorial Analysis

MangoBoost Sets MLPerf Inference Record on AMD Instinct MI300X for Llama2 70B

NVIDIA Blackwell: MLPerf Inference Performance Breakthrough

Azure Introduces Serverless GPUs with Nvidia NIM Integration

GPU Analysis: Identifying Throughput Bottlenecks in Large Batch Inference

NVIDIA Introduces Dynamo: An Open-Source Framework for Scaling AI Inference