NVIDIA Multi-Instance GPU

Seven Independent Instances in a Single GPU

Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores. This gives administrators the ability to support every  workload, from the smallest to the largest, with guaranteed quality of service (QoS) and extending the reach of accelerated computing resources to every user.  

Benefits Overview

Expand GPU Access to More Users

Expand GPU Access

With MIG, you can achieve up to 7X more GPU resources on a single GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimize GPU Utilization

Optimize GPU Utilization

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of the right-sized GPU instance for each workload, ultimately optimizing utilization and maximizing data center investment.

Run Simultaneous Mixed Workloads

Run Simultaneous Workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput. Unlike time slicing, each workload runs in parallel, delivering high performance.

How the Technology Works

Without MIG, different jobs running on the same GPU, such as different AI inference requests, compete for the same resources. A job consuming larger memory bandwidth starves others, resulting in several jobs missing their latency targets. With MIG, jobs run simultaneously on different instances, each with dedicated resources for compute, memory, and memory bandwidth, resulting in predictable performance with QoS and maximum GPU utilization.

Multi-Instance GPU

Provision and Configure Instances as Needed

A GPU can be partitioned into different-sized MIG instances. For example, in an NVIDIA A100 40GB, an administrator could create two instances with 20 gigabytes (GB) of memory each, three instances with 10GB each, or seven instances with 5GB each. Or a mix. 

MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.

Run Workloads in Parallel, Securely

With a dedicated set of hardware resources for compute, memory, and cache, each MIG instance delivers guaranteed QoS and fault isolation. That means that failure in an application running on one instance doesn’t impact applications running on other instances.

It also means that different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical GPU.

MIG in NVIDIA H100

Powered by the NVIDIA Hopper™ architecture, H100 enhances MIG by supporting multi-tenant, multi-user configurations in virtualized environments across up to seven GPU instances, securely isolating each instance with confidential computing at the hardware and hypervisor level. Dedicated video decoders for each MIG instance deliver secure, high-throughput intelligent video analytics (IVA) on shared infrastructure. With Hopper’s concurrent MIG profiling, administrators can monitor right-sized GPU acceleration and allocate resources for multiple users. 

For researchers with smaller workloads, rather than renting a full cloud instance, they can use MIG to securely isolate a portion of a GPU while being assured that their data is secure at rest, in transit, and in use. This improves flexibility for cloud service providers to price and address smaller customer opportunities.

Watch MIG in Action

NVIDIA A100 Tensor Core GPU

Running Multiple Workloads on a Single A100 GPU

This demo runs AI and high-performance computing (HPC) workloads simultaneously on the same A100 GPU.

Multi-Instance GPU on the NVIDIA A100 Tensor Core GPU

Boosting Performance and Utilization with Multi-Instance GPU

This demo shows inference performance on a single slice of MIG and then scales linearly across the entire A100.

Built for IT and DevOps

MIG enables fine-grained GPU provisioning by IT and DevOps teams. Each MIG instance behaves like a standalone GPU to applications, so there’s no change to the CUDA® platform. MIG can be used in all major enterprise computing environments​.

Achieve Ultimate Data Center Flexibility

An NVIDIA A100 GPU can be partitioned into different-sized MIG instances. For example, an administrator could create two instances with 20 gigabytes (GB) of memory each or three instances with 10 GB or seven instances with 5 GB. Or a mix of them. So Sysadmin can provide right-sized GPUs to users for different types of workloads.

MIG instances can also be dynamically reconfigured, enabling administrators to shift GPU resources in response to changing user and business demands. For example, seven MIG instances can be used during the day for low-throughput inference and reconfigured to one large MIG instance at night for deep learning training.

Deliver Exceptional Quality of Service

Each MIG instance has a dedicated set of hardware resources for compute, memory, and cache, delivering guaranteed quality of service (QoS) and fault isolation for the workload. That means that  failure in an application running on one instance doesn’t impact applications running on other instances. And different instances can run different types of workloads—interactive model development, deep learning training, AI inference, or HPC applications. Since the instances run in parallel, the workloads also run in parallel—but separate and isolated—on the same physical A100 GPU.

MIG is a great fit for workloads such as AI model development and low-latency inference. These workloads can take full advantage of A100’s features and fit into each instance’s allocated memory.

MIG Specifications

H100 A100
Confidential computing Yes -
Instance types 7x 10GB
4x 20GB
2x 40GB (more compute capacity)
1x 80GB
7x 10GB
3x 20GB
2x 40GB
1x 80GB
GPU profiling and monitoring Concurrently on all instances Only one instance at a time
Secure Tenants 7x 1x
Media decoders Dedicated NVJPEG and NVDEC per instance Limited options

 Preliminary specifications, may be subject to change

Take a Deep Dive into the NVIDIA Hopper Architecture

Take a Deep Dive into the NVIDIA Ampere Architecture