Memory Tiering

Overview

The memtier policy extends the topology-aware policy. It supports the same features and configuration options, such as topology hints and annotations, which the topology-aware policy does. Please see the documentation for topology-aware policy for the description of how topology-awarepolicy works and how it is configured.

The main goal of memtier policy is to let workloads choose the kinds of memory it wants to use. The topology-aware policy scoring algorithm for selecting topology nodes is changed so that a workload can belong to both a CPU node and a memory node in the topology tree – the CPU allocation is reserved from the CPU node and the memory controllers are selected from the memory node. Typically the aim is that the CPU and memory allocations are done from the same node so that the memory locality is as good as possible, but the memory allocation may happen also from a wider pool of memory controllers if the amount of free memory on a topology node is too low.

Activation of the Memtier Policy

You can activate the memtier policy by setting --policy parameter of cri-resmgr to memtier. For example:

cri-resmgr --policy memtier --reserved-resources cpu=750m

Configuration

The memtier policy knows of three kinds of memory: DRAM, PMEM, and HBM. The various memory types are accessed via separate memory controllers.

  • DRAM (dynamic random-access memory) is regular system main memory.

  • PMEM (persistent memory) is large-capacity memory, such as Intel® Optane™ memory.

  • HBM (high-bandwidth memory) is high speed memory, typically found on some special-purpose computing systems.

In order to configure a pod to use a certain memory type, use cri-resource-manager.intel.com/memory-type annotation in the pod spec. For example, to make a container request both PMEM and DRAM memory types, you could use pod metadata such as this:

metadata:
  annotations:
    cri-resource-manager.intel.com/memory-type: |
      container1: dram,pmem

The memtier policy will then aim to allocate resources from a topology node which can satisfy the memory requirements.

The memtier policy supports “cold start” functionality. When cold start is enabled and the workload is allocated to a topology node with both DRAM and PMEM memory, the initial memory controller is only the PMEM controller. DRAM controller is added to the workload only after the cold start timeout is done. The effect of this is that allocated large unused memory areas of memory don’t need to be migrated to PMEM, because it was allocated there to begin with. Cold start is configured like this in the pod metadata:

metadata:
  annotations:
    cri-resource-manager.intel.com/memory-type: |
      container1: dram,pmem
    cri-resource-manager.intel.com/cold-start: |
      container1:
        duration: 60s

In the above example, container1 would be initially granted only PMEM memory controller, but after 60 seconds the DRAM controller would be added to the container memset.

Container memory requests and limits

Due to inaccuracies in how cri-resmgr calculates memory requests for pods in QoS class Burstable, you should either use Limit for setting the amount of memory for containers in Burstable pods or run the resource-annotating webhook as described in the top-level README file.