CPU Memory Optimization
On the CPU side, memory libraries ptmalloc or jemalloc can cause severe page faults while allocating large memory chunks common to DL applications. To solve this issue, DeepRec optimizes memory allocation to reduce the memory usage and minor page faults, and improve the running performance. When this optimization is enabled, DeepRec will collect memory allocation information (after the number of steps reaches the
START_STATISTIC_STEP threshold), and then generate an allocation plan based on the collected memory allocation information of each step. When generating the allocation plan, it will determine whether the previously generated memory allocation plan meets the current allocation requirements, and if it is considered a stable step. When the number of stable steps reaches the
STABLE_STATISTIC_STEP threshold or the total number of steps collected reaches the
MAX_STATISTIC_STEP threshold, DeepRec will stop collecting memory information. Since memory allocation information needs to be collected for optimization, the performance gain can only be observed after a certain number of steps.
On the CPU side, the current version of DeepRec supports the CPU memory optimization of stand-alone and distributed training/inference, which is enabled by default, and can be turned off using the
export ENABLE_MEMORY_OPTIMIZATION=0 command.
There are several environment variables.
START_STATISTIC_STEP configures the step to start collecting memory information.
STABLE_STATISTIC_STEP configures how many stable steps the allocation policy ends.
MAX_STATISTIC_STEP configures the maximal steps to end the memory allocation policy. The default values are 100, 10, and 100, respectively. These values generally do not need to be changed, and the
START_STATISTIC_STEP can be increased when there are many initialization graphs, and the
MAX_STATISTIC_STEP can be increased when the main computational graph is irregular or there are more running computational graphs.
The CPU side can adapts the memory optimization with the jemalloc library. After setting the
MALLOC environment variable, add the
LD_PRELOAD jemalloc dynamic library before the python command, for example:
LD_PRELOAD=./libjemalloc.so.2 python ...