Architecture Support for Emerging Memory Technologies
With the wide spread use of multicore and many-core processors, current memory system designers try to push the memory bandwidth and capacity to meet the user demands. A negative side-effect is the continuous increase on memory power consumption. In addition, main memory system design is severely limited by the rigid architecture that requires the memory controller to track the internal status of all memory devices (chips) and schedule the timing of all device operations. As a result, DRAM memory system is heading to the scalability wall. New memory technologies such as Phase-Change Memory (PCM) and STT-RAM emerge as potential alternatives to replace DRAM in future memory systems. Although those technologies have better energy-efficiency and scalability than DRAM, they also suffer from low write-endurance and long write-latency. Thus, new memory architectures are needed for supporting future memory systems and balancing among performance, energy-efficiency, capacity and lifetime. To address the issue, we propose a systematic support for improving memory system efficiency at the architecture level by three steps. Firstly, a new DRAM scheduling algorithm called Delayed Row Activation is proposed to make the DRAM more energy-efficient by allowing memory ranks stay at a low-power mode longer if the data bus ownership cannot be acquired immediately after row activation finishes. Secondly, we present a heterogeneous mini-rank memory architecture that allows concurrently running applications to have different sub-rank widths based on their memory access behavior. By dynamically assigning and changing the sub-rank configurations, the balance can be achieved between the performance and power saving, and large performance loss can be avoided. Lastly, we build a new memory architecture framework called Universal Memory Architecture (UniMA) that can support different memory technologies in a computer system by decoupling the scheduling of device operations from memory controller. A bridge chip is added to each memory module to perform device-specific scheduling locally. The experimental results demonstrate that our schemes can save DRAM power, provide optimal energy efficiency for mini-rank kind of design and integrate diverse memory technologies into one memory system with small overhead.
New memory technology