Storage and Cloud Systems

Big data processing requires good system implementation. Emerging non-volatile memory (NVM) is considered as a key enabler for big data systems because it offers non-volatility, byte-addressability and fast access at the same time. To make the best use of these properties, programs should access NVM directly through CPU load and store instructions. To this end, durable transactions become a common choice of applications for accessing persistent memory data in a crash consistent manner. On the other side, In-Memory cluster Computing (IMC) frameworks such as Spark achieve much higher performance than traditional On-Disk cluster Computing (ODC), e.g., MapReduce/Hadoop and Dryad, for iterative and interactive applications.

Our research group made research contributions to improve the efficiency of big data and NVM systems. DudeTM [ASPLOS'17] is a crash-consistent durable transaction system that avoids the drawbacks of both undo logging and redo logging. DudeTM uses shadow DRAM to decouple the execution of a durable transaction into three fully asynchronous steps. This design also enables an out-of-the-box transactional memory (TM) to be used as an independent component in our system. DAC [ASPLOS'18] is a datasize-aware auto-tuning approach to efficiently identify the high dimensional configuration for the in-memory cluster computing framework to achieve optimal performance on a given cluster.