One of my personal highlights in attending last week’s ACM SIGMOD 2011 conference in Athens was to take in a tutorial [1] on Thursday morning entitled Data Management Over Flash Memory, presented by Ioannis Koltsidas of IBM Zurich and Stratis Viglas of the University of Edinburgh. Here was the abstract:
Flash SSDs are quickly becoming mainstream and emerge as alternatives to magnetic disks. It is therefore imperative to incorporate them seamlessly into the enterprise. We present the salient results of research in the area, touching all aspects of the data management stack: from the fundamentals of flash technology, through storage for database systems and the manipulation of SSD-resident data, to query processing.
The tutorial summary, already available in the ACM Digital Library, includes a comprehensive bibliography of database systems’ usage of SSD technology, from indexing, to caching, to query execution methods that can take advantage of flash memory storage. Here is a brief summary of what I thought were the important points made by Ioannis and Stratis during their talk, from the notes I made during the tutorial:
- With an SSD it is important to determine what mapping is used between logical buffers on the device and the physical flash blocks. There are page-level mappings which are performant but require lots of RAM, and (cheaper) set-associative mappings that use much less RAM but can require moving lots of blocks around with write operations, substantially lowering write performance.
- Empirical testing demonstrates that increasing the proportion of write operations can substantially reduce I/O read performance in a mixed workload that contains both reads and writes. The impact of writes varies by the type of device and proportion of write operations, and can easily increase the latency of read operations by a factor of 2 (or even higher).
- Conclusions from uFlip study in 2009 [2], slides available here:
- with SSDs, reads and sequential writes are very efficient;
- flash-page-aligned I/O requests, request sizes are very beneficial;
- random writes within a small LBA address window incur almost the same latency as sequential ones;
- parallel sequential writes to different partitions should be limited; and
- pauses between requests do not improve overall performance.
- In summary:
- flash has great potential for database system performance, particularly when utilized in a storage architecture that supports copy-on-write or external (disk) logging to reduce the need to write to the device;
- there are multiple classes of SSD devices with significant variations in performance;
- excellent random read latency is typical, but there are dramatic differences with write latencies;
- SSD devices don’t do read caching – RAM on the device is used for mappings, sometimes for write cache;
- across SSD devices, there are lots of differences in power consumption, throughput, reliability, wear leveling, and so on;
- PCI-based devices are the best performing, but are also the most costly at roughly $20/GB.
[1] Ioannis Koltsidas and Stratis D. Viglas (2011). Data management over flash memory. In Proceedings of the 2011 International Conference on Management of Data (SIGMOD ’11). ACM, New York, NY, USA, pp. 1209-1212. DOI=10.1145/1989323.1989455 http://doi.acm.org/10.1145/1989323.1989455
[2] Luc Bouganim, Bjorn Tor Jonsson and Philippe Bonnet (2009). uFLIP: Understanding Flash IO Patterns. In Proceedings of the Fourth Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, January 4-7, 2009.

Glenn Paulley is a Director of Engineering at Sybase iAnywhere.
