White Paper

 

 

STORAGE CACHING TECHNIQUES

October 1997

 

 

Record cache

The general principle of cache is that careful management of a small amount of fast (and thus relatively expensive ) storage improves the performance for a much larger amount of slower storage. Cache usually delivers a performance benefit when accesses to data are organized in a predictable way that allows the cache management to maintain the best data in the cache. Two characteristics of data access ease the task of managing cache:
· data reuse: when an application accesses data once, it often accesses the same data again shortly thereafter
· locality of reference: when an application accesses some data, it frequently accesses data located nearby, shortly thereafter.

Many applications exhibit these two characteristics, and that explains the widespread popularity of cache as a means of improving performance. Clearly, the design of the algorithms used to manage the cache is crucial to taking advantage of these characteristics.

Cache delivers benefits to applications whenever a cache hit I/O occurs. When a cache miss I/O occurs, a cost is associated with reading data into cache. For some applications the costs of managing cache are greater than the benefits, and the use of cache can actually deliver worse performance that not using cache at all.

Cache and I/O Performance

A read cache may be used to hold data in anticipation that it will soon be read, either because data at immediately preceding addresses has recently been read (pre-fetch cache) or because the data itself has recently been accessed (most recently used cache). In either case, when an application requests cached data , the data can be delivered immediately, without the latency of disk seeking and rotation. I/O requests for cached data complete faster, improving application responsiveness.

Write cache can be used similarly to improve application responsiveness. An application write request causes data to be copied into the write cache, after which the application is notified that the write is complete, and can continue executing. The I/O subsystem writes the data to disk sometime later, ideally using time when disk would otherwise be idle. Such a cache is called write-behind cache, because actual writing of data follows behind the notification that it has been written.

There is a limit to the number of requests this strategy queues before the space recovery process may delay the processor if write requests arrive too rapidly. It also puts the system in danger of losing data due to power failures, if the power failures occurs before data is written on disk. If the power fails or the computer is turned off, before data is transferred to disk, data can be lost. An option consists in placing file caches in non-volatile RAM through providing a UPS (Uninterruptible power supply) or a battery backup. In this case, the write behind cache is called write-back

Write through caches immediately write sectors to disk and then notify the system it has written the data . It holds a copy in the cache in case the system needs to read the data again. When a block is updated by a new write, if the data from a previous write is still in cache, the cached data is simply over-written in the cache, eliminating the need for disk write

Write cache is particularly effective in overcoming the write penalty of RAID 5. This is not true if a steady stream of write requests is sustained for a long period of time, because then the cache fills in and must be flushed to make room for newly arriving data, at intervals where disk is not idle.

Caching appears at different levels of the memory hierarchy. Disk drives may act as cache for slower media as optical drives or tapes, or even slower disk drive configurations. For example, a RAID 1 configuration may cache data written before transferring it to a RAID 5 configuration

Caching file systems

A Caching file system is a unique file system-level utility that runs on top of the operating system's native file system. It acts to improve all aspects of data accessibility. A caching file system is an advanced, distributed file system that also has HSM capabilities. Its objective is improved workstation performance, network data accessibility, and workstation storage management. Redundancy for work-station data and files is created and maintained on the file servers. Server volumes can even be copied to other servers for additional levels of redundancy.

Figure SS3e1

The software allows for continuous accessibility even during planned and unplanned outages. Active user files are always located closest to where they are used (for performance reasons) on the individual workstations, allowing users to be productive during planned or unplanned server outages. If the workstation is disconnected or inoperable, the user may sign onto the network at any other node (even with a portable). The user then has access to all their files because they are resident on the server. When servers or workstations come back on-line, the file systems are re-synchronized. Even a storage failure at the workstation or at the server causes no interruption in service.

Performance is further enhanced by migration of infrequently used files off workstations to servers while frequently accessed files are maintained locally for maximum performance (just like a cache). The migration rules can typically be established on the basis of last access date or disk capacity thresholds. This capability is the same as hierarchical storage management, HSM.

Two basic forms of advanced file system products exist: caching file systems and distributed file systems. Redundant distributed file systems like Veritas' Vxfs, and Digital's Advanced File System make and keep synchronized multiple copies like a caching file system but lack the HSM capabilities. Without HSM, they are unable to achieve the performance requirements for high data accessibility. For highest availability and flexibility, these advanced services must be layered upon flexible and reliable, local file systems, such as NFS.

In the PC LAN market, several products have come to market that allow portable computers to connect/disconnect and re-synchronize application files. These are application specific and do not operate at a file system level.

Caching file systems are the front-edge of an important evolution in storage management. File system-based processes will yet provide some of the key technology for automation of storage and systems management.

By: Farid Neema

This Paper was produced by:
PERIPHERAL CONCEPTS, INC.
351 Hitchcock Way, Suite #B-200
Santa Barbara, California, 93105
Tel: (805) 563-9491
fneema@periconcepts.com