Applications that require the transfer
or movement of large amounts of data are prime candidates for
SAN. These applications may refer to horizontal applications (e.g.,
backup, archiving, data replication, disaster protection, and
data warehousing) or vertical applications (e.g., online transaction
processing (OLTP), enterprise resource planning (ERP) business
applications, electronic commerce, broadcasting, prepress, medical,
and geophysics). See Figure 1. SAN is also well suited to making
performance and high availability more scalable and more affordable
in applications such as clustering and data sharing.
This article discusses two major horizontal applications, backup and data sharing, and how they interact with SAN.
Backup in a SAN Environment
One of the first applications that users want when implementing
SAN is to be able to back up and protect their data through the
SAN. They want to offload heavy backup traffic from the LAN, free
system bandwidth for production operations, and gain the speed
and security advantages of centralized management that SAN offers.
Effectively protecting data on a SAN requires a number of elements.
Many of them are currently in the early stages of implementation.
These items include:
· Centralized management
· Support for sharing removable-media libraries
· LAN-less and server-less backup
· Heterogeneous platform support
· Remote vaulting and mirroring
· Realtime backup
Centralized management: Ideally, a central console would manage all the
logical and physical storage resources of an enterprise network.
The console would automatically collect, correlate, and analyze
capacity, configuration, use, and performance information on all
storage resources. The logical resources monitored would include
file systems, directories, files, and application-specific storage
repositories. The physical resources tracked would include disks,
RAID systems, tape libraries, optical jukeboxes, Fibre Channel
components, Network Attached Storage (NAS), and SAN switches and
hubs. Nearly every vendor offers some degree of centralized management.
The leaders in this area are Veritas, Legato, Computer Associates
(CA), and IBM.
Support for sharing removable-media libraries: Performing
backups often involves backing up many different servers to locally
attached tape drives. One benefit of SAN and NAS connectivity
is the ability to share resources (e.g., a large tape library)
among multiple backup servers. Shared resources enable administrators
to consolidate backups into one tape library.
However, the support must extend beyond simple connectivity to
a library and into management. Managing a library means managing
access to the media stored within it and requires dynamic drive
allocation among servers, so the server that needs a drive most
at a given time can get it (e.g., when recovering a large database).
Managing a library involves managing not just backup but any application
that might need access to tape or optical storage.
In many cases, the ability to connect a library to multiple backup
servers via the SAN will justify the expense of automation. In
this environment, Hierarchical Storage Management (HSM) becomes
economically desirable. Legato, Veritas, CA, and Seagate Software
are the leaders in developing shared tape-library support.
LAN-less and server-less backup: Backup is evolving in
three phases when it comes to data movement. Currently-the first
phase-data moves from the disk, to the server it directly connects
to, through the LAN, to another server that, in turn, transfers
data to the tape (Figure 2). In the second phase, SAN lets you
perform backup outside the LAN. Data moves from the disk to the
server, which retransmits it through the SAN to a SAN-connected
library. This setup is sometimes called LAN-less backup (Figure
3). In the third phase, the server initiates the backup command.
Data moves directly from disk to tape through the SAN fabric without
further involving the server or the LAN. This configuration is
called server-less backup (Figure 4). Intelliguard, which Legato
recently acquired, has led the development of server-less backup.
Heterogeneous platform support: Early SAN implementations
are generally homogeneous. As SAN environments mature, they will
become more heterogeneous. Effective SAN management software will
need to be able to manage any vendor's server communicating with
any vendor's storage, hosting any database, application, or file
system, backing up to any tape drive or library, through any switch,
hub, router, or bridge. EMC and Veritas are examples of vendors
supporting heterogeneous platforms.
Remote vaulting and mirroring: The connectivity distances
that Fibre Channel allows-10 to 20 km., depending on usage-make
it easier to deploy remote sites for business comtinuance and
disaster recovery purposes. Use of remote backup, remote vaulting,
and remote mirroring techniques are likely to increase due to
this capability. SANs can also connect to WANs to achieve additional
levels of connectivity and protection. CommVault is one of the
vendors offering remote vaulting capability. CNT offers a SAN-to-WAN
solution in SCSI connectivity and Enterprise Systems Connectivity
(ESCON), and is also developing support for remote Fibre Channel.
Realtime (or window-less) backup: The importance of window-less backup (also called hot backup) becomes obvious when it addresses the large volume of data in a SAN centralized backup library. Realtime backup essentially lets you back up a volume or file periodically and automatically without affecting normal system operations. The technique commonly used is called a snapshot, where you make a copy of the volume needing backup, and then back up the copy while accessing and modifying the original volume in normal operations. Network Integrity leads in development, and EMC and HDS have implemented solutions in currently available products.. Major providers of total backup solutions include ADIC, ATL, StorageTek, Hewlett-Packard (HP), Exabyte, and Overland.
Resource and Data Sharing
In a heterogeneous environment where platforms
are by definition different, the distinction between resource
sharing, data copy sharing, and true data sharing must be made.
Resource sharing: A storage subsystem attached to multiple
computer platforms is divided into partitions, each partition
being accessible only to its owning platform or to a certain number
of homogeneous platforms. The administrator can reassign storage
capacity to different platforms as needs change. One of the benefits
of SAN connectivity is its ability to share resources (e.g., a
large tape library) among multiple backup servers. Such sharing
enables administrators to consolidate backups-from many different
servers to locally attached tape drives-into one tape library.
Dynamic resource sharing: All storage is available to any
connected host; hosts are allocated storage as they need it. If
one host needs the storage, it can use any or all the available
space. If a host deletes a file, that space is available to any
other host. This dynamic storage sharing operates automatically
and transparently. Dynamic resource sharing means that the systems
administrator doesn't have to partition the storage before storing
the data.
Data copy sharing: This process involves replication of
the data. Data is the same across copies at the time of copy creation,
but the copies can change independently afterward. There is no
assurance that they will remain identical. Data access is usually
prevented during replication so the copy accurately reflects all
the data at a particular time. For large amounts of data, the
time needed to copy it may be important, , and the amount of storage
necessary to store the copy could be very large. SAN facilitates
data-copy sharing by allowing high-bandwidth connections to transfer
large volumes of data.
True data sharing. If you are sharing data without making
a copy, multiple computer platforms can access the same physical
instance of the recorded data on a storage subsystem. This type
of sharing is called true data sharing. Different levels of performance
and complexity exist in implementing true data sharing: The first
level is when heterogeneous platforms can access data, but only
the original data owner can modify it. The second level is when
multiple heterogeneous platforms can update and rewrite a data
item, but only one at a time. In this case, you must use a locking
mechanism to momentarily prevent a platform from updating the
data. The third level is called concurrent data sharing and exists
when all platforms can either read or update the data at the same
time. The advantages of true data sharing are numerous. With only
one copy of data, you never need to replicate the data for use
elsewhere, you simplify data maintenance, and you eliminate problems
due to out of sync conditions. True Data Sharing among platforms
running heterogeneous operating systems requires translating to
one common operating system (see File management discussion under
SAN Management Software on page XX). Examples of vendors offering
implementations of true data sharing in a SAN architecture are
Sequent, Mercury Computer Systems, DataDirect, Transoft, Retrieve,
and Network Disk. In a NAS architecture, NetApp, EMC, Sun, IBM,
and Procom offer true data sharing solutions.
By: Farid Neema
PERIPHERAL CONCEPTS, INC.
351 Hitchcock Way, Suite #B-200
Santa Barbara, California, 93105
Tel: (805) 563-9491
fneema@silcom.com
This article was published in the May 1999 issue of Windows NT Magazine