A cluster is a physically loosely-coupled group of systems/servers and their storage subsystems that are interconnected through hardware and software to function as a single system in terms of management, application, access and computing. Individual systems, called nodes, forming the cluster are capable of performing the user's job. Should any become unavailable, other systems take over with little or no interruption. Additional systems can be added to speed up the process without needing to replace nor interrupt existing systems. The goal is to provide fast, easily accessible, and highly available computing service through the use of off-the-shelf components. Loosely coupled means coupled by networking technology , as opposed to storage channels or specialized interconnects.
Clusters presents very cost-effective solution to availability and scalability problems. They perform at a system level the same role as RAID performs to disks. Clusters combine the best features of fault tolerant mirrored systems and SMP's (symmetric multiprocessors)
Clusters can be categorized in one of three major classes:
A Class A cluster, sometimes referred to
as Availability Clusters (figure SS3e2), ensures disk load balancing
across servers, as well as server fail over, where the remaining
servers can access all disks when one server failure occurs. Its
characteristics and limitations are:
· Limited scale, usually two to four nodes
· No file sharing among clusters
· The recovery requires a transition period
· A server failure may trigger batch scripts to re-launch
applications
· The inter-node communication is simple. It simply detects
that one server has failed
A class B cluster, referred to as Scaling
Cluster (figure SS3e3), has the ability to scale applications
by spreading computing across nodes, allowing application on multiple
nodes to coordinate access to shared disk data. It requires a
fast interconnect (FDDI, Fast Ethernet, or similar) to ensure
heartbeats and locking. Its limitations are:
· Disk sharing occurs at raw disk level, not at cluster
file system
· It requires application modifications before advantages
are realized. In UNIX, it is practically restricted today to Oracle
Parallel Server Database applications
· It is usually limited to four to eight nodes
Class C clusters, called Performance clusters
(figure SS3e4), are characterized by the distributed lock manager's
ability to be fully utilized by the Operating system. They allow
cluster file operation, where multiple nodes can perform I/O operations
and access files concurrently on the same disk. Any node can access
any device. Other advantages include:
· Simplified system management, since it is seen as a single
disk system
· The ability to use LAN/WAN as an interconnect, allowing
long distance or remote disaster protection
· Very large configurations
A distributed Lock Manager is a hardware-based technology that lets clustered machines exchange memory information more quickly than with conventional networking architectures such as Ethernet or FDDI
As clustering of servers becomes feasible,
it is possible to design storage architectures that allow access
to nearly infinite amounts of storage with increased reliability
and performance.
By: Farid Neema
This Paper was produced by:
PERIPHERAL CONCEPTS, INC.
351 Hitchcock Way, Suite #B-200
Santa Barbara, California, 93105
Tel: (805) 563-9491
fneema@periconcepts.com