White Paper

 

 

CLUSTERING

October 1997

 

 

 

A cluster is a physically loosely-coupled group of systems/servers and their storage subsystems that are interconnected through hardware and software to function as a single system in terms of management, application, access and computing. Individual systems, called nodes, forming the cluster are capable of performing the user's job. Should any become unavailable, other systems take over with little or no interruption. Additional systems can be added to speed up the process without needing to replace nor interrupt existing systems. The goal is to provide fast, easily accessible, and highly available computing service through the use of off-the-shelf components. Loosely coupled means coupled by networking technology , as opposed to storage channels or specialized interconnects.

Clusters presents very cost-effective solution to availability and scalability problems. They perform at a system level the same role as RAID performs to disks. Clusters combine the best features of fault tolerant mirrored systems and SMP's (symmetric multiprocessors)

Clusters can be categorized in one of three major classes:

A Class A cluster, sometimes referred to as Availability Clusters (figure SS3e2), ensures disk load balancing across servers, as well as server fail over, where the remaining servers can access all disks when one server failure occurs. Its characteristics and limitations are:
· Limited scale, usually two to four nodes
· No file sharing among clusters
· The recovery requires a transition period
· A server failure may trigger batch scripts to re-launch applications
· The inter-node communication is simple. It simply detects that one server has failed

A class B cluster, referred to as Scaling Cluster (figure SS3e3), has the ability to scale applications by spreading computing across nodes, allowing application on multiple nodes to coordinate access to shared disk data. It requires a fast interconnect (FDDI, Fast Ethernet, or similar) to ensure heartbeats and locking. Its limitations are:
· Disk sharing occurs at raw disk level, not at cluster file system
· It requires application modifications before advantages are realized. In UNIX, it is practically restricted today to Oracle Parallel Server Database applications
· It is usually limited to four to eight nodes

Class C clusters, called Performance clusters (figure SS3e4), are characterized by the distributed lock manager's ability to be fully utilized by the Operating system. They allow cluster file operation, where multiple nodes can perform I/O operations and access files concurrently on the same disk. Any node can access any device. Other advantages include:
· Simplified system management, since it is seen as a single disk system
· The ability to use LAN/WAN as an interconnect, allowing long distance or remote disaster protection
· Very large configurations

A distributed Lock Manager is a hardware-based technology that lets clustered machines exchange memory information more quickly than with conventional networking architectures such as Ethernet or FDDI

As clustering of servers becomes feasible, it is possible to design storage architectures that allow access to nearly infinite amounts of storage with increased reliability and performance.

By: Farid Neema

This Paper was produced by:
PERIPHERAL CONCEPTS, INC.
351 Hitchcock Way, Suite #B-200
Santa Barbara, California, 93105
Tel: (805) 563-9491
fneema@periconcepts.com