FTP Home   WSS Home   Customer Service   Site Map
 

Creating a Resilient Windows Infrastructure
Proper implementation of clustering technologies can help ensure that your systems are always available.
by Danielle Ruest and Nelson Ruest

August 17, 2004

For This Solution: Windows Server 2003, Web, Standard, Enterprise, or Datacenter Editions; Windows Server 2003, Enterprise, or Datacenter Editions; Server Cluster Service; Network Load Balancing Service

Business continuity or disaster recovery, as it is sometimes called, is made up of two different components—components that have little to do with each other but that must be pieced together to ensure that your systems are always up and running.

The first is a proper backup strategy. The key to this backup strategy is not so much how backups are performed, but rather how complicated and timely the system or file restorations will be when and if you need them. Too many organizations put in place highly complex backup systems that cost an arm and a leg, only to find out that when the time comes, they can't actually restore the data they need. Or, even worse, too many put in place complex backup technologies that really don't speak the Windows language because they need to rely on Windows' built-in backup technology, NTBackup.exe, to create an original backup to file in the first place so they can transport this file to a central location. It goes without saying that anything that is that expensive but must rely on a free backup tool is only as good as the free backup tool itself.

The second core element of a resiliency or continuity strategy is focused on availability, or making sure systems are available to respond to demand when they are required. One of the best ways to ensure availability is to use clustering technologies, that is, use several servers to perform the same function. If one of the servers fails, another picks up the request and users are usually completely unaware of the failure.

There are several ways to put this type of resiliency in place. You can purchase special hardware and software and run a special cluster service or you can use the default clustering technologies that are part-and-parcel of Windows Server 2003. The latter includes two different clustering technologies, though Microsoft offers three different clustering technologies as a whole (see Figure 1). The two clustering services offered by the operating system are focused on basic networking services. They include the Network Load Balancing (NLB) Service and the Microsoft Cluster Service (MSCS). The third is provided by Microsoft Application Center Server 2000 and is focused on Component Load Balancing. All three are divided into the three tiers of a modern Web service: presentation, logic, and data.

Presentation is provided by the NLB service because this service is designed to automatically redirect any client request to the first available server in the cluster. Since any server can provide the service, they usually have identical configurations and store identical content. Business logic or COM+ processing is provided by the Component Load Balancing service. Because the servers all process the same logic with the same parameters, once again the servers tend to be identical. The third layer deals with data storage, and depending on the configuration it might or might not require servers to be identical. Here, the clustering service determines which machines in the cluster respond to client requests.

Windows Server 2003 provides clustering services for the first and the last tier. Both have been available since Windows NT and were much improved in Windows 2000 Server, but they haven't been commonly available until the release of Windows Server 2003. The NLB service, for example, is available in all editions of Windows Server, from the Web to the Datacenter edition. Since NLB supports up to 32 servers per cluster and can even be further expanded through the Domain Naming System round-robin technique (several IP addresses are made available to respond to a request), once one address has been used, it is moved to the bottom of the list so that the next request is fulfilled by the second address, and so on—to provide a comprehensive response capability for any network-based service.

For the last tier, Windows Server offers the Clustering Service, which is available through both the Enterprise and the Datacenter editions. This service can bring together from one to eight servers or nodes depending on the hardware you run it on. It is much more focused on data-driven applications such as the file service, the print service, Microsoft SQL Server, or Exchange Server.

Clustering 101
There are two types of clusters: shared nothing and shared everything. Shared-nothing means that although all servers or cluster nodes can own shared cluster resources, only one node can manage these resources at one time. This means that in a three-node cluster, with nodes being A, B, and C, if node C is currently running the SQL Server service, it has full access to the data stored in the shared disk. While node C is running this service, no other node can access the SQL Server data until or unless node C relinquishes its hold on it. This can only happen if node C fails and the service is picked up by another node. It then becomes the exclusive owner of the data.

In shared-everything clusters, all nodes can own and access the shared resources at the same time. For example, in the case of a disk-shared resource, the shared-everything model requires some form of Distributed Lock Manager because it must queue requests to resources and process them as they come up in the queue.

In addition, clusters can be configured as active/active or active/passive. The former means that each node in the cluster is doing something. In our example, node A might be running an instance of SQL Server at the same time as node B, and node C can be running Microsoft Exchange but be ready to help pick up the slack in the event of a failure on another node. Active/passive clusters are more expensive to implement because they consist of one node running a service while another is waiting in standby mode for the service to fail or be transferred to it. You can use multiple configurations that include both active/active and active/passive services. It all depends on the number of nodes in your server cluster. For example, in a three-node cluster, you could configure node A to run SQL Server and node B to run Exchange while node C is passively waiting to run in the event of a failure of either node A or B. The more nodes in your cluster, the more complex your cluster configuration can become.

MSCS supports the shared-nothing model. While MSCS can be configured as an active/active cluster or active/passive cluster, each instance of the same service must have access to its own resources. For example, SQL Server on a cluster must have access to its own database. Therefore, you can use an MSCS cluster to run multiple instances of SQL Server, but each instance will provide a completely different service because each has exclusive access to its database. On the other hand, each instance can automatically provide failover for each of the others.

In a shared-everything cluster, each node has access to the same resources, but because a disk drive can only process one request at a time, each node must queue its resource access requests. This means that there is a potential bottleneck in this type of configuration. Your shared-everything cluster will only be as good as the Distributed Lock Manager included in your cluster service. The advantage is clear, however. Because each node has access to the same resources, every node can run exactly the same services. In this case, each node running SQL Server can run against the same database. This should result in even fewer outages because you don't have to wait for the clustering service to start a failed service on another node; each node is already running the same service on the same database.

The Microsoft Cluster Service
Windows Server 2003's Cluster Configuration Wizard makes setting up a cluster easier than in Windows 2000 (see Figure 2). The first thing the wizard does is ensure you have the proper configuration to create your cluster. In addition, the cluster service is automatically installed on both the Enterprise and the Datacenter editions. In Windows 2000, you add the cluster service after setting up and rebooting the server. In Windows Server 2003, any machine can set up a cluster quickly.

Cluster configurations vary from one to eight nodes depending on if you use Small Computer Systems Interface (SCSI) or Fibre Channel to connect your servers to the central or shared data store. Clusters connected through SCSI can include up to two nodes. Clusters connected through Fibre Channel can include up to eight nodes depending on the operating system and the storage technology. Table 1 outlines how many nodes each version will support. If your cluster is running more than one operating system, the number of nodes will be limited to the capability of the oldest operating system in the cluster. The cluster service is usually used for multiple-node configurations, but Windows Server also supports a single-node cluster, containing only one server and direct-attached storage, rather than shared. Single-node clusters have their uses and prove to be a valuable addition to the service (see the sidebar, "Single-Node Clusters").

Setting up a cluster is no more complicated than launching the Cluster Administration Console (Start Menu | Administration Tools | Cluster Administrator) and selecting create a new cluster. Of course, you have to do some preparation. The very core of an MSCS cluster is the shared resource—in this case, a shared disk drive that must be set up and prepared before you create the cluster. There are several configurations for this, but the most common is that server nodes are set up with just enough direct-attached storage for the operating system, and are linked to a shared storage resource that will be used by the cluster service. Windows Server 2003 also supports clusters that don't include any direct-attached storage.

Because MSCS uses the shared-nothing model, only one node may access a shared resource at a time. This means you must properly design how the cluster will administer the services it will host, and configure shared disk resources appropriately. In fact, you might have to create multiple drives. It's common to dedicate one to each of the different services the cluster will host (see Figure 3). Once each disk resource is created, you must link it to each of the servers that will act as nodes within the cluster. The tricky part is to make sure that once the disks have been formatted by one node, you only link it to the other node in Disk Manager and don't choose to reformat it.

Once this is done, you can create the cluster and begin to create clustered services. This is another area where Microsoft made improvements recently. In Windows 2000, for example, creating a print server cluster was a convoluted process. Drivers needed to be installed on each node and the shared service had to be prepared just the right way. In Windows Server 2003, all you have to do is create the shared spooler service on one node, connect to the virtual server, install and configure the printer on the first node, and then force a failover to the other nodes. This automatically installs all printer drivers on the other nodes. The same idea applies to other services the cluster can support. All are simpler to install and manage than before. Cluster administration can be performed through the Cluster Administration console. In addition, the cluster.exe command lets you perform all configuration and administration tasks from the command line. Overall, the cluster service is a major improvement that makes it much easier for any organization to use this technology.

The Network Load Balancing Service
Server clusters are designed to support complex services that must access a single shared storage resource. This means it is ideally suited to functions such as file, print, database, e-mail, and other file-intensive services. Network Load Balancing (NLB), on the other hand, is designed to access resources that are much more static in nature. But once again, the NLB service in Windows Server 2003 has been enhanced to make it easier and more accessible to organizations of all sizes. Like the cluster service, it is installed by default. This means that to create a cluster, all you need to do is launch the NLB Manager, right-click on NLB Clusters, and select New Cluster. This launches the NLB Cluster Wizard. Provide a name for the cluster, its IP address, and the cluster communication mode. Unicast is best with at least two network interface cards (NIC) for each server, though Windows Server supports multicast. The latter should be used only when a single network card is available and only if your routing hardware supports multicasting. Ideally, all the nodes in your NLB cluster will have at least two NICs.

This is also where you configure remote administration for the cluster. Ideally, you will not enable this service. NLB cluster administration in Windows Server should be performed through the NLB Manager graphical interface and not through the nlb.exe command line interface because the latter only stores a single password for the administration account. This could cause a security breach on your NLB clusters. Once the virtual name for the cluster has been created, you can add additional IP addresses for the cluster, and add up to 32 nodes to the cluster.

NLB provides two clustering scenarios. The first is for load balancing. This means that each machine responds to client requests as they come in. When one machine is busy, another takes up the load. In this configuration, most nodes should handle the same number of clients. This means that machines should be as similar as possible. If they are identical, you can use disk images and the sysprep.exe command to duplicate the servers. You can also use NLB clusters for availability. This means that you create the cluster with machines that are not identical and set different priorities for each machine. In this case, you would set the highest priority—one being the highest—for the most powerful machine. Because this machine is more powerful than the others, it will always respond to requests, unless it is down. Requests will then be served by the second most powerful machine for which you have set the second-highest priority level. If this one fails as well, then requests are served by the third machine and so on.

The major advantage of any clustering service is that you can continue to service requests even when performing maintenance on one of the nodes. This is a great advantage, especially with the proliferation of patches and software updates we face today.

Putting It All Together
How do you choose one clustering mode over another? The best way is to focus on the service. Disk-intensive tasks with changing data should use the cluster service. Tasks with static data should use NLB. For example, if you want to put together a comprehensive solution for the support of Terminal Services in your network, you would use both services. Use NLB to support the Terminal Services sessions your users open when they connect to the service. NLB is ideal for this service because all servers run identical software and all sessions are stored in RAM. But to provide a more refined user experience and allow users to disconnect from a session and reconnect to the appropriate session in the cluster, you will need to run the Terminal Services Session Directory. Because this is a service that stores data about each open session in a database, this will require the Server Cluster service. By using a server cluster, you will ensure that the Session Directory service is always available and no one will lose their connection to an open Terminal Services session. In addition, the server cluster can host the Licensing service for Terminal Services. This way, both of the Terminal Services components that require a database will be made fully available (see Figure 4). It could also run the file and print services users require when working through Terminal Services.

Another perfect example of a creative cluster implementation is the metropolitan cluster. In Windows Server 2003, the Microsoft Cluster Service supports a new type of shared configuration disk or quorum that is called a Majority Node Set. Normally, a cluster must use a shared disk to store information about the configuration and operation of the cluster. This is the quorum disk. In a Majority Node Set, the quorum disk can be distributed to different physical disks that are not shared. Data is then replicated from one disk to the other to maintain the consistency of the cluster. Because the disks are separate and because Fibre Channel connections can span up to 10 KM, organizations can set up metropolitan clusters, or clusters whose nodes are in separate physical locations (see Figure 5). This goes a long way toward supporting full business continuity since one or more sites can fail, but the service is always available to users.

As you can see, the potential for business continuity with the cluster services provided by Windows is enormous. All you need to do is use a little creativity and ingenuity when designing your clustered services and your help desk will no longer get downed-service calls.

About the Author
Danielle Ruest and Nelson Ruest (MCSE, MCT) are multiple book authors focusing on systems design, administration, and management. They run a consulting company that concentrates on IT infrastructure architecture and change and configuration management. You can reach them at .