Contact Us Today!   |   + 1 (301) 424 3903



Clustering and fail over for CF and SQL Server

We were recently working on a site that was growing rapiding in traffic. How should it be clustered and why? In putting together this recommendation we took into account the anticipated increase in traffic volumes and also are striving to avoid too much distribution of assets across a network.  By avoiding the use of Network Attached Storage (NAS) or Storage Area Network (SAN) we believe you can achieve maximum efficiency and maximum redundancy. 

 

Initially with two physical ColdFusion servers and two physical SQL Servers, a failure of one will not cause a total loss of service.  If we employ either a NAS or SAN device, that could be a single point of failure.  The only complication of having two Web-ColdFusion-File Servers is in making sure that both systems are fully replicated.  If this is seen as too considerable a job we could easily add a file-server connected via Gigabit Ethernet, however that once again would introduce a single point of failure should the file server fail. There is software available to automate the copying of code across a server farm (more on that in a later blog post)

 

One other major point, relating to clustering.  The software clustering in ColdFusion cannot handle fail-over from server to server.  So in the event that Server 1 of 2 in a cluster fails, all users on Server 1 would lose their sessions and work.  In addition, without a hardware clustering device,  there is no effective way to share incoming traffic between the two ColdFusion servers.  In order to deal with this we recommended a hardware clustering device which would handle all incoming traffic and who’s main job will be to send users to Server 2 should server 1 fail.

 

We are also taking account of another good feature in database design and use.  There are two distinct uses for databases:

 

OLTP : On Line-Transactional-Processing - This is what we would characterize as the ongoing day-to-day functional copy of the database. It is where data is added and updated but never overwritten or deleted.  This what most companies have and what they use for everything, including, typically heavy, reporting needs.

 

OLAP: On Line-Analytical-Processing - In this model data is stored in a format which enables the efficient creation of data mining/reports. OLAP design should accommodate reporting on very large record sets with little degradation in operational efficiency.

 

Our proposal involves two SQL Server databases which will operate in an Active-Passive method for OLTP work, the ongoing adding and updating of data.  So Database Server 1 will handle all the OLTP traffic unless it fails in which case all traffic will go to Database Server 2 which will have an up to date replicated copy.  Database Server 2 will function as a read-only reporting server in an OLAP way and in the case of a failure of Database Server 2 traffic can be sent to Database Server 1 for OLTP and OLAP duties until Database Server 2 recovers.

Using this method there is good redundancy-failure coverage and also some distribution of load for reporting needs.

 

Related Blog Entries

Comments
Daniel Sellers's Gravatar I may be off course here but I am 90% sure that using a ColdFusion cluster you can activate persistent sessions that allow all the servers in the cluster to access the session information for all the users. ( J2EE cluster). I remember reading about this in several ColdFusion books and never had a chance to finish playing around with it before moving to my new job.

While hardware is probably the optimal method I am pretty sure you can get this same solution ( not as simple) with Windows sharing IP addresses and a cluster with persistent sessions. Might take a little more work but it is free.
# Posted By Daniel Sellers | 5/27/08 7:20 PM
Gary Fenton's Gravatar CF7+ supports session replication as Daniel points out. It works well with small clusters but large ones will see a big use of local network bandwidth and server RAM to handle constant session replication. This has come to light in a couple of blogs I've been reading and commenting in. The other downside is a failover can take 20-30 secs from an active user's perspective who's waiting for a request to complete. The big winner (for me) is that if a server dies the users won't get logged out and their request and session will continue, albeit with a delay.

The solution from what I've learnt from others is to store the session scope in a database. (We're not talking about client vars.) So if a server goes down and users are bounced to another server your code should discover that the user already has an active session and it will restore it
from the db. I haven't tried it yet but hope to have a play.

Windows NLB seems to handle a server going down and picks up a request if another server doesn't respond. I'm not convinced a hardware load balancer is the only solution.

NAS/SAN is not necessarily a single point of failure because some models have redundant controllers, PSUs, fans, and of course having mirrored arrays takes care of a single drive failure.
# Posted By Gary Fenton | 5/27/08 10:44 PM
Gary Fenton's Gravatar Forgot to say it's good to hear how others are handling clustering. There's not enough info or blogged experiences out there about this subject. I wish Adobe would address this as clustering is one of the big features of CF Enterprise and it's what some people are paying $$$$ extra for.
# Posted By Gary Fenton | 5/27/08 10:50 PM
Eric's Gravatar I've done something like this recently. I posted a blog about it here http://ericrandle.com/blog/pt/default.aspx?id=23&a...

Esentiallly we still used harware for laodbalancing, but my major issue was replication. I'd be happy to answer any questions about my expierence.
# Posted By Eric | 6/5/08 1:45 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.9.002. Contact Blog Owner