Contact Us Today!   |   + 1 (301) 424 3903



CFUnited - High Availability & Clustering Presentation - Hands On Details

My presentation on HA-Clustering at CFUnited actually consisted of two parts.  Firstly there was the PowerPoint bit which I uploaded to the CFUnited server and then there was the practical part which I would like to overview here for those who could not make the event.

I had intended to use two VMware copies of Windows 2003 Enterprise (32Bit) with Windows Network Load Balancing (NLB) running as the web server clustering mechanism,  ColdFusion Enterprise 8.01 and MySQL 5.0.  I attempted this on my Windows XP notebook with 3GB Ram and after several attempts I gave up because I just could not get enough responsiveness, I could never have gotten through the presentation effectively.  However, I do intend to attempt this again, in future.

Instead I reverted to running the tests on my notebook and the details are as follows:

 

  • System - Dell Inspiron E1705 with 3GB Ram running Windows XP. 
  • CF Version - ColdFusion Enterprise 8.01, installed in the multi instance manner with two instances clustered with RoundRobin algorithm.
  • Servlet Container - JRun 4 Updater 6.
  • JVM - Sun 1.6 (aka 6).
  • Web Server Apache 2.0.59
  • Test Application - cfwhisperer blog lab copy based on Mango blog.
  • Load Test Tool - Paessler Web Stress Tool.
  • Server Utility - SeeFusion version 4.0.7


I created a load test script by simply browsing the lab copy of the cfwhisperer blog in the URL recorder inside the Paessler Web Server Stress Tool and saving the URLs; this took me about 5 minutes total  and I had a very usable test script, that is one of the reasons I really like the Paessler tool.  I recorded 18 clicks in total as I have found around 20 to be optimal, for most test scenarios. As my presentation was only 60 minutes total I only showed one 10 minute test.   I kicked off the test which was running against the Apache web server which I connected to the two instance CF cluster with the wsconfig GUI utility.  So this was a 10 minute load test with 20 concurrent vUsers with 8 second think time between clicks.  As the test was running I first stopped one instance and we immediately observed the response times climb from an average of 300 milliseconds to a peak of over 20 seconds and then fall back to 300 milliseconds, this took around 30 seconds and although response times went up we still responded to requests.  I left this for a couple of minutes then restarted the stopped instance there was no similar slowdown as the instance came back, but we did see a slight rise to 600 milliseconds.  I then repeated the exercise by stopping the other instance and we observed similar results. 

My intention is to at least repeat this at cfdevcon in Brighton, England in September, 2008 and hopefully extend it a bit if I can.

CFUnited - High Availability & Clustering Presentation - Part 2 - Hardware & Software Clustering

This is the second in a series of blog posts leading up to my presentation on Clustering at CFUnited on June 20, 2008 at 2:00PM.  Clustering has been available to us in two apparently different forms.  Hardware Clustering and Software Clustering.  I say apparently different as some digging reveals interesting items, particularly as they relate to the current Software Clustering offered with ColdFusion – JRun.

Before looking more deeply into those issues, I wanted to lay out some basic differences between Hardware and Software Clustering.  In my view a Hardware Clustering system exists when there is a dedicated piece of hardware whose only job is to handle Clustering (Fail-Over and/or Load-Balancing).  One detail of note here, the Hardware Clustering device needs Clustering Software to operate. 

Software Clustering typically means that the Clustering Software is installed on an existing server, not a dedicated Hardware device.  One noted example of Software Clustering is Windows Network Load Balancing (aka NLB).  This is a bit of a misnomer, as this is a Clustering mechanism which includes Load-Balancing as well as Fail-Over.   Another example of Software clustering is the Clustering offered in ColdFusion-JRun which is actually based on J2EE standards.

If we dig a little deeper into the current J2EE clustering used in ColdFusion and JRun; since CF moved to Java with the MX version forward.   We see what I would class as a subset of full Software Clustering.  The reason I believe this is that J2EE clustering as applied the CF-JRun is a sub-set of Software Clustering is that it is purely peer-to-peer with no overall service watching all Cluster members.  Typically we would not only have multiple ColdFusion severs-instances but also multiple web servers or we would not have full redundancy.  Fail-Over over is not covered at the CF-JRun level if a Web server fails.  There has to be a higher level system to do this such as Windows NLB or a Hardware Clustering device.  As an historical point;  prior to MX, ColdFusion-JRun had a much more fully featured Software Clustering system called “ClusterCATS”, this did embrace to concept of a central monitoring-management service and Web server Fail-Over.  This was always tricky though if ColdFusion was deployed in the “distributed-mode” where the Web server and ColdFusion were on different physical devices. 

The last consideration in this piece, is when would we use Hardware or Software Clustering or both.  Typically Hardware Clustering scenarios are much more robust than Software Clustering, as that is their only job in life.  Having said that; Windows NLB claims support for up to 32 servers.  Going back to the Allaire-ColdFusion clustering, ClusterCATS, our testing showed that 8 servers was a good point to consider Hardware Clustering. 

Lastly, the overall term here is “Clustering” and within Clustering we have Fail-Over and Load-Balancing so the often used term “Load Balancing Device” should actually be “Clustering Device”.

CFUnited - High Availability & Clustering Presentation - Part 1 - An Overview Of Clustering

I will be presenting at CFUnited on Friday June 20, 2008, on the subject of High Availability (HA)-Clustering for ColdFusion/JRun applications and I intend to make this as practical as possible. Having spent many years travelling the world helping to fix slow or unresponsive ColdFusion applications, I see HA as a natural progression to this and in fact Load-Balancing, which is a part of Clustering, has a direct impact on improving performance.

There is a point, often overlooked by even the manufacturers of clustering/load balancing equipment. Clustering is the overall term which, in my opinion, applies whenever two items or more appear as one, to the users. In our world, that typically means multiple web servers, with multiple application and database servers.

With this aspect of Clustering there are two services which are a part of the Clustering; Fail-Over and Load-Balancing. In my experience Fail-Over is always present, meaning if one member of the Cluster fails the remaining members ensure that continuity of service is maintained. This is a prime function of a Cluster.

Load-Balancing is the apportioning of load around members of the Cluster, typically an even distribution of the load is what is required. The most even distribution is via Round-Robin which means each single request moves around the Cluster members, like this (this example shows a 3 member Cluster):

  • USER 1 > REQUEST1 > CLUSTERMEMBER1

  • USER 2 > REQUEST1 > CLUSTERMEMBER1

  • USER 1 > REQUEST2 > CLUSTERMEMBER2

  • USER 2 > REQUEST2 > CLUSTERMEMBER2

  • USER 1 > REQUEST3> CLUSTERMEMBER3

  • USER 2 > REQUEST3> CLUSTERMEMBER3

  • USER 1 > REQUEST4 > CLUSTERMEMBER1

  • USER 2 > REQUEST4 > CLUSTERMEMBER1

This is the most evenly balanced Load Balancing algorithm and as I mentioned above is the Round-Robin algorithm. Problems can occur with that algorithm if there are user specific items in memory on one of the Cluster members, for instance in memory session state variables. If USER1 has logged in to CLUSTERMEMBER1 above and their details are in session variables on CLUSTERMEMBER1 when users next request takes them to CLUSTERMEMBER2 those in memory session state variables will not be there. My preference for the optimal Load-Balancing algorithm is Round-Robin with Sticky Sessions. In the case a user “sticks” to a Cluster member as follows: 

  • USER 1 > REQUEST1 > CLUSTERMEMBER1

  • USER 2 > REQUEST1 > CLUSTERMEMBER2

  • USER 1 > REQUEST2 > CLUSTERMEMBER1

  • USER 2 > REQUEST2 > CLUSTERMEMBER2

  • USER 1 > REQUEST3> CLUSTERMEMBER1

  • USER 2 > REQUEST3> CLUSTERMEMBER2

  • USER 1 > REQUEST4 > CLUSTERMEMBER1

  • USER 2 > REQUEST4 > CLUSTERMEMBER2

    This is not quite as evenly balanced as Round-Robin alone but unless there is a failure of one Cluster member the user will not lose their session state variables and the load balances across all Cluster members eventually.

This article serves as the first in a series of posts leading up to the CFUnited presentation and my next one will delve into differences between Hardware and Software Clustering.

ColdFusion stability, metrics, logs and garbage collection

I was talking with TeraTech’s server tuning consultant Mike Brunt about ColdFusion tuning and clustering which we did for a client to improve stability and performance.  I thought others might be interested in what we did with server metrics, threads, logs and garbage collection in the Java Virtual Machine (JVM). We edited two files, as follows... 

1/ {JRun4_root}servers\default\SERVER-INF\jrun.xml

We edited the following lines Lines 126 through 130, we uncommented this section (it is sometimes already uncommented) made sure it was set to true

 

<!-- This Service provides metrics information -->

<!-- ================================================================== -->

<service class="coldfusion.server.jrun4.metrics.MetricsServiceAdapter" name="MetricsService">

    <attribute name="bindToJNDI">true</attribute>

</service>

 

We then edited the section from line 148 through 151 as follows...

 

<!-- You may also need to uncomment MetricsService if you want metrics enabled -->

    <attribute name="metricsEnabled">true</attribute>

    <attribute name="metricsLogFrequency">60</attribute>

    <attribute name="metricsFormat">Web threads (busy/total): {busyTh}/{totalTh} Sessions: {sessions} Total Memory={totalMemory} Free={freeMemory}</attribute>

 

We changed <attribute name="metricsEnabled">true</attribute> to true which turns on Metrics Logging we made no more changes to this section but it is good to note what they do. <attribute name="metricsLogFrequency">60</attribute> sets the frequency of the metrics logging to 60 second intervals, that can be changed, in my experience, 60 seconds is typically a good time interval.

The line below that determines what is displayed; {busyTh}/{totalTh} will show all busy threads and total threads in use.  The busy threads value shows you how many threads are actually in use when this snapshot was taken and the total the actual busy threads along with threads in every other state.  There is a good write up of different thread states here - http://www.bpurcell.org/blog/index.cfm?mode=entry&entry=934.  The total thread number shows all threads in every state, as a very broad rule of thumb, the greater the difference between busy and total the more efficiently the application is running.

Sessions: {sessions} shows the number of Java-J2EE sessions running which is enabled by turning on "Use J2EE Session Variables" in the Memory Variables section of CF Admin.  This will show you how many J2EE sessions are running every 60 seconds.  This is an indication of activity and this number will rise and fall as user sessions begin and end.

Total Memory={totalMemory} Free={freeMemory} is fairly self explanatory and shows the memory state in 60 second intervals, ideally we do not want to see free memory drop too low, I would get uncomfortable if it drops below 75 MB or 10% of the total.  Also, we want to see any used memory being released as quickly as possible.

We next edited the way logs are generated and created a split out of the logs by adding {log.level} to line 155 

<attribute name="filename">{jrun.rootdir}/logs/{jrun.server.name}-{log.level}-event.log</attribute> 

This last change enables more targeted logs which will be written to {JRun4_root}\logs.

 I recommend making these changes on all Production CFMX 6.1 systems and leaving them on, the information produced can be invaluable for troubleshooting.

 

2/ {JRun4_root}\bin\jvm.config 

This files is the main configuration file for the JVM and is loaded at start time.  After observing application behavior metrics from the logging changes above we applied the following changes to jvm.config...

-Xms768m -Xmx768m - This sets the start and maximum memory allotted to the heap at 768MB.  I typically set this number as high as is needed by the metrics logging and verbose garbage collection information.  In practical terms, on 32 bit Windows, it cannot be set any higher than 1.4GB. 

-XX:PermSize=128m -XX:MaxPermSize=128m - Here we added a start size for the permanent generation which I where the classes for JRun-CF are stored.  We did this because we observed all of this memory space being fully consumed many times, in the verbose garbage collection log. 

-Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 - These arguments generate an explicit Full Garbage Collection (Full GC) at 10 minute intervals.  We applied these because we observed Full GC's taking place at one minute intervals and in some cases at 30 second intervals.  This is far too often as when there is a Full GC, everything else on the server stops.

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:teratechgc.log - This argument set turns on logging of all garbage collections in a verbose manner and this is where we got our information for the memory settings along with the metrics information.  We turned this logging off after making the changes and checking that all was OK.

ColdFusion Internship

People often ask me about the ColdFusion internship at TeraTech, how does it work, who is it good for, why do we do it, etc. The Internship is a chance for people new to CF to learn the latest programming techniques hands on for 6 months from experienced programmers. Just as important is learning how to work well on a team and day to day tips of software development that you can't learn from a book. Interns start on small internal projects then as their skills progress they help out on real world projects. Interns often go on to get full time jobs at TeraTech or other tech companies.

We only accept people who have done some programming in some language before (doesn't have to be ColdFusion) or used an older version of CF and want to step up to the latest version. Often people in career transition or who need to get current job experience to move into their dream career. Because of all the mentoring interns receive it is on site at our Rockville MD office for 20-40 hours per week. Interns attend all TeraTech classes and event such as CFUnited at no charge.

So why do we do it? It is a way to give back to the community; a way to find and interact with potential new employees over the long term and a way to get help on community websites such as mdcfug.org and cfconf.org. Also intern often have lots of questions and questions and mentoring are a great way to keep our programming fresh and up to date. There is nothing like teaching some concept to someone else to really understand it yourself!

Does anyone else have a internship at their organization and how does your internship work?

TeraTech Webinar: Tune Your Problem Servers for Better Performance 1/31/08

TeraTech's CF Server guru Mike Brunt will be presenting a webinar on ColdFusion server performance tuning at 1pm EST on Thursday 1/31/08. The webinar will cover fixing slow servers, performance bottlenecks location and diagnosis tips.

Mike is planning to show how to parse logs, interpret metrics, how the JVM works, testing JVM configurations and using SeeFusion to troubleshoot and isolate stability issues

The webinar is Thursday January 31, 2008 at 1 pm Eastern (that is 12 pm Central, 11 am Mountain, or 10 am Pacific). It will be approximately 45 minutes including time for Q and A. Mike is a former Allaire and Macromedia consultant. I got a free registration code for readers of this blog which is ST102. The webinar is free except for the cost of a long distance phone call for the audio. You can register at https://www1.gotomeeting.com/register/451098722. Hope to see you there!

Radiology and Diagnosing CF Server Issues

I recently talked with Mike Brunt, CF Server tuning guru for TeraTech about how he approaches server issues. Heads up we are planning a webinar on server tuning on Thursday 1/31/08 1pm EST - more details coming soon.

For three years, from 1996 to 1999, I worked in the medical software world, in what is known as TeleRadiology, to be precise. This involved creating secured Wide Area Networks which spanned hospital groups and typically involved a central reading center where a group of Radiologists would read images which had been transmitted from each individual hospital. Radiologists would then send results back to the hospital which had sent the images to be read. As a point of interest, there are actually two main types of Radiology Images. The first are currently analog and are printed to physical film (x-rays are typical of that sort of Radiology Image) which then have to be scanned or digitized to high resolution images; a typical chest x-ray is around 12-14MB per image. The second overall kind of Radiology Image is already digital, MRI, CT, PET, Nuclear etc. I digressed a little there but thought that worthy of a little more explanation.

The Radiologists performed two main kinds of reads. The first was from the Emergency Room, where a quick response was imperative and where the Radiologist would be looking in the area of a known trauma or affliction, to deliver a targeted response to the ER Doctors. Very often, those were literally life and death situations. Afterwards, typically the following day, the Radiologist would go back to the image(s) and look for any other issues the patient might have. The other main place that Radiology holds in medicine is that of a major preventative discipline, often discovering ailments before they get too serious.

These principles could apply directly to server diagnostics; sadly, almost all server diagnostic issues stop in the "ER". When I worked for Allaire and then Macromedia, we had a team of 37 engineers who were often called in when an application was in ER, regularly failing or running very slowly. It was interesting to note that the reaction of most client companies, when faced with a poorly performing application, was to "throw" more hardware at the situation. This literally never worked because typically the "disease" was in the application code or in the overall support mechanisms-network, etc. Adding hardware in this situation is a bit like trying to give someone a second heart whilst the first diseased heart is still there; it won't work and the result will likely be terminal!

So in our training with Allaire, we were taught to go in and prevent "death" as quickly as we could, and we did, and then to perform that second read, just like the Radiologist, so that we could give clients a clear path to server health. In addition, we would always hope to leave the client with the ability to self-diagnose issues so as to prevent applications from being "life-threatening" in the future. Server diagnostics are an essential tool in ensuring ongoing application health if, as happens with applications such as MySpace, mercurial growth occurs or if there is simply a need to support structured growth. Applied correctly, server diagnostics will always create stability with scalability.

SQL Reserved word or not, that is the question

Here is a problem that we had on a recent project that we were doing maintenance on. The code had some queries with a table called as "user". The code works fine on the production server but gave an error on our development server. The fix? Put brackets around the table name in all the queries.

eg

select name from [user]

instead of

select name from user


We had different versions of SQL Server on the two boxes and user is now a reserved word on the development box version.

We figure out what words are reserved using Pete Freitag's reserved word tool at:

http://www.petefreitag.com/tools/sql_reserved_words_checker/?word=user

What do you want to see in CF 9?

What do you want to see in ColdFusion 9 (codename Centaur)? Yes Kevin Lynch, Chief Architect of Adobe told the crowd of 4300 attendees at MAX the other week that work is already underway on the new version, and showed ColdFusion's place on Adobe product roadmap. Puts to rest all that hullabaloo the other
month from a few folks thinking that CF was dead! The silent majority know CF rocks and is
alive! So what do you want to see in CF 9?

ColdFusion developer security guidelines

These ColdFusion developer security guidelines from Adobe are cool! And so much code that I review from other (best unnamed) organizations don't follow these simple tips. Check it out at at URL below and make sure that your apps are secure!

http://www.adobe.com/devnet/coldfusion/articles/dev_security/coldfusion_security.pdf

More Entries

BlogCFC was created by Raymond Camden. This blog is running version 5.9.002. Contact Blog Owner