ColdFusion stability, metrics, logs and garbage collection

I was talking with TeraTech’s server tuning consultant Mike Brunt about ColdFusion tuning and clustering which we did for a client to improve stability and performance.  I thought others might be interested in what we did with server metrics, threads, logs and garbage collection in the Java Virtual Machine (JVM). We edited two files, as follows... 

1/ {JRun4_root}servers\default\SERVER-INF\jrun.xml

We edited the following lines Lines 126 through 130, we uncommented this section (it is sometimes already uncommented) made sure it was set to true

 

<!-- This Service provides metrics information -->

<!-- ================================================================== -->

<service class="coldfusion.server.jrun4.metrics.MetricsServiceAdapter" name="MetricsService">

    <attribute name="bindToJNDI">true</attribute>

</service>

 

We then edited the section from line 148 through 151 as follows...

 

<!-- You may also need to uncomment MetricsService if you want metrics enabled -->

    <attribute name="metricsEnabled">true</attribute>

    <attribute name="metricsLogFrequency">60</attribute>

    <attribute name="metricsFormat">Web threads (busy/total): {busyTh}/{totalTh} Sessions: {sessions} Total Memory={totalMemory} Free={freeMemory}</attribute>

 

We changed <attribute name="metricsEnabled">true</attribute> to true which turns on Metrics Logging we made no more changes to this section but it is good to note what they do. <attribute name="metricsLogFrequency">60</attribute> sets the frequency of the metrics logging to 60 second intervals, that can be changed, in my experience, 60 seconds is typically a good time interval.

The line below that determines what is displayed; {busyTh}/{totalTh} will show all busy threads and total threads in use.  The busy threads value shows you how many threads are actually in use when this snapshot was taken and the total the actual busy threads along with threads in every other state.  There is a good write up of different thread states here - http://www.bpurcell.org/blog/index.cfm?mode=entry&entry=934.  The total thread number shows all threads in every state, as a very broad rule of thumb, the greater the difference between busy and total the more efficiently the application is running.

Sessions: {sessions} shows the number of Java-J2EE sessions running which is enabled by turning on "Use J2EE Session Variables" in the Memory Variables section of CF Admin.  This will show you how many J2EE sessions are running every 60 seconds.  This is an indication of activity and this number will rise and fall as user sessions begin and end.

Total Memory={totalMemory} Free={freeMemory} is fairly self explanatory and shows the memory state in 60 second intervals, ideally we do not want to see free memory drop too low, I would get uncomfortable if it drops below 75 MB or 10% of the total.  Also, we want to see any used memory being released as quickly as possible.

We next edited the way logs are generated and created a split out of the logs by adding {log.level} to line 155 

<attribute name="filename">{jrun.rootdir}/logs/{jrun.server.name}-{log.level}-event.log</attribute> 

This last change enables more targeted logs which will be written to {JRun4_root}\logs.

 I recommend making these changes on all Production CFMX 6.1 systems and leaving them on, the information produced can be invaluable for troubleshooting.

 

2/ {JRun4_root}\bin\jvm.config 

This files is the main configuration file for the JVM and is loaded at start time.  After observing application behavior metrics from the logging changes above we applied the following changes to jvm.config...

-Xms768m -Xmx768m - This sets the start and maximum memory allotted to the heap at 768MB.  I typically set this number as high as is needed by the metrics logging and verbose garbage collection information.  In practical terms, on 32 bit Windows, it cannot be set any higher than 1.4GB. 

-XX:PermSize=128m -XX:MaxPermSize=128m - Here we added a start size for the permanent generation which I where the classes for JRun-CF are stored.  We did this because we observed all of this memory space being fully consumed many times, in the verbose garbage collection log. 

-Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 - These arguments generate an explicit Full Garbage Collection (Full GC) at 10 minute intervals.  We applied these because we observed Full GC's taking place at one minute intervals and in some cases at 30 second intervals.  This is far too often as when there is a Full GC, everything else on the server stops.

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:teratechgc.log - This argument set turns on logging of all garbage collections in a verbose manner and this is where we got our information for the memory settings along with the metrics information.  We turned this logging off after making the changes and checking that all was OK.

ColdFusion Cluster Server Tuning training and consulting

TeraTech is helping a large organization in the central US learn how to tune their cluster of ColdFusion servers. We are flying to their location to analyze their current server issues so that the training will be customized to their exact set up and issues.

On clustering we will examine CF8 clustering options - pros and cons, how to set up and administer the cluster, deploying CF/code base options, Flex in a clustered environment and understanding how to integrate source control (such as Subversion) into the environment. We will also look at typical "gotchas" in a CF cluster, maintenance concerns and planning for the future (scaling).

Then we will look at Performance Tuning, including JVM optimization, CF Admin settings, best practices, performance monitoring and determining hardware needs for the projected load.

But what good is it to cluster your servers until you can show they will work well under future load? So we will explain how to do Load Testing, look at best-of-breed load testing tools, best practices for load testing and how to understand the results of a test.

Finally we will help them come up with a plan for migrating to the new cluster from their existing CF 5 servers. A lot of work, but with a mission critical enterprise application it is better to be save rather than sorry!

A Golden Resource For ColdFusion Troubleshooting

I recently talked with Mike Brunt, CF Server tuning guru for TeraTech about how he approaches slow and crashing server issues. Mike will be talking in more detail on this topic at the webinar on server tuning on Thursday 1/31/08 1pm EST. Here is what he said.

In our last blog posting I likened some time I spent in TeleRadiology to diagnosing web application problems, or more to the point, my observations as to how Radiologists work when diagnosing problems with patients. Obviously, Radiologists have medical images to use in their diagnoses, in the case of application or server problem diagnosis we do not have that sort of information to study. As a note point, SeeFusion and FusionReactor did usher in the era of great troubleshooting tools for ColdFusion-JRun. This was obviously followed by Adobe with ColdFusion 8 and the server monitoring tools; so then with these tools at our disposal is that really enough to help finally stem the flow of poorly performing applications from getting into Production?

In the same way that seeing an image with a tumor will not cure the tumor, looking at data from server monitoring will not cure performance or stability issues. Also, what if we have a pre-CF8 application in trouble, we certainly do not have CF Server Diagnostics to help us; if it is pre CFMX we have no third party tools to give us the sort of diagnoses that SeeFusion or FusionReactor provides.There is a savior for such situations which has been with us since the early days of ColdFusion; the ColdFusion logs.

Until SeeFusion was launched, the CF logs were our main source of information for diagnosing and fixing ColdFusion application issues. They are still an invaluable tool for me in all of the work that I do in finding a fixing problems and are particularly valuable to me before I go on site or commence any project. Here are what I found to be the most useful logs for my work.

CFMX-CF8 - Since the launch of CFMX 6.1 there have been two sets of logs, the "traditional" ColdFusion logs and logs relating to the J2EE status of JRun and ColdFusion we should all now and love live in one of two places, depending on the install.

Traditional ColdFusion Logs - Location
COLDFUSION STANDARD-SERVER {drive-volume}\CFusionMX7\logs\
COLDFUSION ENTERPRISE MULTI-INSTANCE {drive-volume}\JRun4\servers\{instancename}\cfusion.ear\cfusion.war\WEB-INF\cfusion\logs\

In these logs I find the application.log and the server.log with slow running request logging to the server log. This can be set in CF Admin. As a note point, the application.log for a well written/running CF Application in Production should be very small.

J2EE Logs - Location
COLDFUSION STANDARD-SERVER {drive-volume}\CFusionMX7\runtime\logs\
COLDFUSION ENTERPRISE MULTI-INSTANCE {drive-volume}\JRun4\logs\ (one set of logs for each server instance).

TeraTech Webinar: Tune Your Problem Servers for Better Performance 1/31/08

TeraTech's CF Server guru Mike Brunt will be presenting a webinar on ColdFusion server performance tuning at 1pm EST on Thursday 1/31/08. The webinar will cover fixing slow servers, performance bottlenecks location and diagnosis tips.

Mike is planning to show how to parse logs, interpret metrics, how the JVM works, testing JVM configurations and using SeeFusion to troubleshoot and isolate stability issues

The webinar is Thursday January 31, 2008 at 1 pm Eastern (that is 12 pm Central, 11 am Mountain, or 10 am Pacific). It will be approximately 45 minutes including time for Q and A. Mike is a former Allaire and Macromedia consultant. I got a free registration code for readers of this blog which is ST102. The webinar is free except for the cost of a long distance phone call for the audio. You can register at https://www1.gotomeeting.com/register/451098722. Hope to see you there!

Radiology and Diagnosing CF Server Issues

I recently talked with Mike Brunt, CF Server tuning guru for TeraTech about how he approaches server issues. Heads up we are planning a webinar on server tuning on Thursday 1/31/08 1pm EST - more details coming soon.

For three years, from 1996 to 1999, I worked in the medical software world, in what is known as TeleRadiology, to be precise. This involved creating secured Wide Area Networks which spanned hospital groups and typically involved a central reading center where a group of Radiologists would read images which had been transmitted from each individual hospital. Radiologists would then send results back to the hospital which had sent the images to be read. As a point of interest, there are actually two main types of Radiology Images. The first are currently analog and are printed to physical film (x-rays are typical of that sort of Radiology Image) which then have to be scanned or digitized to high resolution images; a typical chest x-ray is around 12-14MB per image. The second overall kind of Radiology Image is already digital, MRI, CT, PET, Nuclear etc. I digressed a little there but thought that worthy of a little more explanation.

The Radiologists performed two main kinds of reads. The first was from the Emergency Room, where a quick response was imperative and where the Radiologist would be looking in the area of a known trauma or affliction, to deliver a targeted response to the ER Doctors. Very often, those were literally life and death situations. Afterwards, typically the following day, the Radiologist would go back to the image(s) and look for any other issues the patient might have. The other main place that Radiology holds in medicine is that of a major preventative discipline, often discovering ailments before they get too serious.

These principles could apply directly to server diagnostics; sadly, almost all server diagnostic issues stop in the "ER". When I worked for Allaire and then Macromedia, we had a team of 37 engineers who were often called in when an application was in ER, regularly failing or running very slowly. It was interesting to note that the reaction of most client companies, when faced with a poorly performing application, was to "throw" more hardware at the situation. This literally never worked because typically the "disease" was in the application code or in the overall support mechanisms-network, etc. Adding hardware in this situation is a bit like trying to give someone a second heart whilst the first diseased heart is still there; it won't work and the result will likely be terminal!

So in our training with Allaire, we were taught to go in and prevent "death" as quickly as we could, and we did, and then to perform that second read, just like the Radiologist, so that we could give clients a clear path to server health. In addition, we would always hope to leave the client with the ability to self-diagnose issues so as to prevent applications from being "life-threatening" in the future. Server diagnostics are an essential tool in ensuring ongoing application health if, as happens with applications such as MySpace, mercurial growth occurs or if there is simply a need to support structured growth. Applied correctly, server diagnostics will always create stability with scalability.

BlogCFC was created by Raymond Camden. This blog is running version 5.9.002. Contact Blog Owner
Home | About Us | Software Development | Server Optimization | Client Portfolio | Training | Contact Us

Copyright © TeraTech Inc 2007. All rights Reserved.
TeraTech Inc 405 E Gude Dr Suite 207, Rockville MD 20850 | MAP Map | Tel.: +1 (301) 424 3903 | Fax: +1 (301) 762 8185 | Contact Us