Thursday, September 21, 2006

Scorpion Queen !

When Tim asked for spiffy Scorpio logo, a few CF engineers (Sanjeev, Chandan, Jayesh, Sandeep and Vamsee. special mention - Hemant & Praveen.) got together to create her. Of course they didnt use Photoshop. We are very much used to Whiteboard :)



Did you notice the cool scorpio tatoo she is wearing. And No, she would not execute <CFLAPDANCE /> !!!

Wednesday, September 20, 2006

Connecting to URL from behind a proxy server

I needed a way to set the proxy information on URL/URLConnection and I could not find any good way. One simple way that java recommends is to set the information as system property. These properties are "http.proxyHost" and "http.proxyPort".

So it can either be set as jvm arguments like

-DproxySet=true -Dhttp.proxyHost=proxyIP -DproxyPort=port

or set them in the code using System.setProperty()

However since this is a system property, it gets set on the VM itself and hence it is not dynamic. In ColdFusion, since tags like 'cfhttp' keep them dynamic, I wanted a similar behaviour. After looking around for a while, I noticed that this capability was added in Java 1.5 aka Tiger release. (Is it only me? I keep hitting things which I feel is lacking in java API and then I find them added in 1.5 :) )

In Java 1.5, you can use

URLConnection conn = url.openConnection(proxy); // added in 1.5

where proxy is an object of java.net.Proxy. Pretty neat.


However this was not a solution for me as we are still developing on JDK1.4 (need to consider all the application servers that we have to support). I stumbled upon an interesting article by Daniel Horn who faced the exact same problem. And guess what he did? Since you send an HTTP request to the proxy and then proxy sends out the actual request, he created the URL object by passing proxyHost and proxyPort as IP and port and then he gave the target url string as 'file' argument. This is what he did.


String actualUrl = "http://www.adobe.com";
URL url = new URL("http", proxyHost, proxyPort, actualUrl);
URLConnection conn = url.openConnection();
..
..

And it works ! Brilliant !! I wonder why this is not documented in the java API.

Tuesday, September 19, 2006

A workaround for cfdocument missing images

This is with reference to the post Missing images in CFDocument. There are some cases when the images are locally on the machine running ColdFusion but even then cfdocument is not able to show the images. The reasons could be
1) ColdFusion is behind a firewall because of which it is not able to send any HTTP request (even though to itself).
2) The images are under a protected directory which needs authentication. Since cfdocument can not send authentication information currently, it is not able to fetch the image.
3) ColdFusion is using HTTPS and it is not configured properly to trust itself. So cfdocument can not send a https request to itself.
4) Any other reason which is preventing CFDocument from sending request to the local server.

If the images are on local machine, it is possible to use the file url for images (or CSS,javascripts, etc). CFDocument in that case will not send requests for the images over HTTP and fetch the image directly from the file system. Here is a simple way to use the file url.


<cfdocument format="pdf">
<cfoutput>
Some html content
<br>
<img src=#localUrl("img1.gif")#><br>
<img src=#localUrl("images/img.jpg")#>
</cfoutput>
</cfdocument>

<cffunction name="localUrl" >
<cfargument name="file" />
<cfset var fpath = ExpandPath(file)>
<cfset var f="">
<cfset f = createObject("java", "java.io.File")>
<cfset f.init(fpath)>
<cfreturn f.toUrl().toString()>
</cffunction>




basically here I have an UDF which converts any path to local URL and then I am using that UDF in 'src' attribute of image. This can be used to fetch images, css or any other similar contents from the local machine. You should note the <cfoutput> right under cfdocumet tag that allows the evaluation of UDF before it goes to cfdocument body.. This workaround is applicable only when the these contents are present on the same machine as ColdFusion.

This workaround has another advantage too. Normally when CFDocument body has any images, it fetches those images by sending HTTP request to the local server which is served by web threads. This has its own overhead. In a way, CFDocument uses server resource for getting something which is available locally on the server. This resource can instead be used to serve actual client http requests. Converting the image path to local urls will not go through HTTP and thus should have a better performance.

Friday, September 15, 2006

Update to CFThread POC tags

Damon posted an update to the CFThread proof of concept tag that was published some time back. I wanted to do it for a long time but was busy in other Scorpio features and had to keep delaying this. Neverthless better late than never :) This update includes "thread safety" while retaining the old syntax in the original post. Thanks to Dan Switzer, Derek, Mike and all others who provided the valuable feedback on it !

Whats there in the update
  • Thread very much acts like a function and some time little more than that.
  • Any attribute can be passed to cfthread. These attributes can be accessed using 'attributes.<varname>'. These attributes are passed by value and hence they are completely thread safe.
  • Any variable unless defined with a scope prefix goes in thread local scope. So this is slightly different from function. Like function, variable defined in var scope will also go in thread local scope. So in following snippet, x, y, z and a all will be in thread local scope. However since b is directly used inside thread without defining it, it will use it from the page scope.


<cfset x = 10>
<cfset b = 20>
<cfthread name="t1">
<cfset var y = 10>
<cfset x = 20>
<cfset z = y*x>
...
<cfset a = z*Variables.x>
<cfset a = a/b>
</cfthread>


  • All other scope variables will be accessible using the appropriate prefix like "Variables", "request", "Server" etc.
  • Threads have an another scope called thread scope in which only the owner thread can write but all other can read. This scope can be accessed using 'thread' prefix by owner thread or using the thread name by other thread or main page thread. This scope will be useful when the owner thread wants to put some data in it which needs to accessed by other thread. The same thing could have been achieved by all the threads writing to the page scope in its own variable but that needs some discipline from developers and is error prone. Having a separate thread scope which other can read makes it threadsafe and easier for developers.
  • If thread name is dynamic, it will be everyone's question how to access that thread data from another thread or main page. It can be easily done using Variables[threadname].xxx. See the example below.
  • Threads can continue even after the main page is done.

Below is a sample cf code that uses cfthread. (modified version of Dan's example. Thanks Dan !!).

<cfset CRLF = CHR(13) & CHR(10)>
<cfset sDirectory = expandPath(".") & "\tmp" />

<cfsavecontent variable="sContent">XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX </cfsavecontent>
<cfloop index="loopcounter" from="1" to="50">
<cfset threadname = "thread_" & #loopcounter#>
<cfthread name="#threadname#" filename="file#loopcounter#.txt" counter="#loopcounter#">
<cfset sOutput = "Content written by thread " & #attributes.counter#>
<cfloop index="i" from="1" to="100">
<cfset sOutput = sOutput & sContent>
</cfloop>
<cfset sOutput>
<cfset dest="#sDirectory#\tmp_#attributes.filename#">
<cffile action="write" file="#dest#" output="#sOutput#" />
<cfset thread.msg="Message from thread "&#attributes.counter#>
</cfthread>

</cfloop>
<cfloop index="loopcounter" from="1" to="50">
<cfset threadname = "thread_" & #loopcounter#>
<cfjoin thread="#threadname#"/>
<cfoutput><br>#threadname# output : #Variables[threadname].msg#</cfoutput>

</cfloop>
<br> Work Complete!<br>


Feel free to play around with this tag and send feedbacks if you have any. Once again, as Damon said, this is not a CF feature and hence it is unsupported. On whether it will make it to the Scorpio or not, I can not guarantee anything but stay tuned ;)
Enjoy !

Tuesday, September 12, 2006

Handling J2EE session with cookies disabled

Someone recently reported that when cookies are disabled and J2EE session is enabled, his sessions are not maintained in case of POST request. As per that, CF or rather the app server always creates a new session everytime. His code looked like


<form method="post" action="test.cfm?#session.urltoken#">
...
<input type="submit" value="Submit" >
</form>


can you see whats wrong with above code?

As per the Servlet spec of J2EE, when cookies are disabled, session is maintained by url rewriting and that is done by appending ';jsessionid=' to the URI. Note the semicolon ';' before 'jsessionid'.

In the above code, it is appending session.urltoken which looks like 'CFID=1600&CFTOKEN=59663989&jsessionid=2830a9edcf6f794ff481'. Therefore the url becomes "test.cfm?CFID=1600&CFTOKEN=59663989&jsessionid=2830a9edcf6f794ff481" whereas it should been like "test.cfm;jsessionid=2830a9edcf6f794ff481?CFID=1600&CFTOKEN=59663989". Since jsessionId is not correctly specified, server does not get this and hence creates a new session.

So how do you handle it? One way is to get the sessionId and urltoken from the session and create the url as expected (which is some effort on developer part). Alternatively, you can use a rather simple approach of using URLSessionFormat(url) which will do the exact thing which is required here. URLSessionFormat() appends the necessary information if cookies are disabled. If they are enabled, it does not do anything. Therefore it might be a better idea to always use this function for any GET or POST url.

The above code should actually have been


<cfset myurl=URLSessionFormat("test.cfm")>
<form method="post" action="#myurl#">
...
<input type="submit" value="Submit" >
</form>

JRun Threadpool settings

The other day I was looking at JRun's Threadpool implementation and it really took me SOME time to understand that piece of code. It is one of those codes which are not meant to be understood by others :D. It got me really confused about 'Active Handler Threads', 'min Handler Threads' and 'max Handler threads'. It was very much different from what I had assumed. Too much of 'creating runnables', 'swapping runnables', 'destroying runnables'.. phew.. I think this is the exact reason why Doug Lea and team had to introduce a standard implementation of ThreadPool in 'Tiger' release. Its so simple, neat and elegant I wonder why wasn't it introduced earlier in JDK.

Anyways, enough of cribbing. After my enlightenment of JRun's or CFMX's threadpool, I thought it would be nice to share it with you all. So what are these thread counts? (I am sure most of you would have it figured out. Neverthless.. :) )

Min handler threads - It is the number of web threads that will be spawned initially and will be waiting for HTTP requests. Which effectively means that it is the no of threads which will be waiting on serversocket.accept(). Thus it controls the no of requests that will be accepted concurrently. It is ideally the minimum concurrent users that you expect on the server. As soon as a thread gets a client request (i.e comes out of serversocket.accept()), it enters into a throttle before processing the request. This is where active handler threads come into the picture. Before the thread starts processing the request, it spawns another thread, if required, which can listen to the incoming requests.

Active handler threads - This decides how many requests would be concurrently processed. The throttle we talked about above, allows a maximum of "active handler" threads to continue and rest of threads wait until another thread exits the throttle. This along with "min handler thread" controls the throughput of the server. The value of active handler threads must be between min Handler count and maxHandler Count.

Max handler threads - This is maximum number of threads that can be created in the pool. This includes the threads queued in the throttle + threads processing the requests + threads waiting on the server socket. Once the server reaches the max Handler thread count, server will start denying the request throwing "Server Busy Error". You can see the "Server Busy" error even without server reaching the "max handler" limit if the thread in the throttle queue timeout. So if you see this error, dont start increasing the max handler thread count. You might need to tune all the three counts.

Having both MinHandler and activeHandler counts help JRun in addressing any sudden spike in the load. Lets say your minHandler count is 20 and activeHandler count is 40 and suddenly you have 40 concurrent requests, all of them will be served without any queuing and delay. When the load eases down on the server, it will let the extra threads die and bring the thread count down to minhandler count i.e 20.

The next question that naturally comes to the mind is what should be the appropriate values of these for my server? Having too less value for it would mean that you are not utilizing the potential of the server well and requests start queuing up even though server can handle it. Having a too high value for these would mean too many context switches and the server performance will deteriorate. (Too many context switches means CPU is busy scheduling the threads rather than executing them and thus hurts the performance) So what should be the appropriate value? well.. there can not be one answer or a formula to compute these values. It depends on your application, the traffic that your application expects, memory and processors of the machine on which you are going to run it etc etc.

By default the values in "ColdFusion standalone" are

- Min Handler Thread - 1
- Active Handler Thread - 8
- max Handler thread - 1000

min and active counts here are fine for a development machine but definitely not for a production machine. And in my opinion the value of 'max handler' is bit high even for a production machine. Creating a large no of threads does not necessarily increases the throughput of your server. It can actually lower it down because of the high no of context switches VM will have to make. Moreover, it might not be possible to create 1000's of threads because of OS limitations. On many of the OS, you will get an OutOfMemory Error because the VM will not be able to spawn so many native threads for you. I think max handler count in the range of 300-400 should be good enough.

Regarding tuning these counts, there are huge no of articles around which will tell you how to go about it. Since notion of these counts exist on all the application/web servers, articles need not be CF or JRun specific. In brief, you would need to run some kind of load tests with different values of minHandler and active handler counts, note the throughput and plot a graph. This graph should help you arrive at the appropriate value for these settings.