Thursday, July 19, 2007

ListToArray in ColdFusion 8

There has been lots and lots of discussion in CF blogs and forums about ListToArray not supporting empty elements. Seeing which I had blogged about a simple way which could do the same. Thanks to Charlie Griefer for pointing out the problem in it and then thanks to Ben Nadel and Andrew Clark for pointing me in the right direction.

Though it got pretty late, I was able to sneak-in this change in ColdFusion 8. ListToArray() now takes an additional optional argument "includeEmptyElements", which if 'true' will include the empty elements of list into the array. Default is of course 'false'. It also takes care of empty elements at the end of list and multiple delimiters. Here is how the function looks

ListToArray(list, delimiter, includeEmptyElements) returns Array

Lets take a look at couple of examples to see it working

<cfset list = "a,b,,c, ,d,,">
<cfset arr = ListToArray(list, ',', true)>
<cfdump var="#arr#">
Here is how the output looks.

Here is another example.

<cfset list = "one,/$/,six">
<cfset arr = listToArray(list, ",$/",true)>
<cfdump var="#arr#">

The output for which looks like this

Though we wanted to, there was just not enough time to make similar change in all the list functions for CF 8. Something for CF 9 :-)

Enhancements to CFDocument in ColdFusion 8

As you all know, CFDocument tag is used to easily create pdf or flashpaper documents from HTML/CFML content. ColdFusion 8 has added lot of enhancements to it and in this post we will talk about those enhancements.

1. Bookmark : You can create bookmarks for each section of the pdf using "bookmark" attribute. The bookmark created is of only one level and its name is set to the documentsection's name. Here is a sample code for creating pdf with bookmarks.

<cfdocument format="PDF" bookmark="yes">
<cfdocumentsection name="Introduction">
<p>The introduction goes here.</p>
<cfdocumentsection name="Chapter 1">
<h3>Chapter 1: Getting Started</h3>
<p>Chapter 1 goes here.</p>
<cfdocumentsection name="Chapter 2">
<h3>Chapter 2: Building Applications</h3>
<p>Chapter 2 goes here.</p>
<cfdocumentsection name="Conclusion">
<p>The conclusion goes here.</p>

2. Proxy Support : With ColdFusion 8, you can now provide proxy configuration to cfdocument for retrieving external content like images. This will be useful in situation where the machine hosting ColdFusion is connected to the external world via a proxy. The new attributes added for this (which are self explanatory) are listed below

  • proxyHost
  • proxyPort
  • proxyUser
  • proxyPort

3. Content from URL : Though this was added in 7.0.2, I think it makes sense to add here because it was not there in 7. :-) Have you ever created or wanted to create a pdf from a web page? If yes, then the new attribute "src" in cfdocument and cfdocumentsection tag makes it very easy to do this. Here is an example to do this.

<cfdocument format="pdf" src="" />

4. Basic Authentication : If the CFDocument body contains a resource (e.g; image or URL) that is protected with basic authentication, ColdFusion 7 can not retrieve it and it was one of the reason for getting "red-x" for images. (See my old post on this). ColdFusion 8 addresses this by adding these two attributes to cfdocument and cfdocumentsection tag.

  • authuser - user name to be used for basic authentication.
  • authpassword - password to be used for basic authentication.

5. User Agent : There are some cases where the web server is configured to allow requests only from a certain set of browsers (User agents to be precise) to prevent spiders and bots from overloading the server. In ColdFusion 7, when CFDocument creates a URLConnection for an image, it sends a "User-Agent" header, that looks like "User-Agent:Java/1.4.2_07", in the HTTP request. If the web server does not recognize "Java" user-agent, it returns a status code of 404 (resource not found) and hence the images can not be displayed. ColdFusion 8 adds a new attribute "user-agent" to address this.

  • user-agent : User agent to be used for making http connection. The default value will now be "ColdFusion". If this also does not work, you can use the same string that browsers like IE or firefox use. This attribute has been added to <cfdocument> and <cfdocumentsection> tag.

6. PDF name : When a pdf generated by cfdocument is sent to the browser and you try to save it, the browser will prompt with the cfm name for the pdf which is generally not desirable. With ColdFusion 8, you can provide the appropriate name in "saveAsName" attribute of cfdocument.

  • saveAsName - Name that appears in the 'saveAs' dialog when you try to save the pdf from the browser (Generally File -> saveAs). This name will not work if you use "save" dialog of the pdf plugin.

Here is a sample code for the same.

<cfdocument format="pdf" saveAsName="mypdf">
<p>This is a PDF document.</p>

7. Local URLs : When CFDocument body contains a relative URL, ColdFusion will resolve the relative URL to an absolute URL and will send an HTTP request for this url. A side effect of this is - Server ends up sending HTTP request even for local URL or images that are lying on the local file system which obviously hurts the performance. In ColdFusion 8, we have added a new attribute "localURL" to cfdocument tag which if enabled, will try to resolve the relative URLs as file on the local machine.

  • localURL : "true" | "false" - It should be enabled if the images used in cfdocument body are on the local machine. This would make the cfdocument engine retrieve the images directly from the file system rather then asking the server for it over http.

This attribute helps reducing the load from the server so that the same web server thread can now serve user request instead of serving local images to CFDocument. This also addresses some of the "missing image" problems which I mentioned here. Here is a sample code using this attribute.

<cfdocument format="PDF" localUrl="true">
<td><image src="images/bird.jpg"></td>
<td><image src="images/rose.jpg"></td>

8. Section Page Counts : CFDOCUMENT scope contains two new variables which give you the page counts for document section.

  • TOTALSECTIONPAGECOUNT - total no of pages in the current section
  • CURRENTSECTIONPAGENUMBER - Current page number in the current section.

9. Dynamic header and footer : CFDOCUMENT scope variables can now be used in expressions inside <cfdocumentitem> which makes it possible to have dynamic header and footers. You can now build logic for header/footer content based on the page number. Here is a sample code which prints section title if the page is even and prints the page no otherwise. Below is a code snippet which creates a dynamic header.

    <cfdocumentitem type="header">
<cfif (cfdocument.currentpagenumber mod 2) is 0>
<cfoutput>#cfdocument.currentpagenumber# of #cfdocument.TOTALPAGECOUNT#</cfoutput>

10. We have also fixed most of the CFDocument related bugs e.g text chopping, image cropping or red-x, image scaling, "Document has no pages" etc.

Tuesday, July 17, 2007

New File I/O in ColdFusion 8 - Part II

In my previous post I talked about how new file I/O functions address working with large files. In this post I will talk about other file functions that have been added in ColdFusion 8. Most of these functions are equivalent to the cffile operations and we have retained the same behaviour as cffile. A side effect of that is - relative paths used in these functions are relative to the temp directory. I don't really like that and I believe that it should have been relative to the template. But since these functions were supposed to replicate cffile behaviour, we had to live with it. :-)

Here we go with the list of those new functions

FileRead(filepath, [charset]) - Similar to cffile, this function reads the entire content of a text file and returns the read content. you can also opitonally pass the charset to be used to read the text file.

FileReadBinary(filepath) - This reads the entire content of a binary file and returns the byte array.

FileWrite(filepath, textdata | binarydata, [charset]) - Writes the specified content to the file. The content can be binary as well as text. If the specified content is a text data, you can optionally specify the charset so that the data can be written properly to the file.

FileCopy(source, destination) - As the name suggests, it copies the source file to destination file. Similar to cffile, if the destination is a directory, then source will be copied to that directory otherwise source file will be copied to the destination file.

FileMove(source, destination) - Moves the file from source to destination. Here again, if the destination is a directory, then source is moved under destination directory. Otherwise source is renamed to the destination.

FileDelete(filepath) - Deletes the specified file. The only important thing to note here is that if the file is readOnly, it will not be deleted.

FileSetAttribute(filepath, attribute) - Sets the attributes on file. Applies to Windows. 'attribute' here is a comma-delimited list of attributes to set on the file. Possible attribute values are "readOnly" | "hidden" | "normal".

FileSetAccessMode(filepath, mode) - Sets the file access mode for Unix or Linux systems where the mode is octal values of UNIX chmod command assigned to owner, group, and other, respectively. To give full permission to everyone for a file, the mode should be 777.

GetFileInfo(filepath) - So far till ColdFusion 7, there was no good way to find information like size, last modified date etc about a file. Only way you could do that was to use cfdirectory tag to list the directory, get the query from it, loop over the query until you hit the desired file and then fetch the required metadata. The new function GetFileInfo in ColdFusion 8 provides an easy way to get all the meta-data about a file or directory. This returns a struct which is described below.
  • name - Name of the file/directory specified. This is just the file name and not the absolute path.
  • path - Full path of the file/directory.
  • parent - Full path of the parent directory.
  • type - "directory" if the filepath is a directory else "file".
  • size - size of the file in bytes.
  • lastmodified - DateTime at which this file/directory was last modified.
  • canRead - "true" if this file/directory has 'read' permission. "false" Otherwise.
  • canWrite - "true" if this file/directory has 'write' permission. "false" Otherwise.
  • isHidden - "true" if this file/directory is hidden. "false" Otherwise.

Wednesday, July 11, 2007

Few more details on File handle

In my previous post I talked about file object that you get on FileOpen() which is nothing but handle to the native file. Did you ever try to dump this object? This file object provides lot of valuable information. If I run this code below

<cfset myfile = FileOpen("C:\cfunited_notes.txt")>
<cfdump var="#myfile#">
<cfloop condition="Not FileIsEOF(myfile)">
<cfset line = FileReadLine(myfile)>
<cfset FileClose(myfile)>

this is what gets dumped.

As you can see, it gives you information like lastmodified time, mode in which the file was opened, name, path, size in bytes and status of this file object whether this is still open or closed.

This object acts very much like a struct. So you can access these data from the file object using the simple dot notation. For example, to find out the last modified time, you can use fileObj.lastmodified

That gives another useful tool in your hand. While you are writing a file incrementally, you can easily find out the size of the file written so far using fileObj.size. This will be very helpful if you want to build a logging application where log files are rotated. While you are logging the data, as soon as the file size becomes more than your certain limit, you can close the file object and start writing to a new log file.

Tuesday, July 10, 2007

New File I/O in ColdFusion 8

Till now we have been using <cffile> for all kind of file operations and it does a very good job. If you want to read a file, give the file to this tag and this tag gives you the read content. If you want to write content to a file, you give the content and file name to this tag and it will do that. You want to copy/delete/move your files, this tag will do all of that. All very simple and short. But there are two particular issues which <cffile> does not address.

1. Reading/writing big files - Since <cffile> is a tag, it can only perform one-shot operations. So, to read, it has to read everything in one shot and to write, you have to provide the entire content and that means that <cffile> will have to keep the entire content in memory. It is not of much concern if the file size is just few KBs but as the size increases beyond 100 kb or when it reaches few megs, it can really hurt. It would create a memory crunch on the server and if the load is high and there are many read/write happening simultaneously with large files, it can even lead to OutOfMemory error in server. Apart from creating memory crunch, it will also slow down the server because VM would need to allocate/deallocate larger chunk of memory which would lead to larger and frequent Garbage Collection cycle. At this point, you might ask, why would I ever read or write such a big file? Well I can think of few

  • You need to process the data that comes in a flat file
  • csv parsing
  • Finding the mime type of a file like mp3, image, video etc
  • you want to create a log viewer
  • ... many more
You get the idea.. right?

2. Again since <cffile> is a tag, it is not very easy to use inside cfscript. Either you have to move out of cfscript to use this tag or you wrap this tag in a function and call that function. Though thats true with all the tags but cffile is so commonly used that this looks like a limitation.

New File I/O introduced in ColdFusion 8 addresses both these problems. New File I/O is all based on functions and hence that automatically takes care of problem 2. That means you no longer need to use cffile if you are inside cfscript. I will give more details on handling problem 2 in my next post. In this post I will mainly focus on reading/writing files in chunk using new IO .

The new I/O is based on the same philosophy that is used in other languages i.e;

  1. You first open a file
  2. Perform read/write operations on it
  3. and close the file.
Lets see each of the steps in little detail.

Step 1 : Open a file : Here is the function to open a file

FileOpen(filepath [,mode] [,charset]) -> fileobject
Both mode and charset here and optional. Mode can be "read", "readBinary", "write" or "append"

"read" mode, which is default, is used to read a text file and hence any read operation will give you text data from it. When the file is a text file, you can also optionally specify the charset of the file. So if the file contains UTF-8 or UTF-16 characters (or characters from any other charset), you need to specify it while opening the file.

"readBinary" is used to read a binary file and hence any read operation will give you the binary data i.e byte array.

"write" mode will open the file in write mode which means that if the file already exists, it will be overwritten.

"append" mode, as the name suggests, will open the file in append mode which means that any write operation on that file object will write it at the end of file.

FileOpen function returns you a handle to the native file and you need to use this handle for all further read/write operation. Of course you should keep in mind that you can not perform "read" operation on a file handle that was opened in "write" mode and vice versa.

Step 2
: Do Read/Write operations : Once you get the handle to file object, you can perform multiple read/write operations using this handle. There are several functions to do that.

2A. Read Operation :

i) FileRead(fileobj, no of character/bytes to read) : This provides you a way to read a chunk of data (say 1 kb at a time) from the file at a time. Since you only read a chink of data at a time, it does not create memory crunch on the server. Since this is read operation, file must have been opened in "read" or "readBinary" mode. Depending on which mode the file was opened, this function will return the text or binary data read. One thing to note here - If the data remaining is less than the requested size, this method will return you only the remainign data. i.e if 100 character are remaining in the file being read, and you request for 1000 characters, it will return you 100 characters only.

ii) FileReadLine(fileobject) - This reads one line from the text file. To call this method, the file must have been opened in "read" mode.

Both these read operations can be called multiple times until you reach end of the file. One the end of file has reached, any further read call will result into an "EndOfFile" error. So in order to avoid this error, you should always check whether you have reached the end of file. And the function to do that is

FileIsEOF(fileobj) : Just to be more clear, EOF here stands for "End of File". This function will return true if the end of file has been reached otherwise will return false.

Here are few examples of reading content from file
Read 1 kb binary data at a time.

myfile = FileOpen("c:\temp\song.mp3", "readbinary");
while (! FileIsEOF(myfile)) { // continue the loop if the end of file has not reached
x = FileRead(myfile, 1024); // read 1 kb binary data
...// process this binary data..
Process a text file line by line
myfile = FileOpen("c:\temp\myfile.txt", "read");
while (! FileIsEOF(myfile)) { // continue the loop if the end of file has not reached
x = FileReadLine(myfile); // read a line
...// process this line..
2B. Write operation

i)FileWrite(fileobject, content) - This will add the text or binary content to the file. The file must have been opened in "write" or "append" mode.

ii) FileWriteLine(fileobject, text) - This will add the text followed by a new line character to the file. Here again, the file must have been opened in "write" or "append" mode.

You might wonder that if both the write operations add the content to the file, whats the difference between "write" and "append" mode? The difference is only at the time of opening the file. As I said earlier, opening the file in "write" mode will overwrite the file if already existsed and put the file pointer at the the beginning of file. Whereas opening file in "append" mode will simply put the file pointer at the end of file.

Any subsequent "write" calls, irrespective of "write" or "append" mode, will append the content to the file.

Here is an example of writing content to a file. This reads one line from an input file, does some processing on it, and writes the resultant data to another file.

infile = FileOpen("c:\temp\input.txt", "read");
outfile = FileOpen("C:\temp\result.txt", "write");
while (! FileIsEOF(infile)) { // continue the loop if the end of file has not reached
x = FileReadLine(infile); // read a line
data = processLine(x);
FileWriteLine(outfile, data);

Step 3 : Close the file : Once you are done with read/write operations, you *must* close the handle to file. And the way to do that is using function
What if you don't close the file object? Well, that file will remain locked by the server as long as the file is open, and no other process can modify/rename or delete that file.
You might also ask, why does not ColdFusion automatically take care of closing the file? Why should the developer be bothered about it? Well.. ColdFusion does take care of it when the file object goes out of scope and if it is not kept in any accessible scopes but you can never be certain when exactly this will happen. This might happen immediately or this might happen hours later :-).
Bottomline, you should make it a practice to call FileClose() once you are done with the file object.
Just to show its usage, I will complete the example I used in write.
infile = FileOpen("c:\temp\input.txt", "read");
outfile = FileOpen("C:\temp\result.txt", "write");
while (! FileIsEOF(infile)) { // continue the loop if the end of file has not reached
x = FileReadLine(infile); // read a line
data = processLine(x);
FileWriteLine(outfile, data);

These set of functions would greatly help if you need to work with a file of more than 10 kb size.

Apart from these set of functions, ColdFusion 8 also adds a new language struct to read text files. With ColdFusion 8, you can use <cfloop> to iterate over "lines" or "characters" in a text file. This makes it very easy and convenient to do any kind of text file parsing or processing in your application. Lets take a look at the new syntax of cfloop for reading file (and I really love this syntax :-)).

New attributes in cfloop for reading file :

"file" - path of the file to read
"characters" - no of characters to read in one iteration.

  1. Reading Lines : Below is the simplest syntax to read one line at a time from the file in a loop. This would read the entire file and the loop would end when the file has been completely read. The read content will be available in the index variable specified.

    <cfloop file="c:\temp\myfile.txt" index="line">
    <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->

    With cfloop, you can also iterate over a part of the file by specifying "from" and "to" values.
    Here is an example to loop over lines between 10 and 20.

    <cfloop file="c:\temp\myfile.txt" index="line" from=10 to=20>
    <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->

    "from" and "to" both are optional attributes where "from" defaults to '1' i.e start of file and "to" defaults to the last line of the file.

    So to read first 10 lines from the file, you can use

    <cfloop file="c:\temp\myfile.txt" index="line" to="10">
    <cfoutput>#line#</cfoutput> <!--- or do whatever with the line --->

    One word of caution here - If you use "to" attribute here, its value must be less than the number of lines in the file otherwise you would get an "EndOfFile" error. For example, if I had only 5 lines in my file and value of to is 7, this would throw an error because line 6 and 7 do not exist.

  2. Reading characters : For reading characters instead of line, you need to provide the value for "characters" attribute and as many characters will be read in one iteration. The loop will automatically end when the end of file has reached. The read content will be available in the index variable specified.

    An example for that is

    <cfloop file="c:\temp\myfile.txt" index="chars" characters="1000">
    <cfset x=chars>
    <!--- do whatever with the characters --->
    One important thing to note here. In the last iteration, when the end of file has reached, index variable will only have the remaining characters. For example if I have 130 characters in the file and I run the loop to read a chunk of 20 characters, in the last iteration, index variable's value will only have last 10 characters.

This completes the first part of new File IO which mainly addresses the problem of working with larger files. However this does not mean that you can not or should not use these for smaller files. You can very much use these for all kind of files. These are very simple to use and perform really well. Go ahead and play around with it !

Thursday, July 05, 2007

CFUnited '07 experience

This was my first CFUnited conference and I was amazed with the energy and passion of ColdFusion community. I have been to other non-CF conferences earlier and I can say that I had never seen such passionate and loyal developer community. The interaction that I had with folks there was really awesome and much valuable. The excitement and buzz around ColdFusion 8 was tremendous and all of us in the team are really excited with the response. I am sure ColdFusion 8 will be the best ColdFusion release so far.

All the sessions around CF8 including my session on "CFML Enhancements in ColdFusion 8" were full packed sessions and were very well received. Thanks a lot guys if you attended my session :-). Here are the slides for my session in case you missed it.