Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Real Time = Real Problem
Real Time = Real Problem

Most Web-based applications operate in real time. Add an article to a database and it shows up immediately on content pages. Update a user address and the new contact information is available immediately. Add or remove an employee and the phone directory is correct when next viewed. Real-time data in a real-time world. That's a good thing, isn't it?

Real Time Is Real Expensive
In an ideal world, real-time everything would indeed be a good thing. But we don't live in an ideal world. As appealing as always being up-to-date is, real time comes with a real cost:

  • Performance: To put it quite simply, nothing eats up performance as much as real-time processing. The more dynamic your application, the worse it will perform. After all, static pages don't suffer from performance problems, ever.
  • Scalability: An extension of the above, the more real-time processing your application performs, the less it will scale. If you make your applications work harder they'll just be able to do less concurrently, there's no way around that one.

    If you were to analyze all the timings and debug output from your ColdFusion applications, you'd undoubtedly find that more time is spent processing <CFQUERY> tags than anything else (possibly even everything else combined). In other words, eliminate all your <CFQUERY> tags and your application will fly. And while avoiding databases is not at all practical (or even advisable), understanding the price of real-time data interaction is important.

    Of course, database access and <CFQUERY> are not the only culprits; you likely use Web services and CFX tags and calls to COM and Java and more, and all of those can impact performance too. But databases are a good place to start because they're so prevalent and because making changes (where appropriate) is actually not that difficult.

    Is Real Time Really Necessary?
    Obviously, some applications (or parts thereof) must be real time. Could you imagine eBay auctions if bids were updated only periodically? How do you think users would react to Amazon.com listing products as in stock only to send out an "oops" e-mail after checkout? What would users do if they changed their AOL or Yahoo or MSN passwords only to find that the password change took some unknown length of time to take effect? All of these sites utilize real-time processing to ensure the best customer experience - some operations simply must occur in real time.

    And that's key - some operations, but not all. If you were to place an online classified listing on any of the major classified sites (including Yahoo) you'd find that your ad did not appear instantaneously. Rather, listings are updated with current information at regular intervals. Similarly, online user directories obtain new data on a regular basis, but online listings sometimes take months to be updated.

    So why are some operations performed in real time and others not? Simply because there is a tradeoff to be made - the more data is real time, the greater the hit on performance and scalability. As such, developers of applications have to choose between the two, and for many, the choice is to not perform real-time processing unless it is absolutely required.

    But most ColdFusion developers don't make the choice at all. ColdFusion makes implementing real-time processing so easy (much easier than implementing anything non-real time) that they go the real-time route by default. That's a real problem.

    Reducing Database Reads
    I'm not going to be able to cover every real-time scenario in this column, but I would like to point out some ideas you should think about as a starting point. The simplest (and remarkably effective) change you can make to your application involves the reading of data from database tables. <CFQUERY> is a very powerful tag; it lets you access all sorts of databases easily, maybe too easily. And so developers tend to overuse <CFQUERY>, often rereading data that likely has not changed (or has changed with changes that need not be utilized immediately).

    I covered reducing database access via caching extensively in a column entitled "Caching in on Performance" (CFDJ, Vol. 1, issue 2). As explained there:

    Where would you use caching within your applications? Here are some examples:

  • Almost every form that prompts for an address displays a list of states. Those states should never be hard coded (even though there's no 51st state scheduled to join the U.S. at this time); instead, state lists should be populated by a query against a states table. But as that states list doesn't change often (it's 40 years since Hawaii came on board), reading it from the database every time it's needed is a waste of database resources. The states list is thus a primary candidate for caching.
  • Employee lists are another good example. While it's true that employee lists can change frequently, it's doubtful that they change so often that they have to be read from the database each time they're needed (if they do, do yourself a favor: find a new employer, and quickly). Caching employee lists for a few hours will reduce database activity, and the only penalty is that personnel changes won't be immediately reflected in your lists.

    Even though frequently retrieved data is likely cached by the database server itself, retrieving the data again is obviously more resource intensive than not requesting it at all. Furthermore, as ColdFusion usually isn't running on the same box as the database server, eliminating unnecessary database requests can also reduce network traffic between the two machines, which in turn further eliminates potential performance bottlenecks.

    ColdFusion provides two different ways to cache database reads:

  • Query-based caching using CACHEDWITHIN.
  • Variable-based caching in which queries are stored in persistent scopes.

    I am not going to explain these here; refer to the previously mentioned column to learn more.

    Reducing Dynamic Processing
    Beyond database access, you likely have entire blocks of your application that are being generated programmatically in real time, but that perhaps need not be. For example, the above mentioned employee list. Not hitting the database unnecessarily is a great first step, but you also loop through the results creating output and embedding formatting. Does that really need to occur on each and every page request?

    ColdFusion provides several options that may be used to reduce dynamic processing, and you may use any or all of them (or roll your own). In order of granularity:

  • <CFSAVECONTENT> can be used to save the results of any processing to a variable, perhaps a variable in a persistent scope. This allows developers to mark blocks of code that are executed after timeouts or on specified intervals. Using <CFSAVECONTENT> it is possible to cache Web services results, file reads, returned parsed XML, and much more.
  • <CFCACHE> can be used to cache entire pages so that the generated page output is saved and served up on future requests until a specified timeout. Using <CFCACHE> an entire page can be returned in what is effectively a <CFFILE ACTION="read"> and a few other CFML statements.
  • It is also possible to save generated CFM pages as static HTML files so that no application server processing need occur at all at runtime.

    Of course, each of these options requires that you give something up; if you serve cached content you are serving old (not real-time) content. But depending on what your app is, that may be entirely acceptable. And if so, all you have to lose are performance and scalability problems.

    Using Delayed or Batch Processing
    One of the most important (and least trivial) concepts in the real-time discussion is the use of delayed or batch processing. The best way to understand the idea is via examples. So:

  • You need to import data from a text file into your database. You know that reading the file and then inserting (or updating) each row using a <CFQUERY> within a <CFLOOP> is slow and highly error prone. So you use a batch upload utility (like SQL Server's bcp) to dump all the data into a new empty table. A scheduled event on the database server checks for data in this table, and if any is present, fires off a stored procedure that reads each row, validates it, breaks it up into the appropriate relational components, and then performs the database INSERT or UPDATE operations as needed.
  • You want contact information in your database tables to be clean and consistent - all states are two-letter abbreviations in upper case (with no trailing period), all names are first letter capped, titles like Mr. and Mrs. must have a trailing period, phone numbers must be formatted in a specific way, leading and trailing spaces on all fields must be removed, and so on. Initially you did all that cleanup in CFML before the SQL INSERT, but then you realized that in doing so you were not only hurting performance, you also were not cleaning up any data that did not originate in a ColdFusion application. And so you change the app so that data is written as is, and whenever an INSERT or UPDATE occurs, you set a row level flag named DIRTY to true. You then create a database scheduled event that runs once a day and performs all the cleanup for any rows where DIRTY is true, and then upon completion sets DIRTY to false to flag rows as clean.
  • Your e-commerce site allows users to pay by credit card. You know that most credit card transactions fail because of poor data entry (bad number or expiration date, for example) and so you do basic error checking to ensure that the credit card number is valid post data entry. But you do not actually submit the credit card information for approval while the user waits. Rather, using the assumption that most credit card transactions do not fail (especially from known repeat customers) you thank the customer for the order and place the credit card transaction in a queue. You'll notify the customer via e-mail only if there is a problem. This ensures that the application is responsive. It can also prevent double billing (which could occur if a customer were to submit a form twice), and creates a better user experience. (FYI, you may be surprised to know that some of the largest e-commerce sites on the Net do just this.)

    In all of these examples, some processing is postponed and/or batched. The result? Not only are the applications faster and more scalable, but the developers also have greater control over exactly what operations occur and when.

    Conclusion
    Real time has become the norm by default, not by necessity. And real time causes real problems. Database caching, dynamic output caching, and delayed or batch processing are all concepts that can (and should) be leveraged so as to improve application performance and scalability. The truth is, there is no right or wrong here - everything is a tradeoff. Not all options will always be usable (you'd not want to use delayed batched credit card processing, for example, if you are selling access to a paid Web site). As a developer you get to make the real-time versus non-real-time choice. The important thing is that you actually make the choice. And I think you'll find that most parts of most applications actually need not function in real time at all.

    About Ben Forta
    Ben Forta is Adobe's Senior Technical Evangelist. In that capacity he spends a considerable amount of time talking and writing about Adobe products (with an emphasis on ColdFusion and Flex), and providing feedback to help shape the future direction of the products. By the way, if you are not yet a ColdFusion user, you should be. It is an incredible product, and is truly deserving of all the praise it has been receiving. In a prior life he was a ColdFusion customer (he wrote one of the first large high visibility web sites using the product) and was so impressed he ended up working for the company that created it (Allaire). Ben is also the author of books on ColdFusion, SQL, Windows 2000, JSP, WAP, Regular Expressions, and more. Before joining Adobe (well, Allaire actually, and then Macromedia and Allaire merged, and then Adobe bought Macromedia) he helped found a company called Car.com which provides automotive services (buy a car, sell a car, etc) over the Web. Car.com (including Stoneage) is one of the largest automotive web sites out there, was written entirely in ColdFusion, and is now owned by Auto-By-Tel.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Great article, but small site owners remember that you probably don't need to worry too much about this, but keep it in mind if you ever want to grow.

    I just switched a site over to a full CFML backend that write static .html files to disk only when the admin tells it too. There is Tons of data in databases, but it only changes now and then. a few times a week at most. A full site rebuild only takes about 3 seconds when the .html files are created.

    Harry, Ross,

    I've been programming for over 25 years now and this is the first time I have ever this definition of "real-time" -- "Real time refers to an environment in which some bit of code can be executed within a very precise time frame. Due to net latency alone, no web application is real time as the execution time cannot be guaranteed within *any* boundaries (other than 0 to infinity :)."

    Maybe I'm straining at gnats here but I always considered our payment processing solution "real time" and some of the front end communication takes place over the Internet and some of the back end communications takes place over other nets -- does this automatically disqualify it as real time? Or does the fact that our front end "over the web" communication modules which compensate for timeouts and protocol failures much more quickly than the standard HTTP protocol bring us back into the real time environment?

    I?m VERY confused by the above stated definition. Here?s my quandary: Can "very precise time frame" be defined in such a way that no application known to man can achieve "real time" status or can the "guaranteed within any boundaries" be overcome by simply defining a fixed timeout or can no piece of code which must communicate to any other piece of code (including a database server) be real time?

    I agree 100%, Harry. I think that most misusers of the term "real-time" _really_ mean "synchronous," i.e. "I cannot successfully complete my task until this process I'm dependent upon finishes."

    I am likewise bothered by the use of "near-time"--though in this case it's just that the word "near" is much more ambiguous than it appears at first blush. Too bad "asynchronous" is apparently too difficult to use. (I once heard a Java developer refer to that term as "too mainframey!")

    i wish folks would stop mis-using the term "real time". Real time refers to an environment in which some bit of code can be executed within a very precise time frame. Due to net latency alone, no web application is real time as the execution time cannot be guaranteed within *any* boundaries (other than 0 to infinity :). A web app may seem very very fast, but it's still not real time.

    This may seem like a trivial point, but misusing terms like this one defeats the original purpose/definition of the term :)

    - Do it now. If you do the task now, the user you're dealing with has an expectation that there might be a short pause while you perform the task requested or triggered by the user. The longer you put off a task, the greater the resource hit; a long-delayed task will result in a drop in performance noted by all users at that moment, a drop that has nothing to do with them. And, it will affect them longer than it would have affected the single user requesting a task to be performed. In a contest between scanning a large database for unfinished tasks or performing a task (typically transaction/message file creation and a database update) while the appropriate keys are known, doing it now will always win.
    - If you put off a task, you have to build a scheduling mechanism to see if it needs to be performed. You need to create a "user" or "users" whose job it is to execute such tasks. You need to build a tracking mechanism to be sure the task gets done. If the scheduler kicks off the activity on the back end, now you need to have the back end and front end cooperate in locking records.
    - Since my sister-in-law pointed out the chaos I created by letting clean laundry sit in piles for days instead of putting it away in a real-time fashion, I do it real-time now (well, most of the time). You've heard the old saw that if you always tell the truth, you don't need as good a memory. Well, if you take care of tasks as they come up, you don't have to build an elaborate infrastructure to help you get around to them eventually, and you don't have to worry about the resource hit associated with a batched process interrupting users who aren't being helped by that batched process. =Marty=


    Your Feedback
    Kevin wrote: Great article, but small site owners remember that you probably don't need to worry too much about this, but keep it in mind if you ever want to grow. I just switched a site over to a full CFML backend that write static .html files to disk only when the admin tells it too. There is Tons of data in databases, but it only changes now and then. a few times a week at most. A full site rebuild only takes about 3 seconds when the .html files are created.
    Steve Sommers wrote: Harry, Ross, I've been programming for over 25 years now and this is the first time I have ever this definition of "real-time" -- "Real time refers to an environment in which some bit of code can be executed within a very precise time frame. Due to net latency alone, no web application is real time as the execution time cannot be guaranteed within *any* boundaries (other than 0 to infinity :)." Maybe I'm straining at gnats here but I always considered our payment processing solution "real time" and some of the front end communication takes place over the Internet and some of the back end communications takes place over other nets -- does this automatically disqualify it as real time? Or does the fact that our front end "over the web" communication modules which compensate for timeouts and protocol failures much more quickly than the standard HTTP protocol bring us back into the rea...
    Ross Fortini wrote: I agree 100%, Harry. I think that most misusers of the term "real-time" _really_ mean "synchronous," i.e. "I cannot successfully complete my task until this process I'm dependent upon finishes." I am likewise bothered by the use of "near-time"--though in this case it's just that the word "near" is much more ambiguous than it appears at first blush. Too bad "asynchronous" is apparently too difficult to use. (I once heard a Java developer refer to that term as "too mainframey!")
    Harry Slaughter wrote: i wish folks would stop mis-using the term "real time". Real time refers to an environment in which some bit of code can be executed within a very precise time frame. Due to net latency alone, no web application is real time as the execution time cannot be guaranteed within *any* boundaries (other than 0 to infinity :). A web app may seem very very fast, but it's still not real time. This may seem like a trivial point, but misusing terms like this one defeats the original purpose/definition of the term :)
    Martin Ladner wrote: - Do it now. If you do the task now, the user you're dealing with has an expectation that there might be a short pause while you perform the task requested or triggered by the user. The longer you put off a task, the greater the resource hit; a long-delayed task will result in a drop in performance noted by all users at that moment, a drop that has nothing to do with them. And, it will affect them longer than it would have affected the single user requesting a task to be performed. In a contest between scanning a large database for unfinished tasks or performing a task (typically transaction/message file creation and a database update) while the appropriate keys are known, doing it now will always win. - If you put off a task, you have to build a scheduling mechanism to see if it needs to be performed. You need to create a "user" or "users" whose job it is to execute such tasks. Y...
    Latest Cloud Developer Stories
    In a surprise move Tuesday Oracle wheeled out its Big Data Appliance. That’s the one it said in October would be ready sometime in the first half. Only nobody believed it meant early in the first half. Heck, it’s not even clear anybody thought Oracle could make the first half...
    Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
    Wyse Technology, the global leader in cloud client computing, on Thursday announced it's working with Microsoft to market school IT labs and one-to-one computing solutions that allow a cost effective delivery of innovative IT enabled education. These solutions are available throu...
    With Cloud Expo 2012 New York (10th Cloud Expo) now under four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have techn...
    Nimble, the social CRM platform has announced the launch of Nimble 2.0, billed as the “most social” CRM platform on the market today. Nimble was designed entirely with social CRM in mind and is the first social business platform that empowers companies with the ability to get clo...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers
    ADS BY GOOGLE