A faster web, part III | |
Author : Stephane Rodriguez Creation date : May 8, 2001. Topic : better use of the HTML programming language, part 3 Audience : standard, programmers Keywords : HTML language, W3C, tags, optimization, web cache, web publishing |
On part III of our series 'A faster web', we discuss the pros and cons of using a web technique called server-side includes, and show how this can affect a lot the size of actual web pages. We have demonstrated in parts I and II that bloated HTML pages are reasons for a web being slow to surf these days.
At the beginning, 5 years ago, people were designing simple web pages. The web pages were flat, in the sense that they were made of a combination of words in HTML language written by hand on any notepad-like text editor, all mixed together - without any separation - to get a final layout on screen. It was flat in that in the web page one would have found the page title, the navigation bar, the content itself, links, and other things. Moreover, the navigation bar was more of a concept than an HTML object since there was nothing in the HTML code that would say here begins the navigation bar, then here is the navigation bar content, and here it finishes. No separation, no object encapsulation, no hierarchy.
Because web pages were flat, any change or update was very costly in that you had to find in the mix of HTML code where was the thing to change. Because HTML code is much like unindented C code, it's clear that it has always been a mess to edit HTML content. In other words, HTML is on one hand not a programming language, on the other hand is not designed for code reuse. In first generation web sites, HTML pages were simply served on request. HTML pages were built offline and published in a central repository (in fact a simple folder).
Since the incredible success of HTML content back some years ago, people wanted to reduce the overhead of both being forced to attack direct HTML code, and to write and duplicate the same code over and over again in any web page that would show for instance a navigation bar around articles. Every piece of code had to be replicated on every web page, leading to very high maintenance cost. Programmers behind web servers started to think and design an open system, much like the C language, that would allow to use HTML more or less like any other standard code and by the way start to componentize. Things began to speed up.
In the meantime, HTML matured to version 4.0 thanks to the W3C, but without a single change in the foundation.
So what ? In traditional C code, code written once can be reused thanks to the #include keyword, allowing to include code from another file. This, simply put, allows HTML design. Let's begin with a navigation bar. Design it, code it, and put all HTML code in a separate file. Then, in the main central HTML file, let's use the #include keyword. What happens ? What if you need this navigation bar somewhere else on your site ? Simple, just use #include whenever you need this navigation bar. What about maintenance ? Straight forward too, just edit the separate file. The effects are immediate on all HTML pages that include the separate file, at the expense of a Refresh.
When HTML components are assembled by a web server process, it's called server-side scripting. Server-side scripting encompasses several proprietary languages, such like Microsoft ASP and Allaire Cold Fusion, as well as several techniques such like server-side includes. Server-side includes is now supported by all web servers that support web scripting, and this again has a huge effect on maintenance cost, human resource and ease of code.
For instance, using ASP, you would see something like :
<% @LANGUAGE=VBScript %><%Option ExplicitResponse.Expires = 0%><HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- #include file="/components/navigationbar.asp" --> </TD> <TD width="60%"> <!-- #include file="/today/article.asp" --> </TD> </TR> </TABLE> </BODY></HTML><%...%>
Now what about our topic ? We are dealing with size of web pages. It must be noted that a web server uses a run-time to execute server-side scripting, process all keywords, to build an HTML page sent to you as raw as you would think. All keywords are expanded and flattened so that the final HTML page has only HTML tags inside. Though it's mandatory for us be able to see the actual web page in our web browser, one can wonder whether the HTML is not unnecessarily too flat and too big. After all, if #include keywords were allowed on the client-side, it would be possible to download the navigation bar separately, once for all, and then use the client-side cache to retrieve it on request. Doing so, instead of the code of the navigation bar be flatten and found as many times in HTML pages, the code would remain simple enough and the HTML pages would be much smaller.
It's certainly bad that people behind the HTML standard didn't find useful an include keyword on the client-side. But the thing is, there are several workarounds. As a consequence, it is possible to design HTML pages that use object encapsulation both on the server-side and on the client-side, with the immediate benefit for the end user that HTML pages are much smaller in size. And smaller HTML pages means faster to load and render.
This looks like a real world use case as web sites use the same components in their web pages over over again. By the way, these components provide the look of the website, and act much like identity cards, which is recommended in terms of design.
Thanks to the use of the client-side cache, it's also possible to optimize rendering on screen. We are dealing here with proprietary technologies embedded in web browsers. Any web browser has algorithms and workflows that optimize well or not how HTML components are processed once and not always anytime they are declared. But this is slightly offtopic a discussion.
So how does client-side includes look like ? We are going to discuss two of most used techniques to do it. It is important to note that for a list of possibilities given by HTML (which are most of the time resulting in the use of the standard in a way it was not designed for...) and by web browsers themselves (each web browser has built-in proprietary tags and behaviours worth looking at), it is straight forward to get your hands on the HTML working draft and then check out all tags that have a src attribute. src is short for source, which is a standard attribute to request other web served data within an HTML page. For instance, pictures have a src attribute and by the way their loading can be optimized if cache is used to load a picture once for all.
HTML 4.0 has introduced Javascript, a scripting language allowing to process small data dynamically while loading or while interacting with content. Javascript can be embedded in HTML pages exactly like other tags. But Javascript can be executed from a separate file. Let's take an example, first let's begin with a simple HTML page :
<HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- here begins the navigation bar --> <TABLE border=1> <TR><TD><A href="prevarticle.asp" title="Previous article">Prev</A></TD> <TD><A href="/default.asp" title="Home page">Home page</A></TD> <TD><A href="nextarticle.asp" title="Next article">Next</A></TD> </TR> </TABLE> <!-- here ends the navigation bar --> </TD> <TD width="60%"> <!-- here begins the article --> blah blah blah <!-- here ends the article --> </TD> </TR> </TABLE> </BODY> </HTML>
Let's turn the navigation bar to Javascript code. This is simply done by embedding HTML tags in document.write("") method calls :
<HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- here begins the navigation bar --> <SCRIPT language="Javascript" > document.write ("<TABLE border=1>"); document.write ("<TR><TD><A href='prevarticle.asp' title='Previous article'>Prev</A></TD>"); document.write ("<TD><A href='/default.asp' title='Home page'>Home page</A></TD>"); document.write ("<TD><A href='nextarticle.asp' title='Next article'>Next</A></TD>"); document.write ("</TR>"); document.write ("</TABLE>"); </SCRIPT> <!-- here ends the navigation bar --> </TD> <TD width="60%"> <!-- here begins the article --> blah blah blah <!-- here ends the article --> </TD> </TR> </TABLE> </BODY> </HTML>
And now let's make a separate file of it :
<HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- here begins the navigation bar --> <SCRIPT language="Javascript" src="navigationbar.js"></SCRIPT> <!-- here ends the navigation bar --> </TD> <TD width="60%"> <!-- here begins the article --> blah blah blah <!-- here ends the article --> </TD> </TR> </TABLE> </BODY> </HTML>navigationbar.js :
// Navigation bar // document.write ("<TABLE border=1>"); document.write ("<TR><TD><A href='prevarticle.asp' title='Previous article'>Prev</A></TD>"); document.write ("<TD><A href='/default.asp' title='Home page'>Home page</A></TD>"); document.write ("<TD><A href='nextarticle.asp' title='Next article'>Next</A></TD>"); document.write ("</TR>"); document.write ("</TABLE>");
As simple as it sounds ! The benefits are :
- for the end user : HTML pages get smaller in size as navigation bars and other components get used several times. Smaller means faster to load. As separate resource files are cached, the code for the navigation bar is loaded only once. Less remote load means optimized use of client-side cache which in turn leads to faster rendering on screen.
- for the publisher : no need to know Javascript. All of this can be performed at publish time by black boxes. To be honest, black boxes assume that HTML is componentized. This is a good bargain as nowadays's trafficked web sites did so to reduce maintenance cost. This again can be managed by third party tools.
Let's add some dynamics. What if in our example URLs are related to a specific context ID ? Let's just add a variable to it.
<HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- here begins the navigation bar --> <SCRIPT language="Javascript">id=40</SCRIPT> <SCRIPT language="Javascript" src="navigationbar.js"></SCRIPT> <!-- here ends the navigation bar --> </TD> <TD width="60%"> <!-- here begins the article --> blah blah blah <!-- here ends the article --> </TD> </TR> </TABLE> </BODY> </HTML>navigationbar.js :
// Navigation bar // // param : context ID should be assigned before calling this code // document.write ("<TABLE border=1>"); document.write ("<TR><TD><A href='prevarticle.asp?id="+id+"' title='Previous article'>Prev</A></TD>"); document.write ("<TD><A href='/default.asp' title='Home page'>Home page</A></TD>"); document.write ("<TD><A href='nextarticle.asp?id="+id+"' title='Next article'>Next</A></TD>"); document.write ("</TR>"); document.write ("</TABLE>");
To get further on it, please also note that if(...), for(...) loops, string manipulation and so on are part of the Javascript language, making it adequate for small processing.
Now what if I don't want to use Javascript ? Here we go with the next example.
IFrame is an HTML 4.0 tag allowing to request for new HTML pages within a frame in the context of the same overall web page. IFrames act much like small containers. Surprisingly, each of them can contain an entire HTML web page (with headers like HTML and BODY), not only some HTML tags, say a component.
IFrames can help us to componentize a web page using solely HTML tags, not Javascript. The trick of Iframes is thus seamless.
An example should be clear enough. Let's take again our navigation bar and move the code in a separate file :
<HTML> <BODY> <TABLE border=0> <TR> <TD width="40%"> <!-- here begins the navigation bar --> <IFRAME src="navigationbar.html" frameborder="0" width="100%"> <!-- here ends the navigation bar --> </TD> <TD width="60%"> <!-- here begins the article --> blah blah blah <!-- here ends the article --> </TD> </TR> </TABLE> </BODY> </HTML>navigationbar.html :
<!-- navigation bar --> <TABLE border=1> <TR><TD><A href="prevarticle.asp" title="Previous article">Prev</A></TD> <TD><A href="/default.asp" title="Home page">Home page</A></TD> <TD><A href="nextarticle.asp" title="Next article">Next</A></TD> </TR> </TABLE>
Looks straight forward again. One of the limitations however of Iframes is that you cannot use variables and pass contexts. It is because it's pure HTML tags. Because of that, it's hardly believable that components that live highly on contexts can be factorized this way. This in fact leads to create as many separate files as there are contexts. If a component in a given context is used several times in a web page or through a web site, then there is still a benefit to componentize it, even if it remains used only in his context.
Due to this limitation, Javascript components can be regarded as a preferred solution.
As a conclusion, it has been demonstrated that client-side web browser caching allows to reduce dramatically the size of web pages, making them much thiner than they are these days. It has been shown that using Javascript code or Iframe, two tags coming from HTML 4.0 which is a standard supported by all nowadays browsers, it is possible to componentize parts of web pages in components, even their contexts, thus factorize and allow the cache to play its role. As a consequence, web site designers should not only rely on server-side technologies, and take advantage of the wonderful opportunities of the client-side. One thing remains to be said, on one hand this approach remains complementary to other approachs. On the other hand, we have only been discussing examples taken from standard HTML, though it must be known that recent web browsers have more to offer.
Stephane Rodriguez
May 8th, 2001
To be continued on part IV.