HTML inside or outside?

A classic web application is a program that runs on the server and generates a character string which is valid HTML. It is hard to imagine a programming language that could not generate a string of characters, so any programming language should be able to do this.

Let's look at some examples.

The three languages of the web

But first a quick review. The invention of the web introduced three new languages, shown here in no particular order:

  1. a one-liner language for specifying resource locations: URL (Universal Resource Locator)
  2. a protocol (to run on top of TCP/IP) for request/response: HTTP (HyperText Transfer Protocol)
  3. a markup language for content: HTML (HyperText Markup Language)
The person viewing the web either enters a URL into the location bar of her browser or clicks on a link in a document she is already viewing in the browser (and the link specifies a URL (which is not visible except that some browsers show it in the bottom left corner when she hovers over it)).

The browser reacts to the URL by requesting (using the HTTP protocol*) the content of that resource, requesting it from the web server (and port) specified as the initial portion of the URL.

The web server responds (using the HTTP protocol) to that request by returning the content of the resource to the browser. The browser interprets the markup of HTML content and displays it to the person.

The action outlined in the previous three paragraphs repeats until the person tires of surfing the web.

Web page vs web application


When the web server receives a request (over the HTTP protocol), it will either find an HTML page already built and sitting in its file system, or, it will call upon a web application to generate the HTML that it is looking for.

Web application with HTML inside


The web application is a program written in some language. That language will be used to produce a character string whose content conforms to the HTML language**. So, there is a language inside a language. This sounds confusing, but any programmer can make it happen.

CGI script written in bash


CGI (Common Gateway Interface) is a technique used by programs to be called from the web server as web applications. The applications can be written in many different programming languages. Here is an example using a scripting language called bash.
#!/bin/bash
echo "Content-type: text/html"
echo
echo -n 1 >>../../tallies
COUNT=`cat ../../tallies | wc -c`
cat <<ENDMARKER
<!doctype html>
<html>
<head>
<title>Contrivance without conclusion</title>
<meta name="format-detection" content="telephone=no">
</head>
<body>
<h1>Con without con</h1>
<h2>Contrivance without conclusion</h2>
<p>This page has been visited $COUNT times.</p>
<p>Latest visit from IP $REMOTE_ADDR on $(date).</p>
<p>Sample program from
<a href="http://conwithoutcon.blogspot.com/2014/05/contrivance-without-conclusion.html"
 target="_BLANK">Contrivance without conclusion</a>.</p>
</body>
</html>
ENDMARKER
The code above can also be found in a code repository, here. And a very detailed description of this particular web application can be found in the blog post to which it contains a self-referential link.

The thing to notice is that everything between the two lines (6 and 22) containing "ENDMARKER" is simply a web page written in the HTML language. And it is inside the web application program. 

The web application has HTML inside of it.

Variations in the page require a web application


A careful reader will have noticed that the HTML contains three odd constructs (led by a dollar sign). These embed some information available to the web application that is running, remember, on the server. In order of appearance, they are:
  1. $COUNT, a variable in the web application which holds a number (computed in line 5)
  2. $REMOTE_ADDR, an environment variable from the web server which holds the browser's IP address
  3. $(date), a call out to a program provided by the operating system of the server running the web server which is the current date and time (in the time zone of the server)
So, as the program computes the string (that will ultimately be returned to the browser as HTML (see line 2 of the code above)), it embeds a number, an IP address, and a date and time into the HTML code in those locations.

It is these variations that make it so the we need a web application instead of having a pre-prepared static asset that would always be the same.

Web application that looks like HTML

In the previous section we saw a program that had HTML embedded inside of it. 

Here we consider a program that has the HTML outside.

<html>
<head><title>First JSP</title></head>
<body>
  <%
    double num = Math.random();
    if (num > 0.95) {
  %>
      <h2>You'll have a lucky day!</h2><p>(<%= num %>)</p>
  <%
    } else {
  %>
      <h2>Well, life goes on ... </h2><p>(<%= num %>)</p>
  <%
    }
  %>
  <a href="<%= request.getRequestURI() %>"><h3>Try Again</h3></a>
</body>
</html>

This web application is written in a language known as JSP (Java Server Pages).

A superficial glance at the first few lines might remind one of HTML. 

But it contains some things that look a bit like HTML tags but have a percent sign in them. Such as <% ... %> and <%= ... %>. What could this mean?

JSP is a language in a series of programming languages with a long tradition, dating back to PHP (which originally stood for Personal Home Page) in the early days of the web, quickly followed by ASP (Active Server Pages). The initiator of this concept, PHP, used question marks and exclamation marks where ASP and JSP use the percent sign.

This example comes from Getting Starting with JSP with Examples, with thanks to the author thereof.


How does it work?

When the web server sees a resource whose URL ends in ".jsp" it quickly converts it into an actual Java program (called a servlet) and runs that program to produce the page.

In essence, it turns it inside out***, in that the servlet is a web application with HTML inside. 

For the curious, an excerpt of the servlet code is shown in the getting started page linked to above, in its section "Behind the Scene".

Notes

* Yes, the author is aware that the "P" of HTTP means "protocol" so it unnecessary (not to mention redundant) to say "HTTP protocol". This is written deliberately to emphasize the role of the language being mentioned.

** See * but for the "L" of HTML.

*** If you chance to meet a JSP, do not leave it so! Quickly turn it inside out and make that servlet go. (to the tune of "if you chance to meet a frown")

No comments:

Post a Comment