HTML via Java Back | TOC

Learning HTML, the language of the Web, from Java (a full-featured general-purpose programming language) is certainly not the way to go about it. But the inclusion of HTML in Javanook certainly calls for an unusual approach.

There are two benefits from this approach:

To publish any document on the web, you need to know HTML. If you are a java programmer (you must be, or you would not waste your time here), you need to know how to output HTML pages from within your programs.

Hypertext markup language, simply called HTML, is a simple mechanism to output data in a form that a web browser can render in a platform-neutral way. A HTML file is a text file and it can therefore be easily created in any text editor. For the same reason, it can also be generated dynamically by any programming language.

If you are familiar with a Microsoft Word document, you will notice that the text can be styled os it can be rendered as plain, bold or italic. This style process is called text-formatting. The Word processor applies the style when you select the text and click the appropriate button - B, I or U - as the case may be.

Text written in a text editor such as the Notepad cannot be formatted, nor it can render text that has been formatted. However, it is possible to wrap the text with formatting instructions, and a program can be developed that would execute the instructions and render the text according to the formatting rules.

The formatting instructions are provided by way of a markup language - you guessed it - it is HTML. The program that renders formatted text according to markup tags provided around it is called - no marks for guessing - a Web browser.

Here is how a simple HTML file looks like:

listing 1:
<html>
<head><title>A Sample HTML Page</title>
</head>
<body>
<h2>Sample Web Page</h2>
<hr>
<p>To mimic MS Word's bold, italic and underline formatting, we use the following tags in HTML.
<br>
<u>Underline</u> the <b>bold</b> word in <i>italic</i>.
<br><br>
</body>
</htmL>

When you save this file as test.html, this is how you would see it in the browser.

All markup is contained within the start <html> tag and the close </html> tag. The text within title tag appears on the browser's title bar. The body tag encloses the document content. In the sample the header is enclosed in a h2 tag, and is followed by a horizontal rule, hr. Note how the text is marked up with style tags such as underline, bold and italic.

The World Wide Web (W3C) consortium oversees the development of the Web and releases specifications that govern its functioning. The HTML specification released by W3C is currently at version V4.0, and most browsers support it in their latest versions.

HTML V4.0 specification is available at: http://www.w3.org/TR/html4/. For a 10-minute guide to HTML, point your browser to http://www.w3.org/MarkUp/Guide/ .

There are many HTML authoring tools in the market today. From simple text markups to WYSIWYG editors, you have a range of tools to pick up from. However, not all are available free, and the graphical ones are expensive. 

We will consider HTML generation using Java. For our discussion here we will focus on the following issues - 

The bulleted list is the one used in the guide mentioned above, and it is the one we are going to use to generate the tags from Java.

We begin by specifying an object to represent an HTML document - HTMLDoc. Using UML notation, we illustrate the object's properties and operations.

HTMLDoc
-title:String
-header:String
-style:String
-imagePath:String
-items:String[]
+addTitle()
+addBody()
+addHeader()
+addContent()
+addStyle()
+addImage()
addLink()
addList()

This class specification is by no means complete, but we will expand it as we go along.

To output listing 1 above using this class, we write the following code:

public class Listing1 {

  private String title = "A Sample HTML Page";
  private String header = "Sample Web Page";
  private int headerLevel = 2;

//style Level 1-bold, 2-italic, 3-underline, 0-plain default

  public Listing1() {
    addTitle(title);
    addBody();
  }

  public void addTitle(String title) {
    println("<html><head><title>"+title+"</title></head>");
  }

  public void addBody() {
    println("<body>");
    addHeader(header, headerLevel);
    addRuler();
    addContent();
    println("</body></html>");
  }

  public void addHeader(String header, int level) {
    switch(level) {
      case 1:
      println("<h1>"+header+"</h1>");
      return;
      case 2:
      println("<h2>"+header+"</h2>");
      return;
      case 3:
      default:
      println("<h3>"+header+"</h3>");
      return;
    }
  }

  public void addRuler() {
    print("<hr>");
  }

  public void addContent() {
    addStyle("<p>To mimic MS Word's bold, italic and underline formatting, we use the following tags in HTML.", 0);
    addLineBreak();
    addStyle("Underline", 3);
    addStyle(" the ", 0);
    addStyle("bold", 1);
    addStyle(" word in ", 0);
    addStyle("italic", 2);
    addLineBreak();
    addLineBreak();
  }

  public void addStyle(String text, int style) {
    switch(style) {
      case 0:
      default:
      print(text);
      return;
      case 1:
      print("<b>"+text+"</b>");
      return;
      case 2:
      print("<i>"+text+"</i>");
      return;
      case 3:
      print("<u>"+text+"</u>");
      return;
    } 

  }

  public void addLineBreak() {
    println("<br>");
  }

  private void print(String matter) {
    // write to console - write to a file if you like
    System.out.print(matter);
  }

  private void println(String matter) {
    // write to console - write to a file if you like
    System.out.println(matter);
  }

  public static void main(String[] args) {
    new Listing1();
  }

}

Copy the code into your favorite editor, then compile and run. You will see the output the console exactly like it is shown in listing1.

This approach, though useful, is not comprehensive. It does not allow you to use the full features of HTML specification. 

We are lucky to have such a Java API for HTML markup generation from the Apache Jakarta Project. It goes by the name ECS for Element Construction Set. The API is extensible, so you can easily create other markup languages. See http://jakarta.apache.org/ecs/ .

Using ECS, you write code such as below (from Apache Web site):

Html html = new Html()
              .addElement(new Head()
                  .addElement(new Title("Demo")))
              .addElement(new Body()
              .addElement(new H1("Demo Header"))
              .addElement(new H3("Sub Header:"))
              .addElement(new Font().setSize("+1")
                         .setColor(HtmlColor.WHITE)
                         .setFace("Times")
                         .addElement("The big dog & the little cat chased each other.")));
out.println(html.toString()); 
// or write to the outputstream directly
output(out);

HTML is derived from SGML - Structured General Markup Language, the mother of all markup languages. With the advent of XML on the Web front, plain HTML is passe. It has many irregularities that makes it difficult to parse. In order to make it easy to process by parsers, HTML is now XMLized (i.e. reformulated in XML). The XML form of HTML is called XHTML, and is endorsed and promoted by W3C.

See XHTML specification at http://www.w3.org/TR/xhtml2/ .