BCIT

British Colombia Institute of Technology

It all starts here in one form or another: HTML

What started as a language defined by SGML (Standard Generalized Markup Language), HTML is used to mark up billions of pages, making up the bulk of the web.
From a common origin, 3 distinct version have emerged:
1. HTML 1 - 4.01: This was developed in rapid succession in the 1990's
2. XHTML 1, 1.1, 2: A stricter version, created as an answer to the loose development of HTML
3. HTML5 - a reaction to the draconian ruleset proposed by the W3C for the future of MLs beyond XHTML 1.1

Late 1991, Tim Berners-Lee releases the first iteration of what would become HTML 2.0 (there was no real 1.0+)
HTML was a language created at the dawn of the Web, using SGML (Standard Generalized Markup Language) as a template
Over the next decade, various additions, improvements and changes are introduced
The last official specification was the service release of HTML 4.01 in December 1999
Initially used for documents and a place for the technically inclined, the web did not have the international pervasiveness it does today
Most developers were new to the language and there were no real classes in it
People had to learn as they went along
A lot of early sites are poorly designed

Despite it's reputation as a loose language, HTML 4.01 can be written strict just like XHTML and HTML5 can
The problem was not entirely with the language or the authors. It was with the browsers.
Browser manufacturers created User Agents that would read code, interpret it, fix any mistakes and render a page, without any visible errors.
It was this that led to coders exploiting problems in browsers and creating pages that broke when new specs or updated User Agents (browsers) came out
Okay, so now there is a problem. Millions of pages are being produced with sloppy code because the UAs let it happen
The web is becoming filled with Tag Soup that makes it hard to index, with search tools having to trust things like META keywords and descriptions
A better solution is required: Get Strict!

After years of HTML development, 4.01 became the "Last Official Release"
The Web Working Group had taken a look at XML (Extensible Markup Language) and liked what they saw: a requirement for valid markup in order for an XML application to work
Why not take the ruleset from XML (lower case attributes, everything in quotes, close every tag) and apply it to HTML? Yay!
That should fix everything, right?
Coding wasn't actually fixed by the introduction of XHTML. Fixes were suggested by the specification, but most browsers would render the page anyway.
This is because 99% of XHTML pages are reporting their MIME type as: text/html
To get pages to render using the appropriate DOCTYPE of XHTML, pages would have to be delivered with the MIME type of application/xhtml+xml
Using that MIME type, however, brought with it a rather dire proviso:
The error handling specified in XHTML 1.1 meant all errors will cause the browser to cease rendering
Yes, you read that correctly. That means any error will throw an warning to the user and the page will not be shown
Try to imagine the Web actually working in this way
Why was this done? Because the developers of XHTML wanted to force strict coding rules so that the Web could be seen as a giant application. Reliable code on every page meant you could use the data anywhere