When I look at the state of HTML on the internet, it seems likely that many web developers share my experience. That’s why I think relearning this fundamental technology should be on most development professionals’ to-do lists.
What Is HTML?
How Bad is the Problem?
Many of the world's top sites do not author their sites properly when it comes to HTML. Even internet juggernauts like Facebook don’t properly use semantic HTML tags and have unintelligible document outlines. There are more than 110 tags in the HTML specification, and developers only use about 25 of them frequently across the web. Let’s take a look at where this trend of poor HTML authorship comes from.
Before we dive into the topic, it’s important to understand the background terms that we’ll be using, especially semantics and user agents.
The purpose of HTML is to provide a layer of semantics (in other words, “meaning”) on top of your document content (text, images, videos, etc.). An example of this is the
time tag. When you mark up a section of content with this tag, you are telling the user agent something about the meaning of that content. This markup
Sat, July 10th is clearly a time or date, while the text “Sat July 10th” is not as clear. These tags may seem unimportant to us because, as people, we understand the semantics of a written date implicitly, but these tags are essential to helping non-human users understand the information on your webpage.
In web development, your end user is not actually the person using your website, but instead the browser that reads your page content and displays it to the user. That browser is referred to in web development as the “user agent.” In other words the technology acting on behalf of the user. This user agent might be a browser, but it could also be a web crawler, a screen reader, or another type of website parser. User agents are the systems that web developers need to write HTML for browsers in addition to human users.
What Is Good HTML?
The primary indicator of well-written HTML is a logical information hierarchy. Each page should use all of the header elements from the largest to the smallest, to outline the data on the page.
For example, each level of header element should define a subtopic of the level above it, like in the example below. This is fundamental to allow a user agent to understand what the information on your web page is about.
[h1]The Lord of the Rings ├── [h2]Contents └── [h2]Plot └──[h3]The Fellowship of the Ring ├──[h4]Prologue ├──[h4]Book I: The Ring Sets Out └──[h4]Book II: The Ring Goes South
Secondly, effective HTML correctly uses the available HTML tags. There are roughly 115 tags in the HTML specification, but most developers only use about 20 of them. The more specific you can be with your tagging of content, the better your webpage will be understood by a user agent.
Why Do Developers Write Bad HTML?
The first and most poignant reason that developers write poor HTML is the HTML specification itself. The specification is long, esoteric, and difficult to read for the layperson. On top of that, the specification is inconsistent, even when it comes to extremely important topics like the document outline, which is the algorithm the spec indicates that browsers should use to interpret information hierarchy. The specification insists that browsers should take document sections into account when calculating header relevance. For seven years, though, no browsers actually have done so. This lack of consistency between the spec and the real world has led to a lot of confusion in the development community and a lack of understanding of the specification.
The second reason is the advent of many front-end frameworks that reduce developers need to interact with HTML directly. Many developers are working with front-end frameworks that are very opinionated in their syntax, like React.js or Vue.js, and while these tools allow feature development to be very fast, the lack of contact with the underlying technology leads to developers not understanding the fundamentals of writing good markup. One example is a basic browser form collecting user input. The React.js framework overrides the HTML
form element, giving developers simpler ways to make forms on their websites. Because of this, many developers might skip learning the APIs of the form element, or even skip using this fundamental HTML element entirely.
The final factor contributing to poor HTML authorship is the volume and depth of the HTML tags defined in HTML5. There was a major paradigm shift in 2014 when the HTML5 specification was released. The specification, among other things, released semantic HTML tags, a framework for defining the content arrangement on your page with tags like
header, main, nav and
footer. These top-level tags are the foundation of semantic HTML, but there are hundreds of them, each with different semantic meaning. The volume of these new tags has led to a slow adoption by the development community, starting with the browsers that support the spec and continuing with developers.
Because of a bulky specification and a focus away from semantic HTML in many modern frameworks, many developers still may need to “relearn” some fundamental concepts of writing HTML.
Improving your skill as an HTML author will greatly benefit you in all areas of web development. The HTML document is the foundation of every web page, so improving it’s consistency and clarity will not only result in a highly accessible webpage, but will also reduce maintenance overhead, reduce your overall code size, and keep your code more browser-compliant.