Based on material by Carlos Scheidegger and Kevin Sun
In this module, we will go over the basics of how the content of a web page is represented in HTML, as we will use HTML and other web technologies to create data visualizations. HTML stands for “HyperText Markup Language”. 25 years ago, that used to be a meaningful description of what HTML actually did: it has links (hypertext), and it is a markup language. But we will be using many things from the HTML5 standard, which does much, much more: graphics, audio, video, etc.
Importantly, we will not be building websites, but web applications, i.e., complex programs that only 15 years ago were not possible in the browser. So it is easier to think of HTML as “whatever it is that web browsers know how to interpret”, and just not think about the actual term.
The important thing about HTML is that the markup is represented by elements. An HTML element is a portion of the content that is surrounded by a pair of tags of the same name. Like this:
These elements sometimes have meaningful names. In this element,
strong is the name of the tag; the open tag is
<strong>, and the matching closing tag is
</strong>. The way you should interpret this is that the text “This is an HTML element” should be “strong”, i.e., typically this will be bold text.
A comprehensive and well structured list of all elements can be found at MDN.
Closing and Self-closing elements
Most elements apply to some content in between them; for the
strong example, that is the text between the opening
<strong> and closing
Some elements don’t have internal content between their opening and closing tags, such as a
<br> (breaking a line) or
<img> (including an image) tag. The former doesn’t even need attributes, the latter is typically fully specified by attributes.
Note that HTML5 is no longer a XML-based language, so you can use tags without closing elements, such as
<br> (breaking a line) or
<img> (including an image). What you’ll also see in slightly older code is the following shorthand notation:
<foo />, which is equivalent to
HTML elements can and commonly do nest:
In addition to the names, opening tags can contain extra information about the element. These are called attributes:
In this case, we’re using the
a element (which stood for “anchor”, but now is almost universally used as a “link”). The attribute
href means “HTML reference”, which actually makes sense for a change. The meaning given to each attribute changes from element to element. Note that you can use either single
' or double
" quotes, but you have to be consistent (you can’t use one to open and another one to close).
We will use element attributes in pretty much every example from now on. The most important ones are
Metadata and Basics
HTML documents contain metadata, which is specified using tags that don’t have visual equivalents on the website:
<html>creates the entire HTML container.
<head>creates the header (generally where the title and links to style sheets/scripts are found).
<meta>is used to provide meta-information, such as character encoding or other instructions to a browser.
<script>links to or embeds a script (we will do that a lot).
<style>for embedding a style in the website.
<link>to reference an external document, often a css document like that:
<link rel="stylesheet" type="text/css" href="theme.css">. The
relattribute defines the relationship to the active document, the type attribute tells the browser which type of file to expect.
<body>marks the container of the content of the website.
An HTML5 document has a little bit of necessary boilerplate that you should just copy and paste every time you need to get started. Every HTML5 document you create in class should have this skeleton:See output in new page.
Here is a bit more content and a demonstration of some extra elements:See output in new page.
Lists and Tables
Lists and tables are useful for structured information:See output in new page.
Structuring / Layouting
It’s very common that you have multiple “panels” on a web-page, eg., for the header of a page, or for navigation on the side. We can achieve this with
There are also other, semantically meaningful elements, such as
These elements don’t have styling – that is defined later with CSS. A nuance is that all of the elements mentioned here are “block-level” elements (i.e., they break the line if not otherwise specified in CSS), except for
<span>, which is an inline element that doesn’t break the line. A
<span> hence is useful to do styling for a few words within flowing text.
Forms provide opportunities for input by users. This includes things like buttons, text-boxes, and sliders. Here are a few examples:See output in new page.
The Document Object Model (DOM)
As we have seen above, a markup document looks a lot like a tree: it has a root, the HTML element, and elements can have children that are containing elements themselves.
While HTML is a textual representation of a markup document, the DOM is a programming interface for it. DOM stands for “Document Object Model”, and in this class we will use “DOM” to mean the tree created by the web browsers to represent the document, and the API that they provide in order to access it.
Inspecting the DOM of a website.
Perhaps the most important habit you will learn in these first web lessons is the following: when in doubt, go to the Developer Tools. In this case, we’ll look at the Element tree, by clicking on the menu bar: View → Developer → Developer Tools. Alternatively, you can right click on any part of the webpage, and choose “Inspect Element”. Notice that there can be a big difference between what is in the DOM and what is in the source. In fact, much of this class is about dynamically generating DOM elements. Here is a good overview of the developer tools.