Lightweight markup languages are inadequate for technical writing
The family of lightweight markup languages includes Markdown, AsciiDoc, reStructuredText, and more. These are all simple textual formats that can be transformed into HTML, thereby avoiding the need to write verbose HTML in the first place. All these lightweight markup languages have their own unique issues, but they share the same fundamental problem of having a lack of expressivity.
Markdown is the ubiquitous writing format of the 21st century. Many dedicated writing apps now default to, or support only Markdown. Even software that isn’t aimed at writing itself, such as Slack, have adopted a variation of it.
As a shorthand for HTML, these markup languages achieve their goals splendidly. They are far more compact than HTML, and easier to both write and read. Their syntax stays out of the way when writing, so that one can focus on what to write rather than how to mark it up.
While Markdown has shortcomings (which I’m describing in this note), there still is nothing better than Markdown out there for any kind of writing. Even this web site is primarily written in Markdown, purely because it is so convenient.
That said, I regularly find myself using Markdown’s HTML fallback to mark up things outside of Markdown’s formatting toolkit. As the creator of Nanoc (a static-site generator) and the markup language D★Mark, it might not come as a surprise that I’m always thinking about ways to improve the writing experience.
To elaborate: What am I trying to achieve, exactly? Good typography is probably a good chunk of the answer here. Document/content management is also related (I believe that a piece of writing rarely lives in isolation).
Alternatives (non-lightweight markup languages)
Here’s a list of all the markup languages I can think of:
-
HTML: To do: Write about plain old HTML.
-
DITA and DocBook: On the XML side of things, there is DITA (Darwin Information Typing Architecture) and DocBook. DITA, despite being a standard and having been around since 2005, has no web site, and no official resources to help anyone ease into it. DocBook has been around since the early 90s, and is primarily used in conjunction with DocBook XSL, a set of XSLT stylesheets that is so monstrous that I have not found any custom styling for it, and the default HTML output looks like it comes straight from the 90s.
-
LaTeX: LaTeX is a document preparation system that is great for creating beautiful-looking documents. It is primarily used for generating PDF documents (for printing) and is not usable for the web.
-
roff and friends: The roff family of tools (nroff, troff, groff, etc) use an arcane syntax that is nearly unreadable to the untrained eye.
-
D★Mark: To do: Write about D★Mark. What about the idea of reusing some D★Mark-like markup inside Markdown — how problematic would this be?
-
Pollen: Pollen is an extensive publishing system that comes with its own syntax. It treats content as a programming language rather than a text (markup) language, which gives it a great deal of power. (In terms of syntax, Pollen’s markup and D★Mark are rather alike — coincidentally, as I was not aware of Pollen when I created D★Mark.)
Shortcomings
To do: This section talks about Markdown specifically, but most of these are applicable to other lightweight markup languages too.
To do: Elaborate on this: Markdown is still the best tool for getting thoughts into writing. It is easy to use, well-supported by tons of software, and does 95% of what is needed. Still, it helps to think about ways in which it is flawed so that we can improve on it and build something better.
To do: Take into account the ARIA enhancements, especially DPUB-ARIA.
Markdown achieves what it needs to achieve. It is not intended as a tool for technical writing. It is a shorthand for HTML. To do: While HTML itself is also rather inadequate for technical writing, it at least provides extensibility, e.g. with classes and appropriate amounts of CSS.
To elaborate: Markdown has a very useful HTML fallback, though Markdown inside HTML is not supported.
In the last few years, the CommonMark project has emerged as a de-facto standard for Markdown. It has helped in improving consistency across Markdown implementations. Having a standard is beneficial, but it does not solve the shortcomings listed in this section: the issues are shortcomings of the language itself, not of the lack of a standard.
Interestingly, Markdown’s original author, John Gruber, is not involved in CommonMark. He’s against it:
There’s no need, and in fact, that was would be a terrible idea. One True Markdown Spec is a bad idea. Markdown, fundamentally, is an idea, not a spec. The fact that there are numerous popular variants is a feature not a bug. Different variants to suit different contexts.
Also, Markdown (the original) has not been updated since 2004.
Emphasis and italics
Markdown provides markup for emphasis (*bleep*
becomes <em>bleep</em>
) and strong emphasis (**bloop**
becomes <strong>bloop</strong>
). It does not provide markup for italic (<i>
) nor bold (<b>
). While this is a good design choice, Markdown doesn’t provide markup options where italic would otherwise be used.
When to use italic depends on the style guide you follow. Commonly, italic is used when introducing a new term, or when referencing a word in a foreign language, among other uses. For example, Brian David Gilbert describes zjierbness as that feeling when you bite into a pickle and it’s a little squishier than you expected
.
Don’t use <em>
when no emphasis is intended. This matters to search engines, any natural-language processing tooling (e.g. text summarization), and especially screen readers. A screen reader might prefer to pause slightly before and after introducing a new term, and not give it undue emphasis.
In Markdown, this requires a fallback to HTML.
URLs and paths
Markdown does not provide a way to mark up URLS or file paths. A common alternative approach is to mark up URLs or file paths as code, using backticks: /home/denis/archive/2021/notes/markdown.md
.
While it can be aesthetically pleasing to use the formatting for the code
element, using code
for URLs is problematic because it either hyphenates incorrectly (if hyphenation is enabled), or does not hyphenate at all, creating an awkward space on the previous line, or making the text overflow, running past the right edge of its container.
Marking up URLs and paths explicitly is a good idea, because URLs and paths require special word-breaking rules. Markdown does not provide appropriate markup for this use case, but a fallback to HTML is possible.
Cross-references
To do
To do: Describe that this is part of a larger topic of documents being part of a whole. A Markdown file can be standalone, but often it is part of a larger entity, such as a web site or a book.
To do: Describe that cross-references look different in print (e.g. physical page numbers). Though, even online, cross-references can look different, depending on whether a reference refers to the currently-open page, or a different one. It’ll also be interesting to see this evolves as the web moves away from the concept of a “page” (or does it?).
Admonitions
To do
Citations and bibliographies
To do
Footnotes, endnotes, sidenotes
To do
Definition lists
To do
Glossaries
To do
Figures
To do
Callouts
To do
Indexing
To do
Composability
To do: You can’t embed a HTML document inside a HTML document directly (because you’d end up duplicating <html>
+ <head>
+ body
). You can’t easily do that with Markdown either, because a Markdown document always translates to at least one top-level element such as p
.
See also
- List of external textual DSLs — though this primarily deals with non-prose structured data.