Lightweight markup languages are inadequate for technical writing

The family of lightweight markup languages includes Markdown, AsciiDoc, reStructuredText, and more. These are all simple textual formats that can be transformed into HTML, thereby avoiding the need to write verbose HTML in the first place. All these lightweight markup languages have their own unique issues, but they share the same fundamental problem of having a lack of expressivity.

Markdown is the ubiquitous writing format of the 21st century. Many dedicated writing apps now default to, or support only Markdown. Even software that isn’t aimed at writing itself, such as Slack, have adopted a variation of it.

As a shorthand for HTML, these markup languages achieve their goals splendidly. They are far more compact than HTML, and easier to both write and read. Their syntax stays out of the way when writing, so that one can focus on what to write rather than how to mark it up.

While Markdown has shortcomings (which I’m describing in this note), there still is nothing better than Markdown out there for any kind of writing. Even this web site is primarily written in Markdown, purely because it is so convenient.

That said, I regularly find myself using Markdown’s HTML fallback to mark up things outside of Markdown’s formatting toolkit. As the creator of Nanoc (a static-site generator) and the markup language D★Mark, it might not come as a surprise that I’m always thinking about ways to improve the writing experience.

To elaborate: What am I trying to achieve, exactly? Good typography is probably a good chunk of the answer here. Document/content management is also related (I believe that a piece of writing rarely lives in isolation).

Alternatives (non-lightweight markup languages)

Here’s a list of all the markup languages I can think of:

Shortcomings

To do: This section talks about Markdown specifically, but most of these are applicable to other lightweight markup languages too.

To do: Elaborate on this: Markdown is still the best tool for getting thoughts into writing. It is easy to use, well-supported by tons of software, and does 95% of what is needed. Still, it helps to think about ways in which it is flawed so that we can improve on it and build something better.

To do: Take into account the ARIA enhancements, especially DPUB-ARIA.

Markdown achieves what it needs to achieve. It is not intended as a tool for technical writing. It is a shorthand for HTML. To do: While HTML itself is also rather inadequate for technical writing, it at least provides extensibility, e.g. with classes and appropriate amounts of CSS.

To elaborate: Markdown has a very useful HTML fallback, though Markdown inside HTML is not supported.

In the last few years, the CommonMark project has emerged as a de-facto standard for Markdown. It has helped in improving consistency across Markdown implementations. Having a standard is beneficial, but it does not solve the shortcomings listed in this section: the issues are shortcomings of the language itself, not of the lack of a standard.

Interestingly, Markdown’s original author, John Gruber, is not involved in CommonMark. He’s against it:

There’s no need, and in fact, that was would be a terrible idea. One True Markdown Spec is a bad idea. Markdown, fundamentally, is an idea, not a spec. The fact that there are numerous popular variants is a feature not a bug. Different variants to suit different contexts.

 — John Gruber, December 2021

Also, Markdown (the original) has not been updated since 2004.

Emphasis and italics

Markdown provides markup for emphasis (*bleep* becomes <em>bleep</em>) and strong emphasis (**bloop** becomes <strong>bloop</strong>). It does not provide markup for italic (<i>) nor bold (<b>). While this is a good design choice, Markdown doesn’t provide markup options where italic would otherwise be used.

When to use italic depends on the style guide you follow. Commonly, italic is used when introducing a new term, or when referencing a word in a foreign language, among other uses. For example, Brian David Gilbert describes zjierbness as that feeling when you bite into a pickle and it’s a little squishier than you expected.

Don’t use <em> when no emphasis is intended. This matters to search engines, any natural-language processing tooling (e.g. text summarization), and especially screen readers. A screen reader might prefer to pause slightly before and after introducing a new term, and not give it undue emphasis.

In Markdown, this requires a fallback to HTML.

URLs and paths

Markdown does not provide a way to mark up URLS or file paths. A common alternative approach is to mark up URLs or file paths as code, using backticks: /home/denis/archive/2021/notes/markdown.md.

While it can be aesthetically pleasing to use the formatting for the code element, using code for URLs is problematic because it either hyphenates incorrectly (if hyphenation is enabled), or does not hyphenate at all, creating an awkward space on the previous line, or making the text overflow, running past the right edge of its container.

Marking up URLs and paths explicitly is a good idea, because URLs and paths require special word-breaking rules. Markdown does not provide appropriate markup for this use case, but a fallback to HTML is possible.

Cross-references

To do

To do: Describe that this is part of a larger topic of documents being part of a whole. A Markdown file can be standalone, but often it is part of a larger entity, such as a web site or a book.

To do: Describe that cross-references look different in print (e.g. physical page numbers). Though, even online, cross-references can look different, depending on whether a reference refers to the currently-open page, or a different one. It’ll also be interesting to see this evolves as the web moves away from the concept of a “page” (or does it?).

Admonitions

To do

Citations and bibliographies

To do

Footnotes, endnotes, sidenotes

To do

Definition lists

To do

Glossaries

To do

Figures

To do

Callouts

To do

Indexing

To do

Composability

To do: You can’t embed a HTML document inside a HTML document directly (because you’d end up duplicating <html> + <head> + body). You can’t easily do that with Markdown either, because a Markdown document always translates to at least one top-level element such as p.

See also