Avoid naughty words in hex digest strings by changing the set of characters

Hexadecimal strings typically contain 0–9 and a–f, which gives plenty of opportunity for (naughty) words to form (Batchelder, 2007), e.g. “deadbabe”.

If we also accept “9” as a substitute for “g,” then even more (naughty) words can be built, e.g. “defacedfa9”. Your imagination is the limit.

Remove vowels

To avoid accidentally generating words, we can remove vowels (A E I O U, and arguably Y) from the list of possible characters, and additionally remove characters that can stand in for vowels (0 1 3 4).

The set of characters we use for representing hexadecimal strings therefore becomes shifted: instead of 0123456789abcdef, we now have 256789bcdfghjklm.

Performing this replacement is trivial with tr on the command line:

% echo 1337cafebabe | tr 0123456789abcdef 256789bcdfghjklm
577cjgmlhghl

Ruby’s String#tr does the same:

'1337cafebabe'.tr('0123456789abcdef', '256789bcdfghjklm')
# => "577cjgmlhghl"

Remove ambiguous characters

We can go further and remove ambiguous characters as well. While not necessary, it is helpful when writing down hexadecimal strings (especially when using non-conventional characters, like we do).

Ambiguous characters include B D G J Q S Z and 0 1 2 5 6 8 (with 0 and 1 already removed because they can substitute a vowel). The set of characters we use for representing hexadecimal strings has shifted again: instead of 256789bcdfghjklm (which replaced 0123456789abcdef), we now have 79cfhklmnprtvwxz.

We can use it the same way. Command line:

% echo 1337cafebabe | tr 0123456789abcdef 79cfhklmnprtvwxz
9ffmvrzxtrtx

Ruby:

'1337cafebabe'.tr('0123456789abcdef', '79cfhklmnprtvwxz')
# => "9ffmvrzxtrtx"

Note how 79cfhklmnprtvwxz is exactly 16 characters and ends with z. A wonderful coincidence.

See also

Good human-usable tokens are not random strings

References

Batchelder, Ned. “Hex Words,” March 20, 2007. https://nedbatchelder.com/text/hexwords.html.