Module systems in programming languages

Defreyne, Denis

Modern programming languages have module systems and package managers, for easy structuring of large applications and reusing source code. There are plenty different ways in which such module systems can work, though, and this note attempts to describe some of the ideas.

Some definitions:

A module is the smallest collection of independent units (functions, constants, structs, classes, and more). It often corresponds to a file, but not always. For example, a math module could contain all basic math functions and constants.
A package is a collection of related modules. It is independently distributable. For example, a stdlib module could contain all basic modules that would be expected from a programming language (IO, string manipulation, math, and more).
An import statement is a source code construct that makes the code in a module available in the current file, module, or scope.

Note that the terms “module” and “package” are not always used in the way described above. For example, Java (with its new module system) has swapped the two.

Locations

A nontrivial module will load reusable source code from three places:

The standard library
Installed packages
The local package

The local package refers to the product that’s being developed, and that contains the module that does the importing.

In my opinion, it is important to be clear where an import statement loads modules from:

C (despite its archaic approach modularity) distinguishes between imports in the local package (#include "mymodule.h") and standard library or installed packages (#include <stdbool.h>).
Ruby’s #require can load from anywhere, depending on the load path (a search path, set as an environment variable). It does have #require_relative, which indicates an import from the local package.
Dart distinguishes between all three by having import 'dart:async' (prefix dart:) for standard library imports, import 'package:something' (prefix package:) for package imports, and import 'myfile' for local imports.

What are modules?

Modules can correspond one-to-one to files. This is the approach used in Python, Ruby, JavaScript, and others. Modules can correspond one-to-one to directories. This is the approach that Go uses. Modules can correspond to a combination of files and directories as well; this is the case with Rust, Java, and others.

A module name can be implicitly derived from the filename (usually when modules correspond to files, e.g. in Python or JavaScript), explicitly given (e.g. with a package mypkgname declaration, like in Java or Go).

If modules and files correspond one-to-one, then it is useful to have a main module (file) that re-exports declarations from other modules inside the package. For example, a math package might have dozens of modules, but its main module would export some constants and functions defined in submodules: math.trig.sin might be re-exported as math.sin for convenience.

A main module could be a regular module (file), or it could have a special filename such as __init__ (which is what Python uses), which the interpreter/compiler looks for. This is useful for simplifying the directory structure. Compare with __init__:

src/
  math/
    __init__.xx
    trig.xx
    vec.xx

… and without __init__:

src/
  math.xx
  math/
    trig.xx
    vec.xx

Open question: If modules correspond to directories rather than files, are re-exports still useful?

Visibility

Visibility determines whether a symbol is accessible to other modules. Typically there is private (not accessible to other module), and public (accessible to other modules). More granular options are possible, e.g. package-private (visible to modules in the current package).

To do: Private by default, with export keyword? Go-style public/private? Underscore for private?

To do: Re-exports: how do those work?

Visibility is not a property of the symbol

The visibility of a symbol is a property of the symbol’s relationship to the module, rather than a property of the symbol itself. This is relevant in languages that determine visibility based on the symbol name. For example:

import math

println(math.Cos(3.14))

In the example above, Cos is exported from math and is public, because it uses the Go-style convention of public symbols starting with an uppercase letter. There is no issue in this case.

It gets complicated when importing symbols directly:

import Cos from math

println(Cos(3.14))

In this case, the Cos symbol is imported from the math module, where Cos is public. However, in the example above, Cos appears to be public with regard to the current module too, which might not be intentional or desired.

In other words, if visibility is determined by the symbol name (e.g. through the use of an initial uppercase letter, or the presence of a leading underscore), then importing symbols directly can lead to confusion of what visibility rules apply.

Non-source imports

JavaScript (or at least Webpack) allows importing non-source files, such as CSS files.

This is interesting, because it provides a mechanism to bundle resources into a single output file, which makes distribution easier.

With compile-time function execution, it would even be possible to import a non-source file and process it ahead of time. For example, it might allow converting Sass to CSS ahead of time. (Though perhaps not directly: Sass can include other files, which means that a non-source import would need to be able to access other files as well.)

Within the JavaScript ecosystem, usually the non-source imports are handled by Webpack. Unfortunately, the Webpack configuration to make that work is too opaque; a language with built-in support for this (with zero configuration) would be nice to have.

Non-source imports only work when modules map one-to-one to files. It might be necessary to have an alternative import-file instruction apart from the regular import-module instruction.

Miscellaneous

An import statement should not modify the global namespace. This is what Ruby does and leads to weird effects: if I import a package that imports the yaml package, I can use the functionality from the yaml package without having imported it.
The import syntax for a package needs to follow the package name. Python does not do this, so you can install a package with one name and then import it with the other — very confusing.
To do: Aliasing (import x as y), partial imports (from x import xyz, import x for xyz). Full imports (Python-like star imports) are bad!
To do: Needs a package manager, with checksums. Perhaps with vendoring/caching support. Probably needs a way to bundle all dependencies together.
To do: If a package is imported, it’s nice to be able to have multiple versions of it. JavaScript’s npm ecosystem allows this.
To do: Inline package manager

Note last edited August 2024.

Incoming links:

Programming language design