Module systems in programming languages
Modern programming languages have module systems and package managers, for easy structuring of large applications and reusing source code. There are plenty different ways in which such module systems can work, though, and this note attempts to describe some of the ideas.
Some definitionsThe terms “module” and “package” are not always used this way. For example, Java (with its new module system) has swapped the two. :
A module is the smallest collection of independent units (functions, constants, structs, classes, and more). It often corresponds to a file, but not always. For example, a
mathmodule could contain all basic math functions and constants.
A package is a collection of related modules. It is independently distributable. For example, a
stdlibmodule could contain all basic modules that would be expected from a programming language (IO, string manipulation, math, and more).
An import statement is a source code construct that makes the code in a module available in the current file, module, or scope.
A nontrivial module will load reusable source code from three places:
- The standard library
- Installed packages
- The local package
The local package refers to the product that’s being developed, and that contains the module that does the importing.
In my opinion, it is important to be clear where an import statement loads modules from:
C (despite its archaic approach modularity) distinguishes between imports in the local package (
#include "mymodule.h") and standard library or installed packages (
#requirecan load from anywhere, depending on the load path (a search path, set as an environment variable). It does have
#require_relative, which indicates an import from the local package.
Dart distinguishes between all three by having
dart:) for standard library imports,
package:) for package imports, and
import 'myfile'for local imports.
What are modules?
package mypkgname declaration, like in Java or Go).
If modules and files correspond one-to-one, then it is useful to have a main module (file) that re-exports declarations from other modules inside the package. For example, a
math package might have dozens of modules, but its main module would export some constants and functions defined in submodules:
math.trig.sin might be re-exported as
math.sin for convenience.
A main module could be a regular module (file), or it could have a special filename such as
__init__ (which is what Python uses), which the interpreter/compiler looks for. This is useful for simplifying the directory structure. Compare with
src/ math/ __init__.xx trig.xx vec.xx
… and without
src/ math.xx math/ trig.xx vec.xx
Open question: If modules correspond to directories rather than files, are re-exports still useful?
Visibility determines whether a symbol is accessible to other modules. Typically there is private (not accessible to other module), and public (accessible to other modules). More granular options are possible, e.g. package-private (visible to modules in the current package).
To do: Private by default, with export keyword? Go-style public/private? Underscore for private?
To do: Re-exports: how do those work?
Visibility is not a property of the symbol
The visibility of a symbol is a property of the symbol’s relationship to the module, rather than a property of the symbol itself. This is relevant in languages that determine visibility based on the symbol name. For example:
import math println(math.Cos(3.14))
In the example above,
Cos is exported from
math and is public, because it uses the Go-style convention of public symbols starting with an uppercase letter. There is no issue in this case.
It gets complicated when importing symbols directly:
import Cos from math println(Cos(3.14))
In this case, the
Cos symbol is imported from the
math module, where
Cos is public. However, in the example above,
Cos appears to be public with regard to the current module too, which might not be intentional or desired.
In other words, if visibility is determined by the symbol name (e.g. through the use of an initial uppercase letter, or the presence of a leading underscore), then importing symbols directly can lead to confusion of what visibility rules apply.
This is interesting, because it provides a mechanism to bundle resources into a single output file, which makes distribution easier.
With compile-time function execution, it would even be possible to import a non-source file and process it ahead of time. For example, it would allow converting Sass to CSS ahead of timePerhaps not directly: Sass can include other files, which means that a non-source import would need to be able to access other files as well. .
Non-source imports only work when modules map one-to-one to files. It might be necessary to have an alternative import-file instruction apart from the regular import-module instruction.
An import statement should not modify the global namespace. This is what Ruby does and leads to weird effects: if I import a package that imports the
yamlpackage, I can use the functionality from the
yamlpackage without having imported it.
The import syntax for a package needs to follow the package name. Python does not do this, so you can install a package with one name and then import it with the other — very confusing.
To do: Aliasing (import x as y), partial imports (from x import xyz, import x for xyz). Full imports (Python-like star imports) are bad!
To do: Needs a package manager, with checksums. Perhaps with vendoring/caching support. Probably needs a way to bundle all dependencies together.
To do: Inline package manager