What NInja Does
The build file, once parsed, describes a graph of dependencies: the final output binary depends on linking a number of objects, each of which is the result of compiling sources. Specifically it is a bipartite graph, where "nodes" (input files) point to "edges" (build commands) which point to nodes (output files)6. The build process then traverses this graph.
Chrome(roughly 40,000 files) is roughly the size of the Linux kernel so the approach taken there ought to work.
Supporting Windows
- straightforward changes like making the path separator a backslash or changing the Ninja syntax to allow colons in a path (like c:\foo.txt)
- Windows has a relatively low limit on the length of a command, an issue that comes up when constructing the command passed to a final link step that may mention most files in the project.
- Getting the modification time of a file–GetFileAttributesEx() on Windows13 and stat() on non-Windows platforms–seems to be about 100 times slower on Windows than it is on Linux.14
- The Git version control system, which similarly needs to get the state of many files, can use multiple threads on Windows to execute file checks in parallel. Ninja ought to adopt this feature.
Executing a Build
Ninja runs build commands in parallel by default, based on the number of available CPUs on the system.
The Build Log
- Linux kernel build system tracks the commands used to generate outputs.
- Consider a motivating example: you compile an input foo.c into an output foo.o, and then change the build file such that it should be rebuilt with different compilation flags.
- For the build system to know that it needs to rebuild the output, it must either note that foo.o depends on the build files themselves (which, depending on the organization of the project, might mean that a change to the build files would cause the entire project to rebuild), or record the commands used to generate each output and compare them for each build.
- Ninja writes out a build log that records the full commands used to generate each output.
- Then for each subsequent build, Ninja loads the previous build log and compares the new build's commands
- Rather than recording commands, which are frequently very long and take a lot of time to parse, Ninja instead records a hash of the command.
- Using hashes reduced the size of the build log dramatically–from 200 MB to less than 2 MB on Mac OS X–and made it over 20 times faster to load.
Dependency Files
A better approach relies on the fact that at compile time gcc (and Microsoft's Visual Studio) can output which headers were used to build the output.
Canonicalization
- Ninja avoids using strings to identify paths. Instead, Ninja maps each path it encounters to a unique Node object and the Node object is used in the rest of the code. Reusing this object ensures that a given path is only ever checked on disk once, and the result of that check (i.e., the modification time) can be reused in other code.
- To always use the same Node for a single file, Ninja must reliably map every possible name for a file into the same Node object. This requires a canonicalization pass on all paths mentioned in input files,