The compiler basically transforms a DeepDive application's deepdive.conf object into a collection of shell scripts and Makefile, which can be later used for actual execution.
deepdive-compile
is the user-facing facade command, an entry point to the compilation, and the rest are implementations for a specific step or transformation in the overall compilation.
Here's a brief summary of how the compilation is done.
-
DDlog rules and declarations in
app.ddlog
are compiled and combined with user'sdeepdive.conf
andschema.json
first. The HOCON syntax used bydeepdive.conf
is interpreted byhocon2json
and everything is converted into a single JSON config object that holds everything under the key "deepdive". -
The config object is first extended with some implied extractors, such as initializing the database and loading input tables. Then, the dependencies of extractors are normalized, and their names are qualified with corresponding prefixes (by
compile-config_normalized
) to make it easier and clearer to produce the final code for execution. DeepDive's built-in processes for variables and factors, such as grounding, learning, inference, and calibration, are added to the config object after the normalization. User's original config is kept intact under "deepdive" while the normalized one is created under a different key, "deepdive_". -
The dependency information in config object is captured in a
Makefile
that is later used by the execution engine to produce an execution plan for any data product defined in the application, and to keep track of when each process or data has been done. -
The actual computation that needs to be performed for each process, e.g., running a SQL query or a Python UDF, is compiled as a shell script (by
compile-code-*
). Each compiler component takes as input the normalized config object and generates a code fragment for the part it is responsible for, e.g.,compile-code-tsv_extractor
handles thetsv_extractor
s. Compiled code fragments are represented again as JSON objects that map contents of files by their path names. These objects are merged, then passed to a code generator that actually materializes the code as executable shell scripts.
Everything compiled is kept under run/
directory of the application.
The runner uses the compiled Makefile
and executable files to plan and execute various data processing and computation tasks for the DeepDive application.