Skip to content

OLD: DataNode Implementation

VermillionAzure edited this page Jan 20, 2018 · 1 revision

DataNode is the core data structure for the Scheme interpreter. It represents the fundamental data type that represents all possible values within the Scheme interpreter's type system.

As Scheme's values are dynamically typed, we are required to use some form of the algebraic sum type. In C++11, no native or standard library sum type is available, but the Boost C++ libraries contains one in the form of boost::variant. Note that although std::variant exists in C++17 and later, we decided to go with boost::variant in the name of sticking to our original C++11 versioning.

Data - The sum type

Data is simply an alias for a boost::variant that contains various different fundamental data types, and a few extra, such as a std::shared_ptr<shaka::Environment>.

Fundamentally, DataNode is a linked list node that contains two Data elements. Interestingly enough, since a Data an contain a std::shared_ptr<DataNode>, every DataNode is potentially a binary tree.

Although DataNode sports the capability of hosting a single Data value in either of its two fields of head and tail, a single value in our object-oriented implementation consists of a DataNode where its head contains the single value of type Data, and its tail contains a std::shared_ptr<DataNode>(nullptr). This is not the most efficient implementation, but it makes dealing with DataNode much more uniform and clean in terms of design.

FAQ

Why not make Data recursive?

Making a boost::variant recursive has caused significant problems for us in the past. It is common to find that something like shaka::Procedure can depend on shaka::DataNode which depends on shaka::Data, which, in turn, would depend on shaka::Procedure.

The solution is to "break" the circular dependency chain of types by using a special boost::recursive_wrapper<shaka::Procedure> within the boost::variant definition to introduce a "loose" dependency of types. (It introduces some form of T& or T* so as to not require the entire T type to be defined or instantiated yet.) Unfortunately, when such a solution is left out by mistake, it creates a significant amount of errors (a well-known frustration with the current state of C++ compilers, in general).

Can DataNode be optimized to use a tree-based layout instead of a linked-list-based layout?

Possibly, but our prior approach to DataNode was based on this concept but deemed unviable. Formerly, DataNode was structured to contain a single piece of a shaka::Data (which was still a boost::variant at that time), but also contained a std::vector<std::shared_ptr<shaka::Data>> which represented the children. For more than one reason, we abandoned the approach:

  • Scheme's primitive expressions do not have a unique and "same everywhere" AST since keywords like define can be redefined inside of a local scope (e.g. a top-level lambda can contain a different binding of define like (define define 1)).

  • Manipulation of lists with a tree-based implementation where the root represents a hosting dummy "list" node becomes bulky. Implementing car on a tree becomes taking the first child of the tree, while taking the car of a tree-based list means taking only the slice of elements with the first element cut off from the vector of pointers to children. In C++11, there is no standard library facility for taking a "view" or "slice" of a vector except for copying or moving parts of it. The mechanics of duplicating the nodes is much more complicated than simply returning the pointer to the next element in the list. We rewrote a lot of code to go back towards a basic list-based approach after going down the AST-route; currently, we affirm it was the right decision.

  • Using a fixed AST-based evaluation system where the core interpreter functions must expect a certain AST form is flawed when the meaning of any of the AST forms can be changed by re-definitions of primitive expression symbols.

    • Parsing based off of an fixed AST-based system becomes possibly invalid due to the same reason (e.g. (define 1 2 3) is valid if (define define (lambda (x y z) z)) was executed for the local scope in question before).
  • Manipulating a non-uniform AST with special "primitive expression" nodes quickly becomes hard when paired with the different semantics of evaluating macros while also coordinating evaluation of macros with non-macro forms.

In short, if you can re-define the core syntactic forms to have different meanings, there exists no universal, fixed rules for parsing or even evaluating a form. No semantics can be assumed except for that which doesn't change at run-time, such as the syntax of lists. All bindings must be checked for even primitive procedures to ensure they still represent the correct primitive expression forms as specified by R7RS.