-
Notifications
You must be signed in to change notification settings - Fork 24
OLD: DataNode Implementation
DataNode is the core data structure for the Scheme interpreter. It represents the fundamental data type that represents all possible values within the Scheme interpreter's type system.
As Scheme's values are dynamically typed, we are required to use some form of the algebraic sum type. In C++11, no native or standard library sum type is available, but the Boost C++ libraries contains one in the form of boost::variant
. Note that although std::variant
exists in C++17 and later, we decided to go with boost::variant
in the name of sticking to our original C++11 versioning.
Data
is simply an alias for a boost::variant
that contains various different fundamental data types, and a few extra, such as a std::shared_ptr<shaka::Environment>
.
Fundamentally, DataNode is a linked list node that contains two Data elements. Interestingly enough, since a Data
an contain a std::shared_ptr<DataNode>
, every DataNode is potentially a binary tree.
Although DataNode sports the capability of hosting a single Data value in either of its two fields of head
and tail
, a single value in our object-oriented implementation consists of a DataNode where its head
contains the single value of type Data
, and its tail
contains a std::shared_ptr<DataNode>(nullptr)
. This is not the most efficient implementation, but it makes dealing with DataNode much more uniform and clean in terms of design.
Making a boost::variant
recursive has caused significant problems for us in the past. It is common to find that something like shaka::Procedure
can depend on shaka::DataNode
which depends on shaka::Data
, which, in turn, would depend on shaka::Procedure
.
The solution is to "break" the circular dependency chain of types by using a special boost::recursive_wrapper<shaka::Procedure>
within the boost::variant
definition to introduce a "loose" dependency of types. (It introduces some form of T&
or T*
so as to not require the entire T
type to be defined or instantiated yet.) Unfortunately, when such a solution is left out by mistake, it creates a significant amount of errors (a well-known frustration with the current state of C++ compilers, in general).
Possibly, but our prior approach to DataNode was based on this concept but deemed unviable. Formerly, DataNode was structured to contain a single piece of a shaka::Data
(which was still a boost::variant
at that time), but also contained a std::vector<std::shared_ptr<shaka::Data>>
which represented the children. For more than one reason, we abandoned the approach:
-
Scheme's primitive expressions do not have a unique and "same everywhere" AST since keywords like
define
can be redefined inside of a local scope (e.g. a top-level lambda can contain a different binding ofdefine
like(define define 1)
). -
Manipulation of lists with a tree-based implementation where the root represents a hosting dummy "list" node becomes bulky. Implementing
car
on a tree becomes taking the first child of the tree, while taking thecar
of a tree-based list means taking only the slice of elements with the first element cut off from the vector of pointers to children. In C++11, there is no standard library facility for taking a "view" or "slice" of a vector except for copying or moving parts of it. The mechanics of duplicating the nodes is much more complicated than simply returning the pointer to the next element in the list. We rewrote a lot of code to go back towards a basic list-based approach after going down the AST-route; currently, we affirm it was the right decision. -
Using a fixed AST-based evaluation system where the core interpreter functions must expect a certain AST form is flawed when the meaning of any of the AST forms can be changed by re-definitions of primitive expression symbols.
- Parsing based off of an fixed AST-based system becomes possibly invalid due to the same reason (e.g.
(define 1 2 3)
is valid if(define define (lambda (x y z) z))
was executed for the local scope in question before).
- Parsing based off of an fixed AST-based system becomes possibly invalid due to the same reason (e.g.
-
Manipulating a non-uniform AST with special "primitive expression" nodes quickly becomes hard when paired with the different semantics of evaluating macros while also coordinating evaluation of macros with non-macro forms.
In short, if you can re-define the core syntactic forms to have different meanings, there exists no universal, fixed rules for parsing or even evaluating a form. No semantics can be assumed except for that which doesn't change at run-time, such as the syntax of lists. All bindings must be checked for even primitive procedures to ensure they still represent the correct primitive expression forms as specified by R7RS.