Decision Tree Classifier CDecisionTree

Decision Tree Classifier CDecisionTree

Decision tree is a classification method that involves comparing the object features with a set of threshold values; the result tells us to move to one of the children nodes. Once we reach a leaf node we assign the object to the class this node represents.

In NeoML library this method is implemented by the CDecisionTree class. It exposes a Train method that allows you to train a multi-class classification model.

Training settings

The parameters are represented by a CDecisionTree::CParams structure.

MinContinuousSubsetSize — [for continuous features] the minimum number of vectors corresponding to a node subtree.
MinDiscreteSubsetSize — [for discrete features] the minimum number of vectors corresponding to a node subtree.
MinDiscreteSubsetPart — [for discrete features] the minimum weight of the vectors in a subtree relative to the parent node weight (may be from 0 to 1).
MinContinuousSubsetPart — [for continuous features] the minimum weight of the vectors in a subtree relative to the parent node weight (may be from 0 to 1).
MinSplitSize — the minimum number of vectors in a node subtree when it may be divided further.
MaxTreeDepth — the maximum depth of the tree.
MaxNodesCount — the maximum number of nodes.
SplitCriterion — the criterion for subset splitting.
ConstNodeThreshold — the ratio of the equal elements in the subset which should be the threshold for creating a constant node (may be from 0 to 1).
RandomSelectedFeaturesCount — no more than this number of randomly selected features will be used for each node. Set the value to -1 to use all features every time.
AvailableMemory — memory limit for the algorithm (in bytes); if training step fails, try to increase this parameter (default value is 1Gb).
MulticlassMode - the approach used in multiclass task: SingleClassifier (default), OneVsAll or OneVsOne.

Model

The trained model is a tree with every non-leaf node containing a pair (feature index, a set of its values), and every leaf node containing the probability for the object to belong to one of the classes.

To classify a given vector, the algorithm travels along the tree from the root node, in each node comparing the value of the specified feature with the set threshold; the result of comparison selects the next node.

The model is described by an IDecisionTreeModel interface:

// Trained classification model interface
class NEOML_API IDecisionTreeModel : public IModel {
public:
	virtual ~IDecisionTreeModel() = 0;

	// Gets the number of children nodes
	virtual int GetChildrenCount() const = 0;

	// Gets the child node with the specified index
	virtual CPtr<IDecisionTreeModel> GetChild( int index ) const = 0;

	// Gets the node information
	virtual void GetNodeInfo( CDecisionTreeNodeInfo& info ) const = 0;

	// Serialize the model
	virtual void Serialize( CArchive& ) = 0;
};

Sample

See a simple example of training a decision tree classifier. The input data set contains only vectors with continuous features.

CPtr<IDecisionTreeModel> buildModel( IProblem* data )
{
	CDecisionTree::CParams param;
	param.MinContinuousSubsetPart = 0.10; // Each subtree should contain no less than 10% of all nodes
	param.MinContinuousSubsetSize = 128; // Suppose only continuous features are used
	param.MinSplitSize = 16; // Each subtree should contain no less than 16 nodes
	param.MaxTreeDepth = 10; // The tree depth shouldn't be more than 10
	param.SplitCriterion = CDecisionTree::SC_InformationGain;

	CDecisionTree builder( param );

	return dynamic_cast<IDecisionTreeModel*>( builder.Train( data ).Ptr() );
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DecisionTree.md

DecisionTree.md

Decision Tree Classifier CDecisionTree

Training settings

Model

Sample

Files

DecisionTree.md

Latest commit

History

DecisionTree.md

File metadata and controls

Decision Tree Classifier CDecisionTree

Training settings

Model

Sample