Skip to content

Commit

Permalink
docs: add tutorial on obtaining information about distributiosn
Browse files Browse the repository at this point in the history
  • Loading branch information
younesStrittmatter committed Nov 10, 2023
1 parent 58f5bd6 commit cb669f1
Show file tree
Hide file tree
Showing 10 changed files with 390 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The Equation Tree package is an equation toolbox with symbolic regression in min
- [**Equation Sampling**](user-guide/equation-sampling.md)
- Calculating [Distance Metrics](user-guide/distance-metrics.md) between equations

It also encompasses a variety of [additional features](user-guide/additional-features.md). For example, to obtain information about existing equation list that can, in turn, be used in our sampling method.
It also encompasses a variety of [additional features](user-guide/additional-features.md) including the capability to analyse distribution parameters for a given set of equations. For example, to obtain information about a specific field one can use the [equation scraper](https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/) to scrape equations from wikipedia and then use the sampler to generate equations that resemble a equations of a scientific field.

## Relevant Publication

Expand Down
212 changes: 212 additions & 0 deletions docs/tutorials/Analysing Equation Distribution.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Analysing Distributions\n",
"\n",
"The equation tree can be used to extract information from existing distributions of equations (e.g., for example by scraping priors: https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/)\n",
"\n",
"## Installation"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"!pip install equation-tree"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Equation Database\n",
"\n",
"Here, we use a list of sympy equation to demonstrate the functionality"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [],
"source": [
"# import functionality from sympy\n",
"from sympy import sympify\n",
"\n",
"eq_1 = sympify('x_1 + x_2')\n",
"eq_2 = sympify('exp(x_1) * 2.5')\n",
"eq_3 = sympify('sin(x_1) + 2 * cos(x_2)')\n",
"equation_list = [eq_1, eq_2, eq_3]"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Analyse the List\n",
"\n",
"We can obtain informations about equations and lists of equations:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"outputs": [
{
"data": {
"text/plain": "{'max_depth': {3: 0.3333333333333333,\n 4: 0.3333333333333333,\n 7: 0.3333333333333333},\n 'depth': {1: 0.3333333333333333,\n 2: 0.3333333333333333,\n 3: 0.3333333333333333},\n 'structures': {'[0, 1, 1]': 0.3333333333333333,\n '[0, 1, 1, 2]': 0.3333333333333333,\n '[0, 1, 2, 1, 2, 2, 3]': 0.3333333333333333},\n 'features': {'constants': 0.2857142857142857,\n 'variables': 0.7142857142857143},\n 'functions': {'exp': 0.3333333333333333,\n 'sin': 0.3333333333333333,\n 'cos': 0.3333333333333333},\n 'operators': {'+': 0.5, '*': 0.5},\n 'function_conditionals': {'exp': {'features': {'constants': 0.0,\n 'variables': 1.0},\n 'functions': {},\n 'operators': {}},\n 'sin': {'features': {'constants': 0.0, 'variables': 1.0},\n 'functions': {},\n 'operators': {}},\n 'cos': {'features': {'constants': 0.0, 'variables': 1.0},\n 'functions': {},\n 'operators': {}}},\n 'operator_conditionals': {'+': {'features': {'constants': 0.0,\n 'variables': 1.0},\n 'functions': {'sin': 1.0},\n 'operators': {'*': 1.0}},\n '*': {'features': {'constants': 1.0, 'variables': 0.0},\n 'functions': {'exp': 0.5, 'cos': 0.5},\n 'operators': {}}}}"
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from equation_tree import get_frequencies\n",
"\n",
"# Show the frequencies of\n",
"frequencies = get_frequencies(equation_list)\n",
"frequencies"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Instead of frequencies, we can also obtain absolute values:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [
{
"data": {
"text/plain": "{'max_depth': {3: 1, 4: 1, 7: 1},\n 'depth': {1: 1, 2: 1, 3: 1},\n 'structures': {'[0, 1, 1]': 1, '[0, 1, 1, 2]': 1, '[0, 1, 2, 1, 2, 2, 3]': 1},\n 'features': {'constants': 2, 'variables': 5},\n 'functions': {'exp': 1, 'sin': 1, 'cos': 1},\n 'operators': {'+': 2, '*': 2},\n 'function_conditionals': {'exp': {'features': {'constants': 0,\n 'variables': 1},\n 'functions': {},\n 'operators': {}},\n 'sin': {'features': {'constants': 0, 'variables': 1},\n 'functions': {},\n 'operators': {}},\n 'cos': {'features': {'constants': 0, 'variables': 1},\n 'functions': {},\n 'operators': {}}},\n 'operator_conditionals': {'+': {'features': {'constants': 0, 'variables': 2},\n 'functions': {'sin': 1},\n 'operators': {'*': 1}},\n '*': {'features': {'constants': 2, 'variables': 0},\n 'functions': {'exp': 1, 'cos': 1},\n 'operators': {}}}}"
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from src.equation_tree import get_counts\n",
"\n",
"counts = get_counts(equation_list)\n",
"counts"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Note: We can directly use the obtained frequencies to sample new functions:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 6,
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/younesstrittmatter/Documents/GitHub/AutoRA/equation-tree/src/equation_tree/util/io.py:27: UserWarning: No hashed prior found. Sample frequencies may diverge from the prior. Consider burning this prior first.\n",
" warnings.warn(\n",
"Processing: 100%|██████████| 10/10 [00:00<00:00, 90.79iteration/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[c_1*cos(x_1), 2*x_1, 2*x_1, 2*x_1, 2*x_1, x_1 + sin(x_1), c_1*cos(x_1), c_1*cos(x_1), x_1 + sin(x_1), x_1 + sin(x_1)]\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"from equation_tree import sample\n",
"\n",
"# sample equations\n",
"equations = sample(10, frequencies)\n",
"print(equations)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 7,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'max_depth': {4: 0.6, 3: 0.4}, 'depth': {2: 0.6, 1: 0.4}, 'structures': {'[0, 1, 1, 2]': 0.6, '[0, 1, 1]': 0.4}, 'features': {'constants': 0.35, 'variables': 0.65}, 'functions': {'cos': 0.5, 'sin': 0.5}, 'operators': {'*': 0.7, '+': 0.3}, 'function_conditionals': {'cos': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {}, 'operators': {}}, 'sin': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {}, 'operators': {}}}, 'operator_conditionals': {'*': {'features': {'constants': 0.6363636363636364, 'variables': 0.36363636363636365}, 'functions': {'cos': 1.0}, 'operators': {}}, '+': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {'sin': 1.0}, 'operators': {}}}}\n"
]
}
],
"source": [
"# check the frequencies\n",
"print(get_frequencies(equations))"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
6 changes: 4 additions & 2 deletions docs/user-guide/additional-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ The package features an extensive list of additional features to make benchmarki
- Export To SR Bench
- ...

## Feature Extraction
## Analysing Equation Distributions

Given an equation, our package can extract features like number of constants, and variables, and various equation complexity measurements (For example, number of nodes and tree depth.)

For a list of equations, our package is capable to easily access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.
For a list of equations, our package is capable to access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.

A possible application of this feature can involve scraping equations from equations. Another package developed by the AutoRA group can be used to obtain a list of equations: [https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/](https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/)
3 changes: 3 additions & 0 deletions docs/user-guide/equation-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ While the underlying format of the Equation Tree is an incomplete binary tree, i
## Tree Representation
coming soon ...

### Tree Structure
coming soon ...

## String Representation
coming soon ...

Expand Down
130 changes: 129 additions & 1 deletion docs/user-guide/equation-sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,132 @@ Various features of the underlying equation distribution can be customized. For

In our sampling method, we distinct equation structure and equation content and sample both separately:
- First, in the *(1) Structure Sampling* step, we sample the structure of the underlying tree. Here, complexity is adjusted, and we can use prior information about structures.
- Second, in the *(2) Attribute Sampling* step, we sample the content of each tree node individually. Here, we can use prior information about the occurrence probabilities of specific operators and frequencies. This information can be conditioned on the parent nodes. For example, we can use prior information about the likelihood of + appearing in a sine function.
- Second, in the *(2) Attribute Sampling* step, we sample the content of each tree node individually. Here, we can use prior information about the occurrence probabilities of specific operators and frequencies. This information can be conditioned on the parent nodes. For example, we can use prior information about the likelihood of + appearing in a sine function.

## How To Use The Sampler

To use our sampler, import the functionality and call the sample function:
```python
from equation_tree import sample

equations = sample()
```
This will return a list of sampled equations. You can customize the number of equations and the dimension of the input via the keyword arguments `n` and `max_num_variables`. For example to sample 100 equations with a maximum of 3 input variables, write:
```python
equations = sample(n=100, max_num_variables=3)
```
The most versatile way to further customize the sampling is the use of a prior. You can pass this to the sampler as a dictionary with entries for a structures prior, features, functions and operators. Here, we give an example:
```python
prior = {
'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3]': .4},
'features': {'constants': .2, 'variables': .8},
'functions': {'sin': .5, 'cos': .5},
'operators': {'+': 1., '-': .0},
}

# To use the prior use the keyword argument `prior`
equations = sample(prior=prior)
```
You can also include conditionals. These influence the likelihood of a specific attribute being sampled given its parent node. For example, how likely does a - occur in a sine function:
```python
prior = {
'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3]': .4},
'features': {'constants': .2, 'variables': .8},
'functions': {'sin': .5, 'cos': .5},
'operators': {'+': 1., '-': .0},
'function_conditionals': {
'sin': {
'features': {'constants': 0., 'variables': 1.},
'functions': {'sin': 0., 'cos': 1.},
'operators': {'+': 0., '-': 1.}
},
'cos': {
'features': {'constants': 0., 'variables': 1.},
'functions': {'cos': 1., 'sin': 0.},
'operators': {'+': 0., '-': 1.}
}
},
'operator_conditionals': {
'+': {
'features': {'constants': .5, 'variables': .5},
'functions': {'sin': 1., 'cos': 0.},
'operators': {'+': 1., '-': 0.}
},
'-': {
'features': {'constants': .3, 'variables': .7},
'functions': {'cos': .5, 'sin': .5},
'operators': {'+': .9, '-': .1}
}
},
}
```

### Possible Attributes
Here, we present which attributes are supported natively.

*You can use custom attributes for operators and functions, but other functionality like distance metrics or the evaluation of equations might not work with custom attributes.*

#### Structures
Here, we use the structure notion highlighted in the [format](equation-formats.md#tree-structure).

The Equation Tree package provides convenience functions to obtain uniform structure priors from the tree depth or from the maximum number of nodes. To call them, you can use the keyword argument in the sample function:
```python
# Sample equations with only a specified tree depth
equations = sample(depth=...)

# Sample equations up to a specified depth
equatons = sample(max_depth=...)
```


#### Features
`constants`: the likelihood of a leaf being a constant. In the Equation Tree package, constants are represented as c followed by an index (`c_{}`). The sampler doesn't sample the same constant twice.

`variables`: the likelihood of a leaf being a variable. Variables are represented as a x followed by an index (`x_{}`). Variables are sampled with replacement.

*Attention*: A function will never have a constant as it's child, since a constant in a function can be simplified to a single constant

#### Functions
Functions are mathematical operations with only one input value. Our package supports the following natively. Please ues the exact notion.
- sin
- cos
- tan
- exp
- log
- sqrt
- abs

The following operators can be added, but are not in the default priors:
- acos
- arg
- asin
- sinh
- cosh
- tanh
- cot

*Additionally, you can use `squared` and `cubed` as keys, but this might not be fully supported in all functions of the equation sampler. For example, converting to sympy expressions might lead to unexpected results.*

#### Operators
Operators are mathematical operations with two input values. Our package supports the following natively.
- \+
- \-
- \*
- \/
- \**
- max
- min

#### Conditionals
- In the function conditionals each function can has it's own prior consisting of a feature, function, and operator prior.
- In the operator conditionals each operator can has it's own prior consisting of a feature, function, and operator prior.

#### Convenience
The Equation Tree package has a convenience function that allows to transform a space into a uniform prior:
```python
from equation_tree.prior import prior_from_space

# For example if you only want to include primitive operators
operator_prior = prior_from_space(["+", "-", "*", "/"])
```

3 changes: 3 additions & 0 deletions src/equation_tree/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
EquationTree,
instantiate_constants,
)
from equation_tree.analysis import get_frequencies, get_counts

__all__ = ["EquationTree",
"sample",
"burn",
"instantiate_constants",
"get_frequencies",
"get_counts"
]
Loading

0 comments on commit cb669f1

Please sign in to comment.