docs: add tutorial on obtaining information about distributiosn

AutoResearch · Nov 10, 2023 · cb669f1 · cb669f1
1 parent 58f5bd6
commit cb669f1
Show file tree

Hide file tree

Showing 10 changed files with 390 additions and 11 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -5,7 +5,7 @@ The Equation Tree package is an equation toolbox with symbolic regression in min
 - [**Equation Sampling**](user-guide/equation-sampling.md)
 - Calculating [Distance Metrics](user-guide/distance-metrics.md) between equations
 
-It also encompasses a variety of [additional features](user-guide/additional-features.md). For example, to obtain information about existing equation list that can, in turn, be used in our sampling method.
+It also encompasses a variety of [additional features](user-guide/additional-features.md) including the capability to analyse distribution parameters for a given set of equations. For example, to obtain information about a specific field one can use the [equation scraper](https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/) to scrape equations from wikipedia and then use the sampler to generate equations that resemble a equations of a scientific field. 
 
 ## Relevant Publication
 

diff --git a/docs/tutorials/Analysing Equation Distribution.ipynb b/docs/tutorials/Analysing Equation Distribution.ipynb
@@ -0,0 +1,212 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Analysing Distributions\n",
+    "\n",
+    "The equation tree can be used to extract information from existing distributions of equations (e.g., for example by scraping priors: https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/)\n",
+    "\n",
+    "## Installation"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "!pip install equation-tree"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Equation Database\n",
+    "\n",
+    "Here, we use a list of sympy equation to demonstrate the functionality"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "outputs": [],
+   "source": [
+    "# import functionality from sympy\n",
+    "from sympy import sympify\n",
+    "\n",
+    "eq_1 = sympify('x_1 + x_2')\n",
+    "eq_2 = sympify('exp(x_1) * 2.5')\n",
+    "eq_3 = sympify('sin(x_1) + 2 * cos(x_2)')\n",
+    "equation_list = [eq_1, eq_2, eq_3]"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Analyse the List\n",
+    "\n",
+    "We can obtain informations about equations and lists of equations:"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "{'max_depth': {3: 0.3333333333333333,\n  4: 0.3333333333333333,\n  7: 0.3333333333333333},\n 'depth': {1: 0.3333333333333333,\n  2: 0.3333333333333333,\n  3: 0.3333333333333333},\n 'structures': {'[0, 1, 1]': 0.3333333333333333,\n  '[0, 1, 1, 2]': 0.3333333333333333,\n  '[0, 1, 2, 1, 2, 2, 3]': 0.3333333333333333},\n 'features': {'constants': 0.2857142857142857,\n  'variables': 0.7142857142857143},\n 'functions': {'exp': 0.3333333333333333,\n  'sin': 0.3333333333333333,\n  'cos': 0.3333333333333333},\n 'operators': {'+': 0.5, '*': 0.5},\n 'function_conditionals': {'exp': {'features': {'constants': 0.0,\n    'variables': 1.0},\n   'functions': {},\n   'operators': {}},\n  'sin': {'features': {'constants': 0.0, 'variables': 1.0},\n   'functions': {},\n   'operators': {}},\n  'cos': {'features': {'constants': 0.0, 'variables': 1.0},\n   'functions': {},\n   'operators': {}}},\n 'operator_conditionals': {'+': {'features': {'constants': 0.0,\n    'variables': 1.0},\n   'functions': {'sin': 1.0},\n   'operators': {'*': 1.0}},\n  '*': {'features': {'constants': 1.0, 'variables': 0.0},\n   'functions': {'exp': 0.5, 'cos': 0.5},\n   'operators': {}}}}"
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from equation_tree import get_frequencies\n",
+    "\n",
+    "# Show the frequencies of\n",
+    "frequencies = get_frequencies(equation_list)\n",
+    "frequencies"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Instead of frequencies, we can also obtain absolute values:"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "{'max_depth': {3: 1, 4: 1, 7: 1},\n 'depth': {1: 1, 2: 1, 3: 1},\n 'structures': {'[0, 1, 1]': 1, '[0, 1, 1, 2]': 1, '[0, 1, 2, 1, 2, 2, 3]': 1},\n 'features': {'constants': 2, 'variables': 5},\n 'functions': {'exp': 1, 'sin': 1, 'cos': 1},\n 'operators': {'+': 2, '*': 2},\n 'function_conditionals': {'exp': {'features': {'constants': 0,\n    'variables': 1},\n   'functions': {},\n   'operators': {}},\n  'sin': {'features': {'constants': 0, 'variables': 1},\n   'functions': {},\n   'operators': {}},\n  'cos': {'features': {'constants': 0, 'variables': 1},\n   'functions': {},\n   'operators': {}}},\n 'operator_conditionals': {'+': {'features': {'constants': 0, 'variables': 2},\n   'functions': {'sin': 1},\n   'operators': {'*': 1}},\n  '*': {'features': {'constants': 2, 'variables': 0},\n   'functions': {'exp': 1, 'cos': 1},\n   'operators': {}}}}"
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from src.equation_tree import get_counts\n",
+    "\n",
+    "counts  = get_counts(equation_list)\n",
+    "counts"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Note: We can directly use the obtained frequencies to sample new functions:"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/younesstrittmatter/Documents/GitHub/AutoRA/equation-tree/src/equation_tree/util/io.py:27: UserWarning: No hashed prior found. Sample frequencies may diverge from the prior. Consider burning this prior first.\n",
+      "  warnings.warn(\n",
+      "Processing: 100%|██████████| 10/10 [00:00<00:00, 90.79iteration/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[c_1*cos(x_1), 2*x_1, 2*x_1, 2*x_1, 2*x_1, x_1 + sin(x_1), c_1*cos(x_1), c_1*cos(x_1), x_1 + sin(x_1), x_1 + sin(x_1)]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from equation_tree import sample\n",
+    "\n",
+    "# sample equations\n",
+    "equations = sample(10, frequencies)\n",
+    "print(equations)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'max_depth': {4: 0.6, 3: 0.4}, 'depth': {2: 0.6, 1: 0.4}, 'structures': {'[0, 1, 1, 2]': 0.6, '[0, 1, 1]': 0.4}, 'features': {'constants': 0.35, 'variables': 0.65}, 'functions': {'cos': 0.5, 'sin': 0.5}, 'operators': {'*': 0.7, '+': 0.3}, 'function_conditionals': {'cos': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {}, 'operators': {}}, 'sin': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {}, 'operators': {}}}, 'operator_conditionals': {'*': {'features': {'constants': 0.6363636363636364, 'variables': 0.36363636363636365}, 'functions': {'cos': 1.0}, 'operators': {}}, '+': {'features': {'constants': 0.0, 'variables': 1.0}, 'functions': {'sin': 1.0}, 'operators': {}}}}\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check the frequencies\n",
+    "print(get_frequencies(equations))"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/docs/user-guide/additional-features.md b/docs/user-guide/additional-features.md
@@ -6,8 +6,10 @@ The package features an extensive list of additional features to make benchmarki
 - Export To SR Bench
 - ...
 
-## Feature Extraction
+## Analysing Equation Distributions
 
 Given an equation, our package can extract features like number of constants, and variables, and various equation complexity measurements (For example, number of nodes and tree depth.)
 
-For a list of equations, our package is capable to easily access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.
+For a list of equations, our package is capable to  access frequencies for operators, functions, features, and structures. These frequencies can in turn be used to sample new equations that mimic the original list in these aspects.
+
+A possible application of this feature can involve scraping equations from equations. Another package developed by the AutoRA group can be used to obtain a list of equations: [https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/](https://autoresearch.github.io/equation-scraper/tutorials/equation_scraper_tutorial/)
diff --git a/docs/user-guide/equation-formats.md b/docs/user-guide/equation-formats.md
@@ -9,6 +9,9 @@ While the underlying format of the Equation Tree is an incomplete binary tree, i
 ## Tree Representation
 coming soon ...
 
+### Tree Structure
+coming soon ...
+
 ## String Representation
 coming soon ...
 

diff --git a/docs/user-guide/equation-sampling.md b/docs/user-guide/equation-sampling.md
@@ -14,4 +14,132 @@ Various features of the underlying equation distribution can be customized. For
 
 In our sampling method, we distinct equation structure and equation content and sample both separately:
 - First, in the *(1) Structure Sampling* step, we sample the structure of the underlying tree. Here, complexity is adjusted, and we can use prior information about structures.
-- Second, in the *(2) Attribute Sampling* step, we sample the content of each tree node individually. Here, we can use prior information about the occurrence probabilities of specific operators and frequencies. This information can be conditioned on the parent nodes. For example, we can use prior information about the likelihood of + appearing in a sine function.
+- Second, in the *(2) Attribute Sampling* step, we sample the content of each tree node individually. Here, we can use prior information about the occurrence probabilities of specific operators and frequencies. This information can be conditioned on the parent nodes. For example, we can use prior information about the likelihood of + appearing in a sine function.
+
+## How To Use The Sampler
+
+To use our sampler, import the functionality and call the sample function: 
+```python
+from equation_tree import sample
+
+equations = sample()
+```
+This will return a list of sampled equations. You can customize the number of equations and the dimension of the input via the keyword arguments `n` and `max_num_variables`. For example to sample 100 equations with a maximum of 3 input variables, write:
+```python
+equations = sample(n=100, max_num_variables=3)
+```
+The most versatile way to further customize the sampling is the use of a prior. You can pass this to the sampler as a dictionary with entries for a structures prior, features, functions and operators. Here, we give an example:
+```python
+prior = {
+    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3]': .4},
+    'features': {'constants': .2, 'variables': .8},
+    'functions': {'sin': .5, 'cos': .5},
+    'operators': {'+': 1., '-': .0},
+}
+
+# To use the prior use the keyword argument `prior`
+equations = sample(prior=prior)
+```
+You can also include conditionals. These influence the likelihood of a specific attribute being sampled given its parent node. For example, how likely does a - occur in a sine function:
+```python
+prior = {
+    'structures': {'[0, 1, 1]': .3, '[0, 1, 2]': .3, '[0, 1, 2, 3]': .4},
+    'features': {'constants': .2, 'variables': .8},
+    'functions': {'sin': .5, 'cos': .5},
+    'operators': {'+': 1., '-': .0},
+        'function_conditionals': {
+            'sin': {
+                'features': {'constants': 0., 'variables': 1.},
+                'functions': {'sin': 0., 'cos': 1.},
+                'operators': {'+': 0., '-': 1.}
+            },
+            'cos': {
+                'features': {'constants': 0., 'variables': 1.},
+                'functions': {'cos': 1., 'sin': 0.},
+                'operators': {'+': 0., '-': 1.}
+            }
+    },
+        'operator_conditionals': {
+            '+': {
+                'features': {'constants': .5, 'variables': .5},
+                'functions': {'sin': 1., 'cos': 0.},
+                'operators': {'+': 1., '-': 0.}
+            },
+            '-': {
+                'features': {'constants': .3, 'variables': .7},
+                'functions': {'cos': .5, 'sin': .5},
+                'operators': {'+': .9, '-': .1}
+            }
+    },
+}
+```
+
+### Possible Attributes
+Here, we present which attributes are supported natively. 
+
+*You can use custom attributes for operators and functions, but other functionality like distance metrics or the evaluation of equations might not work with custom attributes.*
+
+#### Structures
+Here, we use the structure notion highlighted in the [format](equation-formats.md#tree-structure). 
+
+The Equation Tree package provides convenience functions to obtain uniform structure priors from the tree depth or from the maximum number of nodes. To call them, you can use the keyword argument in the sample function: 
+```python
+# Sample equations with only a specified tree depth
+equations = sample(depth=...)
+
+# Sample equations up to a specified depth
+equatons = sample(max_depth=...)
+```
+
+
+#### Features
+`constants`: the likelihood of a leaf being a constant. In the Equation Tree package, constants are represented as c followed by an index (`c_{}`). The sampler doesn't sample the same constant twice. 
+
+`variables`: the likelihood of a leaf being a variable. Variables are represented as a x followed by an index (`x_{}`). Variables are sampled with replacement. 
+
+*Attention*: A function will never have a constant as it's child, since a constant in a function can be simplified to a single constant
+
+#### Functions
+Functions are mathematical operations with only one input value. Our package supports the following natively. Please ues the exact notion.
+- sin
+- cos
+- tan
+- exp
+- log
+- sqrt
+- abs
+
+The following operators can be added, but are not in the default priors:
+- acos
+- arg
+- asin
+- sinh
+- cosh
+- tanh
+- cot
+
+*Additionally, you can use `squared` and `cubed` as keys, but this might not be fully supported in all functions of the equation sampler. For example, converting to sympy expressions might lead to unexpected results.*
+
+#### Operators
+Operators are mathematical operations with two input values. Our package supports the following natively.
+- \+
+- \-
+- \*
+- \/
+- \**
+- max
+- min
+
+#### Conditionals
+- In the function conditionals each function can has it's own prior consisting of a feature, function, and operator prior.
+- In the operator conditionals each operator can has it's own prior consisting of a feature, function, and operator prior.
+
+#### Convenience
+The Equation Tree package has a convenience function that allows to transform a space into a uniform prior:
+```python
+from equation_tree.prior import prior_from_space
+
+# For example if you only want to include primitive operators
+operator_prior = prior_from_space(["+", "-", "*", "/"])
+```
+
diff --git a/src/equation_tree/__init__.py b/src/equation_tree/__init__.py
@@ -3,9 +3,12 @@
     EquationTree,
     instantiate_constants,
 )
+from equation_tree.analysis import get_frequencies, get_counts
 
 __all__ = ["EquationTree",
            "sample",
            "burn",
            "instantiate_constants",
+           "get_frequencies",
+           "get_counts"
            ]