Any interest in a multi-dataset backtesting wrapper? #508

mikelovesrobots · 2021-10-19T08:26:01Z

I wanted to make a strategy that would work well against LOTS of cryptocurrencies, with the idea being that maybe it wouldn't be as overfit as my usual optimization runs. And it turned out to not actually be that hard and I wondered if this was something that if it were cleaned up and tested, you'd like me to open a PR for for inclusion into master.

Library Code (you might want to skip ahead to the example)

from backtesting import Backtest
from tqdm.auto import tqdm as _tqdm
import pandas as pd

class KrakenDataset:
  name = None
  data = None
  backtest = None

  def __init__(self, name, data):
    self.name = name
    self.data = data


class KrakenBacktest:
    datasets = []

    def __init__(self, datasets, strategy, **kwargs):
      for dataset in datasets:
        dataset.backtest = Backtest(
            dataset.data,
            strategy=strategy,
            **kwargs
        )
      self.datasets = datasets    

    def run(self):
        results = [dataset.backtest.run() for dataset in self.datasets]
        
        dataframe_results = pd.DataFrame(results).transpose()
        dataframe_results.columns = [dataset.name for dataset in self.datasets]

        return dataframe_results

    def optimize(self, **kwargs):
        optimize_args = {
            "return_heatmap": True,
            **kwargs
        }
        heatmaps = []

        for dataset in _tqdm(self.datasets, desc="KrakenBacktest.optimize"):
            _best_stats, heatmap = dataset.backtest.optimize(**optimize_args)
            heatmaps.append(heatmap)

        return pd.DataFrame(heatmaps)

Example

Let's define a simple strategy:

from backtesting import Backtest, Strategy
import pandas as pd
import ta

def SimpleSMA(values, n=12):
    """
    Return simple moving average of `values`, at
    each step taking into account `n` previous values.
    """
    return ta.trend.sma_indicator(values.s, n, True)

def SimpleSMH(values, n):
    """
    Return max of `values`,
    each step taking into account `n` previous values.
    """
    return pd.Series(values).rolling(n).max()

class BeatingPreviousHighs(Strategy):
  n_ma_window = 36
  n_previous_highs_window = 5

  def init(self):
    self.ma = self.I(SimpleSMA, self.data.Close, self.n_ma_window, overlay=True)
    self.previous_highs = self.I(SimpleSMH, self.data.Close, self.n_previous_highs_window, overlay=True)

  def next(self):
    if not self.position and self.ma > self.previous_highs:
      self.buy()
    elif self.position and self.ma <= self.previous_highs:
      self.position.close()

And let's fetch a whole lot of alt coin data. I kept the frames to a really short period of time just so it'd run fast, but in production I'd probably want to stretch these data windows to as much data as I could possibly get.

# didn't include the source for fetch_data(), but it's fetching ohlcv pandas dataframes from my broker
ada_data = fetch_data('ADA-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
xlm_data = fetch_data('XLM-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
eth_data = fetch_data('ETH-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
atom_data = fetch_data('ATOM-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
matic_data = fetch_data('MATIC-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
doge_data = fetch_data('DOGE-USDT', '5min', '1 Sept 2021', '5 Sept 2021')
shib_data = fetch_data('SHIB-USDT', '5min', '1 Sept 2021', '5 Sept 2021')

Let's define our multi-backtest:

datasets = [
    KrakenDataset('ADA-USDT', ada_data),
    KrakenDataset('XLM-USDT', xlm_data),
    KrakenDataset('ETH-USDT', eth_data),
    KrakenDataset('ATOM-USDT', atom_data),
    KrakenDataset('MATIC-USDT', matic_data),
    KrakenDataset('DOGE-USDT', doge_data),
    KrakenDataset('SHIB-USDT', shib_data),
]

kraken_backtest = KrakenBacktest(
    datasets, 
    strategy=BeatingPreviousHighs,
    cash=100000,
    commission=.001,
    exclusive_orders=True
)

kraken_backtest.run()

Which spits out our familiar stats, only with a column per dataset which is pretty cool:

Now let's optimize for the best n_ma_window and n_previous_highs_window params:

multi_heatmap = kraken_backtest.optimize(
  n_ma_window=range(3,41),
  n_previous_highs_window=range(3,13),
  maximize='Equity Final [$]',
)

from backtesting.lib import plot_heatmaps
plot_heatmaps(multi_heatmap.quantile(0.25), agg='mean')

Aside: you'll notice an interesting little bit in there multi_heatmap.quantile(0.25) and that's how I'm smashing the multiple heatmaps down into one heatmap. You could swap in all sorts of different metrics like .mean() (for average results) or .min() (worst results) or .max() (best results). I found that the bottom 25th percentile was interestingly pessimistic and interpreted that as meaning I want a score that 3/4s of the currencies I tested did better than.

Anyway, here's our 25th percentile graph.

Hovering around a little, it looks like 31, 5 is a good combo. Reasonably pessimistically, I could hope for +1.6% or better returns using those parameters.

Anyway, let me know if you'd like me to open a PR for it. We could call it MultiBacktest or something. It doesn't need to be quite as fanciful a name.

The text was updated successfully, but these errors were encountered:

shaunpatterson · 2021-11-13T19:18:23Z

Definitely think this should be included

shaunpatterson · 2021-11-16T13:23:30Z

I solved this a slightly different way

class MultiBacktest(Backtest):
  datasets = []

  def __init__(self, datasets, strategy, **kwargs):
    for dataset in datasets:
      dataset.backtest = Backtest(
        dataset.data,
        strategy=strategy,
        **kwargs
      )
    self.datasets = datasets

  def run(self, *args, **kwargs):
    results = [dataset.backtest.run(*args, **kwargs) for dataset in self.datasets]
    aggregate = pd.DataFrame(results).mean()
    aggregate['_strategy'] = results[0]['_strategy']         # Save the strategy used for this round... mean() blows it away
    return aggregate

  def optimize(self, **kwargs):
    optimize_args = {
      "return_heatmap": True,
      **kwargs
    }
    return super().optimize(**optimize_args)

This takes the mean of the results across the backtests and returns the best.

zha0yangchen · 2022-01-24T08:55:33Z

Definitely think this should be included

reisenmachtfreude · 2022-06-15T23:16:47Z

Thanks for sharing this. I had the same questions in my mind.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any interest in a multi-dataset backtesting wrapper? #508

Any interest in a multi-dataset backtesting wrapper? #508

mikelovesrobots commented Oct 19, 2021

shaunpatterson commented Nov 13, 2021

shaunpatterson commented Nov 16, 2021 •

edited by kernc

Loading

zha0yangchen commented Jan 24, 2022

reisenmachtfreude commented Jun 15, 2022

Any interest in a multi-dataset backtesting wrapper? #508

Any interest in a multi-dataset backtesting wrapper? #508

Comments

mikelovesrobots commented Oct 19, 2021

Library Code (you might want to skip ahead to the example)

Example

shaunpatterson commented Nov 13, 2021

shaunpatterson commented Nov 16, 2021 • edited by kernc Loading

zha0yangchen commented Jan 24, 2022

reisenmachtfreude commented Jun 15, 2022

shaunpatterson commented Nov 16, 2021 •

edited by kernc

Loading