You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PandasData DataSource used by PolygonBacktesting stores all of the Data entities for all symbols accessed by the strategy. For strategies that work with hundreds of stocks that results in huge memory consumption which slows things down as virtual memory swapping kicks in and then finally an OOM error. For example: a 4 year backtest uses an average of 10mb per symbol (in a range from 1mb to 50mb) in in-memory storage as reported by Data.df.memory_usage().sum(). In backtesting a strategy that trades a few thousand different stocks the Python process gets to around 60gb before OOM less than 10% of the way through the 4 years.
A quick fix is to use an LRU cache for the pandas_data and _data_store dict in-memory caches.
A question I have is why are there these two, almost but not quite, identical dicts.
The text was updated successfully, but these errors were encountered:
The PandasData DataSource used by PolygonBacktesting stores all of the Data entities for all symbols accessed by the strategy. For strategies that work with hundreds of stocks that results in huge memory consumption which slows things down as virtual memory swapping kicks in and then finally an OOM error. For example: a 4 year backtest uses an average of 10mb per symbol (in a range from 1mb to 50mb) in in-memory storage as reported by Data.df.memory_usage().sum(). In backtesting a strategy that trades a few thousand different stocks the Python process gets to around 60gb before OOM less than 10% of the way through the 4 years.
A quick fix is to use an LRU cache for the pandas_data and _data_store dict in-memory caches.
A question I have is why are there these two, almost but not quite, identical dicts.
The text was updated successfully, but these errors were encountered: