OOM when using PolygonBacktesting with many (100+) stocks #391

jimwhite · 2024-03-10T18:41:52Z

The PandasData DataSource used by PolygonBacktesting stores all of the Data entities for all symbols accessed by the strategy. For strategies that work with hundreds of stocks that results in huge memory consumption which slows things down as virtual memory swapping kicks in and then finally an OOM error. For example: a 4 year backtest uses an average of 10mb per symbol (in a range from 1mb to 50mb) in in-memory storage as reported by Data.df.memory_usage().sum(). In backtesting a strategy that trades a few thousand different stocks the Python process gets to around 60gb before OOM less than 10% of the way through the 4 years.

A quick fix is to use an LRU cache for the pandas_data and _data_store dict in-memory caches.

A question I have is why are there these two, almost but not quite, identical dicts.

jimwhite mentioned this issue Mar 10, 2024

Add LRU eviction with 1gb memory limit for PandasData #392

Merged

grzesir closed this as completed in #392 Mar 12, 2024

jimwhite mentioned this issue Apr 28, 2024

Fix get_historical_prices for PolygonDataBacktesting data source #430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when using PolygonBacktesting with many (100+) stocks #391

OOM when using PolygonBacktesting with many (100+) stocks #391

jimwhite commented Mar 10, 2024

OOM when using PolygonBacktesting with many (100+) stocks #391

OOM when using PolygonBacktesting with many (100+) stocks #391

Comments

jimwhite commented Mar 10, 2024