Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when using PolygonBacktesting with many (100+) stocks #391

Closed
jimwhite opened this issue Mar 10, 2024 · 0 comments · Fixed by #392
Closed

OOM when using PolygonBacktesting with many (100+) stocks #391

jimwhite opened this issue Mar 10, 2024 · 0 comments · Fixed by #392

Comments

@jimwhite
Copy link
Collaborator

The PandasData DataSource used by PolygonBacktesting stores all of the Data entities for all symbols accessed by the strategy. For strategies that work with hundreds of stocks that results in huge memory consumption which slows things down as virtual memory swapping kicks in and then finally an OOM error. For example: a 4 year backtest uses an average of 10mb per symbol (in a range from 1mb to 50mb) in in-memory storage as reported by Data.df.memory_usage().sum(). In backtesting a strategy that trades a few thousand different stocks the Python process gets to around 60gb before OOM less than 10% of the way through the 4 years.

A quick fix is to use an LRU cache for the pandas_data and _data_store dict in-memory caches.

A question I have is why are there these two, almost but not quite, identical dicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant