Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Representation Refactor #70

Merged
merged 63 commits into from
Oct 10, 2023
Merged

Dataset Representation Refactor #70

merged 63 commits into from
Oct 10, 2023

Conversation

sslivkoff
Copy link
Member

@sslivkoff sslivkoff commented Oct 1, 2023

This PR updates the way that cryo defines datasets and performs orchestration, significantly simplifying each

Highlights

  1. Extraction logic is now entirely separate from transformation logic. Before, each dataset definition had to specify large, ugly, deeply-nested functions to perform extraction and transformation. Now, each dataset just defines very simple extract and transform functions, and makes no mention of tokio.
  2. Dataset-agnostic processes like orchestration, partitioning, and dataframe creation are also abstracted away from each dataset definition. These processes are nearly identical for each dataset, so they are now handled by dataset-agnostic functions and macros (store! and cryo_to_df::to_df).
  3. Instead of just partitioning datasets by blocks or by tx hashes, datasets can now be partitioned by any relevant dataset parameter including addresses, call_datas, or log topics. This is made possible and simple thanks to the changes in (1) and (2).

This cleans up the ugliest part of the cryo codebase. The code is now:

  • significantly shorter and simpler
  • easier to read, maintain, and test
  • more robust
  • more extensible for new features and new contributors

Code guide

  • Each dataset now defines its extraction and transformation logic using the CollectByBlock and/or CollectByTransaction traits.
  • Each dataset simply defines these parameters.
  • A new crate cryo_to_df defines a procedural macro that adds a to_df() function to each ColumnData.

@sslivkoff sslivkoff marked this pull request as ready for review October 10, 2023 04:51
@sslivkoff sslivkoff merged commit 2f6c829 into main Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant