Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 038: State Listening #8012

Merged
merged 10 commits into from
Feb 5, 2021
Prev Previous commit
Next Next commit
auxiliary streaming/queue service
  • Loading branch information
i-norden committed Feb 4, 2021
commit 94535a50e43b32986f2cb9205a1be599b860a45a
9 changes: 8 additions & 1 deletion docs/architecture/adr-038-state-listening.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,11 +348,18 @@ Writing to a file is the simplest approach for streaming the data out to consume
This approach also provide the advantages of being persistent and durable, and the files can be read directly,
or an auxiliary streaming services can read from the files and serve the data over a remote interface.

#### Auxiliary streaming service

We will create a separate standalone process that reads and internally queues the state as it is written out to these files
and serves the data over a gRPC API. This API will allow filtering of requested data, e.g. by block number, block/tx hash, ABCI message type,
i-norden marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for clients or caching to a database? Feels like the cache process should just read the files and talk to the DB and skip grpc. Or we should use an actual message queue instead of files + grpc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the case of downstream IPLD-ization in Postgres we would just consume the files directly as we know we want to consume everything- as quickly as possible- and will run that process in the same environment with direct access to the files. But if we want to be more selective about what we consume and consume it remotely than that's where this auxiliary streaming service comes into play, as I see it. @robert-zaremba may be able to back me up here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To actually answer your question: this would be useful primarily for clients but also potentially database caching in those cases where the caching wants to leverage this more selective interface and/or perform the caching remotely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's for clients - eg data warehousing / ETL. We were discussing queues but didn't make a decision. Such a service will only need to expose a network port, without exposing filesystem.

whether a DeliverTx message failed or succeeded, etc.

#### File pruning

Without pruning the number of files can grow indefinitely, this will need to be managed by
Without pruning the number of files can grow indefinitely, this may need to be managed by
the developer in an application or even module-specific manner (e.g. log rotation).
i-norden marked this conversation as resolved.
Show resolved Hide resolved
The file naming schema facilitates pruning by block number and/or ABCI message.
The gRPC auxiliary streaming service introduced above will include an option to remove the files as it consumes their data.

### Configuration

Expand Down