Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway: separate metrics for Signed IPNS from DNSLink #409

Open
lidel opened this issue Jul 11, 2023 · 1 comment
Open

gateway: separate metrics for Signed IPNS from DNSLink #409

lidel opened this issue Jul 11, 2023 · 1 comment
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/hours Estimated to take one or several hours kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Issues related to HTTP Gateway

Comments

@lidel
Copy link
Member

lidel commented Jul 11, 2023

Extracted from ipfs/kubo#9927 (comment)

Currently, we have basic request counts and durations for gateway=ipfs and gateway=ipns namespaces in the form of boxo/gateway/metrics.go metrics:

# HELP ipfs_http_gw_get_duration_seconds The time to GET a successful response to a request (all content types).
# TYPE ipfs_http_gw_get_duration_seconds histogram
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="0.05"} 8
[..]
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="1920"} 11
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="+Inf"} 11
ipfs_http_gw_get_duration_seconds_sum{gateway="ipfs"} 1.185360469

Problem

  • /ipns supports both DNSLink and Signed IPNS records – we have no visibility what is the % of each
  • we measure success only, have no visibility into % of IPNS record failures vs DNSLink failures

Solution

Requirements

TBD, initial requirements

  • we need dedicated metric for each type of /ipns/ request
    • signed_ipns
    • dnslink
  • we need to be able to tell:
    • how many requests were sent by clients
    • how many requests were successful vs errored
    • how long success / error takes? (could be precomputed P50/P95)
    • we need to make sure this is visible in Thunderdome testing so we can catch regressions here during release phase

Open questions

  • do we have a separate metrics for success/failure, or do we have single one with success/error attribute?
  • do we do histogram with predefined duration buckets and implicit counter (like ipfs_http_gw_get_duration_seconds)?
  • or maybe, instead of picking arbitrary duration buckets (like we have in legacy metrics) we should have P50, P75, P95, P99 Objectives, like we do here?
@lidel lidel added P2 Medium: Good to have, but can wait until someone steps up dif/expert Extensive knowledge (implications, ramifications) required effort/hours Estimated to take one or several hours kind/enhancement A net-new feature or improvement to an existing feature topic/gateway Issues related to HTTP Gateway labels Jul 11, 2023
@BigLep
Copy link
Contributor

BigLep commented Aug 6, 2023

I added a requirement for Thunderdome testing visibly so we can catch regressions easier.
This was an action from the 0.22 retro: https://www.notion.so/pl-strflt/Kubo-0-22-Retro-d9800a96661b44a3ba5fa046926323cb?pvs=4#ad81265cf9ae4082805c8f566d54e243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/expert Extensive knowledge (implications, ramifications) required effort/hours Estimated to take one or several hours kind/enhancement A net-new feature or improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up topic/gateway Issues related to HTTP Gateway
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

2 participants