From 1125eb4ded59102436d7a160cc40df4431a6376d Mon Sep 17 00:00:00 2001 From: Chris Chinchilla Date: Fri, 22 Jan 2021 12:07:03 +0100 Subject: [PATCH] [DOCS] Update to cluster docs (#3084) * Some cluster section overhaul Signed-off-by: ChrisChinchilla * Add minikube note Signed-off-by: ChrisChinchilla * Add query config Signed-off-by: ChrisChinchilla * inalise changes Signed-off-by: ChrisChinchilla * Fix broken links Signed-off-by: ChrisChinchilla --- site/content/cluster/binaries_cluster.md | 56 +--- site/content/cluster/kubernetes_cluster.md | 288 +++++++++++++++++- site/content/includes/cluster-architecture.md | 31 ++ site/content/includes/cluster-common-steps.md | 29 ++ .../includes/m3query/annotated_config.yaml | 246 +++++++++++++++ site/content/operator/api.md | 39 +-- 6 files changed, 606 insertions(+), 83 deletions(-) create mode 100644 site/content/includes/cluster-architecture.md create mode 100644 site/content/includes/m3query/annotated_config.yaml diff --git a/site/content/cluster/binaries_cluster.md b/site/content/cluster/binaries_cluster.md index 5c2ac410b1..961a793c1c 100644 --- a/site/content/cluster/binaries_cluster.md +++ b/site/content/cluster/binaries_cluster.md @@ -10,30 +10,7 @@ This guide shows you the steps involved in creating an M3 cluster using M3 binar This guide assumes you have read the [quickstart](/docs/quickstart/binaries), and builds upon the concepts in that guide. {{% /notice %}} -## M3 Architecture - -Here's a typical M3 deployment: - - - -![Typical Deployment](/cluster_architecture.png) - -An M3 deployment typically has two main node types: - -- **Coordinator node**: `m3coordinator` nodes coordinate reads and writes across all nodes in the cluster. It's a lightweight process, and does not store any data. This role typically runs alongside a Prometheus instance, or is part of a collector agent such as statsD. -- **Storage node**: The `m3dbnode` processes are the workhorses of M3, they store data and serve reads and writes. - -A `m3coordinator` node exposes two ports: - -- `7201` to manage the cluster topology, you make most API calls to this endpoint -- `7203` for Prometheus to scrape the metrics produced by M3DB and M3Coordinator - -## Prerequisites - -M3 uses [etcd](https://etcd.io/) as a distributed key-value storage for the following functions: - -- Update cluster configuration in realtime -- Manage placements for distributed and sharded clusters +{{< fileinclude file="cluster-architecture.md" >}} ## Download and Install a Binary @@ -52,8 +29,6 @@ You can download the latest release as [pre-compiled binaries from the M3 GitHub ## Provision a Host -Enough background, let's create a real cluster! - M3 in production can run on local or cloud-based VMs, or bare-metal servers. M3 supports all popular Linux distributions (Ubuntu, RHEL, CentOS), and [let us know](https://github.com/m3db/m3/issues/new/choose) if you have any issues with your preferred distribution. ### Network @@ -236,35 +211,6 @@ curl -X POST {{% apiendpoint %}}database/create -d '{ If you need to setup multiple namespaces, you can run the command above multiple times with different namespace configurations. -### Ready a Namespace - -Once a namespace has finished bootstrapping, you must mark it as ready before receiving traffic by using the _{{% apiendpoint %}}namespace/ready_. - -{{< tabs name="ready_namespaces" >}} -{{% tab name="Command" %}} - -{{< codeinclude file="docs/includes/quickstart/ready-namespace.sh" language="shell" >}} - -{{% /tab %}} -{{% tab name="Output" %}} - -```json -{ - "ready": true -} -``` - -{{% /tab %}} -{{< /tabs >}} - -### Replication factor - -We recommend a replication factor of **3**, with each replica spread across failure domains such as a physical server rack, data center or availability zone. Read our [replication factor recommendations](/docs/operational_guide/replication_and_deployment_in_zones) for more details. - -### Shards - -Read the [placement configuration guide](/docs/operational_guide/placement_configuration) to determine the appropriate number of shards to specify. - {{< fileinclude file="cluster-common-steps.md" >}} + + +## Organizing Data with Placements and Namespaces + +A time series database (TSDBs) typically consist of one node (or instance) to store metrics data. This setup is simple to use but has issues with scalability over time as the quantity of metrics data written and read increases. + +As a distributed TSDB, M3 helps solve this problem by spreading metrics data, and demand for that data, across multiple nodes in a cluster. M3 does this by splitting data into segments that match certain criteria (such as above a certain value) across nodes into shards. + + + +If you've worked with a distributed database before, then these concepts are probably familiar to you, but M3 uses different terminology to represent some concepts. + +- Every cluster has **one** placement that maps shards to nodes in the cluster. +- A cluster can have **0 or more** namespaces that are similar conceptually to tables in other databases, and each node serves every namespace for the shards it owns. + + + +For example, if the cluster placement states that node A owns shards 1, 2, and 3, then node A owns shards 1, 2, 3 for all configured namespaces in the cluster. Each namespace has its own configuration options, including a name and retention time for the data. + +## Create a Placement and Namespace + +This quickstart uses the _{{% apiendpoint %}}database/create_ endpoint that creates a namespace, and the placement if it doesn't already exist based on the `type` argument. + +You can create [placements](/docs/operational_guide/placement_configuration/) and [namespaces](/docs/operational_guide/namespace_configuration/#advanced-hard-way) separately if you need more control over their settings. + +In another terminal, use the following command. + +{{< tabs name="create_placement_namespace" >}} +{{< tab name="Command" >}} + +{{< codeinclude file="docs/includes/create-database.sh" language="shell" >}} + +{{< /tab >}} +{{% tab name="Output" %}} + +```json +{ + "namespace": { + "registry": { + "namespaces": { + "default": { + "bootstrapEnabled": true, + "flushEnabled": true, + "writesToCommitLog": true, + "cleanupEnabled": true, + "repairEnabled": false, + "retentionOptions": { + "retentionPeriodNanos": "43200000000000", + "blockSizeNanos": "1800000000000", + "bufferFutureNanos": "120000000000", + "bufferPastNanos": "600000000000", + "blockDataExpiry": true, + "blockDataExpiryAfterNotAccessPeriodNanos": "300000000000", + "futureRetentionPeriodNanos": "0" + }, + "snapshotEnabled": true, + "indexOptions": { + "enabled": true, + "blockSizeNanos": "1800000000000" + }, + "schemaOptions": null, + "coldWritesEnabled": false, + "runtimeOptions": null + } + } + } + }, + "placement": { + "placement": { + "instances": { + "m3db_local": { + "id": "m3db_local", + "isolationGroup": "local", + "zone": "embedded", + "weight": 1, + "endpoint": "127.0.0.1:9000", + "shards": [ + { + "id": 0, + "state": "INITIALIZING", + "sourceId": "", + "cutoverNanos": "0", + "cutoffNanos": "0" + }, + … + { + "id": 63, + "state": "INITIALIZING", + "sourceId": "", + "cutoverNanos": "0", + "cutoffNanos": "0" + } + ], + "shardSetId": 0, + "hostname": "localhost", + "port": 9000, + "metadata": { + "debugPort": 0 + } + } + }, + "replicaFactor": 1, + "numShards": 64, + "isSharded": true, + "cutoverTime": "0", + "isMirrored": false, + "maxShardSetId": 0 + }, + "version": 0 + } +} +``` + +{{% /tab %}} +{{< /tabs >}} + +Placement initialization can take a minute or two. Once all the shards have the `AVAILABLE` state, the node has finished bootstrapping, and you should see the following messages in the node console output. + + + +```shell +{"level":"info","ts":1598367624.0117292,"msg":"bootstrap marking all shards as bootstrapped","namespace":"default","namespace":"default","numShards":64} +{"level":"info","ts":1598367624.0301404,"msg":"bootstrap index with bootstrapped index segments","namespace":"default","numIndexBlocks":0} +{"level":"info","ts":1598367624.0301914,"msg":"bootstrap success","numShards":64,"bootstrapDuration":0.049208827} +{"level":"info","ts":1598367624.03023,"msg":"bootstrapped"} +``` + +You can check on the status by calling the _{{% apiendpoint %}}services/m3db/placement_ endpoint: + +{{< tabs name="check_placement" >}} +{{% tab name="Command" %}} + +```shell +curl {{% apiendpoint %}}services/m3db/placement | jq . +``` + +{{% /tab %}} +{{% tab name="Output" %}} + +```json +{ + "placement": { + "instances": { + "m3db_local": { + "id": "m3db_local", + "isolationGroup": "local", + "zone": "embedded", + "weight": 1, + "endpoint": "127.0.0.1:9000", + "shards": [ + { + "id": 0, + "state": "AVAILABLE", + "sourceId": "", + "cutoverNanos": "0", + "cutoffNanos": "0" + }, + … + { + "id": 63, + "state": "AVAILABLE", + "sourceId": "", + "cutoverNanos": "0", + "cutoffNanos": "0" + } + ], + "shardSetId": 0, + "hostname": "localhost", + "port": 9000, + "metadata": { + "debugPort": 0 + } + } + }, + "replicaFactor": 1, + "numShards": 64, + "isSharded": true, + "cutoverTime": "0", + "isMirrored": false, + "maxShardSetId": 0 + }, + "version": 2 +} +``` + +{{% /tab %}} +{{< /tabs >}} + +{{% notice tip %}} +[Read more about the bootstrapping process](/docs/operational_guide/bootstrapping_crash_recovery/). +{{% /notice %}} + +### Ready a Namespace + +Once a namespace has finished bootstrapping, you must mark it as ready before receiving traffic by using the _{{% apiendpoint %}}services/m3db/namespace/ready_. + +{{< tabs name="ready_namespaces" >}} +{{% tab name="Command" %}} + +{{% codeinclude file="docs/includes/quickstart/ready-namespace.sh" language="shell" %}} + +{{% /tab %}} +{{% tab name="Output" %}} + +```json +{ +"ready": true +} +``` + +{{% /tab %}} +{{< /tabs >}} + +### View Details of a Namespace + +You can also view the attributes of all namespaces by calling the _{{% apiendpoint %}}services/m3db/namespace_ endpoint + +{{< tabs name="check_namespaces" >}} +{{% tab name="Command" %}} + +```shell +curl {{% apiendpoint %}}services/m3db/namespace | jq . +``` + +{{% notice tip %}} +Add `?debug=1` to the request to convert nano units in the output into standard units. +{{% /notice %}} + +{{% /tab %}} +{{% tab name="Output" %}} + +```json +{ + "registry": { + "namespaces": { + "default": { + "bootstrapEnabled": true, + "flushEnabled": true, + "writesToCommitLog": true, + "cleanupEnabled": true, + "repairEnabled": false, + "retentionOptions": { + "retentionPeriodNanos": "43200000000000", + "blockSizeNanos": "1800000000000", + "bufferFutureNanos": "120000000000", + "bufferPastNanos": "600000000000", + "blockDataExpiry": true, + "blockDataExpiryAfterNotAccessPeriodNanos": "300000000000", + "futureRetentionPeriodNanos": "0" + }, + "snapshotEnabled": true, + "indexOptions": { + "enabled": true, + "blockSizeNanos": "1800000000000" + }, + "schemaOptions": null, + "coldWritesEnabled": false, + "runtimeOptions": null + } + } + } +} +``` + +{{% /tab %}} +{{< /tabs >}} + {{< fileinclude file="cluster-common-steps.md" >}} \ No newline at end of file diff --git a/site/content/includes/cluster-architecture.md b/site/content/includes/cluster-architecture.md new file mode 100644 index 0000000000..d715c2743d --- /dev/null +++ b/site/content/includes/cluster-architecture.md @@ -0,0 +1,31 @@ +## M3 Architecture + + + +![Typical Deployment](/cluster_architecture.png) + +### Node types + +An M3 deployment typically has two main node types: + +- **[Storage nodes](/docs/m3db)** (`m3dbnode`) are the workhorses of M3, they store data and serve reads and writes. +- **[Coordinator nodes](/docs/m3coordinator)** (`m3coordinator`) coordinate reads and writes across all nodes in the cluster. It's a lightweight process, and does not store any data. This role typically runs alongside a Prometheus instance, or is part of a collector agent such as statsD. + +A `m3coordinator` node exposes two external ports: + +- `7201` to manage the cluster topology, you make most API calls to this endpoint +- `7203` for Prometheus to scrape the metrics produced by M3DB and M3Coordinator + +There are two other less-commonly used node types: + +- **[Query nodes](/docs/m3query)** (`m3query`) are an alternative query option to using M3's built-in PromQL support. +- **[Aggregator nodes](/docs/how_to/aggregator)** cluster and aggregate metrics before storing them in storage nodes. Coordinator nodes can also perform this role but are not cluster-aware. + + + +## Prerequisites + +M3 uses [etcd](https://etcd.io/) as a distributed key-value storage for the following functions: + +- Update cluster configuration in realtime +- Manage placements for distributed and sharded clusters \ No newline at end of file diff --git a/site/content/includes/cluster-common-steps.md b/site/content/includes/cluster-common-steps.md index 08e8c731d9..53b07dd82d 100644 --- a/site/content/includes/cluster-common-steps.md +++ b/site/content/includes/cluster-common-steps.md @@ -1,3 +1,32 @@ +### Ready a Namespace + +Once a namespace has finished bootstrapping, you must mark it as ready before receiving traffic by using the _{{% apiendpoint %}}namespace/ready_. + +{{< tabs name="ready_namespaces" >}} +{{% tab name="Command" %}} + +{{< codeinclude file="docs/includes/quickstart/ready-namespace.sh" language="shell" >}} + +{{% /tab %}} +{{% tab name="Output" %}} + +```json +{ + "ready": true +} +``` + +{{% /tab %}} +{{< /tabs >}} + +### Replication factor + +We recommend a replication factor of **3**, with each replica spread across failure domains such as a physical server rack, data center or availability zone. Read our [replication factor recommendations](/docs/operational_guide/replication_and_deployment_in_zones) for more details. + +### Shards + +Read the [placement configuration guide](/docs/operational_guide/placement_configuration) to determine the appropriate number of shards to specify. + ## Writing and Querying Metrics ### Writing Metrics diff --git a/site/content/includes/m3query/annotated_config.yaml b/site/content/includes/m3query/annotated_config.yaml new file mode 100644 index 0000000000..01c9c027fc --- /dev/null +++ b/site/content/includes/m3query/annotated_config.yaml @@ -0,0 +1,246 @@ +# The server listen address +listenAddress: + +# Metrics configuration +# TODO: Which is what? +metrics: + # Scope of metrics root + # TODO: Again, which is? + scope: + # Prefix prepended to metrics collected + prefix: + # Reporting frequendy of metrics collected + reportingInterval: + # Tags shared by metrics collected + tags: + # Configuration for a Prometheus reporter (if used) + prometheus: + # Metrics collection endpoint for application + # Default = "/metrics" + handlerPath: + # Listen address for metrics + # Default = "0.0.0.0:7203" + listenAddress: + # Metric sanitization type, valid options: [none, m3, prometheus] + # Default = "none" + sanitization: + # Sampling rate for metrics. min=0.0, max=1.0 + # TODO: What does this mean exactly? + samplingRate: + # Enable Go runtime metrics, valid options: [none, simple, moderate, detailed] + # See https://github.com/m3db/m3/blob/master/src/x/instrument/extended.go#L39:L64 for more details + extended: + +# Logging configuration +# TODO: More detail than this +# https://github.com/m3db/m3/blob/9f129cf9f16430cc5a399f60aa5684fb72b55bb5/src/cmd/services/m3query/config/config.go#L116 +logging: + level: info + +# Enables tracing, if nothing configured, tracing is disabled +tracing: + # Name for tracing service + serviceName: + # Tracing backen to use, valid options: [jaeger, lightstep] + backend: + # If using Jaeger, options to send to tracing backend + jaeger: + # If using Lightstep, options to send to tracing backend + lightstep: + +clusters: + - namespaces: + - namespace: default + type: unaggregated + retention: 48h + client: + config: + service: + # TODO: ? + env: default_env + # Availability zone, valid options: [user-defined, embedded] + zone: + # TODO: ?? + service: m3db + # Directory to store cached etcd data + cacheDir: + # Identify the etcd hosts this node should connect to + etcdClusters: + # TODO: Confusing, if you use embedded, why do you still need endpoints? + # TODO: Embedded vs seed nodes embedded?? + # Availability zone, valid options: [user-defined, embedded] + - zone: + # Member nodes of the etcd cluster, in form url:port + endpoints: + - + seedNodes: + initialCluster: + - hostID: m3db_local + endpoint: http://127.0.0.1:2380 + # The consistency level for writing to a cluster, valid options: [none, one, majority, all] + writeConsistencyLevel: + # The consistency level for reading from a cluster, valid options: [none, one, unstrict_majority, majority, unstrict_all, all] + readConsistencyLevel: + # The timeout for writing data + # TODO: Defaults? + writeTimeout: + # The fetch timeout for any given query + # Range = 30s to 5m + fetchTimeout: + # The cluster connect timeout + connectTimeout: + # Configuration for retrying write operations + writeRetry: + initialBackoff: + # Factor for exponential backoff + backoffFactor: + # Maximum backoff time + maxBackoff: + # Maximum retry attempts + maxRetries: + # Add randomness to wait intervals + jitter: + # Configuration for retrying fetch operations + # TODO: Query? + fetchRetry: + initialBackoff: + # Factor for exponential backoff + backoffFactor: + # Maximum backoff time + maxBackoff: + # Maximum retry attempts + maxRetries: + # Add randomness to wait intervals + jitter: + # The amount of times a background check fails before a connection is taken out of consideration + backgroundHealthCheckFailLimit: + # The factor of the host connect time when sleeping between a failed health check and the next check + backgroundHealthCheckFailThrottleFactor: + +# TODO: +local: + +# Configuration for the placemement, namespaces and database management endpoints. +clusterManagement: + # etcd client configuration + etcd: + # TODO: ? + env: default_env + # Availability zone, valid options: [user-defined, embedded] + zone: + # TODO: ?? + service: m3db + # Directory to store cached etcd data + cacheDir: + # Identify the etcd hosts this node should connect to + etcdClusters: + m3sd: + # The revision that watch requests start from + watchWithRevision: + newDirectoryNode: + retry: + # The timeout for etcd requests + requestTimeout: + # The timeout for a watchChan initialization + watchChanInitTimeout: + # Frequency to check if a watch chan is no longer subscribed and should be closed + watchChanCheckInterval: + # The delay before resetting the etcd watch chan + watchChanResetInterval: + +# TODO: +filter: + +# TODO: +rpc: + +# TODO: +backend: + +# The worker pool policy for read requests +readWorkerPoolPolicy: + # Worker pool automatically grows to capacity + grow: + # Static pool size, or initial size for dynamically growing pools + size: + +# The worker pool policy for write requests +writeWorkerPoolPolicy: + # Worker pool automatically grows to capacity + grow: + # Static pool size, or initial size for dynamically growing pools + size: + +# TODO: +writeForwarding: + +# TODO: +downsample: + +# TODO: +ingest: + +# Configuration for the carbon server +# TODO: Which is? +carbon: + ingester: + aggregateNamespacesAllData: + # A constant time to shift start by + shiftTimeStart: + # A constant time to shift end by + shiftTimeEnd: + # A constant set of steps to shift start by + shiftStepsStart: + # A constant set of steps to shift end by + shiftStepsEnd: + # A constant set of steps to shift start by, if and only if, the end is an exact match to the resolution boundary of a query, and the start is an exact match to the resolution boundary + shiftStepsStartWhenAtResolutionBoundary: + # A constant set of steps to shift end by, if and only if, the start is an exact match to the resolution boundary of a query, and the end is an exact match to the resolution boundary + shiftStepsEndWhenAtResolutionBoundary: + # A constant set of steps to shift start by, if and only if, the start is an exact match to the resolution boundary of a query, and the end is NOT an exact match to the resolution boundary + shiftStepsEndWhenStartAtResolutionBoundary: + # A constant set of steps to shift end by, if and only if, the end is an exact match to the resolution boundary of a query, and the start is NOT an exact match to the resolution boundary + shiftStepsStartWhenEndAtResolutionBoundary: + # Render partial datapoints when the start time is between a datapoint's resolution step size + renderPartialStart: + # Render partial datapoints when the end time is between a datapoint's resolution step size + renderPartialEnd: + # Render series that have only NaNs for entire output instead of returning an empty array of datapoints + renderSeriesAllNaNs: + # escape all characters using a backslash in a quoted string instead of only escaping quotes + compileEscapeAllNotOnlyQuotes: + +# TODO: +query: + +# TODO: +limits: + +# Additional configuration for metrics tags +# Read https://m3db.io/docs/how_to/query/#id-generation for more details +tagOptions: + # TODO: To do… + idScheme: + +# Sets the lookback duration for queries +# TODO: Which means what? +# Default = 5m +lookbackDuration: + +# The result options for a query +resultOptions: + # Keeps NaNs before returning query results. + # Default = false + keepNans: + +# TODO: +experimental: + +# TODO: +storeMetricsType: + +# TODO: +multiProcess: + +# TODO: +debug: \ No newline at end of file diff --git a/site/content/operator/api.md b/site/content/operator/api.md index fc6c4f7d2b..2e48d4c783 100644 --- a/site/content/operator/api.md +++ b/site/content/operator/api.md @@ -8,24 +8,25 @@ chapter: true This document enumerates the Custom Resource Definitions used by the M3DB Operator. It is auto-generated from code comments. ## Table of Contents -* [ClusterCondition](#clustercondition) -* [ClusterSpec](#clusterspec) -* [ExternalCoordinatorConfig](#externalcoordinatorconfig) -* [IsolationGroup](#isolationgroup) -* [M3DBCluster](#m3dbcluster) -* [M3DBClusterList](#m3dbclusterlist) -* [M3DBStatus](#m3dbstatus) -* [NodeAffinityTerm](#nodeaffinityterm) -* [AggregatedAttributes](#aggregatedattributes) -* [Aggregation](#aggregation) -* [AggregationOptions](#aggregationoptions) -* [DownsampleOptions](#downsampleoptions) -* [IndexOptions](#indexoptions) -* [Namespace](#namespace) -* [NamespaceOptions](#namespaceoptions) -* [RetentionOptions](#retentionoptions) -* [PodIdentity](#podidentity) -* [PodIdentityConfig](#podidentityconfig) +- [Table of Contents](#table-of-contents) +- [ClusterCondition](#clustercondition) +- [ClusterSpec](#clusterspec) +- [ExternalCoordinatorConfig](#externalcoordinatorconfig) +- [IsolationGroup](#isolationgroup) +- [M3DBCluster](#m3dbcluster) +- [M3DBClusterList](#m3dbclusterlist) +- [M3DBStatus](#m3dbstatus) +- [NodeAffinityTerm](#nodeaffinityterm) +- [AggregatedAttributes](#aggregatedattributes) +- [Aggregation](#aggregation) +- [AggregationOptions](#aggregationoptions) +- [DownsampleOptions](#downsampleoptions) +- [IndexOptions](#indexoptions) +- [Namespace](#namespace) +- [NamespaceOptions](#namespaceoptions) +- [RetentionOptions](#retentionoptions) +- [PodIdentity](#podidentity) +- [PodIdentityConfig](#podidentityconfig) ## ClusterCondition @@ -220,7 +221,7 @@ Namespace defines an M3DB namespace or points to a preset M3DB namespace. ## NamespaceOptions -NamespaceOptions defines parameters for an M3DB namespace. See https://m3db.io/docs/operational_guide/namespace_configuration/ for more details. +NamespaceOptions defines parameters for an M3DB namespace. Read [the namespace configuration guide](/docs/operational_guide/namespace_configuration) for more details. | Field | Description | Scheme | Required | | ----- | ----------- | ------ | -------- |