Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb-in-kubernetes: add deploy k8s requirements documentation #1565

Merged
merged 47 commits into from
Jul 16, 2019

Conversation

onlymellb
Copy link
Contributor

@CLAassistant
Copy link

CLAassistant commented Jul 2, 2019

CLA assistant check
All committers have signed the CLA.

tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
luolibin and others added 3 commits July 4, 2019 15:20
Co-Authored-By: Tennix <tennix@users.noreply.github.com>
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb 集群由 PD、TiKV、TiDB 三个组件组成,在做容量规划的时候一般按照可以支持多少套 tidb 集群来算。我们这里按照标准的 tidb 集群 3个 PD + 3 个 TiKV + 2 个 TiDB 来算,下面是对每个组件规划的一种建议:

1. PD 组件。PD 占用资源较少,这种集群规模下分配 2C 4GB 即可,占用少量本地盘,为了便于管理,我们可以将所有集群的 PD 都放在 master 节点,比如需要支持 5套 tidb 集群,则可以规划三个 master 节点上每个支持部署 5 个 pd 实例,5 个 pd 实例使用同一块 SSD 盘即可(两三百 GB 的盘即可),通过 bind mount 的方式在这块 SSD 上创建 5个目录作为挂载点,操作方式见 [文档](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/operations.md#sharing-a-disk-filesystem-by-multiple-filesystem-pvs)。如果后面集群添加了更多的机器需要支持更多的 tidb 集群时,可以在 master 上用这种方式继续增加 PD 实例,如果 master 上资源耗尽可以找其它的 work 节点机器用同样的方式添加 PD 实例,这种方式的好处就是方便规划和管理 PD 实例,坏处就是由于 PD 实例过于集中,这些机器中如果有两台宕机会导致所有的 tidb 集群不可用。因此我们这里的建议是从所有集群里面的机器都拿出一块 SSD 盘像 master 节点一样提供 PD 实例。比如总共 7 台机器,要支持 7 套 tidb 标准集群的情况下,则需要每台机器上都能支持部署3个 PD 实例,如果后续有集群需要通过扩容机器增加容量,我们也只需要在新的机器上创建 PD 实例即可。
2. TiKV 组件,TiKV 组件的性能因为很依赖磁盘IO 且数据量一般较大,因此这里建议每个 TiKV 实例独占一块 NVMe 的盘,资源配置为 8C 32GB,如果想要在一个机器上支持部署 多个 TiKV 实例,则参考这些参数去选择合适的机器,同时在规划容量的时候应当预留出足够的 buffer。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8C 32GB is for testing environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just provide an example for the user to refer to, the actual use needs to be planned according to the specific situation

| --- | --- |
| Docker | docker-ce 18.09.6 |
| K8s | v1.12.2 |
| CentOS | 7.3 或以上 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimal kernel version is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document also does not contain the minimal kernel version requirements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this blog post suggests, CentOS 7.6 kernel 3.10.0-957 or later is required.

@yikeke
Copy link
Contributor

yikeke commented Jul 9, 2019

Important Update: We have added a markdownlint static check in the CI check to improve the quality of our documentation, so your unclosed PR might fail the CI check now.

First, you should merge upstream master to your branch, so the upstream changes might resolve the issues found by the CI check. Then, click details beside the ci/circleci: lint — Your tests failed on CircleCI message, and you can check the details for the failure and fix all the issues in your PR. It is recommended to install "markdownlint" extension in your VS Code editor, so it will check all potential issues that fail the CI check.

@onlymellb If you have any problem with troubleshooting, please consult @yikeke for help. Thanks for your cooperation~

Ref: #1494

tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Outdated Show resolved Hide resolved
tidb-in-kubernetes/k8s-deployment.md Show resolved Hide resolved
luolibin and others added 6 commits July 9, 2019 18:25
@onlymellb
Copy link
Contributor Author

@yikeke PTAL again

@yikeke
Copy link
Contributor

yikeke commented Jul 10, 2019

@yikeke PTAL again

PTAL again @lilin90

onlymellb and others added 2 commits July 10, 2019 10:18
onlymellb and others added 20 commits July 11, 2019 15:30
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
Co-Authored-By: Lilian Lee <lilin@pingcap.com>
| 软件名称 | 版本 |
| --- | --- |
| Docker | Docker CE 18.09.6 |
| Kubernetes | v1.12.2 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cofyc suggests v1.12.5+

2. 如果需要部署 Kubernetes 集群的监控系统, 且监控数据需要落盘,则也需要考虑为 Prometheus 准备一块 SAS 盘,后面日志监控系统也需要大的 SAS 盘,同时考虑到机器采购最好是同构的这一因素,因此每台机器最好有两块大的 SAS 盘。生产环境建议给这两种类型的盘做 RAID5,至于使用多少块来做 RAID5 可自己决定。
3. etcd 的分布建议是和 k8s master 节点保持一致,也就是多少个 master 节点就部署多少个 etcd 节点,etcd 数据建议使用 SSD 盘存放。

## tidb 集群资源需求
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## tidb 集群资源需求
## TiDB 集群资源需求


TiDB 集群由 PD、TiKV、TiDB 三个组件组成,在做容量规划的时候一般按照可以支持多少套 TiDB 集群来算。这里按照标准的 tidb 集群 3个 PD + 3 个 TiKV + 2 个 TiDB 来算,下面是对每个组件规划的一种建议:

1. PD 组件。PD 占用资源较少,这种集群规模下分配 2C 4GB 即可,占用少量本地盘,为了便于管理,我们可以将所有集群的 PD 都放在 master 节点,比如需要支持 5 套 TiDB 集群,则可以规划三个 master 节点上每个支持部署 5 个 PD 实例,5 个 PD 实例使用同一块 SSD 盘即可(两三百 GB 的盘即可),通过 bind mount 的方式在这块 SSD 上创建 5 个目录作为挂载点,操作方式见 [文档](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/operations.md#sharing-a-disk-filesystem-by-multiple-filesystem-pvs)。如果后面集群添加了更多的机器需要支持更多的 TiDB 集群时,可以在 master 上用这种方式继续增加 PD 实例,如果 master 上资源耗尽可以找其它的 work 节点机器用同样的方式添加 PD 实例,这种方式的好处就是方便规划和管理 PD 实例,坏处就是由于 PD 实例过于集中,这些机器中如果有两台宕机会导致所有的 TiDB 集群不可用。因此这里建议从所有集群里面的机器都拿出一块 SSD 盘像 master 节点一样提供 PD 实例。比如总共 7 台机器,要支持 7 套 TiDB 标准集群的情况下,则需要每台机器上都能支持部署3个 PD 实例,如果后续有集群需要通过扩容机器增加容量,也只需要在新的机器上创建 PD 实例。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. PD 组件PD 占用资源较少,这种集群规模下分配 2C 4GB 即可,占用少量本地盘,为了便于管理,我们可以将所有集群的 PD 都放在 master 节点,比如需要支持 5 套 TiDB 集群,则可以规划三个 master 节点上每个支持部署 5 个 PD 实例,5 个 PD 实例使用同一块 SSD 盘即可(两三百 GB 的盘即可),通过 bind mount 的方式在这块 SSD 上创建 5 个目录作为挂载点,操作方式见 [文档](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/operations.md#sharing-a-disk-filesystem-by-multiple-filesystem-pvs)。如果后面集群添加了更多的机器需要支持更多的 TiDB 集群时,可以在 master 上用这种方式继续增加 PD 实例,如果 master 上资源耗尽可以找其它的 work 节点机器用同样的方式添加 PD 实例,这种方式的好处就是方便规划和管理 PD 实例,坏处就是由于 PD 实例过于集中,这些机器中如果有两台宕机会导致所有的 TiDB 集群不可用。因此这里建议从所有集群里面的机器都拿出一块 SSD 盘像 master 节点一样提供 PD 实例。比如总共 7 台机器,要支持 7 套 TiDB 标准集群的情况下,则需要每台机器上都能支持部署3个 PD 实例,如果后续有集群需要通过扩容机器增加容量,也只需要在新的机器上创建 PD 实例。
1. PD 组件PD 占用资源较少,这种集群规模下分配 2C 4GB 即可,占用少量本地盘,为了便于管理,我们可以将所有集群的 PD 都放在 master 节点,比如需要支持 5 套 TiDB 集群,则可以规划三个 master 节点上每个支持部署 5 个 PD 实例,5 个 PD 实例使用同一块 SSD 盘即可(两三百 GB 的盘即可),通过 bind mount 的方式在这块 SSD 上创建 5 个目录作为挂载点,操作方式见 [文档](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/operations.md#sharing-a-disk-filesystem-by-multiple-filesystem-pvs)。如果后面集群添加了更多的机器需要支持更多的 TiDB 集群时,可以在 master 上用这种方式继续增加 PD 实例,如果 master 上资源耗尽可以找其它的 work 节点机器用同样的方式添加 PD 实例,这种方式的好处就是方便规划和管理 PD 实例,坏处就是由于 PD 实例过于集中,这些机器中如果有两台宕机会导致所有的 TiDB 集群不可用。因此这里建议从所有集群里面的机器都拿出一块 SSD 盘像 master 节点一样提供 PD 实例。比如总共 7 台机器,要支持 7 套 TiDB 标准集群的情况下,则需要每台机器上都能支持部署3个 PD 实例,如果后续有集群需要通过扩容机器增加容量,也只需要在新的机器上创建 PD 实例。


与部署 TiDB binary 集群的服务器要求一致,选用 Intel x86-64 架构的 64 位通用硬件服务器,使用万兆网卡。关于 TiDB 集群在物理机上的具体部署需求,参考[这里](/dev/how-to/deploy/hardware-recommendations.md)。

对于服务器 disk、memory、CPU 的选择要根据对集群的容量规划以及部署拓扑来定。线上 Kubernetes 集群部署为了保证高可用一般需要部署三个 master 节点,三个 etcd 节点,若干个 work 节点。同时,为了充分利用机器资源,master 节点一般也充当 work 节点(也就是 master 节点上也可以调度负载),通过 kubelet 设置[预留资源](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/) 来保证机器上的系统进程以及 Kubernetes 的核心进程在工作负载很高的情况下仍然有足够的资源来运行,从而保证整个系统的稳定。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onlymellb I want to confirm: is it work or worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to use worker instead

- master 兼 work 节点:48C 192GB 2 块 SSD 盘,一块做了 RAID5 的 SAS 盘,三块 NVMe 盘
- work 节点:48C 192GB 1 块 SSD 盘,一块做了 RAID5 的 SAS 盘,两块 NVMe 盘

使用上面的机器配置,除去各个组件占用的资源外,还有比较多的预留资源。如果再考虑到加监控和日志组件,用同样的方法去规划需要采购的机器类型以及配置。另外在生产环境的使用上尽量不要在 master 节点部署 TiDB 实例或者尽可能少地部署 TiDB 实例,这里的主要考虑点是网卡带宽,因为如果 master 节点网卡打满会影响到 work 节点和 master 节点之间的心跳汇报信息,导致比较严重的问题。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use more formal words to replace "网卡打满"? And please try to explain it in English, otherwise, it might be hard to translate. 😅 Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use “网卡满负荷工作” instead

Copy link
Member

@tennix tennix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants