Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add diagnostics service #2024

Merged
merged 1 commit into from
Dec 16, 2019
Merged

server: add diagnostics service #2024

merged 1 commit into from
Dec 16, 2019

Conversation

lonng
Copy link
Member

@lonng lonng commented Dec 16, 2019

Signed-off-by: Lonng heng@lonng.org

What problem does this PR solve?

This PR embed Diagnostics service to the PD to make it diagnosable. (Part of pingcap/tidb#13567)

What is changed and how it works?

Register Diagnostics service to PD at server startup.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
mysql> select * from cluster_log where type='pd' and level='warn' and message like '%';
+-------------------------+------+-----------------+-------+---------------------------------------------------------------------------------------------------------------------------+
| TIME                    | TYPE | ADDRESS         | LEVEL | MESSAGE                                                                                                                   |
+-------------------------+------+-----------------+-------+---------------------------------------------------------------------------------------------------------------------------+
| 2019/12/16 16:28:28.357 | pd   | 127.0.0.1:49901 | Warn  | [store.go:1288] ["simple token is not cryptographically signed"]                                                          |
| 2019/12/16 16:28:29.852 | pd   | 127.0.0.1:49901 | Warn  | [history_buffer.go:138] ["load history index failed"] [error="leveldb: not found"]                                        |
| 2019/12/16 16:29:03.773 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283320320] |
| 2019/12/16 16:29:13.774 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:29:23.775 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:29:33.776 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:29:43.777 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:29:53.778 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:30:03.779 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:30:13.780 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283267072] |
| 2019/12/16 16:30:23.781 | pd   | 127.0.0.1:49901 | Warn  | [cluster.go:383] ["store does not have enough disk space"] [store-id=1] [capacity=1574498656256] [available=280283262976] |
+-------------------------+------+-----------------+-------+---------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)
mysql> select * from cluster_log where type='pd' and message like '%scheduler%' and message regexp '.*coordinator.*';
+-------------------------+------+-----------------+-------+-----------------------------------------------------------------------------------------+
| TIME                    | TYPE | ADDRESS         | LEVEL | MESSAGE                                                                                 |
+-------------------------+------+-----------------+-------+-----------------------------------------------------------------------------------------+
| 2019/12/16 16:28:56.764 | pd   | 127.0.0.1:49901 | Info  | [coordinator.go:172] ["coordinator starts to run schedulers"]                           |
| 2019/12/16 16:28:56.765 | pd   | 127.0.0.1:49901 | Info  | [coordinator.go:236] ["create scheduler"] [scheduler-name=balance-region-scheduler]     |
| 2019/12/16 16:28:56.766 | pd   | 127.0.0.1:49901 | Info  | [coordinator.go:236] ["create scheduler"] [scheduler-name=balance-leader-scheduler]     |
| 2019/12/16 16:28:56.766 | pd   | 127.0.0.1:49901 | Info  | [coordinator.go:236] ["create scheduler"] [scheduler-name=balance-hot-region-scheduler] |
| 2019/12/16 16:28:56.767 | pd   | 127.0.0.1:49901 | Info  | [coordinator.go:236] ["create scheduler"] [scheduler-name=label-scheduler]              |
+-------------------------+------+-----------------+-------+-----------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)

@lonng lonng added the component/api HTTP API. label Dec 16, 2019
Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But there are some error with ci.

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Lonng <heng@lonng.org>
@disksing
Copy link
Contributor

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 16, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Dec 16, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Dec 16, 2019

@lonng merge failed.

@disksing
Copy link
Contributor

CI not run. retry...

/merge

@disksing
Copy link
Contributor

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Dec 16, 2019

/run-all-tests

1 similar comment
@lonng
Copy link
Member Author

lonng commented Dec 16, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Dec 16, 2019

@lonng merge failed.

@lonng lonng merged commit 7b856c8 into tikv:master Dec 16, 2019
@lonng lonng deleted the diagnostics-service branch December 16, 2019 09:57
Huster-ljw pushed a commit to Huster-ljw/pd that referenced this pull request Dec 18, 2019
Signed-off-by: Lonng <heng@lonng.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/api HTTP API. status/can-merge Indicates a PR has been approved by a committer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants