Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infoschema: add metric database/table to query cluster metric table. #13757

Merged
merged 39 commits into from
Dec 20, 2019

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Nov 26, 2019

What problem does this PR solve?

  • Add add metric database/table to query cluster metric table.
  • Add 2 session variable to control the query metric time range:
    • tidb_metric_schema_range_duration, the query duration, it will affect the generated promQL.
    • tidb_metric_schema_step, the query step time, it will affect the step value when query promQL.

release test PR:https://github.com/pingcap/tidb-test/pull/968, merge test first.

mysql>use METRIC_SCHEMA;
mysql>select * from up;
+----------------------------+-------+-----------------+-------------------+
| time                       | value | instance        | job               |
+----------------------------+-------+-----------------+-------------------+
| 2019-12-04 18:46:33.398000 | 1.0   | 127.0.0.1:10080 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10081 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10082 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10083 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10090 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10091 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10092 | tidb              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:10093 | tidb              |
| 2019-12-04 18:46:33.398000 | 1.0   | 127.0.0.1:20180 | tikv              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:20181 | tikv              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:20182 | tikv              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:20183 | tikv              |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:20184 | tikv              |
| 2019-12-04 18:46:33.398000 | 1.0   | 127.0.0.1:2379  | pd                |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:2479  | pd                |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:2579  | pd                |
| 2019-12-04 18:46:33.398000 | 0.0   | 127.0.0.1:9100  | overwritten-nodes |
| 2019-12-04 18:46:33.398000 | 1.0   | localhost:9090  | prometheus        |
+----------------------------+-------+-----------------+-------------------+
mysql>select * from query_duration;
+----------------------------+------------------+----------+----------+----------+
| time                       | value            | instance | sql_type | quantile |
+----------------------------+------------------+----------+----------+----------+
| 2019-12-04 18:46:37.134000 | 0.00430970543826 |          |          | 0.9      |
+----------------------------+------------------+----------+----------+----------+

You can use explain to check the generated promQL too:

mysql>desc select * from query_duration;
+----------------+----------+------+------------------------------------------------------------------------------------------------------------+
| id             | count    | task | operator info                                                                                              |
+----------------+----------+------+------------------------------------------------------------------------------------------------------------+
| MemTableScan_4 | 10000.00 | root | promQL:histogram_quantile(0.9, sum(rate(tidb_server_handle_query_duration_seconds_bucket{}[60s])) by (le)) |
+----------------+----------+------+------------------------------------------------------------------------------------------------------------+

What is changed and how it works?

Actually, we use promQL to query metric data. Every metric table has a related promQL.
Such as query_duration metric table, there is a definition in code:

	"query_duration": {
		promQL:   `histogram_quantile($QUANTILE, sum(rate(tidb_server_handle_query_duration_seconds_bucket{$LABEL_CONDITION}[$RANGE_DURATION])) by (le))`,
		labels:   []string{"instance", "sql_type"},
		quantile: 0.90,
	},
	"up": {
		promQL: `up{$LABEL_CONDITION}`,
		labels: []string{"instance", "job"},
	},

Then TiDB will generate a metric table as below:

mysql>desc query_duration;
+----------+-------------------+------+-----+-------------------+-------+
| Field    | Type              | Null | Key | Default           | Extra |
+----------+-------------------+------+-----+-------------------+-------+
| time     | datetime unsigned | YES  |     | CURRENT_TIMESTAMP |       |
| value    | double unsigned   | YES  |     | <null>            |       |
| instance | varchar(512)      | YES  |     | <null>            |       |
| sql_type | varchar(512)      | YES  |     | <null>            |       |
| quantile | double unsigned   | YES  |     | 0.9               |       |
+----------+-------------------+------+-----+-------------------+-------+
mysql>desc up;
+----------+-------------------+------+-----+-------------------+-------+
| Field    | Type              | Null | Key | Default           | Extra |
+----------+-------------------+------+-----+-------------------+-------+
| time     | datetime unsigned | YES  |     | CURRENT_TIMESTAMP |       |
| value    | double unsigned   | YES  |     | <null>            |       |
| instance | varchar(512)      | YES  |     | <null>            |       |
| job      | varchar(512)      | YES  |     | <null>            |       |
+----------+-------------------+------+-----+-------------------+-------+

As you can see, there is 3 variable in promQL, it will be replaced when execute SQL query:

$QUANTILE  
$LABEL_CONDITION
$RANGE_DURATION

If you execute below SQL:

select * from query_duration where time > "2019-11-25 00:00:00" and time < "2019-11-25 00:01:00" and sql_type="general";

It will generated a promQL as below:

histogram_quantile(0.90, sum(rate(tidb_server_handle_query_duration_seconds_bucket{sql_type="general"}[60s])) by (le))

The $QUANTILE and $RANGE_DURATION is 0.90, 60s, it's the session variable value of tidb_metric_schema_range_duration.

Then send the promQL to PD to query metric data which between in the "2019-11-25 00:00:00" and "2019-11-25 00:01:00".

Attension

Actually the variable in the promQL will always use the default value, use the value specified in SQL will be finished in the next PR.

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Code changes

Side effects

Related changes

Release note

  • Write release note for bug-fix or new feature.

domain/domain.go Outdated Show resolved Hide resolved
infoschema/infoschema_test.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Dec 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@eef8c39). Click here to learn what that means.
The diff coverage is 50.6172%.

@@             Coverage Diff             @@
##             master     #13757   +/-   ##
===========================================
  Coverage          ?   80.0891%           
===========================================
  Files             ?        487           
  Lines             ?     121431           
  Branches          ?          0           
===========================================
  Hits              ?      97253           
  Misses            ?      16435           
  Partials          ?       7743

@crazycs520 crazycs520 requested a review from a team as a code owner December 9, 2019 03:35
@ghost ghost requested review from alivxxx and lzmhhh123 and removed request for a team December 9, 2019 03:35
infoschema/metricschema/table.go Outdated Show resolved Hide resolved
sessionctx/variable/varsutil.go Show resolved Hide resolved
infoschema/metricschema/init.go Outdated Show resolved Hide resolved
infoschema/metricschema/init.go Outdated Show resolved Hide resolved
infoschema/metricschema/query.go Outdated Show resolved Hide resolved
infoschema/metricschema/table.go Outdated Show resolved Hide resolved
meta/autoid/autoid.go Outdated Show resolved Hide resolved
sessionctx/variable/sysvar.go Show resolved Hide resolved
Copy link
Contributor

@djshow832 djshow832 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed IsMemOrSysDB in util/misc.go.

executor/cluster_reader.go Outdated Show resolved Hide resolved
@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-all-tests tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-common-test tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-all-tests tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-integration-copr-test tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-common-test tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-all-tests tidb-test=pr/968

Copy link
Contributor

@djshow832 djshow832 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@djshow832 djshow832 added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 20, 2019
@crazycs520
Copy link
Contributor Author

/run-all-tests tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-all-tests tidb-test=pr/968

@crazycs520
Copy link
Contributor Author

/run-integration-copr-test tidb-test=pr/968

@lonng lonng merged commit 933715f into pingcap:master Dec 20, 2019
@lonng lonng added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2. type/usability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants