Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infoschema: add metrics_summary_by_label table to query all detail metrics #14663

Merged
merged 9 commits into from
Feb 11, 2020

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Feb 6, 2020

Signed-off-by: crazycs crazycs520@gmail.com

What problem does this PR solve?

Add the metrics_summary_by_label table to query all detail metrics. It is the table metrics_summary by label.
This table can quickly help Users to find abnormal metrics in detail between two different times.

eg:

>select * from `METRICS_SUMMARY_BY_LABEL` order by sum_value desc;
+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
| METRIC_NAME                                      | LABEL                                                                                                                         | TIME                       | SUM_VALUE                   | AVG_VALUE                   | MIN_VALUE                   | MAX_VALUE                   |
+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
| tikv_auto_gc_safepoint                           | instance = 127.0.0.1:20180                                                                                                    | 2020-02-06 20:26:51.628000 |           1.58099122657e+13 |           1.58099122657e+12 |           1.58099098657e+12 |           1.58099158656e+12 |
| pd_cluster_status                                | instance, type = 127.0.0.1:2379, storage_capacity                                                                             | 2020-02-06 20:26:51.628000 |           4.99963170816e+12 |           4.99963170816e+11 |           4.99963170816e+11 |           4.99963170816e+11 |
| pd_scheduler_store_status                        | address, instance, store, type = 127.0.0.1:20160, 127.0.0.1:2379, 1, store_capacity                                           | 2020-02-06 20:26:51.628000 |           4.99963170816e+12 |           4.99963170816e+11 |           4.99963170816e+11 |           4.99963170816e+11 |
| tikv_store_size                                  | instance, type = 127.0.0.1:20180, capacity                                                                                    | 2020-02-06 20:26:51.628000 |           4.99963170816e+12 |           4.99963170816e+11 |           4.99963170816e+11 |           4.99963170816e+11 |
| pd_scheduler_store_status                        | address, instance, store, type = 127.0.0.1:20160, 127.0.0.1:2379, 1, store_available                                          | 2020-02-06 20:26:51.628000 |           8.95173033984e+11 | 89517303398.4               | 89205329920.0               | 89868484608.0               |
| tikv_store_size                                  | instance, type = 127.0.0.1:20180, available                                                                                   | 2020-02-06 20:26:51.628000 |           8.94733672448e+11 | 89473367244.8               | 89183318016.0               | 89855328256.0               |
| pd_scheduler_store_status                        | address, instance, store, type = 127.0.0.1:20160, 127.0.0.1:2379, 1, region_score                                             | 2020-02-06 20:26:51.628000 | 10736564536.5               |  1073656453.65              |  1073656118.73              |  1073656751.17              |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, mapped                                                                                      | 2020-02-06 20:26:51.628000 |  2642391040.0               |   264239104.0               |   263536640.0               |   264978432.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, resident                                                                                    | 2020-02-06 20:26:51.628000 |  1942179840.0               |   194217984.0               |   193515520.0               |   194957312.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, active                                                                                      | 2020-02-06 20:26:51.628000 |  1800548352.0               |   180054835.2               |   179871744.0               |   180322304.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, allocated                                                                                   | 2020-02-06 20:26:51.628000 |  1650098608.0               |   165009860.8               |   164888600.0               |   165097960.0               |
| heap_mem_usage                                   | instance, job = 127.0.0.1:10081, tidb                                                                                         | 2020-02-06 20:26:51.628000 |  1494873992.0               |   149487399.2               |   127208008.0               |   168826440.0               |
| heap_mem_usage                                   | instance, job = localhost:9090, prometheus                                                                                    | 2020-02-06 20:26:51.628000 |  1110194840.0               |   111019484.0               |   103603152.0               |   118016312.0               |
| pd_scheduler_store_status                        | address, instance, store, type = 127.0.0.1:20160, 127.0.0.1:2379, 1, store_used                                               | 2020-02-06 20:26:51.628000 |   654853972.0               |    65485397.2               |    65476826.0               |    65493634.0               |
| pd_cluster_status                                | instance, type = 127.0.0.1:2379, storage_size                                                                                 | 2020-02-06 20:26:51.628000 |   654853972.0               |    65485397.2               |    65476826.0               |    65493634.0               |
| tikv_engine_size                                 | instance, type, db = 127.0.0.1:20180, default, raft                                                                           | 2020-02-06 20:26:51.628000 |   576435976.0               |    57643597.6               |    57630460.0               |    57655804.0               |
| tikv_memtable_size                               | cf, instance, type, db = default, 127.0.0.1:20180, mem-tables, raft                                                           | 2020-02-06 20:26:51.628000 |   571758336.0               |    57175833.6               |    57162696.0               |    57188040.0               |
| tikv_engine_size                                 | instance, type, db = 127.0.0.1:20180, write, kv                                                                               | 2020-02-06 20:26:51.628000 |   346880962.0               |    34688096.2               |    34686165.0               |    34689925.0               |
| tikv_memtable_size                               | cf, instance, type, db = write, 127.0.0.1:20180, mem-tables, kv                                                               | 2020-02-06 20:26:51.628000 |   345936192.0               |    34593619.2               |    34591688.0               |    34595448.0               |
| heap_mem_usage                                   | instance, job = 127.0.0.1:2379, pd                                                                                            | 2020-02-06 20:26:51.628000 |   234594768.0               |    23459476.8               |    15726376.0               |    30543536.0               |
| tikv_engine_size                                 | instance, type, db = 127.0.0.1:20180, raft, kv                                                                                | 2020-02-06 20:26:51.628000 |   158033300.0               |    15803330.0               |    15801314.0               |    15805322.0               |
| tikv_memtable_size                               | cf, instance, type, db = raft, 127.0.0.1:20180, mem-tables, kv                                                                | 2020-02-06 20:26:51.628000 |   158004880.0               |    15800488.0               |    15798472.0               |    15802480.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, fragmentation                                                                               | 2020-02-06 20:26:51.628000 |   150449744.0               |    15044974.4               |    14866824.0               |    15312544.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, metadata                                                                                    | 2020-02-06 20:26:51.628000 |   113197920.0               |    11319792.0               |    11319792.0               |    11319792.0               |
| tikv_engine_size                                 | instance, type, db = 127.0.0.1:20180, lock, kv                                                                                | 2020-02-06 20:26:51.628000 |    79471680.0               |     7947168.0               |     7942544.0               |     7951584.0               |
| tikv_memtable_size                               | cf, instance, type, db = lock, 127.0.0.1:20180, mem-tables, kv                                                                | 2020-02-06 20:26:51.628000 |    79471680.0               |     7947168.0               |     7942544.0               |     7951584.0               |
| tikv_engine_size                                 | instance, type, db = 127.0.0.1:20180, default, kv                                                                             | 2020-02-06 20:26:51.628000 |    70468030.0               |     7046803.0               |     7046803.0               |     7046803.0               |
| tikv_memtable_size                               | cf, instance, type, db = default, 127.0.0.1:20180, mem-tables, kv                                                             | 2020-02-06 20:26:51.628000 |    68931920.0               |     6893192.0               |     6893192.0               |     6893192.0               |
| tikv_allocator_stats                             | instance, type = 127.0.0.1:20180, dirty                                                                                       | 2020-02-06 20:26:51.628000 |    28433568.0               |     2843356.8               |     2020880.0               |     3548688.0               |
| tikv_block_cache_size                            | cf, instance, db = all, 127.0.0.1:20180, kv                                                                                   | 2020-02-06 20:26:51.628000 |     6222680.0               |      622268.0               |      622268.0               |      622268.0               |
......

-- compare 2 different time range metrics and order by ratio.

> select t1.avg_value /t2.avg_value as ratio, t1.metric_name, t1.label,t1.avg_value,t2.avg_value from 
	METRICS_SUMMARY_BY_LABEL as t1 join METRICS_SUMMARY_BY_LABEL as t2 
where
  t1.metric_name = t2.metric_name and t1.label = t2.label and
  t1.time > "2020-02-06 20:30:00" and t1.time < "2020-02-06 20:35:00" and  
  t2.time > "2020-02-06 20:37:00" and t2.time < "2020-02-06 20:42:00" 
  order by ratio desc;
+--------------------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-----------------------------+-----------------------------+
| ratio              | metric_name                                      | label                                                                                                                         | avg_value                   | avg_value                   |
+--------------------+--------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-----------------------------+-----------------------------+
| 16.0702113766      | tikv_cop_total_rocksdb_perf_statistics           | instance, req, metric = 127.0.0.1:20180, select, block_cache_hit_count                                                        |          36.7687130543      |           2.28800432009     |
|  5.35837757549     | tikv_scheduler_scan_details                      | instance, tag, req, cf = 127.0.0.1:20180, next, get, write                                                                    |          19.4401938439      |           3.628             |
|  5.0               | tidb_gc_lifetime                                 | instance = 127.0.0.1:10081                                                                                                    |         600.0               |         120.0               |
|  5.0               | tidb_gc_config                                   | instance, type = 127.0.0.1:10081, tikv_gc_life_time                                                                           |         600.0               |         120.0               |
|  4.62741058075     | pd_tso_rpc_duration_0.999                        |                                                                                                                               |           0.00628           |           0.0013571304924   |
|  4.62741058075     | pd_handle_requests_duration_0.999                | type = tso                                                                                                                    |           0.00628           |           0.0013571304924   |
|  4.62741058075     | pd_handle_request_duration_0.999                 | instance, type = 127.0.0.1:10081, tso                                                                                         |           0.00628           |           0.0013571304924   |
|  4.46616284271     | tikv_cop_scan_details                            | instance, tag, req, cf = 127.0.0.1:20180, get, select, lock                                                                   |          27.4402184844      |           6.14402552051     |
|  4.31760073068     | tikv_pd_request_avg_duration                     | instance, type = 127.0.0.1:20180, store_heartbeat                                                                             |           0.0042097538      |           0.000975021560025 |
|  4.15884097494     | pd_client_cmd_duration_0.999                     | instance, type = 127.0.0.1:10081, wait                                                                                        |           0.00793493333333  |           0.00190796747967  |
|  4.15884097494     | pd_tso_wait_duration_0.999                       |                                                                                                                               |           0.00793493333333  |           0.00190796747967  |
|  4.0               | tikv_scheduler_latch_wait_duration_1             | instance, type = 127.0.0.1:20180, scan_lock                                                                                   |           0.002             |           0.0005            |
|  3.809202887       | pd_start_tso_wait_duration_0.999                 |                                                                                                                               |           0.007138304       |           0.00187396266667  |
|  3.66516564812     | tidb_kv_request_duration_0.99                    | instance, type, store = 127.0.0.1:10081, BatchGet, 1                                                                          |           0.0303866666667   |           0.00829066666667  |
|  3.6               | pd_start_tso_wait_duration_1                     |                                                                                                                               |           0.009216          |           0.00256           |
|  3.57937723328     | tikv_grpc_messge_duration_0.99                   | instance, type = 127.0.0.1:20180, kv_batch_get                                                                                |           0.028048          |           0.007836          |
|  3.48548548549     | tikv_scheduler_latch_wait_duration_0.999         | instance, type = 127.0.0.1:20180, scan_lock                                                                                   |           0.001741          |           0.0004995         |
|  3.04999251801     | pd_client_cmd_duration_0.999                     | instance, type = 127.0.0.1:10081, tso                                                                                         |           0.00959474666667  |           0.0031458262963   |
|  3.0370149805      | tidb_kv_request_duration_0.99                    | instance, type, store = 127.0.0.1:10081, Get, 1                                                                               |           0.0133710666667   |           0.00440270026738  |
|  3.0               | pd_tso_rpc_duration_1                            |                                                                                                                               |           0.0096            |           0.0032            |
|  3.0               | pd_handle_request_duration_1                     | instance, type = 127.0.0.1:10081, tso                                                                                         |           0.0096            |           0.0032            |
|  3.0               | pd_handle_requests_duration_1                    | type = tso                                                                                                                    |           0.0096            |           0.0032            |
|  2.82352941176     | pd_tso_wait_duration_1                           |                                                                                                                               |           0.0096            |           0.0034            |
|  2.82352941176     | pd_client_cmd_duration_1                         | instance, type = 127.0.0.1:10081, wait                                                                                        |           0.0096            |           0.0034            |
|  2.78382096671     | tikv_scheduler_command_duration_0.99             | instance, type = 127.0.0.1:20180, batch_get                                                                                   |           0.0101173333333   |           0.00363433333333  |
|  2.78260869565     | pd_client_cmd_duration_1                         | instance, type = 127.0.0.1:10081, tso                                                                                         |           0.0128            |           0.0046            |
|  2.72921951382     | tikv_grpc_messge_duration_0.99                   | instance, type = 127.0.0.1:20180, kv_get                                                                                      |           0.0122831622807   |           0.00450061353383  |
|  2.72691863344     | tikv_grpc_messge_duration_0.999                  | instance, type = 127.0.0.1:20180, kv_batch_get                                                                                |           0.0330448         |           0.012118          |
|  2.70819428995     | tidb_meta_operation_duration_0.99                | instance, type, result = 127.0.0.1:10081, get_ddl_job, ok                                                                     |           0.01328           |           0.00490363636364  |
|  2.65982071049     | tidb_kv_request_duration_0.999                   | instance, type, store = 127.0.0.1:10081, BatchGet, 1                                                                          |           0.0361586666667   |           0.0135944         |
|  2.625             | tikv_grpc_messge_duration_1                      | instance, type = 127.0.0.1:20180, kv_batch_get                                                                                |           0.0336            |           0.0128            |
......

What is changed and how it works?

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Signed-off-by: crazycs <crazycs520@gmail.com>
@crazycs520 crazycs520 requested a review from a team as a code owner February 6, 2020 12:49
@ghost ghost requested review from eurekaka and francis0407 and removed request for a team February 6, 2020 12:49
@eurekaka eurekaka removed their request for review February 6, 2020 12:49
@@ -853,6 +853,10 @@ func createSessionFunc(store kv.Storage) pools.Factory {
if err != nil {
return nil, errors.Trace(err)
}
err = variable.SetSessionSystemVar(se.sessionVars, variable.MaxAllowedPacket, types.NewStringDatum("67108864"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PR #11137, concat_ws will need this variable, so I set this variable here for internal sql.

@crazycs520 crazycs520 removed the request for review from francis0407 February 6, 2020 12:59
Signed-off-by: crazycs <crazycs520@gmail.com>
@crazycs520 crazycs520 changed the title infoschema: add metric_detail table to query all detail metrics infoschema: add metrics_summary_by_label table to query all detail metrics Feb 7, 2020
infoschema/tables.go Outdated Show resolved Hide resolved
executor/metric_reader.go Outdated Show resolved Hide resolved
infoschema/tables.go Outdated Show resolved Hide resolved
@crazycs520
Copy link
Contributor Author

/run-all-tests

startTime := e.extractor.StartTime.Format(plannercore.MetricTableTimeFormat)
endTime := e.extractor.EndTime.Format(plannercore.MetricTableTimeFormat)
for name, def := range infoschema.MetricTableMap {
sqls := e.genMetricQuerySQLS(name, startTime, endTime, def.Quantile, quantiles, def)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e.genMetricQuerySQLS method accepts the def, so I think the def.Quantile is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, done.

executor/metric_reader.go Outdated Show resolved Hide resolved
Copy link
Contributor

@lonng lonng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@lonng
Copy link
Contributor

lonng commented Feb 10, 2020

LGTM

Copy link
Contributor

@Deardrops Deardrops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520 crazycs520 merged commit 7ecb7e6 into pingcap:master Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants