Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and expose client errors/warnings as a table #14433

Closed
wwar opened this issue Jan 9, 2020 · 9 comments · Fixed by #22382
Closed

Collect and expose client errors/warnings as a table #14433

wwar opened this issue Jan 9, 2020 · 9 comments · Fixed by #22382
Labels
component/infoschema type/enhancement The issue or PR belongs to an enhancement. type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@wwar
Copy link

wwar commented Jan 9, 2020

Feature Request

I really like the work being done in metric_schema and information_schema :-) I was analyzing it today for the purposes of creating something similar to the SYS schema in MySQL.

SYS is designed to be a set of views that are task-oriented for DBAs (versus i_s and p_s which are just the raw metrics). The main deficiency is that I could not see any metrics related to either the user, or the client's host.

For example, in MySQL I can do this:

> select * from sys.host_summary\G
*************************** 1. row ***************************
                  host: localhost
            statements: 960
     statement_latency: 758.15 ms
 statement_avg_latency: 789.74 us
           table_scans: 28
              file_ios: 938
       file_io_latency: 92.93 ms
   current_connections: 2
     total_connections: 10
          unique_users: 2
        current_memory: 9.83 MiB
total_memory_allocated: 97.08 MiB
1 row in set (0.01 sec)

This helps me find out if I have one particular rogue host, perhaps because the application was accidentally unpatched. Or maybe I allow reporting users to remotely connect, and I have one that is doing naughty things? SYS is even not as detailed as it could be here. In MySQL 8.0 there are performance_schema metrics per host of each error code:

> select * from performance_schema.events_errors_summary_by_host_by_error where sum_error_raised > 0;
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| HOST      | ERROR_NUMBER | ERROR_NAME                               | SQL_STATE | SUM_ERROR_RAISED | SUM_ERROR_HANDLED | FIRST_SEEN          | LAST_SEEN           |
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| NULL      |         3719 | ER_DEPRECATED_UTF8_ALIAS                 | HY000     |               34 |                 0 | 2019-12-25 12:20:37 | 2019-12-25 12:20:37 |
| NULL      |         3778 | ER_WARN_DEPRECATED_UTF8MB3_COLLATION     | HY000     |               73 |                 0 | 2019-12-25 12:20:37 | 2019-12-25 12:20:37 |
| localhost |         1050 | ER_TABLE_EXISTS_ERROR                    | 42S01     |                1 |                 0 | 2019-12-25 18:54:13 | 2019-12-25 18:54:13 |
| localhost |         1146 | ER_NO_SUCH_TABLE                         | 42S02     |                1 |                 0 | 2020-01-09 10:36:02 | 2020-01-09 10:36:02 |
| localhost |         1159 | ER_NET_READ_INTERRUPTED                  | 08S01     |                2 |                 0 | 2019-12-26 02:54:20 | 2019-12-26 03:36:53 |
| localhost |         1287 | ER_WARN_DEPRECATED_SYNTAX                | HY000     |              340 |                 0 | 2019-12-25 18:54:12 | 2020-01-09 12:12:13 |
| localhost |         1305 | ER_SP_DOES_NOT_EXIST                     | 42000     |                1 |                 0 | 2019-12-25 12:27:09 | 2019-12-25 12:27:09 |
| localhost |         1585 | ER_NATIVE_FCT_NAME_COLLISION             | HY000     |                2 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:07:18 |
| localhost |         1681 | ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT | HY000     |                2 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:07:18 |
| localhost |         3554 | ER_NO_SYSTEM_TABLE_ACCESS                | HY000     |              278 |                 0 | 2019-12-25 12:20:38 | 2020-01-09 12:12:15 |
| localhost |         3719 | ER_DEPRECATED_UTF8_ALIAS                 | HY000     |                2 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:07:18 |
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
11 rows in set (0.01 sec)

.. this helps identify that their might be an unpatched client as well.

The other thing that MySQL has, is aggregations per-user. This is helpful for the case that I provide separate users for a multi-tenant database server (TiDB is used as a pool with many apps on it), or a single application schema with many clients. Some examples:

select * from sys.user_summary;
+------------+------------+-------------------+-----------------------+-------------+----------+-----------------+---------------------+-------------------+--------------+----------------+------------------------+
| user       | statements | statement_latency | statement_avg_latency | table_scans | file_ios | file_io_latency | current_connections | total_connections | unique_hosts | current_memory | total_memory_allocated |
+------------+------------+-------------------+-----------------------+-------------+----------+-----------------+---------------------+-------------------+--------------+----------------+------------------------+
| msandbox   |        800 | 512.04 ms         | 640.05 us             |          18 |      570 | 32.99 ms        |                   2 |                 5 |            1 | 4.23 MiB       | 65.17 MiB              |
| root       |         69 | 51.78 ms          | 750.37 us             |           4 |      123 | 15.90 ms        |                   0 |                 5 |            1 | 848.01 KiB     | 3.39 MiB               |
| background |          1 | 316.41 us         | 316.41 us             |           0 |    11615 | 375.21 ms       |                  42 |                47 |            0 | 54.19 MiB      | 72.11 MiB              |
+------------+------------+-------------------+-----------------------+-------------+----------+-----------------+---------------------+-------------------+--------------+----------------+------------------------+
3 rows in set (0.01 sec)
select * from performance_schema.users;
+----------+---------------------+-------------------+
| USER     | CURRENT_CONNECTIONS | TOTAL_CONNECTIONS |
+----------+---------------------+-------------------+
| NULL     |                  42 |                47 |
| msandbox |                   2 |                 5 |
| root     |                   0 |                 5 |
+----------+---------------------+-------------------+
3 rows in set (0.00 sec)
> select * from performance_schema.events_errors_summary_by_user_by_error where sum_error_raised > 0;
+----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| USER     | ERROR_NUMBER | ERROR_NAME                               | SQL_STATE | SUM_ERROR_RAISED | SUM_ERROR_HANDLED | FIRST_SEEN          | LAST_SEEN           |
+----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| NULL     |         3719 | ER_DEPRECATED_UTF8_ALIAS                 | HY000     |               34 |                 0 | 2019-12-25 12:20:37 | 2019-12-25 12:20:37 |
| NULL     |         3778 | ER_WARN_DEPRECATED_UTF8MB3_COLLATION     | HY000     |               73 |                 0 | 2019-12-25 12:20:37 | 2019-12-25 12:20:37 |
| msandbox |         1050 | ER_TABLE_EXISTS_ERROR                    | 42S01     |                1 |                 0 | 2019-12-25 18:54:13 | 2019-12-25 18:54:13 |
| msandbox |         1146 | ER_NO_SUCH_TABLE                         | 42S02     |                1 |                 0 | 2020-01-09 10:36:02 | 2020-01-09 10:36:02 |
| msandbox |         1159 | ER_NET_READ_INTERRUPTED                  | 08S01     |                1 |                 0 | 2019-12-26 02:54:20 | 2019-12-26 02:54:20 |
| msandbox |         1287 | ER_WARN_DEPRECATED_SYNTAX                | HY000     |              511 |                 0 | 2019-12-25 18:54:12 | 2020-01-09 12:18:07 |
| msandbox |         1305 | ER_SP_DOES_NOT_EXIST                     | 42000     |                1 |                 0 | 2019-12-25 12:27:09 | 2019-12-25 12:27:09 |
| msandbox |         1585 | ER_NATIVE_FCT_NAME_COLLISION             | HY000     |                2 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:07:18 |
| msandbox |         1681 | ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT | HY000     |                2 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:07:18 |
| msandbox |         3554 | ER_NO_SYSTEM_TABLE_ACCESS                | HY000     |              382 |                 0 | 2019-12-25 12:20:38 | 2020-01-09 12:18:09 |
| msandbox |         3719 | ER_DEPRECATED_UTF8_ALIAS                 | HY000     |                3 |                 0 | 2020-01-09 10:30:09 | 2020-01-09 12:17:17 |
| root     |         1159 | ER_NET_READ_INTERRUPTED                  | 08S01     |                1 |                 0 | 2019-12-26 03:36:53 | 2019-12-26 03:36:53 |
| root     |         1287 | ER_WARN_DEPRECATED_SYNTAX                | HY000     |               33 |                 0 | 2019-12-25 19:35:05 | 2019-12-25 19:35:05 |
| root     |         3554 | ER_NO_SYSTEM_TABLE_ACCESS                | HY000     |               16 |                 0 | 2019-12-25 19:35:05 | 2019-12-25 19:35:06 |
+----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
14 rows in set (0.03 sec)

I took a look at #13567 but couldn't see anything related to user/client statistics. Feel free to correct me if this is something planned that I missed :-)

Of course, the other thing missing to implement SYS is helper functions for formatting. This was mentioned already in #14211

@wwar wwar added the type/enhancement The issue or PR belongs to an enhancement. label Jan 9, 2020
@XuHuaiyu
Copy link
Contributor

PTAL
@lonng

@lonng
Copy link
Contributor

lonng commented Jan 14, 2020

@wwar Thank you very much for paying attention to this point. We implement the metric schema mainly to implement diagnostic rules. These diagnostic rules will soon begin to add, and if you are interested, you can get involved.

In addition, for the time being we do not have the same thing as sys, we consider adding a partially compatible view in the future.

@ghost ghost changed the title metric_schema omits client statistics Collect and expose client statistics as a table Aug 6, 2020
@ghost ghost added the type/feature-request Categorizes issue or PR as related to a new feature. label Aug 6, 2020
@zz-jason zz-jason added the feature/reviewing This feature request is reviewing by product managers label Aug 10, 2020
@zz-jason
Copy link
Member

@wwar It would be helpful if you can clearly specify the tables you want to add to tidb.

@ghost
Copy link

ghost commented Aug 12, 2020

Let me fill in the request. I would like to see the following tables in information schema:

CREATE TABLE client_errors_summary_by_host (
  `HOST` char(255) CHARACTER SET ascii COLLATE ascii_general_ci DEFAULT NULL,
  `ERROR_NUMBER` int DEFAULT NULL,
  `ERROR_NAME` varchar(64) DEFAULT NULL,
  `SQL_STATE` varchar(5) DEFAULT NULL,
  `SUM_ERROR_RAISED` bigint unsigned NOT NULL,
  `SUM_ERROR_HANDLED` bigint unsigned NOT NULL,
  `FIRST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `LAST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  UNIQUE KEY `HOST` (`HOST`,`ERROR_NUMBER`)
);

CREATE TABLE `client_errors_summary_by_user` (
  `USER` char(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  `ERROR_NUMBER` int DEFAULT NULL,
  `ERROR_NAME` varchar(64) DEFAULT NULL,
  `SQL_STATE` varchar(5) DEFAULT NULL,
  `SUM_ERROR_RAISED` bigint unsigned NOT NULL,
  `SUM_ERROR_HANDLED` bigint unsigned NOT NULL,
  `FIRST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `LAST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  UNIQUE KEY `USER` (`USER`,`ERROR_NUMBER`)
);
CREATE TABLE `client_errors_summary_global` (
  `ERROR_NUMBER` int DEFAULT NULL,
  `ERROR_NAME` varchar(64) DEFAULT NULL,
  `SQL_STATE` varchar(5) DEFAULT NULL,
  `SUM_ERROR_RAISED` bigint unsigned NOT NULL,
  `SUM_ERROR_HANDLED` bigint unsigned NOT NULL,
  `FIRST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  `LAST_SEEN` timestamp NULL DEFAULT '0000-00-00 00:00:00',
  UNIQUE KEY `ERROR_NUMBER` (`ERROR_NUMBER`)
);

Here is an example output:

SELECT * FROM client_errors_summary_by_host WHERE sum_error_raised > 0;
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| HOST      | ERROR_NUMBER | ERROR_NAME                               | SQL_STATE | SUM_ERROR_RAISED | SUM_ERROR_HANDLED | FIRST_SEEN          | LAST_SEEN           |
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
| NULL      |         3719 | ER_DEPRECATED_UTF8_ALIAS                 | HY000     |               34 |                 0 | 2020-08-10 21:10:11 | 2020-08-10 21:10:12 |
| NULL      |         3778 | ER_WARN_DEPRECATED_UTF8MB3_COLLATION     | HY000     |               73 |                 0 | 2020-08-10 21:10:11 | 2020-08-10 21:10:12 |
| localhost |         1046 | ER_NO_DB_ERROR                           | 3D000     |                5 |                 0 | 2020-08-12 10:29:50 | 2020-08-12 10:29:50 |
| localhost |         1050 | ER_TABLE_EXISTS_ERROR                    | 42S01     |                1 |                 0 | 2020-08-12 10:44:29 | 2020-08-12 10:44:29 |
| localhost |         1136 | ER_WRONG_VALUE_COUNT_ON_ROW              | 21S01     |                1 |                 0 | 2020-08-12 10:44:32 | 2020-08-12 10:44:32 |
| localhost |         1159 | ER_NET_READ_INTERRUPTED                  | 08S01     |                1 |                 0 | 2020-08-11 05:33:44 | 2020-08-11 05:33:44 |
| localhost |         1287 | ER_WARN_DEPRECATED_SYNTAX                | HY000     |              289 |                 0 | 2020-08-10 21:10:22 | 2020-08-12 11:52:25 |
| localhost |         1292 | ER_TRUNCATED_WRONG_VALUE                 | 22007     |                4 |                 0 | 2020-08-12 10:07:46 | 2020-08-12 10:45:57 |
| localhost |         1298 | ER_UNKNOWN_TIME_ZONE                     | HY000     |                2 |                 0 | 2020-08-12 10:44:29 | 2020-08-12 10:44:37 |
| localhost |         1681 | ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT | HY000     |                3 |                 0 | 2020-08-10 21:10:19 | 2020-08-10 21:33:58 |
| localhost |         3554 | ER_NO_SYSTEM_TABLE_ACCESS                | HY000     |               98 |                 0 | 2020-08-10 21:10:22 | 2020-08-12 11:56:46 |
+-----------+--------------+------------------------------------------+-----------+------------------+-------------------+---------------------+---------------------+
11 rows in set (0.01 sec)

(Note: This is modeled on the performance_schema tables in MySQL, but for compatibility TiDB leaves performance_schema empty, and adds information_schema tables. I'm fine with that.)

@lonng
Copy link
Contributor

lonng commented Aug 13, 2020

Good idea.
@nullnotnil Do we need to persistent this summary?

@ghost
Copy link

ghost commented Aug 13, 2020

Nope, it can have similar semantics to STATEMENTS_SUMMARY / STATEMENTS_SUMMARY_HISTORY where it ages out eventually. It would also be useful to have an SQL command FLUSH CLIENT_ERRORS_SUMMARY.

But that's basically it. Adding this can also help cleanup some of the current log spam mentioned in #19053

@lonng
Copy link
Contributor

lonng commented Aug 14, 2020

@nullnotnil Do we need to add a frontend page in Dashboard for them? just like STATEMENTS_SUMMARY.

@ghost
Copy link

ghost commented Aug 14, 2020

Yes. There are a few problems you can find with this - for example:

  • You can find regressions introduced from a new version of an app deployed (where MySQL behavior is to return a warning which was ignored.. but some warnings are super dangerous to ignore).

  • There might be configuration errors. Logically restoring data from a system that had strict mode disabled, but with it enabled you can see clients are getting errors.

  • You update login credentials in your app, but missed a server. The host will show up as the one with errors.

cc @breeswish

@scsldb scsldb added this to the Requirement pool milestone Sep 22, 2020
@ghost ghost self-assigned this Oct 1, 2020
@ghost ghost removed the feature/reviewing This feature request is reviewing by product managers label Oct 30, 2020
@ghost ghost removed their assignment Dec 15, 2020
@morgo morgo changed the title Collect and expose client statistics as a table Collect and expose client errors/warnings as a table Jan 12, 2021
@breezewish
Copy link
Member

Is it also possible to provide a column, or a table, to map error types, so that DBAs can distinguish whether this error is a client mistake (like syntax errors), or an infra problem (like region unavailable)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/infoschema type/enhancement The issue or PR belongs to an enhancement. type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
7 participants