Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a prefix index to filecache.path, attempt 2 #28541

Merged
merged 2 commits into from
Oct 19, 2021
Merged

Conversation

icewind1991
Copy link
Member

Second attempt at #26070, this time with a shorter prefix to hopefully not have the same issues.


The reason that filecache.path hasn't had an index added is the mysql limitation of ~1kb for indexeded fields,
which is to small for the path, however mysql supports indexing only the first N bytes of a column instead of the entire column,
allowing us to add an index even if the column is to long.

Because the index doesn't cover the entire column it can't be used in all situations where a normal index would be used, but it does cover the path like 'folder/path/%' queries that are used in various places.

Sqlite and Postgresql don't support prefix indexes, but they also don't have the 1kb limit and DBAL handles the differences in index creation.

Signed-off-by: Robin Appelman robin@icewind.nl

@icewind1991 icewind1991 added the 3. to review Waiting for reviews label Aug 20, 2021
@icewind1991 icewind1991 added this to the Nextcloud 23 milestone Aug 20, 2021
@nickvergessen
Copy link
Member

Can't remember completely why it failed back then. Maybe we can ask people to setup the index manually on several instances to not fail again.

@icewind1991 icewind1991 force-pushed the path-prefix-index2 branch 2 times, most recently from 0717cbf to c35470b Compare August 24, 2021 17:17
Copy link
Member

@skjnldsv skjnldsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is good, no idea about what @nickvergessen was referring to. Let's wait for another review :)

@nickvergessen
Copy link
Member

no idea about what nickvergessen was referring to. Let's wait for another review :)

It 💥 on several servers (including c.nc.c iirc) back then (when attempt one was done). We can try it again, but in case it breaks again we have to revert it again.

@icewind1991
Copy link
Member Author

After some more testing I found that postgresql requires setting some additional options ("operator class") on the index before it can be used for LIKE pattern matching when using a non C locale.

Since dbal does not currently support setting these options I've disabled creating the index on pgsql for now so we can more easily add the correct index later and we don't get stuck with a useless index.

@skjnldsv
Copy link
Member

Any news? :)

@nickvergessen
Copy link
Member

nickvergessen commented Sep 22, 2021

Seems it's helping lot of times, but e.g. with groupfolders the index is not used:

EXPLAIN SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `metadata_etag`, `creation_time`, `upload_time` FROM
`oc_filecache` `file` LEFT JOIN `oc_filecache_extended` `fe` ON `file`.`fileid` = `fe`.`fileid` WHERE (`storage` = 330) AND (((`path`  COLLATE utf8mb4_general_ci LIKE '__groupfolders/22/%') OR (`path_hash` = '')) AND (`name`  COLLATE u
tf8mb4_general_ci LIKE '%…%')) ORDER BY `mtime` desc LIMIT 5;
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+----------+---------+---------------------+------+-------------+
| id   | select_type | table | type   | possible_keys                                                                                       | key      | key_len | ref                 | rows | Extra       |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+----------+---------+---------------------+------+-------------+
|    1 | SIMPLE      | file  | index  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_mtime | 8       | NULL                |   76 | Using where |
|    1 | SIMPLE      | fe    | eq_ref | PRIMARY                                                                                             | PRIMARY  | 8       | db_biss.file.fileid |    1 | Using where |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+----------+---------+---------------------+------+-------------+

@skjnldsv
Copy link
Member

skjnldsv commented Oct 8, 2021

Applied on NC production. 1 sec per 100k rows in the filecache. Expecting minimal downtime then.
So far so good!

@icewind1991
Copy link
Member Author

@nickvergessen make sure that you have #28476 and #28608 applied

The reason that `filecache.path` hasn't had an index added is the mysql limitation of ~1kb for indexeded fields,
which is to small for the `path`, however mysql supports indexing only the first N bytes of a column instead of the entire column,
allowing us to add an index even if the column is to long.

Because the index doesn't cover the entire column it can't be used in all situations where a normal index would be used, but it does cover the `path like 'folder/path/%'` queries that are used in various places.

Sqlite and Postgresql don't support prefix indexes, but they also don't have the 1kb limit and DBAL handles the differences in index creation.

Signed-off-by: Robin Appelman <robin@icewind.nl>
having the index work properly for the queries we need it for requires some additional options which dbal does not support at the momement.
to prevent making it harder to add the correct index later on we don't create the index for now on postgresql

Signed-off-by: Robin Appelman <robin@icewind.nl>
@skjnldsv
Copy link
Member

Let's merge? and backport to 20! 🚀

@skjnldsv
Copy link
Member

/backport to stable22

@skjnldsv
Copy link
Member

/backport to stable21

@skjnldsv
Copy link
Member

/backport to stable20

@skjnldsv skjnldsv mentioned this pull request Oct 13, 2021
@icewind1991
Copy link
Member Author

Let's merge?

Counting as approval, merging

@backportbot-nextcloud
Copy link

The backport to stable20 failed. Please do this backport manually.

@solracsf
Copy link
Member

Some feedback about this PR here: #29377

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants