Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audit tool to extract inconsistencies between users and buckets #1202

Merged
merged 4 commits into from
Jul 31, 2015

Conversation

shino
Copy link
Contributor

@shino shino commented Jul 30, 2015

There is possibility of inconsistencies between the user bucket moss.users
and the buckets bucket moss.buckets because no transaction is (can be) used.

This PR adds tool for detecting inconsistencies between them.
Three kinds of inconsistencies:

  1. Bucket object in buckets bucket is not tracked by the user
    in its value. ({not_tracked, {User, Bucket}})
  2. A certain user thinks it owns a bucket but no objects in buckets bucket
    exists. ({no_bucket_object, {User, Bucket}})
  3. A certain user thinks it owns a bucket but the bucket is actually
    owned by different user. ({different_user, {User, Bucket, DifferentUser}})

Sample execution by console command and log output in it:

% dev/dev1/bin/riak-cs-admin audit-bucket-ownership
[{no_bucket_object,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"corrupt">>}},
 {different_user,{<<"N4ADKYOVGZLKL5RVD0MA">>,<<"bob2">>,
                  <<"T_RNSHIN22TIJ1MKF8BP">>}},
 {not_tracked,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"missing-in-list">>}},
 {not_tracked,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"bob2">>}},
 {not_tracked,{<<"N4ADKYOVGZLKL5RVD0MA">>,<<"missing-in-bobs-list">>}}]
16:54:20.475 [info] Bucket is not tracked by user: {User, Bucket} = {N4ADKYOVGZLKL5RVD0MA, missing-in-bobs-list}
16:54:20.475 [info] Bucket is not tracked by user: {User, Bucket} = {T_RNSHIN22TIJ1MKF8BP, bob2}
16:54:20.475 [info] Bucket is not tracked by user: {User, Bucket} = {T_RNSHIN22TIJ1MKF8BP, missing-in-list}
16:54:20.475 [info] Bucket is owned by different user: {User, Bucket, DifferentUser} = {N4ADKYOVGZLKL5RVD0MA, bob2, T_RNSHIN22TIJ1MKF8BP}
16:54:20.475 [info] Bucket does not exist: {User, Bucket} = {T_RNSHIN22TIJ1MKF8BP, corrupt}

This PR is in the state of request-for-comment. Any comments on categorization,
log format, command line output, implementation, etc is appreciated.

@shino shino added this to the 2.1.0 milestone Jul 30, 2015
@kuenishi
Copy link
Contributor

Note: this work is for #1161 (RCS-211) .

{ok, RcPid} = riak_cs_riak_client:start_link([]),
try
Inconsistencies = audit_bucket_ownership0(RcPid),
io:format("~p", [Inconsistencies])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~n would make this look better. Also, a message like "No inconsistency found." will be more informative when Inconsistencies is just [].

@kuenishi
Copy link
Contributor

It worked;

$ curl -XPUT  'http://localhost:8098/buckets/moss.buckets/keys/test' -H "X-Riak-Vclock: a85hYGBgzGDKBVI8ypz/fp7QY+KCCCUy5rEyKCjLnOfLAgA=" -d "sub-key"
$ rel/riak-cs/bin/riak-cs-admin audit-bucket-ownership
[{different_user,{<<"admin-key">>,<<"test">>,<<"sub-key">>}},
 {not_tracked,{<<"sub-key">>,<<"test">>}}]

We would follow this with some pretty printing or documentation.

Inconsistencies0 =
gb_sets:fold(
fun ({U, B}, Acc) ->
lager:info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to output console.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By discussion in chat, output to console may be very noisy when large number of inconsistencies.

- Explicitly output "no inconsistecies"
- For tuple output, add period in order to consult the output afterward
  and line break for human friendliness
@shino
Copy link
Contributor Author

shino commented Jul 31, 2015

Thanks for review 😄 Updated.

borshop added a commit that referenced this pull request Jul 31, 2015
Add audit tool to extract inconsistencies between users and buckets

Reviewed-by: kuenishi
@shino
Copy link
Contributor Author

shino commented Jul 31, 2015

@borshop merge

@shino
Copy link
Contributor Author

shino commented Jul 31, 2015

For release note: please look at the description at the top.

@borshop borshop merged commit bd00b41 into develop Jul 31, 2015
@shino shino deleted the feature/user-bucket-consistency branch July 31, 2015 06:13
@shino
Copy link
Contributor Author

shino commented Jul 31, 2015

micro benchmark with 100,000 users and 27,000 buckets and small inconsistencies:

% time ./dev/dev1/erts-5.10.3/bin/nodetool -name 'rcs-dev1@127.0.0.1' -setcookie riak rpc_infinity riak_cs_console audit_bucket_ownership
[{no_bucket_object,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"corrupt">>}},
 {different_user,{<<"N4ADKYOVGZLKL5RVD0MA">>,<<"bob2">>,
                  <<"T_RNSHIN22TIJ1MKF8BP">>}},
 {not_tracked,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"missing-in-list">>}},
 {not_tracked,{<<"T_RNSHIN22TIJ1MKF8BP">>,<<"bob2">>}},
 {not_tracked,{<<"N4ADKYOVGZLKL5RVD0MA">>,<<"missing-in-bobs-list">>}}].
./dev/dev1/erts-5.10.3/bin/nodetool -name 'rcs-dev1@127.0.0.1' -setcookie ria  0.28s user 0.11s system 0% cpu 2:18.44 total

@shino
Copy link
Contributor Author

shino commented Jul 31, 2015

In local environment (single node cluster with ring size 8, not good for benchmark), it took a few seconds for list keys, it seems that getting every object in serialized way is current bottleneck. But, one more point, Riak CS node consumes CPU ~ 300%. Feels strange. Further investigation / optimization may follow in another ticket / issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants