Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all kind of timeouts hitting Riak configurable #1021

Merged
merged 6 commits into from
Dec 10, 2014

Conversation

kuenishi
Copy link
Contributor

This branch still has an easy bug, but sending it out to keep tracking. I'd be more than happy if somebody takes this (or it's also fine to omit and start a completely new work).

@kellymclaughlin
Copy link
Contributor

Can you elaborate on the "easy bug" you mentioned in the desription?

@kuenishi
Copy link
Contributor Author

kuenishi commented Dec 3, 2014

I thought I had seen so many existing riak_tests failed and believed there should be an easy bug. But today I ran riak_test and then found just one test failing. I apologise about my bad memory if it isn't easy in advance.

10:57:54.007 [warning] gc_tests failed: {{badmatch,["/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs-1.5.2-5-g9827939/priv/tools/repair_gc_bucket.erl","/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs/priv/tools/repair_gc_bucket.erl"]},[{rtcs,repair_gc_bucket,2,[{file,"riak_test/src/rtcs.erl"},{line,565}]},{gc_tests,repair_gc_bucket,1,[{file,"riak_test/tests/gc_tests.erl"},{line,157}]},{gc_tests,'-confirm/0-lc$^0/1-0-',1,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]},{gc_tests,confirm,0,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]}]}
10:57:54.007 [error] 
================ gc_tests failure stack trace =====================
{{badmatch,["/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs-1.5.2-5-g9827939/priv/tools/repair_gc_bucket.erl",
            "/home/kuenishi/rt/riak_cs/current/dev/dev1/lib/riak_cs/priv/tools/repair_gc_bucket.erl"]},
 [{rtcs,repair_gc_bucket,2,[{file,"riak_test/src/rtcs.erl"},{line,565}]},
  {gc_tests,repair_gc_bucket,1,
            [{file,"riak_test/tests/gc_tests.erl"},{line,157}]},
  {gc_tests,'-confirm/0-lc$^0/1-0-',1,
            [{file,"riak_test/tests/gc_tests.erl"},{line,52}]},
  {gc_tests,confirm,0,[{file,"riak_test/tests/gc_tests.erl"},{line,52}]}]}

- All configuration items gathered into `riak_cs_config:*_timeout/0`
- Defaults varies from 5 to 60 seconds according to old code
- New items has defaults by 60 seconds defined in `riak_cs.hrl`
- You can set all configurations at once by setting `riakc_timeouts`
  in `riak_cs` section of `app.config`
- Riak client calls that stem from manual operations does not have
  timeout values configurable

 #Please enter the commit message for your changes. Lines starting
@kellymclaughlin
Copy link
Contributor

Ok I think I've squashed all the issues causing test failures. I'll reverify tomorrow and probably push a rebased version of this branch that makes dialyzer happy and addresses the issues.

@kellymclaughlin kellymclaughlin force-pushed the feature/gh1008-configurable-timeouts branch from e0e3b02 to 40389f2 Compare December 8, 2014 21:24
@kellymclaughlin
Copy link
Contributor

All tests are passing so I'll give this one more riak_test run tomorrow to double-check it and then try to get it merged. In the meantime, happy for any feedback on my commits, but not strictly necessary.

@@ -100,7 +100,8 @@ sum_bucket(BucketName) ->
ok = riak_cs_riak_client:set_bucket_name(RcPid, BucketName),
{ok, ManifestPbc} = riak_cs_riak_client:manifest_pbc(RcPid),
ManifestBucket = riak_cs_utils:to_bucket_name(objects, BucketName),
case riakc_pb_socket:mapred(ManifestPbc, ManifestBucket, Query) of
Timeout = riak_cs_config:storage_calc_timeout(),
case riakc_pb_socket:mapred(ManifestPbc, ManifestBucket, Query, Timeout) of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very line causes this assertion in the cs743_regression_test to fail which is actually a good thing because it means the calculation is actually returning instead of timing out. I tried even up to 10000 objects and the test still passed. I will update to the test to expect success now instead of timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#743 and #759 were fix for crash caused by timeout. So the assertion here should be ?assertEqual to check calculation has successfully detected the timeout without crashing. This timeout setting which is not working any more should be changed to storage_calc_timeout. wrong place

The increase in the default timeout for the mapreduce query used to
retrieve storage calculation results causes the previous expectation
of timeout in this test to no longer be valid.  Update the test
assertion to check that the response is not a timeout instead.
@kellymclaughlin
Copy link
Contributor

CI testing looks good and all the riak_tests are passing for me. I'll defer merging this until I get in tomorrow in case there is any feedback on my commits from today.

?assertEqual(<<"{error,{timeout,[]}}">>, ErrorStr),
ResultStr ->
?assert(not is_integer(ResultStr)),
?assertNotEqual(<<"{error,{timeout,[]}}">>, ResultStr),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#743 and #759 were fix for crash caused by timeout. So the assertion here should be ?assertEqual to check calculation has successfully detected the timeout without crashing. This timeout setting which is not working any more should be changed to storage_calc_timeout.

@kuenishi
Copy link
Contributor Author

Changes except the one I commented above are all awesome, thanks @kellymclaughlin !

@@ -353,7 +353,7 @@ mark_manifests(RiakObject, Bucket, Key, UUIDsToMark, ManiFunction, RcPid) ->
%% with vector clock. This allows us to do a PUT
%% again without having to re-retrieve the object
{ok, ManifestPbc} = riak_cs_riak_client:manifest_pbc(RcPid),
riak_cs_pbc:put(ManifestPbc, UpdObj, [return_body]).
riak_cs_pbc:put(ManifestPbc, UpdObj, [return_body], riak_cs_config:get_gckey_timeout()).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put manifest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, riak_cs_config:put_manifest_timeout() might be more suitable here.

@shino
Copy link
Contributor

shino commented Dec 10, 2014

Only one tiny comment from me. This feature will be great help for production environment! Nice work @kuenishi @kellymclaughlin .

@@ -493,9 +492,9 @@ make_bkeys(ManifestBucketName, Keys) ->
-spec send_map_reduce_request(riak_client(), list()) -> streaming_req_response().
send_map_reduce_request(RcPid, BKeyTuples) ->
%% TODO: change this:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this comment any more because TODO has been done here?

borshop added a commit that referenced this pull request Dec 10, 2014
Make all kind of timeouts hitting Riak configurable

Reviewed-by: kellymclaughlin
@kellymclaughlin
Copy link
Contributor

@borshop merge

@borshop borshop merged commit 0f2c354 into release/1.5 Dec 10, 2014
@shino shino added this to the 1.5.3 milestone Dec 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants