Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: Add ImmutableCFOption disable_preload_pinning option #8463

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

matthewvon
Copy link
Contributor

@matthewvon matthewvon commented Jun 25, 2021

This PR is to address #8397 and an addition use case.

There are two use cases handled by this PR:

  1. Memory management:
    VersionBuilder::Rep::LoadTableHandlers() currently pins 25% of the max_open_files table_readers within the table_cache. Over time these table readers become irrelevant and just chew up memory and disk space. It also creates a very bad environment for users that are manipulating max_open_files dynamically as part of a general memory management scheme. (This scenario is what brought this problem to our attention.) kFilePreloadWithoutPinning will prevent these files from becoming "stuck" within the table cache.

  2. Dynamic corruption recovery:
    Even if pinning by LoadTableHandlers is disabled, the normal database open sequence attempts to open .sst files. If one of those files happens to be corrupt, the database open fails. We have incorporated regular column family checkpoints within our application. It is possible for us to automatically recover to saved checkpoints if and only if the database starts. kFilePreloadDisabled allows the database to open without tripping over corrupted .sst files. Application code now has the ability to detect and correct without user intervention.

The new option is FilePreload file_preload. Its default, kFilePreloadWithPinning, matches current behavour.

@matthewvon
Copy link
Contributor Author

matthewvon commented Aug 17, 2021

Today, beginning work to convert disable_preload_pinning to a 3 way selection:

kFilePreloadWithPinning
kFilePreloadWithoutPinning
kFilePreloadDisabled

This will impact both LoadTableHandlers() and MaybeInitializeFileMetaData() during DB::Open() calls.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes all seem reasonable to me but I am not an expert in this area.

What is missing is showing some scenario/test that bad things go awry with pinnings enabled that pinnings disabled will allow to pass. Is it possible to add some test cases that show a problem that this resolves that does not crash?

@matthewvon
Copy link
Contributor Author

matthewvon commented Sep 24, 2021

This issue has escalated internally. I have two day’s work on another feature then back to this. We previously added GetIntProperties() access to table_cache. Should be able to prove / disprove my claims easily.

Thank you for asking.

@matthewvon
Copy link
Contributor Author

@mrambacher I have finished making the code changes and building a unit test within version_builder_test.cc (last part of the file). Looks like I have some merging required. But I am otherwise code complete.

@matthewvon
Copy link
Contributor Author

@mrambacher @pdillinger The Angel of Software has smiled on me. CircleCI has finally passed this PR. There are a bunch of unrelated clang-format/whitespace changes that Circle demanded. My apologies. There is also a fix to configurable.cc that is again unrelated but blocked CircleCI (serialization of a zero length vector was suppressed, so upon reload said vector did not replace the default value ... fun).

There is a unit test at the bottom of version_builder_test.cc that shows the successful rocksdb Open even with corrupt .sst file ... which is the whole point.

Thanks tons, Matthew

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why there are so many format changes in files/areas that you did not touch?

} else {
// do not accumulate failed files
// (could be hundreds with bad options code)
(void)env_->DeleteFile(file_name).ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than ok(), please use PermitUncheckedError()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

files_meta.emplace_back(file_meta, level);
statuses.emplace_back(Status::OK());
Status ret;
if (0 < max_load) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (max_load > 0) would be clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (!file_meta->table_reader_handle) {
files_meta.emplace_back(file_meta, level);
statuses.emplace_back(Status::OK());
Status ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little worried about "Status ret" being moved so far from its setting/use. It seems likely that something will change where we return early and get an unchecked Status. Can the code be restructured (maybe move the files_meta/status outside the if ()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have shifted the code such that the lambda function is away from the threaded preload processing / loops. Added a couple of minor comments. My understanding of the unchecked status stuff is that the naked "Status ret;" will not be seen as unchecked since it is a return value. The caller will need to perform the check.

// If the file has been opened before, just skip it.
if (!file_meta->table_reader_handle) {
files_meta.emplace_back(file_meta, level);
statuses.emplace_back(Status::OK());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is part of the old code, but is this necessary? Seems like the values will be overwritten in the loop below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code depends upon the vector being properly sized when used later since multiple threads access the vector, though at different indexes. I would say this is "code safety" and/or "thread safety".

100U, 0, 100, 100,
kEntriesPerFile, kDeletionsPerFile,
(i < kTotalSamples));
ToString((i + 100) * 1000 + 999).c_str(), 100U, 0, 100, 100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might just be me but this ToString (and the earlier one) look dangerous to me. Aren't we doing a c_str() on something that is about to be destroyed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code that got reformatted by clang-format. I will change if required for acceptance, but noting "not my code".


TEST_F(FilePreloadTest, PreloadCaching) {
// create a DB with 3 files
// the day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue how that got in there ... both instances removed.

Comment on lines 1737 to 1743
ASSERT_TRUE(WriteStringToFile(db_->GetEnv(), garbage, fail_file, true).ok());

ASSERT_FALSE(TryReopen(new_options).ok())
<< "reopen should fail with corrupted .sst";

new_options = GetOptions(kPreloadDisabled);
ASSERT_TRUE(TryReopen(new_options).ok())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change these to ASSERT_OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -35,6 +35,7 @@ enum class OptionType {
kCompactionPri,
kCompressionType,
kCompactionStopStyle,
kFilePreload,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type needed or can this just be an Enum? I rather not see new types added if not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrambacher I will attempt the kEnum. But I want to point out that there is no current option using that type. Leaves me unsettled. I basically copied the code for the kCompaction* as a model previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have found the correct example code and implemented equivalently. All accessory code to my previous method is eliminated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +611 to +612
} else if (!value.empty() ||
(opt_info.IsVector() && opt_info.IsMutable())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this change has to do with the rest of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to do with making the CircleCI complete. There is a test that randomizes option values and then attempts to serialize and parse. It started putting zero has number of levels, num_levels, consistently. That demonstrated that a zero length vector did not work.

My goal was to simply get the test to pass. The alternate fix is to change the random number for num_levels from 0 through 99 to be 1 through 100. But again it has nothing to do with my change ... like some of the white space only files.

@matthewvon
Copy link
Contributor Author

@mrambacher Wait. My double check revealed that one concern about Status ret was not addressed. Apologies. Also, happy to fix configurable.cc via the options randomizer as discussed. But one of the two solutions is needed to fix serialize/parser options sequence failure with zero length vector ... or new code to better identify zero length vector on both serialization and parsing.

@matthewvon
Copy link
Contributor Author

@mrambacher I think the code is ready for another round of review. The one failing test makes no sense. Did not fail previously and is not failing locally. Likely next push will find it working ... I do not have clearance to force a retry of just that test.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, this looks okay. I think we need an explanation somewhere of why those files never get removed from the table cache and should understand why that is a good thing.

Some of the formats look odd. Please check that your format file is up-to-date. Also, some files appear to only have format changes. You might try to revert those to the original and see if the format stuff comes back (hopefully it does not).

Comment on lines 820 to 825
std::unordered_map<std::string, FilePreload>
OptionsHelper::file_preload_string_map = {
{"kFilePreloadWithPinning", FilePreload::kFilePreloadWithPinning},
{"kFilePreloadWithoutPinning", FilePreload::kFilePreloadWithoutPinning},
{"kFilePreloadDisabled", FilePreload::kFilePreloadDisabled}};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map does not need to be part of OptionsHelper and can be static within cf_options.cc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tools/ldb_cmd.cc Outdated
}
//cretaing variables for row count of each type
// cretaing variables for row count of each type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Creating". I know, not your change...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed.

@matthewvon
Copy link
Contributor Author

@mrambacher Any way to get this PR back on track?

You asked "I think we need an explanation somewhere of why those files never get removed from the table cache and should understand why that is a good thing."

Where would that comment go?

@matthewvon
Copy link
Contributor Author

Oh god. All checks have passed. I am caught completely unprepared and without experience as to how to continue.

@mrambacher
Copy link
Contributor

@siying Can you review this as well? It looks fine to me but am not sure why the original code behaved the way it did and if this is the best alternative

@facebook-github-bot
Copy link
Contributor

@mrambacher has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@siying
Copy link
Contributor

siying commented Feb 18, 2022

Can you explain what the PR does in the summary? It's not clear to me.

@siying
Copy link
Contributor

siying commented Feb 18, 2022

The background of the 1/4 pinning threshold is that we hope when users set a very large options.max_open_files that is far larger than number of files in the DB, we hope they don't have to pay the overhead of cache access in table cache. If it ends up with a problem with a setup where number of files is actually larger than options.max_open_files, the ideal solution is to come up with a way that this scenario is detected and the 1/4 pinning is disabled. We always try to avoid yet another options, and if we can make the system adaptive to avoid it, it will be the best scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants