Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db_stress: db_stress segmentation fault #9175

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aravind-wdc
Copy link
Contributor

@aravind-wdc aravind-wdc commented Nov 16, 2021

db_stress seg faults with below commands.

"rm -rf /tmp/rocksdbtest*; sudo ./db_stress --ops_per_thread=1000 --reopen=5 --fs_uri=zenfs://dev:nvmeXnY"
"rm -rf /tmp/rocksdbtest*; db_stress --ops_per_thread=1000 --reopen=5"
=======================================
Error opening unique id file for append: IO error: No such file or directory:
 While open a file for appending: /tmp/rocksdbtest-0/dbstress/.unique_ids:
 No such file or directory
Choosing random keys with no overwrite
Creating 2621440 locks
Starting continuous_verification_thread
2021/11/15-08:46:49  Initializing worker threads
2021/11/15-08:46:49  Starting database operations
2021/11/15-08:46:49  Reopening database for the 1th time
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
DB path: [/tmp/rocksdbtest-0/dbstress]
Segmentation fault
=======================================

db_stress is also broken for custom filesystems such as ZenFS(RocksDb plugin).

This happens on a new database, UniqueIdVerifier's constructor tries to read the
".unique_ids" file, if the file is not present, the data_file_writer_ is set as
an invalid(null) pointer and in subsequent calls (~UniqueIdVerifier()
and UniqueIdVerifier::Verify()) it accesses this null pointer and crashes.

This patch uses the filesystem interface from relevant environment and creates
the .unique_ids file if it does not exist, so that data_file_writer_
points to a valid file and is not null. This patch also fixes the
issue for custom filesystems.

Signed-off-by: Aravind Ramesh Aravind.Ramesh@wdc.com

db_stress seg faults with below commands.

"rm -rf /tmp/rocksdbtest*; sudo ./db_stress --ops_per_thread=1000 --reopen=5 --fs_uri=zenfs://dev:nvmeXnY"
"rm -rf /tmp/rocksdbtest*; db_stress --ops_per_thread=1000 --reopen=5"
=======================================
Error opening unique id file for append: IO error: No such file or directory:
 While open a file for appending: /tmp/rocksdbtest-0/dbstress/.unique_ids:
 No such file or directory
Choosing random keys with no overwrite
Creating 2621440 locks
Starting continuous_verification_thread
2021/11/15-08:46:49  Initializing worker threads
2021/11/15-08:46:49  Starting database operations
2021/11/15-08:46:49  Reopening database for the 1th time
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
DB path: [/tmp/rocksdbtest-0/dbstress]
Segmentation fault
=======================================

db_stress is also broken for custom filesystems such as ZenFS(RocksDb plugin).

This happens on a new database, UniqueIdVerifier's constructor tries to read the
".unique_ids" file, if the file is not present, the data_file_writer_ is set as
an invalid(null) pointer and in subsequent calls (~UniqueIdVerifier()
and UniqueIdVerifier::Verify()) it accesses this null pointer and crashes.

This patch uses the filesystem interface from relevant environment and creates
the .unique_ids file if it does not exist, so that data_file_writer_
points to a valid file and is not null. This patch also fixes the
issue for custom filesystems.

Signed-off-by: Aravind Ramesh <Aravind.Ramesh@wdc.com>
@aravind-wdc
Copy link
Contributor Author

aravind-wdc commented Nov 16, 2021


(gdb) bt
#0  rocksdb::UniqueIdVerifier::~UniqueIdVerifier (this=0x7ffff739d1b8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:360
#1  0x0000555555641a5e in rocksdb::DbStressListener::~DbStressListener (this=0x7ffff739d140, __in_chrg=<optimized out>) at ./db_stress_tool/db_stress_listener.h:62
#2  rocksdb::DbStressListener::~DbStressListener (this=0x7ffff739d140, __in_chrg=<optimized out>) at ./db_stress_tool/db_stress_listener.h:62
#3  std::_Sp_counted_ptr<rocksdb::DbStressListener*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:377
#4  0x0000555555639ccb in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffff73a8540) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#5  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7ffff73a8540) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#6  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7ffff7343ce8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#7  std::__shared_ptr<rocksdb::EventListener, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7ffff7343ce0, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#8  std::shared_ptr<rocksdb::EventListener>::~shared_ptr (this=0x7ffff7343ce0, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
#9  std::_Destroy<std::shared_ptr<rocksdb::EventListener> > (__pointer=0x7ffff7343ce0) at /usr/include/c++/9/bits/stl_construct.h:98
#10 std::_Destroy_aux<false>::__destroy<std::shared_ptr<rocksdb::EventListener>*> (__last=<optimized out>, __first=0x7ffff7343ce0) at /usr/include/c++/9/bits/stl_construct.h:108
#11 std::_Destroy<std::shared_ptr<rocksdb::EventListener>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/9/bits/stl_construct.h:137
#12 std::_Destroy<std::shared_ptr<rocksdb::EventListener>*, std::shared_ptr<rocksdb::EventListener> > (__last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/9/bits/stl_construct.h:206
#13 std::vector<std::shared_ptr<rocksdb::EventListener>, std::allocator<std::shared_ptr<rocksdb::EventListener> > >::_M_erase_at_end (this=0x7ffff73b2200, __pos=0x7ffff7343ce0)
    at /usr/include/c++/9/bits/stl_vector.h:1793
#14 std::vector<std::shared_ptr<rocksdb::EventListener>, std::allocator<std::shared_ptr<rocksdb::EventListener> > >::clear (this=0x7ffff73b2200) at /usr/include/c++/9/bits/stl_vector.h:1496
#15 rocksdb::StressTest::Open (this=<optimized out>) at db_stress_tool/db_stress_test_base.cc:2497
#16 0x000055555563c5dd in rocksdb::StressTest::Reopen (this=0x7ffff73b2000, thread=<optimized out>) at db_stress_tool/db_stress_test_base.cc:2790
#17 0x000055555563ecb5 in rocksdb::StressTest::OperateDb (this=0x7ffff73b2000, thread=thread@entry=0x7ffff507d600) at ./db_stress_tool/db_stress_shared_state.h:196
#18 0x000055555562386e in rocksdb::ThreadBody (v=0x7ffff507d600) at ./db_stress_tool/db_stress_shared_state.h:196
#19 0x00005555557f2bc9 in rocksdb::(anonymous namespace)::StartThreadWrapper (arg=0x7ffff5190030) at env/env_posix.cc:454
#20 0x00007ffff7f9a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007ffff775d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I initially tried passing in Env or FileSystem, but db_stress_env might use FaultInjectionTestFS and we don't want fault injection in this test-supporting I/O path. (It's not an I/O path under test.) @zhichao-cao or @siying know the right answer here?

And apparently you're running db_stress in release build. I didn't know that was done, because the point of db_stress is to detect bugs and the debug build is generally much more sensitive to anomalies. We can change my assert(false) cases to std::abort() to ensure the intended invariants even in release build.

@@ -84,7 +88,8 @@ UniqueIdVerifier::UniqueIdVerifier(const std::string& db_name)
}

UniqueIdVerifier::~UniqueIdVerifier() {
data_file_writer_->Close(IOOptions(), /*dbg*/ nullptr);
if (data_file_writer_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_file_writer_ is not intended to be nullable, because these checks are not intended to be optional. (If they were optional, Verify() would also need updating)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it is expected to be non-null, so will remove this check.

Status s2 = fs->FileExists(path_, opts, /*dbg*/ nullptr);
if (s2.IsNotFound()) {
std::unique_ptr<FSWritableFile> wf;
s = fs->NewWritableFile(path_, FileOptions(), &wf, /*dbg*/ nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary according to the API docs of ReopenWritableFile

Copy link
Contributor Author

@aravind-wdc aravind-wdc Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you pointed it out, I dug in a bit more into it and turns out the StressTest() constructor is deleting the "dbstress" directory under "/tmp/rocksdbtest-0/" directory because "--destroy_db_initially" option is by default on.
In UniqueIdVerifier() constructor, ReOpenWritableFile() tries to create the file
"/tmp/rocksdbtest-0/dbstress/.unique_ids" but it fails because the directory dbstress is missing, so it fails to create the "unique_ids"file and "data_file_writer_" becomes an invalid pointer, causing seg faults in ~UniqueIdVerifier() and Verify() in release builds and asserting on assert(false) in debug builds.

So, below command seg faults on both debug and release builds
"rm -rf /tmp/rocksdbtest*; db_stress --ops_per_thread=1000 --reopen=5"
Below command passes on both debug and release builds
"rm -rf /tmp/rocksdbtest*; db_stress --ops_per_thread=1000 --reopen=5 --destroy_db_initially=0"

So I added a change to create the db directory if it is missing, so that ReOpenWritableFile() can work as expected and there is no need of NewWritableFile().

// Use default FileSystem to avoid fault injection, etc.
FileSystem& fs = *FileSystem::Default();
// Use the FileSystem from the environment
const std::shared_ptr<rocksdb::FileSystem>fs = env->GetFileSystem();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rocksdb::

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -27,14 +27,14 @@ UniqueIdVerifier::UniqueIdVerifier(const std::string& db_name)
// very good probability for the quantities in this test.
offset_ = Random::GetTLSInstance()->Uniform(17); // 0 to 16

// Use default FileSystem to avoid fault injection, etc.
FileSystem& fs = *FileSystem::Default();
// Use the FileSystem from the environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in the comment, for the listener, we use the default FS (a new instance) such that, the calls will not executed by fault_injection_fs. If we use the one from env_, then we may get IOError here, which is not expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way might be, we create another wrapper for the customized FileSystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
Could you please elaborate a bit more on the new wrapper that you are suggesting ? What would this wrapper include/exclude ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao @aravind-wdc What is the goal here? You want to get the FileSystem from inside the FaultInjectionTestFS? Could you just use static_cast<FileSystem*>(env->GetFileSystem()->Inner())

@aravind-wdc
Copy link
Contributor Author

aravind-wdc commented Nov 17, 2021

Thanks @pdillinger and @zhichao-cao for the review comments.
I am thinking of splitting this patch into 2 PRs.

PR 1, to fix the seg fault issue for normal db_stress. Below is the patch for that, I will send it as a new PR. Basically creates the db directory if missing.

diff --git a/db_stress_tool/db_stress_listener.cc b/db_stress_tool/db_stress_listener.cc
index 9968f5465..290d49008 100644
--- a/db_stress_tool/db_stress_listener.cc
+++ b/db_stress_tool/db_stress_listener.cc
@@ -31,6 +31,13 @@ UniqueIdVerifier::UniqueIdVerifier(const std::string& db_name)
   FileSystem& fs = *FileSystem::Default();
   IOOptions opts;
 
+  Status st = fs.CreateDirIfMissing(db_name, opts, nullptr);
+  if (!st.ok()) {
+    fprintf(stderr, "Failed to create directory %s: %s\n",
+                   db_name.c_str(), st.ToString().c_str());
+    exit(1);
+  }
+
   {
     std::unique_ptr<FSSequentialFile> reader;
     Status s =

PR 2: Once I get the clarification from @zhichao-cao on the wrapper, I will send another PR which would use the environment based filesystem so that custom filesystems like ZenFS can run db_stress.

Thoughts ?

@pdillinger
Copy link
Contributor

PR 1, to fix the seg fault issue for normal db_stress. Below is the patch for that, I will send it as a new PR. Basically creates the db directory if missing.

I'm not convinced this is needed if using the right Env/FS.

PR 2: Once I get the clarification from @zhichao-cao on the wrapper, I will send another PR which would use the environment based filesystem so that custom filesystems like ZenFS can run db_stress.

I think we need to save a copy of raw_env for this kind of use before we wrap it here: https://github.com/facebook/rocksdb/blob/main/db_stress_tool/db_stress_tool.cc#L101

@mrambacher
Copy link
Contributor

PR 1, to fix the seg fault issue for normal db_stress. Below is the patch for that, I will send it as a new PR. Basically creates the db directory if missing.

I'm not convinced this is needed if using the right Env/FS.

PR 2: Once I get the clarification from @zhichao-cao on the wrapper, I will send another PR which would use the environment based filesystem so that custom filesystems like ZenFS can run db_stress.

I think we need to save a copy of raw_env for this kind of use before we wrap it here: https://github.com/facebook/rocksdb/blob/main/db_stress_tool/db_stress_tool.cc#L101

I do not think you do not need to save the raw env as you can get its file system using target(). If you really need to, you could verify whether the outer FileSystem was a FaultInjection one by IsInstanceOf.

One point of the configurable/customizable infrastructure was to eliminate some of the guess work here and make things simpler...

@pdillinger
Copy link
Contributor

I do not think you do not need to save the raw env as you can get its file system using target(). If you really need to, you could verify whether the outer FileSystem was a FaultInjection one by IsInstanceOf.

One point of the configurable/customizable infrastructure was to eliminate some of the guess work here and make things simpler...

That approach is less robust to code evolution, e.g. if another layer of indirection is added, mysterious failures might ensue.

@aravind-wdc
Copy link
Contributor Author

I'm not convinced this is needed if using the right Env/FS.

Since the issue is that the "dbstress" directory is not available at this point, even with the right Env/FS, it will fail to create the ".unique_ids" file. Creating the "dbstress" directory if it is missing, ensures that ReopenWritableFile() is able to create the file.

@pdillinger
Copy link
Contributor

Since the issue is that the "dbstress" directory is not available at this point, even with the right Env/FS, it will fail to create the ".unique_ids" file. Creating the "dbstress" directory if it is missing, ensures that ReopenWritableFile() is able to create the file.

OK, yes, thanks. For whatever reason, our db_stress setup creates the directory before starting db_stress.

@aravind-wdc
Copy link
Contributor Author

OK, yes, thanks. For whatever reason, our db_stress setup creates the directory before starting db_stress.

Thanks. I have sent a PR #9219 to fix it in db_stress for regular use case.
I will send another PR/commit to fix the same issue for other environments like ZenFS.

@aravind-wdc
Copy link
Contributor Author

aravind-wdc commented Jan 3, 2022

Thanks for pulling the earlier change.
I was on vacation, so could not submit this any earlier, apologies.
db_stress fails when run on custom filesystems like ZenFS, which operates as a plugin to RocksDB.
"sudo ./db_stress --ops_per_thread=1000 --reopen=5 --fs_uri=zenfs://dev:nvmeXnY"
This is because it uses a default filesystem, in UniqueIdVerifier() constructor, causing it to not access the correct filesystem. So I had passed the env to the Uniqueidverifier constructor and get the correct filesystem interface from it, in the earlier patch, but I got feedback that when fault injection is enabled, we do not want to use the FaultInjection file system in dbstresslistener.

So, I have submitted another PR(#9352) with changes where faultinjection file system is not passed to dbstresslistener, even when fault injection is enabled.

Please review.

@aravind-wdc
Copy link
Contributor Author

@pdillinger Could you please take a look at PR #9352 , which is based on the feedback I got here.
I also got feedback from riversand963 also i will rework it once I have your feedback also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants