Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap sqlite cursor with dictionary and set data structures #24

Merged
merged 4 commits into from
Mar 3, 2017

Conversation

liloman
Copy link
Contributor

@liloman liloman commented Feb 22, 2017

  1. Use 2 new data structures:
    -paths_fs (set) contains all the paths(files) in the actual filesystem
    -hashes_dict_db (dictionary) substitute the sqlite query with dict[hash] = set(db paths)
  2. Minimal unitary tests created with bats (bash script)

See #23 for details.

Copy link
Owner

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close, just a few minor changes. Mostly blocking on the exception handling.

src/bitrot.py Outdated
)

return renamed
except:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use bare except:, specify which exceptions you expect. That would also let me understand why you need to wrap this block in a try:except: in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception was for hashes_dict_db[new_sha1], as new_sha1 could be a new one sha1 indeed. ;)

I'm sure this block can be reworked in something simpler/cleaner.


#WARNING!
#Be careful don't use ((, cause (( $status == pp )) && echo Really WRONG!
#the issue is that (( 0 == letters )) is always true ... :(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. "Don't use ((", so what should I use instead? You're using "((" in lines 29, 45, and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the default header for my bats tests.

If you have the chance of having a typo with (( $status == 0 )) and swap the 0 with a O or whatever string it will be always true.

status=0; (( $status == pp )) && echo Really WRONG!

It's just another bash "bug" that I need to report some day (they will think on it as a feature...).

PD: (( )) in bash script is for testing integers, so I reckon strings decay to 0 in bash (( )).

src/bitrot.py Outdated
@@ -180,12 +179,13 @@ def run(self):
errors = []
current_size = 0
missing_paths = self.select_all_paths(cur)
paths, total_size = list_existing_paths(
hashes_dict_db = self.select_all_hashes(cur)
paths_fs, total_size = list_existing_paths(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the paths name as it was before? I don't see the value of changing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the suffixes to differentiate between fs and db info. It was clearer for me to read hashes_db than hashes (and have to remember everytime that they came from the DB), the same reason for paths (do they come from the DB or the FS, ummm? ).

src/bitrot.py Outdated
b'.', expected=missing_paths, ignored={bitrot_db, bitrot_sha512},
follow_links=self.follow_links,
)

for p in paths:
for p in paths_fs:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to wrap this in sorted() for reproducible results if it doesn't affect the execution speed too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.
I'll try it, reproducible results are really important. ;)

The first thing I tried was using just bisect.bisect_left with the former sorted path, but O(1) beats O(log n) really hard, so I opted for using set rather than list. ;)

src/bitrot.py Outdated
chash = ''
while row:
rhash, rpath = row
if chash != rhash:
Copy link
Owner

@ambv ambv Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simpler if you rather do:

result.setdefault(rhash, set()).add(rpath)

Then you don't need to order the SELECT which should be even faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice syntantic sugar I didn't know about setdefault. :)

I'll try it.

src/bitrot.py Outdated
try: # if doesn't exist the fs path of the database
found = [path for path in hashes_dict_db[new_sha1] if path not in paths_fs]
renamed = found.pop()
if renamed :
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unnecessary space before the colon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.
And what about the found.pop() could I use something different?. I don't know why but I found it odd.

src/bitrot.py Outdated
@@ -180,12 +179,13 @@ def run(self):
errors = []
current_size = 0
missing_paths = self.select_all_paths(cur)
paths, total_size = list_existing_paths(
hashes_dict_db = self.select_all_hashes(cur)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name hashes would be simpler and contains the same information.

src/bitrot.py Outdated

# update the path in the database
try: # if doesn't exist the fs path of the database
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm. :S
What if I reword it as:

"If the fs path doesn't exist in the database yet" (/native grammar mode off)
Or just
"If the path isn't in the dabatase"

@ambv ambv merged commit a8e5262 into ambv:master Mar 3, 2017
@ambv
Copy link
Owner

ambv commented Mar 3, 2017

Thanks! ✨ 🍰 ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants