Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symlink issues when scanning for files #21723

Closed
wants to merge 3 commits into from

Conversation

adrium
Copy link

@adrium adrium commented Jul 6, 2020

Description

Hi, there are issues with the file scanner and symlinks.
This PR fixes the ones that are crashing the files:scan command for local storage:

  1. Inexisting symlink targets
  2. Symlinks pointing outside the data directory

I am using Nextcloud more like a GUI to browse files online and do the sync or copy by other means.
Hence, some of the symlinks are not valid anymore when copied to the storage folder.

With the fixes applied, I was finally able to run a full scan of all files:

Folders: 46353
Files: 433857
Elapsed time: 00:20:28

The commits may not pass all checks when submitting.
I will try to update the code and force-push the branch with improvements.
Let me know about other adjustments so that the code is ready for merging.

Case 1

The scanner uses OC\Files\Storage\Storage::getDirectoryContent() which calls OC\Files\Storage\Local::getMetaData() for local storage to populate the array of children.
Because invalid symlinks are not readable, the array can contain null which is not expected by OC\Files\Cache\Scanner::handleChildren()

The PR fixes this by ignoring unreadable entries.

Case 2

OC\Files\Storage\Local::getMetaData() uses OC\Files\Storage\Local::getSourcePath() to resolve the filename.
The method can throw a ForbiddenException if the path is outside the data directory which is not handled gracefully.
Since null is returned, if the file is not readable, it is consistent to return null in that case as well.

Copy link
Contributor

@kesselb kesselb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to debug and submit a patch 👍

$fullPath = $this->getSourcePath($path);
$fullPath = null;
try {
$fullPath = $this->getSourcePath($path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment for Local.php below. It's getting a bit longer...

@@ -415,6 +415,10 @@ private function handleChildren($path, $recursive, $reuse, $folderId, $lock, &$s
$childQueue = [];
$newChildNames = [];
foreach ($newChildren as $fileMeta) {
if ($fileMeta === null) {
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure. If we are not able to process symlinks properly getDirectoryContent should not return them.

if (!Filesystem::isIgnoredDir($file) && !Filesystem::isFileBlacklisted($file)) {

We might add is_link check there and not yield the meta data. Probably is_link could be added to isFileBlacklisted if every symlink is actually forbidden (cc @icewind1991)

Copy link
Author

@adrium adrium Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment below. It's getting a bit longer...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrium
Copy link
Author

adrium commented Jul 7, 2020

Hey @kesselb Thanks a lot for your review! I gave this some thought...

It makes sense to return only readable data from getDirectoryContent.
However, leaving this PR aside, there is still the possibility that getMetaData returns null:

  • In Local.php: if (!@stat($fullPath))
  • In Common.php: if (!$this->getPermissions($path) & \OCP\Constants::PERMISSION_READ)

And by the way, in Common.php it is written: can't read, nothing we can do, return null so it seems that getting null from this function is valid. (Documentation fix necessary?)

Hence, it seams reasonable to fix it in getDirectoryContent and I would propose to directly filter out the null values instead of other checks, because:

  • is_link alone is not sufficient, because symlinks are followed in the current implementation (there are other bugs related to that)
  • isFileBlacklisted does not seem appropriate because it seems to be a config-related function
  • isIgnoredDir could be considered
    • But then the problem with getMetaData returning null will occur again

@J0WI
Copy link
Contributor

J0WI commented Jul 8, 2020

I just found another symlink issue. If you scanned a folder that contains symlinks and try to delete the folder afterwards in Nextcloud, it will fail with the following error: rmdir(/var/www/html/data/admin/files/path/to/symlink): Directory not empty at /var/www/html/lib/private/Files/Storage/Local.php#115

@adrium
Copy link
Author

adrium commented Jul 9, 2020

Thanks for your report. This PR tries to fix issues with the scanner.
What happens if you delete the file by other means (rm on the server) and run files:scan again?

@kesselb
Copy link
Contributor

kesselb commented Oct 22, 2020

@adrium would you mind to rebase this branch and solve the conflicts? I will request another review then. Sorry it took so long. Thanks again for debugging 👍

@adrium
Copy link
Author

adrium commented Nov 13, 2020

I am not using nextcloud for this anymore, but I can try set up another test case, if there is interest.

Please give me about two weeks.

@adrium adrium force-pushed the fix-symlinks-scanner branch 2 times, most recently from 3e5f43e to 7bcd87b Compare January 3, 2021 20:48
@adrium
Copy link
Author

adrium commented Jan 3, 2021

Hey, I worked on the PR again and tested it in the VM. It also fixes cyclic symlinks now!

  1. Inexisting symlink targets (fixed thanks to change in filter files containing a hash in the path for ftp storages #21789)
  2. Symlinks pointing outside the data directory (fixed)
    • Same fix as initial PR by catching the ForbiddenException
  3. Cyclic symlinks (fixed)
    • Fixed by detecting a parent path. It is deliberate to not introduce new functions such as is_link()
    • Also throwing ForbiddenException

What to do about the DCO check? Do I just have to repeat my name in the commit message?

The below script can be used to create files to test the scanner.

NCDATA=/mnt/ncdata
NCUSER=admin

NCTEST=$NCDATA/$NCUSER/files/Tests

DIR=$NCTEST/valid
mkdir -p $DIR
echo '# Valid links' > $DIR/README.md
ln -s ../../Photos $DIR/photos
ln -s ../../Photos/Library.jpg $DIR/file.jpg

# Case 1
DIR=$NCTEST/invalid
mkdir -p $DIR
echo '# Test for invalid link target' > $DIR/README.md
ln -s /invalid $DIR/link1
ln -s /tmp/invalid $DIR/link2

# Case 2
DIR=$NCTEST/outside
mkdir -p $DIR
mkdir /tmp/hello
echo '# Test for link target outside the user folder' > $DIR/README.md
echo '# Hello World' > /tmp/hello/README.md
ln -s /tmp/hello $DIR/link

# Case 3
DIR=$NCTEST/cyclic
mkdir -p $DIR
echo '# Test link pointing to parent' > $DIR/README.md
mkdir -p $DIR/directory/files
ln -s ../../directory $DIR/directory/files/parent

@adrium adrium requested a review from kesselb January 3, 2021 23:33
@J0WI J0WI added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Jan 5, 2021
@kesselb
Copy link
Contributor

kesselb commented Jan 8, 2021

To reference:
#24989
#22943

I'm not sure if we need this patch anymore. That needs some testing.

@adrium
Copy link
Author

adrium commented Jan 9, 2021

Hi, from the first look at OC\Files\Storage\Local::getSourcePath() it can still throw a ForbiddenException which is not handled in getMetaData() so I believe this is still relevant.

Do you need support with testing?

@AndreKoepke
Copy link

@icewind1991 and @kesselb what is the current status? Can we help?

@BUFU1610
Copy link

Just one quick question: Is this still a problem now?
How has this not been reviewed in half a year otherwise?

@skjnldsv
Copy link
Member

Just one quick question: Is this still a problem now? How has this not been reviewed in half a year otherwise?

It seems like this isn't such an issue for most people and failed to gain attraction :(
As there is no feedback since a while I will close this ticket.
If you will decide to work on this feature again and if it hasn't been fixed or implemented already, feel free to re-open and solve the various conflicts.

Thanks for the interest in Nextcloud and the effort put into this! 🙇

@skjnldsv skjnldsv closed this Feb 27, 2024
@skjnldsv skjnldsv removed the 3. to review Waiting for reviews label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trying to access array offset on value of type null at /var/www/html/lib/private/Files/Cache/Scanner.php#418
6 participants