Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/learn not working properly for directories in 2.14.0 #748

Closed
michaelchia opened this issue Apr 26, 2024 · 1 comment
Closed

/learn not working properly for directories in 2.14.0 #748

michaelchia opened this issue Apr 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@michaelchia
Copy link
Collaborator

Description

/learn does not properly index all the files in the directory.

def split(path, all_files: bool, splitter):
chunks = []
# Check if the path points to a single file
if os.path.isfile(path):
dir = os.path.dirname(path)
filenames = [os.path.basename(path)]
else:
for dir, subdirs, filenames in os.walk(path):
# Filter out hidden filenames, hidden directories, and excluded directories,
# unless "all files" are requested
if not all_files:
subdirs[:] = [
d for d in subdirs if not (d[0] == "." or d in EXCLUDE_DIRS)
]
filenames = [f for f in filenames if not f[0] == "."]
for filename in filenames:
filepath = Path(os.path.join(dir, filename))
# Lower case everything to make sure file extension comparisons are not case sensitive
if filepath.suffix.lower() not in {j.lower() for j in SUPPORTED_EXTS}:
continue
document = dask.delayed(path_to_doc)(filepath)
chunk = dask.delayed(split_document)(document, splitter)
chunks.append(chunk)
flattened_chunks = dask.delayed(flatten)(*chunks)
return flattened_chunks

split only keeps the files in the last iteration of os.walk(path).

I've made a PR that fixes this. #747

@michaelchia michaelchia added the bug Something isn't working label Apr 26, 2024
@dlqqq
Copy link
Member

dlqqq commented Apr 29, 2024

Closed by #747. Thank you @michaelchia !

@dlqqq dlqqq closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants