Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG FIX: daemonic processes are not allowed to have children #135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rishiraj
Copy link

Error:
PDF file used in Marker: crowd.pdf from benchmark dataset

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 216, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 20, in use_marker
    result = markdown_extractor.extract(content, config)
  File "/home/user/app/marker/markdown_extractor.py", line 31, in extract
    full_text, images, out_meta = convert_single_pdf(inputtmpfile.name, self.model_lst, max_pages=params.max_pages, langs=params.langs, batch_multiplier=params.batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/convert.py", line 86, in convert_single_pdf
    surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/ocr/detection.py", line 24, in surya_detection
    predictions = batch_text_detection(images, det_model, processor, batch_size=int(get_batch_size() * batch_multiplier))
  File "/usr/local/lib/python3.10/site-packages/surya/detection.py", line 135, in batch_text_detection
    results = list(executor.map(parallel_get_lines, preds, orig_sizes))
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 766, in map
    results = super().map(partial(_process_chunk, fn),
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 738, in submit
    self._start_executor_manager_thread()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 678, in _start_executor_manager_thread
    self._launch_processes()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 705, in _launch_processes
    self._spawn_process()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
    p.start()
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children

Reason:
In Python, a daemon process cannot have child processes. The ProcessPoolExecutor is likely being used in a context where its parent process is a daemon.

Fix:
To fix this, ensure that the parent process is not a daemon or refactor the code to use ThreadPoolExecutor instead if parallelism is needed and I/O-bound tasks dominate the workload. For CPU-bound tasks, you might need to avoid creating processes within daemon threads.

Error:
PDF file used in Marker: crowd.pdf from benchmark dataset
```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 216, in thread_wrapper
    res = future.result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/app/app.py", line 20, in use_marker
    result = markdown_extractor.extract(content, config)
  File "/home/user/app/marker/markdown_extractor.py", line 31, in extract
    full_text, images, out_meta = convert_single_pdf(inputtmpfile.name, self.model_lst, max_pages=params.max_pages, langs=params.langs, batch_multiplier=params.batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/convert.py", line 86, in convert_single_pdf
    surya_detection(doc, pages, detection_model, batch_multiplier=batch_multiplier)
  File "/usr/local/lib/python3.10/site-packages/marker/ocr/detection.py", line 24, in surya_detection
    predictions = batch_text_detection(images, det_model, processor, batch_size=int(get_batch_size() * batch_multiplier))
  File "/usr/local/lib/python3.10/site-packages/surya/detection.py", line 135, in batch_text_detection
    results = list(executor.map(parallel_get_lines, preds, orig_sizes))
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 766, in map
    results = super().map(partial(_process_chunk, fn),
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 610, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 738, in submit
    self._start_executor_manager_thread()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 678, in _start_executor_manager_thread
    self._launch_processes()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 705, in _launch_processes
    self._spawn_process()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
    p.start()
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
    assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
```

Reason:
In Python, a daemon process cannot have child processes. The ProcessPoolExecutor is likely being used in a context where its parent process is a daemon.

Fix:
To fix this, ensure that the parent process is not a daemon or refactor the code to use ThreadPoolExecutor instead if parallelism is needed and I/O-bound tasks dominate the workload. For CPU-bound tasks, you might need to avoid creating processes within daemon threads.
Copy link
Contributor

github-actions bot commented Jun 15, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@rishiraj
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant