Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
시도
https://www.notion.so/24-7-4-USIO-1fb3c634391142c2baaede470f4285b4?pvs=4#2ecdd64c66464d68ac7bf09920bc39d4
concurrent.futures.ProcessPoolExecutor를 사용하여 여러 페이지를 동시에 처리합니다.
_process_pdfminer_pages_parallel과 _process_single_page 함수로 나누어 병렬 처리를 지원합니다.
각 페이지를 독립적으로 처리하고, 결과를 나중에 모아서 elements 리스트에 추가합니다.
모든 페이지 처리가 완료된 후 전체 요소를 한 번에 정렬합니다.
각 페이지 처리마다 새로운 PDFResourceManager와 PDFPageAggregator를 생성합니다.
아래 코드에서 제안한 내용을 바탕으로 작성해보았습니다.
https://blog.shikoan.com/pdfminer-parallel/#google_vignette
시도해보자.
on prem
eks
결론
병렬 시도로 수확이 없었다. 시간 차이가 없다.