Tags: TashaSkyUp/unstructured
Tags
Unstructured v0.12.6 release (Unstructured-IO#2626) ## 0.12.6 ### Enhancements * **Improve ability to capture embedded links in `partition_pdf()` for `fast` strategy** Previously, a threshold value that affects the capture of embedded links was set to a fixed value by default. This allows users to specify the threshold value for better capturing. * **Refactor `add_chunking_strategy` decorator to dispatch by name.** Add `chunk()` function to be used by the `add_chunking_strategy` decorator to dispatch chunking call based on a chunking-strategy name (that can be dynamic at runtime). This decouples chunking dispatch from only those chunkers known at "compile" time and enables runtime registration of custom chunkers. ### Features * **Added Unstructured Platform Documentation** The Unstructured Platform is currently in beta. The documentation provides how-to guides for setting up workflow automation, job scheduling, and configuring source and destination connectors. ### Fixes * **Partitioning raises on file-like object with `.name` not a local file path.** When partitioning a file using the `file=` argument, and `file` is a file-like object (e.g. io.BytesIO) having a `.name` attribute, and the value of `file.name` is not a valid path to a file present on the local filesystem, `FileNotFoundError` is raised. This prevents use of the `file.name` attribute for downstream purposes to, for example, describe the source of a document retrieved from a network location via HTTP. * **Fix SharePoint dates with inconsistent formatting** Adds logic to conditionally support dates returned by office365 that may vary in date formatting or may be a datetime rather than a string. * **Include warnings** about the potential risk of installing a version of `pandoc` which does not support RTF files + instructions that will help resolve that issue. * **Incorporate the `install-pandoc` Makefile recipe** into relevant stages of CI workflow, ensuring it is a version that supports RTF input files. * **Fix Google Drive source key** Allow passing string for source connector key. * **Fix table structure evaluations calculations** Replaced special value `-1.0` with `np.nan` and corrected rows filtering of files metrics basing on that. * **Fix Sharepoint-with-permissions test** Ignore permissions metadata, update test. * **Fix table structure evaluations for edge case** Fixes the issue when the prediction does not contain any table - no longer errors in such case.
build(release): release commit for 0.12.5 (Unstructured-IO#2585)
build(release): release commit for 0.12.4 (Unstructured-IO#2525) Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>
build(release): release commit for 0.12.3 (Unstructured-IO#2466)
drop python3.8 (Unstructured-IO#2372) ### Description Remove all uses of python3.8 --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
Unstructured SaaS API subscription guide (Unstructured-IO#2341) To test: > cd docs && make html Sections: - New User sign-up: (i) registration form, (ii) payment processing, and (iii) use API key & URL - API Account maintenance: (i) update billing, (ii) opt-in email, (iii) rotate API key, and (iv) cancel plan - Get Supports
fix: Fix api_url param to partition_via_api (Unstructured-IO#2342) Closes Unstructured-IO#2340 We need to make sure the custom url is passed to our client. The client constructor takes the base url, so for compatibility we can continue to take the full url and strip off the path. To verify, run the api locally and confirm you can make calls to it. ``` # In unstructured-api make run-web-app # In ipython in this repo from unstructured.partition.api import partition_via_api filename = "example-docs/layout-parser-paper.pdf" partition_via_api(filename=filename, api_url="http://localhost:8000") ```
PreviousNext