Skip to content

Tags: TashaSkyUp/unstructured

Tags

0.12.6

Toggle 0.12.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Unstructured v0.12.6 release (Unstructured-IO#2626)

## 0.12.6

### Enhancements

* **Improve ability to capture embedded links in `partition_pdf()` for
`fast` strategy** Previously, a threshold value that affects the capture
of embedded links was set to a fixed value by default. This allows users
to specify the threshold value for better capturing.
* **Refactor `add_chunking_strategy` decorator to dispatch by name.**
Add `chunk()` function to be used by the `add_chunking_strategy`
decorator to dispatch chunking call based on a chunking-strategy name
(that can be dynamic at runtime). This decouples chunking dispatch from
only those chunkers known at "compile" time and enables runtime
registration of custom chunkers.

### Features
* **Added Unstructured Platform Documentation** The Unstructured
Platform is currently in beta. The documentation provides how-to guides
for setting up workflow automation, job scheduling, and configuring
source and destination connectors.

### Fixes

* **Partitioning raises on file-like object with `.name` not a local
file path.** When partitioning a file using the `file=` argument, and
`file` is a file-like object (e.g. io.BytesIO) having a `.name`
attribute, and the value of `file.name` is not a valid path to a file
present on the local filesystem, `FileNotFoundError` is raised. This
prevents use of the `file.name` attribute for downstream purposes to,
for example, describe the source of a document retrieved from a network
location via HTTP.
* **Fix SharePoint dates with inconsistent formatting** Adds logic to
conditionally support dates returned by office365 that may vary in date
formatting or may be a datetime rather than a string.
* **Include warnings** about the potential risk of installing a version
of `pandoc` which does not support RTF files + instructions that will
help resolve that issue.
* **Incorporate the `install-pandoc` Makefile recipe** into relevant
stages of CI workflow, ensuring it is a version that supports RTF input
files.
* **Fix Google Drive source key** Allow passing string for source
connector key.
* **Fix table structure evaluations calculations** Replaced special
value `-1.0` with `np.nan` and corrected rows filtering of files metrics
basing on that.
* **Fix Sharepoint-with-permissions test** Ignore permissions metadata,
update test.
* **Fix table structure evaluations for edge case** Fixes the issue when
the prediction does not contain any table - no longer errors in such
case.

0.12.5

Toggle 0.12.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
build(release): release commit for 0.12.5 (Unstructured-IO#2585)

0.12.4

Toggle 0.12.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
build(release): release commit for 0.12.4 (Unstructured-IO#2525)

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>

0.12.3

Toggle 0.12.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
build(release): release commit for 0.12.3 (Unstructured-IO#2466)

0.12.2

Toggle 0.12.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixed sphinx-build error by pinning alabaster=-0.7.13 (Unstructured-I…

…O#2436)

0.12.1

Toggle 0.12.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
v0.12.1 release (Unstructured-IO#2429)

0.12.0

Toggle 0.12.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
drop python3.8 (Unstructured-IO#2372)

### Description
Remove all uses of python3.8

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>

0.11.8

Toggle 0.11.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Unstructured SaaS API subscription guide (Unstructured-IO#2341)

To test:
> cd docs && make html

Sections:
- New User sign-up: (i) registration form, (ii) payment processing, and
(iii) use API key & URL
- API Account maintenance: (i) update billing, (ii) opt-in email, (iii)
rotate API key, and (iv) cancel plan
- Get Supports

0.11.7

Toggle 0.11.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: Fix api_url param to partition_via_api (Unstructured-IO#2342)

Closes Unstructured-IO#2340 

We need to make sure the custom url is passed to our client. The client
constructor takes the base url, so for compatibility we can continue to
take the full url and strip off the path.

To verify, run the api locally and confirm you can make calls to it.

```
# In unstructured-api
make run-web-app

# In ipython in this repo
from unstructured.partition.api import partition_via_api
filename = "example-docs/layout-parser-paper.pdf"
partition_via_api(filename=filename, api_url="http://localhost:8000")
```

0.11.6

Toggle 0.11.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
build: release commit for 0.11.6 (Unstructured-IO#2304)