Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add x_tolerance_ratio param to extract_text and similar functions (now properly linted!) #1041

Merged
merged 8 commits into from
Nov 9, 2023
Prev Previous commit
Next Next commit
Add x_tolerance_ratio test for extract_words(...)
  • Loading branch information
jsvine committed Nov 8, 2023
commit 0145647c58cfedf96d8a333f99606cd3ccf4aade
13 changes: 8 additions & 5 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,14 @@ def test_decode_psl_list(self):

def test_x_tolerance_ratio(self):
pdf = pdfplumber.open(os.path.join(HERE, "pdfs/issue-987-test.pdf"))
assert pdf.pages[0].extract_text() == "Big Te xt\nSmall Text"
assert pdf.pages[0].extract_text(x_tolerance=4) == "Big Te xt\nSmallText"
assert (
pdf.pages[0].extract_text(x_tolerance_ratio=0.15) == "Big Text\nSmall Text"
)
page = pdf.pages[0]

assert page.extract_text() == "Big Te xt\nSmall Text"
assert page.extract_text(x_tolerance=4) == "Big Te xt\nSmallText"
assert page.extract_text(x_tolerance_ratio=0.15) == "Big Text\nSmall Text"

words = page.extract_words(x_tolerance_ratio=0.15)
assert "|".join(w["text"] for w in words) == "Big|Text|Small|Text"

def test_extract_words(self):
path = os.path.join(HERE, "pdfs/issue-192-example.pdf")
Expand Down