Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some PDF's can't be opened and generate "Invalid or corrupted PDF file." errors #3474

Closed
jumde opened this issue Feb 24, 2019 · 21 comments
Closed
Labels
closed/duplicate Issue has already been reported closed/invalid extension/PDFJS webcompat/not-shields-related Sites are breaking because of something other than Shields.

Comments

@jumde
Copy link
Contributor

jumde commented Feb 24, 2019

Test plan

See brave/brave-core#2342

Description

Cannot open PDF in Brave

Steps to Reproduce

  1. Navigate to http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf

Actual result:

PDF is corrupted

Expected result:

PDF should be displayed

Reproduces how often:

Easily

Brave version (brave://version info)

0.62.5 Chromium: 73.0.3683.39 (Official Build) dev (64-bit)

Reproducible on current release:

  • Does it reproduce on brave-browser dev/beta builds? Dev

Website problems only:

  • Does the issue resolve itself when disabling Brave Shields? No
  • Is the issue reproducible on the latest version of Chrome? No
@jumde jumde added the webcompat/not-shields-related Sites are breaking because of something other than Shields. label Feb 24, 2019
@rebron rebron added extension/PDFJS priority/P3 The next thing for us to work on. It'll ride the trains. labels Feb 26, 2019
@kjozwiak kjozwiak changed the title Corrupted PDF warning: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf some PDF's can't be opened and generate "Invalid or corrupted PDF file." errors Feb 27, 2019
@kjozwiak
Copy link
Member

Reproduced the above issue when I attempted to open several banking statements under https://easyweb.td.com. The PDF opened in a new window and displayed the following (same error that @jumde mentioned above):

PDF.js v2.0.673 (build: 31012570)
Message: Invalid PDF structure

Seeing the following in the terminal:

[6473:775:0226/221130.910014:ERROR:CONSOLE(1)] "Active tab not found", source: chrome-extension://mnojpmjdmbbfmejpflffifhffcmidifd/js/background.bundle.js (1)
[6473:775:0226/221133.424143:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: This extension has no action specified.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/pdfHandler.html (0)
[6473:775:0226/221133.424197:ERROR:CONSOLE(0)] "Unchecked runtime.lastError: This extension has no action specified.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/pdfHandler.html (0)
[6473:775:0226/221133.726921:ERROR:CONSOLE(862)] "Uncaught (in promise) Error: Invalid or corrupted PDF file.", source: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/content/web/viewer.js (862)

Example of the browser console:

screen shot 2019-02-26 at 10 27 44 pm

Checked the following versions:

  • 0.60.45 Chromium: 72.0.3626.109 (release) --> reproduced
  • 0.61.38 Chromium: 73.0.3683.39 (beta) --> reproduced
  • 0.62.8 Chromium: 73.0.3683.39 (dev) --> reproduced

@SilverPuppy
Copy link

Duplicate of #884

@kjozwiak kjozwiak added priority/P2 A bad problem. We might uplift this to the next planned release. and removed priority/P3 The next thing for us to work on. It'll ride the trains. labels Mar 7, 2019
@kjozwiak
Copy link
Member

kjozwiak commented Mar 7, 2019

Changing this as a P2 so it's the same as #884. We can decide if we want to close this one or #884 off. We should probably fix this sooner than later as most banking/government websites usually generate their PDFs when needed.

@simonhong
Copy link
Member

simonhong commented Mar 11, 2019

With the link, I could see pdf pages after several times of reloading.
And below images are captured header contents for both(success and fail).
The difference is 200 OK vs. 302 Found. I think pdf.js sometimes couldn't handle redirect properly.
I also reproduced this issue with pdf.js on chrome stable.
Success:
Screen Shot 2019-03-11 at 09 41 34
Failure:
Screen Shot 2019-03-11 at 09 43 59

@simonhong simonhong self-assigned this Mar 11, 2019
@simonhong
Copy link
Member

simonhong commented Mar 11, 2019

I got more solid repro steps.
When opening by link click, it always shows invalid. then showing properly after reloading.
When opening by cmd + link click, it is always invalid. With reloading pdf loaded well.
When pasting link url to another tab(not already invalid pdf tab), it is invalid also. fine after reloading.
I assume that pdf.js seems not work properly with pdf link that does redirect.

@SilverPuppy
Copy link

SilverPuppy commented Mar 11, 2019

I can confirm that refreshing the page with the error does load the PDF correctly in my application. This is a workaround, of course, but a good step forward in identifying the real issue so it can be resolved.

@tripp-lc
Copy link

I want to add that I also receive this error (PDF.js v2.0.673 (build: 31012570)
Message: Invalid PDF structure) from my EHR site (https://www.therapynotes.com/app/) when opening a PDF file.

  • It is a site that I am logged into securely.
  • I am not able to refresh and eventually get it to load correctly.
  • I am also not able to copy the link to another tab and have it open.
  • This happens with Brave Shields Down or Up.
  • The Download button (top right of PDF Viewer Window) will still download the pdf file correctly.
  • When I hit the print button (top right) it gives this warning: Warning: The PDF is not fully loaded for printing.
  • Browser is up to date: Version 0.60.48 Chromium: 72.0.3626.121 (Official Build) (64-bit)

@SilverPuppy
Copy link

  • The Download button (top right of PDF Viewer Window) will still download the pdf file correctly.

VERY interesting. I'd never thought to try that.

@simonhong
Copy link
Member

What makes difference between loading
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf and
chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf?
Latter one is pdfjs extension id prepended one and pdf displayed well when it loads.

@simonhong
Copy link
Member

After some debugging, I found that http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf redirects to https://citeseerx.ist.psu.edu/messages/downloadsexceeded.html when loading it twice quickly.
Because of that, pdf.js failed. pdf.js tries to parsing https://citeseerx.ist.psu.edu/messages/downloadsexceeded.html file.
This will always happen because after first loading, pdf.js extension hooks that loading and request again with extension url prefixed url. So, it will always get exceed.html content instead of pdf.

When I load chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf, it success because it just request pdf once.

I can reproduce this in chrome with its builtin viewer(pdfium). When I reload quickly, I can see exceed.html page in chrome.

I'm trying to find the way to resolve this, but I'm not sure it can be fixed with pdf.js because pdf.js should request pdf url twice.

@SilverPuppy
Copy link

I'm trying to find the way to resolve this, but I'm not sure it can be fixed with pdf.js because pdf.js should request pdf url twice.

Do you mean it should request it once? It seems that it ought to request it once and parse it, but the second request for the attempted workaround is triggering the exceed.html, which is apparently normal behavior for that server, probably as a DDOS attack mitigation. The server self-protection behavior is stopping the workaround from working, but the root problem is still the same as what I and others have been experiencing.

@tripp-lc
Copy link

Something else, if it can help.
A comparison of sites where I am logged in. Two that work and one that doesn't.

All of these begin with the standard: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/

Then two sites that work:

  • https://spi.INSURANCE_COMPANY.com/mmis?rq=STFileToken&Payload=BUNCH_OF_LETTERS
  • https://secure10.THE_BANK.com/QNBOnline/mobilews/accountstatement/27060/0/pdf?q2token=BUNCH_OF_LETTERS

The site that doesn't work:

  • https://www.ELECTRONIC_HEALTH_RECORD.com/app/patients/files/view/BUNCH_OF_LETTERS/

Is there something related to the site that doesn't work not specifically defining token=
Or that the one that doesn't work ends in a hash / ?

@simonhong
Copy link
Member

simonhong commented Mar 13, 2019

@SilverPuppy brave requests twice to that server when user clicks pdf link. One is issued when user clicks the link.
And the other is issued by pdfjs extension. This is inevitable with current pdfjs extension implementation. Because of this two requests in a very short interval, I think that server responded with exceed.html.

I assume that any server that provides pdf documents handles like that, current pdfjs extension would not work with that pdf link.

It would be nice if pdfjs extension displays with contents from first request.

@simonhong
Copy link
Member

@tripp-lc Thanks for checking! If it works with chrome-extension:// prefixed url, that server also might deny extension's request (second request. first one is user click)

Is there something related to the site that doesn't work not specifically defining token=
Or that the one that doesn't work ends in a hash / ?

I don't think so.

@simonhong
Copy link
Member

simonhong commented Mar 14, 2019

I'll continue discussion about this issue in here - mozilla/pdf.js#10639.
I think it's hard to fix this issue from brave-side.
cc: @bbondy

@shge
Copy link

shge commented Mar 17, 2019

probably as a DDOS attack mitigation

Because of this two requests in a very short interval, I think that server responded with exceed.html.

I came across a website which prevents the second request by plugin because it is "a direct request".
It lets me download the file only when I access it by clicking a link in a specific webpage (the referer has to be a specific page).

Anyway, it should request once with referer information.

@gustavnikolaj
Copy link

gustavnikolaj commented Mar 18, 2019

I am experiencing a variant of this issue, with the same symptoms. In my case though, it's a result of the requests for the pdf file not including the cookies, resulting in a redirect to a login page.

I commented on this issue describing the same problem, which was closed a few months ago: #2048

Version 0.61.51 Chromium: 73.0.3683.75 on macos

Edit: It works fine when downloading the PDF and loading it from a file:// url.

@tripp-lc
Copy link

I want to confirm that I just updated my browser to:

Version 0.61.51 Chromium: 73.0.3683.75 (Official Build) (64-bit)

and the PDFs generated through my EHR site now renders correctly in a new tab.

Thanks for your hard work.

@simonhong
Copy link
Member

Edit: It works fine when downloading the PDF and loading it from a file:// url.

@gustavnikolaj Loading local file would be fine although it also needs two requests with same reason above. Local file system will give files always whenever user requests :)

@SilverPuppy
Copy link

SilverPuppy commented Mar 19, 2019

Unfortunately, whatever changed is not a full fix, because Google Calendar's PDFs for printing are still problematic requiring the refresh workaround. I am also on .61.51

@bsclifton
Copy link
Member

bsclifton commented Apr 26, 2019

Closing as a duplicate of #884

@bsclifton bsclifton added closed/duplicate Issue has already been reported and removed QA/Yes priority/P2 A bad problem. We might uplift this to the next planned release. labels Apr 26, 2019
@bsclifton bsclifton added closed/duplicate Issue has already been reported and removed closed/duplicate Issue has already been reported labels May 26, 2019
@NejcZdovc NejcZdovc added this to the Dupe / Invalid / Not actionable milestone Jun 3, 2019
@bbondy bbondy removed this from the Dupe / Invalid / Not actionable milestone May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed/duplicate Issue has already been reported closed/invalid extension/PDFJS webcompat/not-shields-related Sites are breaking because of something other than Shields.
Projects
None yet
Development

No branches or pull requests