Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Invalid or corrupted PDF files" is displayed #10639

Closed
simonhong opened this issue Mar 12, 2019 · 11 comments
Closed

"Invalid or corrupted PDF files" is displayed #10639

simonhong opened this issue Mar 12, 2019 · 11 comments

Comments

@simonhong
Copy link

Attach (recommended) or Link to PDF file here:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf

Configuration:

  • Web browser and its version: Chrome 72.0.3621.121 (Official Build) (64-bit)
  • Operating system and its version: MacOS
  • PDF.js version: PDF.js v2.0.673 (build: 3101257)
  • Is a browser extension: Yes

Steps to reproduce the problem:

  1. Click the above link

What is the expected behavior? (add screenshot)
This is what I can see when I pasted chrome-extension prefixed url or reloading the error pdf page.
(chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.7135&rep=rep1&type=pdf)
Screen Shot 2019-03-12 at 17 20 26

What went wrong? (add screenshot)
This is what I can see when click the above link
Screen Shot 2019-03-12 at 17 19 38

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm

@Snuffleupagus
Copy link
Collaborator

Possibly a duplicate of #10562.

@simonhong
Copy link
Author

@Snuffleupagus I think this is a different issue with #10562.
This is for some pdf isn't opened properly, whereas #10562 is the issue that pdf is opened by chrome's internal pdf viewer(pdfium?) instead of pdfjs extension.

@simonhong
Copy link
Author

I found the cause of this issue.
The reason is requesting twice to server in a very short interval.
To the second request, server redirects to downloadsexceeded.html instead of pdf content.
So, pdf.js complains it's invalid/corrupted pdf file.
It's maybe server's DDoS protection I think.

Why two requests are issued when user clicks that link?
First one is issued by browser for user click.
Then, pdfjs extension intercepts header response and redirects to extension url.
Then, one more requesting is issued by pdf.js.

I think we can improve this more.
How about using the contents received from first request instead of requesting again?
This is just an idea.
(Sorry, if this idea doesn't make sense. I don't fully understand about pdf.js/extension implementation now.).

WDYT? @timvandermeij @Rob--W

@shge
Copy link

shge commented Mar 17, 2019

@simonhong
Will this be a temporary fix for this problem?

document.querySelectorAll('a[href]').forEach(function(a){
    if (a.href.match(/.+.pdf$/)){
        a.setAttribute('href', 'chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/' + a.href);
    }
});

@simonhong
Copy link
Author

@shge good try. I think it would work for the link that ends .pdf suffix.
However, we can easily find pdf links that don't have that suffix.

@ghost
Copy link

ghost commented Mar 25, 2019

Do you try to reload the web page after this exception has displayed and the issue will be solved? This exception will be thrown when the pdf data loading in the first time.

@shge
Copy link

shge commented Mar 26, 2019

@864534182 Do you try to reload the web page after this exception has displayed and the issue will be solved? This exception will be thrown when the pdf data loading in the first time.

Yes, but it does not work on some pages that require referer information.

I came across a website which prevents the second request by plugin because it is "a direct request".
It lets me download the file only when I access it by clicking a link in a specific webpage (the referer has to be a specific page).
Anyway, it should request once with referer information.
brave/brave-browser#3474 (comment)

@Rob--W
Copy link
Member

Rob--W commented Mar 26, 2019

The referrer thing is a regression caused by a change in Chrome - see #10645

@ghost
Copy link

ghost commented May 21, 2019

I will post this on the brave-browser repository too:
Browser: Brave-browser.
In this link:
https://projecteuclid.org/euclid.rmjm/1181072068
there is a button linking to PDF file. When I click on the button, it shows the already mentioned "Invalid or corrupted PDf file" message.

  • The "reloading" workaround does not work.
  • Even after reloading the page with Ctrl+R , the download button on the upper right corner only lets me download an HTML file, but not a PDF file.
  • When I tried to load this supposedly "direct" link to the PDF file:
    https://projecteuclid.org/download/pdf_1/euclid.rmjm/1181072068
    , that link redirects me to the original link written in the second line of this post. In other words, there is no real direct link to the PDF file.
  • Exactly the same problem occurs with the following link:
    https://projecteuclid.org/euclid.rmjm/1181069828

@SilverPuppy
Copy link

This appears to have been resolved when I updated today. My issues with this have been resolved by the most recent update.

@timvandermeij
Copy link
Contributor

Closing since this seems to work again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants