Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank files #67

Open
Jack-Lewis1 opened this issue Sep 8, 2023 · 4 comments
Open

Blank files #67

Jack-Lewis1 opened this issue Sep 8, 2023 · 4 comments

Comments

@Jack-Lewis1
Copy link

Hey,

I'm running a somewhat simple command:

wayback_machine_downloader absglobal.com --all-timestamps --from 20110101000000 --to 20221231235959 --concurrency 5 --only "/(\/$|\.(html|htm|aspx)$)/i" --all

The downloader somewhat works. I get quite a few errors like so:

Failed to open TCP connection to web.archive.org:443 (Connection refused - connect(2) for "web.archive.org" port 443)

I accept this. But the real problem seems to be:

I get a folder structure like so: 20110101152320/contact-us/europe/index.html but the html is just blank?

@jsvine
Copy link
Owner

jsvine commented Sep 28, 2023

@tempo660 Thanks for flagging. Can you share the command / URL you're using? And can you confirm that the Wayback Machine's version (online) for that particular timestamp is not blank itself?

@Jack-Lewis1 Based on your message, I suspect that you're using a different tool than is represented by this repository. This repository supplies waybackpack, not wayback_machine_downloader.

@jsvine
Copy link
Owner

jsvine commented Oct 7, 2023

Thanks, @tempo660! Confirming that I get the same result on my end. But the good news is that there seems to be an easy fix: Just append a trailing / to the URL — i.e., https://www.fat-pie.com/. With that, I get proper results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@jsvine @Jack-Lewis1 and others