Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download a file with original name or content-disposition ? #85

Closed
Baltazar500 opened this issue Feb 24, 2022 · 8 comments
Closed

Comments

@Baltazar500
Copy link

How to download file with original name or content-disposition using xidel without curl/wget ? The "--download=" switch allows you to save the file only with the specified file name :(

@Reino17
Copy link

Reino17 commented Feb 24, 2022

Hello Baltazar500,

The string to enter for --download is actually an "extended string", but without the opening x" and closing ".
So with --download '{...}' you can insert every variable or function like you would for -e/--extract.

Related (duplicate?) issue: #38.

@Baltazar500
Copy link
Author

@Reino17, Thanks. It works. But only for single expressions. When using follow page by page I get error after file download

xidel -f 'very-long-expression' --download '{replace($url, "^.*/", "")}' -f '//div[@file="text"]/a/@href'

Save as: List#1.txt
Error:
err:XPTY0004: Need context item that is a node to get root element
Possible backtrace:
  $08120890  TXQUERYENGINE__EVALUATESINGLESTEPQUERY,  line 9358 of /home/benito/hg/components/pascal/data/xquery.pas: perhaps TXQTermTryCatch + 136624 ? but unlikely
  $080F5547  TXQTERMPATH__EVALUATE,  line 3302 of /home/benito/hg/components/pascal/data/xquery_terms.inc: perhaps TXQTermBinaryOp + 3959 ? but unlikely
  $080E090A  TXQUERY__EVALUATE,  line 7524 of /home/benito/hg/components/pascal/data/xquery.pas: perhaps Q{http://www.w3.org/2005/xpath-functions}concat + 41114 ? but unlikely
  $080E0A3D  TXQUERY__EVALUATE,  line 7549 of /home/benito/hg/components/pascal/data/xquery.pas: perhaps Q{http://www.w3.org/2005/xpath-functions}concat + 41421 ? but unlikely
  $08080A64  TPROCESSINGCONTEXT__EVALUATEQUERY,  line 2218 of xidelbase.pas: perhaps ? ? but unlikely
  $0808002C  SUBPROCESS,  line 2062 of xidelbase.pas: perhaps ? ? but unlikely
  $0807F64C  TPROCESSINGCONTEXT__PROCESS,  line 2079 of xidelbase.pas: perhaps ? ? but unlikely
  $080801F0  PROCESSFOLLOWTO,  line 1998 of xidelbase.pas: perhaps ? ? but unlikely
  $08080061  SUBPROCESS,  line 2065 of xidelbase.pas: perhaps ? ? but unlikely
  $0807F8D1  TPROCESSINGCONTEXT__PROCESS,  line 2098 of xidelbase.pas: perhaps ? ? but unlikely
  $0808A3BB  PERFORM,  line 3891 of xidelbase.pas: perhaps ? ? but unlikely
  $080493D9  main,  line 84 of xidel.pas: perhaps ? ? but unlikely

Call xidel with --trace-stack to get an actual backtrace

When using "download" after following to the next page

xidel -f 'very-long-expression' -f '//div[@file="text"]/a/@href' --download '{replace($url, "^.*/", "")}'

file is not downloaded and I get an error

@Reino17
Copy link

Reino17 commented Feb 25, 2022

That's because, as the error-message mentions, there's no context item. You didn't provide input (a file or an url).

@benibela
Copy link
Owner

If the download name is a directory, it uses the name from the URL. So you can do --download .

And the last option is better not a -f

@Baltazar500
Copy link
Author

@Reino17

That's because, as the error-message mentions, there's no context item. You didn't provide input (a file or an url).

When using a 'very-long-expression' as an extraction (-e), I get links from each following page. When using follow (and download) it ends on the first page :(

@benibela

If the download name is a directory, it uses the name from the URL. So you can do --download .

This works like expression --download '{replace($url, "^.*/", "")}' , but only the file from the base link is loaded. The next (follow) page does not load and throws an error.

Error:
err:XPTY0004: Need context item that is a node to get root element

@benibela
Copy link
Owner

benibela commented Mar 3, 2022

When using a 'very-long-expression' as an extraction (-e), I get links from each following page. When using follow (and download) it ends on the first page :(

Try it with some input

xidel '<start/>' -f 'very-long-expression' -f '//div[@file="text"]/a/@href' --download '{replace($url, "^.*/", "")}'

@Baltazar500
Copy link
Author

@benibela, Sorry, the site I'm extracting data from has stopped working. After he resumes work, I will check this trick. Thanks :)

@Baltazar500
Copy link
Author

My problem was solved by using a loop "[ -f xxx ]"

xidel [-f 'very-long-expression' --download '{replace($url, "^.*/", "")}' ] -f '//div[@file="text"]/a/@href'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants