Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies consent page #1

Closed
TeddyBear06 opened this issue May 20, 2023 · 5 comments · Fixed by #6
Closed

Cookies consent page #1

TeddyBear06 opened this issue May 20, 2023 · 5 comments · Fixed by #6
Labels
bug Something isn't working

Comments

@TeddyBear06
Copy link
Contributor

TeddyBear06 commented May 20, 2023

Hi,

First, thanks for this tool, really useful.

As reported on HN by Europe users, it exists a YouTube cookies consent page that blocks channel_id retrieving (first) and consequently, all other requests.

French version

English version

File ".../yt-fts/yt_fts.py", line 29, in download
    channel_id = get_channel_id(channel_url)
  File ".../yt-fts/yt_fts.py", line 176, in get_channel_id
    channel_id = re.search('channelId":"(.{24})"', html).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

I already faced this issue and adding a cookie indicating that consent has been given to a requests session can "solve" this.

s = requests.session()
s.cookies.set("CONSENT", "YES+1")
[...]
res = s.get(url)

In order to respect the initial goal of this consent page, we can ask the user to give its consent through a CLI argument like so:

python yt_fts.py download "https://www.youtube.com/@ycombinator/videos" --cookies_consent=1

It's just a suggestion as it can also be a question that prompt in CLI during download but this require to know that the user is in Europe (or it can apply to all users but it can be annoying if it's not really needed after all).

I tried to analyse "Reject all" selection behavior but the CONSENT cookie's content is still PENDING+{RANDOM NUMBER} (perhaps not random from Google's POV but I couldn't explain this value) so from my point of view only "Accept all" is "working".

Do you have any thoughts about this?

Kind regards,

@theAkito
Copy link

theAkito commented May 20, 2023

Isn't this an already old issue, which has been solved before? Last time I built a YouTube scrape thing, I already applied this workaround.


Indeed.

https://github.com/theAkito/mini-tools-nim/blob/274fb8a52678cda31daea81c828c7f0ef3b366a1/generic/web/youtubestreamlive/youtubestreamlive.nim#L32

https://stackoverflow.com/a/66940841

Obviously, rejecting cookies should be preferred...

@TeddyBear06
Copy link
Contributor Author

TeddyBear06 commented May 20, 2023

@theAkito Sure. The issue here is not really how to "solve" it, but how to implement it (ethically speaking).

Even if it's scoped into requests session, users has to be aware of the implicit "consent" (in my opinion).

EDIT: Sure, rejecting would be the best option, but I didn't succeed baking a reject cookie.

@theAkito
Copy link

I would go against anything, that would restrict or complect the end-user experience. For example, the argument for consenting shouldn't be there. I know some programs, like for example ones related to Let's Encrypt, that require the user or administrator to specifically consent to an EULA or something. It's just terrible user experience.

I think, the only reasonable option, that makes sense, at all, is to put a big fat disclosure -- which we put up only for cosmetics, since everyone using Google should have the two brain cells, that tell them the obvious, which is that Google & therefore YouTube is a black hole for user data -- into the README or the product's description & then apply & at least try to reject the consent. That's it.

In no way should the user experience be diminished by a CLI argument or manually required configuration change, forcing the user to waste time on something useless, because everyone ought to know what Google & YouTube is.

If someone is using this tool to get something off YouTube, it ought to be absolutely obvious, that data is sucked into a black hole, either way. If we are going to ask people for consent regarding stuff like this, we might as well start explaining how to breathe air, eat food & drink water. It's stuff everyone has to know. Period.

@TeddyBear06
Copy link
Contributor Author

TeddyBear06 commented May 20, 2023

Thanks for taking time to share your toughts.

I agree, you're totally right.

The "big fat disclosure" into the README seems a reasonable way to adress this issue.

I'll try to understand how the reject cookie is working.

@TeddyBear06
Copy link
Contributor Author

TeddyBear06 commented May 20, 2023

By analysing both "Accept all" and "Reject all" POST requests, I was able to bake a valid "Reject all" cookie.

"Accept all" POST parameters:

gl=DE
m=0
app=0
pc=yt
continue=https://www.youtube.com/@TimDillonShow/videos?cbrd=1
x=6
bl=boq_identityfrontenduiserver_20230514.09_p0
hl=de
src=1
cm=2
set_eom=false
set_ytc=true
set_apyt=true

"Reject all" POST parameters:

gl=DE
m=0
app=0
pc=yt
continue=https://www.youtube.com/@TimDillonShow/videos?cbrd=1
x=6
bl=boq_identityfrontenduiserver_20230514.09_p0
hl=de
src=1
cm=2
set_eom=true

We can see that set_eom is set to true and both set_ytc and set_apyt are missing from a "Reject all" request.

By declaring a global requests session variable and making a POST request to get a valid "Reject all" consent cookie before making any other requests, we can address this issue:

[...]

s = requests.session()

[...]

def download(channel_url, channel_id):
    data = {
        "gl":"DE",
        "pc":"yt",
        "continue":"https://www.youtube.com/@TimDillonShow/videos?cbrd=1",
        "x":"6",
        "bl":"boq_identityfrontenduiserver_20230514.09_p0",
        "hl":"de",
        "set_eom":"true"
    }
    s.post("https://consent.youtube.com/save", data=data)

    [...]

[...]

def get_vid_title(vid_url):
    res = s.get(vid_url)

If you loop over requests's session CookieJar you can see that we just have 4 (against at least 6 when we "Accept all"):

CONSENT: PENDING+443
SOCS: CAESNQgDEitib3FfaWRlbnRpdHlmcm********Vpc2VydmVyXzIwMjMwNTE0LjA5X3AwGgJkZSACGgYIgJugowY
YSC: 5j-C****St8
__Secure-YEC: CgtkMV********VGSSjd8aOjBg%3D%3D

BTW, SOCS cookie seems to contain the consent rejection (needs confirmation).

Obviously, the POST parameters are suceptible to be modified in the future (but I've no ideas about the modification cycle).

Especially the bl parameter as it seems to contains a server version with a date (actually boq_identityfrontenduiserver_20230514.09_p0).

EDIT: The bl value and others are obviously inside the cookies consent webpage (hidden inputs), maybe we can extract them from the webpage to be "sure" that we've the right value for the right parameter.

@NotJoeMartinez NotJoeMartinez added the bug Something isn't working label May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants