Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random misleading Unknown Interaction errors #5558

Open
ImRodry opened this issue Oct 17, 2022 · 56 comments
Open

Random misleading Unknown Interaction errors #5558

ImRodry opened this issue Oct 17, 2022 · 56 comments
Labels
bug synced Synced to internal tracker

Comments

@ImRodry
Copy link
Contributor

ImRodry commented Oct 17, 2022

Description

I've seen this issue reported by many people but so far no one has been able to gather enough information to reliably explain what's going on. An example can be seen at discordjs/discord.js#7005
In summary, every now and then at a seemingly random chance it's possible that a bot's reply to an interaction fails due to an Unknown Interaction when, in reality, the reply succeeded and was shown to the user (by reply I mean a regular reply, deferred reply or update). I know this because I've been investigating this issue on a bot I manage for around a week now and I asked some users who were impacted by this.
In the following screenshots I'm logging the time it took for me to reply by subtracting the current timestamp to the interaction's created_timestamp, and then logging the time it took for the bot to receive the error by subtracting the timestamp at the time the error was received to the one before the request was submitted. You can see that the reply is sent pretty fast and in time for Discord to accept it, however, the error comes 5 seconds later, indicating some sort of issue on Discord's end.
image
And of course I could be faking those numbers but it would make no sense for me to do that so I'm gonna have to ask you to trust that.
I later asked the user impacted by this issue to see what the bot responded with, and they showed that the reply was indeed deferred, which means that that error was a false positive and everything worked fine on our end.

Steps to Reproduce

There are no steps to consistently reproduce this issue as it only happens randomly. What I can tell is that the error comes when the API takes too long to send the response back but actually acknowledges and processes it.

Expected Behavior

The reply is sent correctly (happening) and a success message is returned

Current Behavior

The reply is sent correctly but an "Unknown Interaction" error is thrown

Screenshots/Videos

Can only attach what I've shown above already
image
image
(Bot is thinking but in Portuguese)

Client and System Information

discord.js v14.6.0 on Node v18.11.0 running on Debian 11 (bullseye)

@ImRodry ImRodry added the bug label Oct 17, 2022
@DV8FromTheWorld
Copy link
Member

Can you provide a code snippet showing how these logs were generated?
I'm curious as to whether it is all coming from one request or possibly retries. There are a number of interacting systems here, so additional information to help debug the issue would be beneficial.

@DV8FromTheWorld DV8FromTheWorld added the waiting for response Discord is waiting for a response and will re-triage the issue at a later point label Oct 17, 2022
@ImRodry
Copy link
Contributor Author

ImRodry commented Oct 17, 2022

afaik discord.js only retries to submit requests when getting a 429 response which I assume not to be the case here on freshly created interactions, so there are no retries being done here to my knowledge
image
This is the first line that gets executed in the entire event that isn't an if statement and nothing above it interacts with the API other than this line. Hope this helps

@ImRodry
Copy link
Contributor Author

ImRodry commented Oct 17, 2022

Keep in mind this issue can happen with other kinds of interaction replies, not only deferred messages (I tested with showing a modal). I only showed that snippet because it's the most basic one that should never generate that error

@DV8FromTheWorld DV8FromTheWorld removed the waiting for response Discord is waiting for a response and will re-triage the issue at a later point label Oct 19, 2022
@ImRodry
Copy link
Contributor Author

ImRodry commented Oct 22, 2022

Hey @DV8FromTheWorld do you have any updates on this?

@ooliver1
Copy link

ooliver1 commented Oct 22, 2022

I have been getting this too, we check if it has been 3 seconds and it definitely has not at the time of request, but sometimes get this response.
nextcord@2.2.0

@kenyonbowers
Copy link

Yeah, I have been getting this error as well. And I haven't changed my code since updating Discord.js to v14.6.0.

@DV8FromTheWorld
Copy link
Member

DV8FromTheWorld commented Oct 24, 2022

I have not looked deeper into this issue at this time. This is the first time I've heard of this issue.
Before assuming it is a problem with Discord I would likely investigate the underlying library implementation.

For debugging purposes: Is there a way in your library (or tech stack) to track outbound network traffic? If there is, it would be useful to indicating whether the library is re-attempting a network call or if the initial network call is actually taking 5 seconds. From your code snippet that isn't possible to determine.

@DV8FromTheWorld DV8FromTheWorld added the waiting for response Discord is waiting for a response and will re-triage the issue at a later point label Oct 24, 2022
@ImRodry
Copy link
Contributor Author

ImRodry commented Oct 24, 2022

I’m not sure if there is but I can dig into the source code and add that myself. I do, however, doubt that is the case, as we’ve seen @ooliver1 say they are experiencing the same behavior and they’re using a python library, which is completely different from the one I’m using

@ooliver1
Copy link

It's pretty hard to reproduce confidently, since it's been random a lot of the time

@kenyonbowers
Copy link

I have found that letting it sit running for multiple hours after starting the bot allows it to not have that error until you turn the bot off and try to run it again without letting it sit.

@ImRodry
Copy link
Contributor Author

ImRodry commented Oct 30, 2022

@DV8FromTheWorld I believe there is not much more debugging I can do here. Due to this issue happening at a random chance and requiring a high volume of interactions it would be impossible to gather enough data to be able to tell exactly why it's happening. All I can tell is that, on discord.js, after calling deferReply() the request is sent to this method which I am not familiar with and I would probably need to spend a lot of time figuring out all the quirks with this class and the whole package itself.
I would, however, like to emphasize that I've seen people face this issue long before discord.js had this rest package, and also other people on other languages and libraries claim to be facing the same so could you look into this? If needed I can start gathering timestamps of when this issue happens and send them to you if that helps, I just can't log anything from the internal parts of the library unfortunately

@SuperSajuuk
Copy link

SuperSajuuk commented Oct 30, 2022

If it helps, I also get totally random, out of nowhere, “unknown interaction” errors in my bot logs [i run a bot using the discord.py library, so totally unrelated to the op who uses discord.js] when sending a response to an interaction. In my case, its just an immediate ephemeral response message [eg interaction.response.send_message(mymsghere, ephemeral=True) ], rather than a deference response with a use of the followup webhook.

I’ve never bothered trying to work out why it happens since the error traceback shows its more likely to be a Discord issue, rather than to do with anyones’ library implementations [unless every single lib dev has implemented interactions wrongly for 2 years lol]. Also, it’ll happen once, then never again for several days, usually when i’m sleeping [ie overnight] so its hardly something i can spend time debugging, since there’s no chance i’ll be able to find out why its happening.

@muhitrhn
Copy link

muhitrhn commented Dec 9, 2022

I'm facing some issues with showing Modal in my bit. The same code works 99% times but in some cases the interaction returns Unknown Interaction when trying to show the Modal. When I replace Modal with a Reply to the interaction, it works everytime. But as soon as I revert back to using Modal it starts failing again. This happens in certain buttons interactions set through certain slash command data. The issue persists even if I repost that post. But if I try posting it again with same data, the error persists.

@yash1441
Copy link

Have been getting random unknown interaction errors as well on deferReply() and showModal() rarely. Decided to check how much time each reply is taking (even though everything is deferred) using console.time() and console.timeEnd() and surprisingly that one day no errors occurred.

@yonilerner yonilerner added the synced Synced to internal tracker label Dec 21, 2022
@yonilerner
Copy link
Member

Would someone with this issue be willing to provide a complete, runnable code sample that reproduces this issue? Its very difficult to figure out if this is even a bug or not

@ImRodry
Copy link
Contributor Author

ImRodry commented Dec 21, 2022

@yonilerner like I've said above, simply set up an event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible code sample because it really is random

@Conklins
Copy link

event listener that all it does is either reply or defer the interaction it receives. Let that sit for a couple hours with a good amount of interactions coming through and you should see the error. There's no reproducible__

@DV8FromTheWorld
Copy link
Member

DV8FromTheWorld commented Jan 5, 2023

The problem here is that there isn't enough information here to actually debug anything. I recognize that people are occasionally receiving "Unknown Interaction", but that usually indicates a problem with the developer's code.

Personally, I would try capturing a variety of information:

  1. Capture network logs.
    i) Ensure there are no retries
    ii) Ensure the network request is actually being sent to discord, as opposed to being queued for # seconds due to some ratelimiting, and thus exceeding the timelimit
    • I throw this bullet point in because in the screenshots I'm seeing multi-second delayed until receiving an error which, to me, indicates a ratelimiter is holding things up.
      The fact that the error came from the "sequential requester" further makes me think that is at play
  2. Time the event was received
  3. Time the the event was supposedly responded to
  4. The time the request was actually sent by the network requester
  5. The type of event response (deferred reply, etc)
  6. Information about the internal ratelimiter to see if anything was triggered
  7. Generally I'd turn on any debug-logging around the network layer / requester

Unfortunately, until we have better concrete information with a timeline of events in a failed interaction request there isn't a ton we can do here.

@ImRodry
Copy link
Contributor Author

ImRodry commented Jan 6, 2023

Alright thank you, I will try to get that information for you. Unfortunately it might not be very easy since my bot is using a package and it's hard to get that info from the package itself on prod, but I'll look into it

@ckohen
Copy link
Contributor

ckohen commented Jan 6, 2023

For what it's worth, with the increasing number of times we've seen this, I decided to finally look into a bit. In djs there shouldn't be anything getting in the way of the request firing, but I am implementing a separate request handler to handle specifically interaction callbacks. While in theory this won't change the external facing behavior of the request, it at least should streamline the process and make it a little easier to debug.

@devsnek
Copy link
Contributor

devsnek commented Mar 7, 2023

Been a few months here so I'm assuming the behavior isn't being seen anymore.

@lumap
Copy link

lumap commented Apr 6, 2023

still getting those to this day, notably from people with a bad internet connection

@davfsa
Copy link
Contributor

davfsa commented Apr 6, 2023

still getting those to this day, notably from people with a bad internet connection

Bad connection doesn't seem have any effect here all the time (please refer to my previous comment).

@muhitrhn
Copy link

muhitrhn commented Apr 7, 2023

This is happening more often now. Around 20+ times per day. My bot is in 41 Servers with a total of 127k Users. Mainly happens on showModal for me and sometimes on ApplicationCommand. In case of showModal even though the error is thrown, the modal still gets sent to the end user. But in case of ApplicationCommand it just straight erros the whole response.

@JustRoxy
Copy link

JustRoxy commented Apr 30, 2023

It's happening not only in discord.js, but in discord.net as well.
The problem is inconsistently reproducible and it feels like discord servers just throttling some defer requests, and discarding them later with interaction timeout.

@MockirY
Copy link

MockirY commented Jun 12, 2023

Bu şimdi daha sık oluyor. Günde yaklaşık 20+ kez. Botum, toplam 127k Kullanıcı ile 41 Sunucuda . Esas olarak benim için showModal'da ve bazen de ApplicationCommand'da oluyor . showModal durumunda, hata atılsa bile, modal yine de son kullanıcıya gönderilir. Ancak, ApplicationCommand durumunda, tüm yanıtta doğrudan hata yapar.************

Yea I try fix it I change my host service but it doesnt care...

@HiroNxw
Copy link

HiroNxw commented Oct 2, 2023

Is it fixed? I'm getting this error every time I click a button whiel I get a reply.

@ozgur3512
Copy link

This just happens randomly and lasts 15-20 minutes then gone

@RecycleFix
Copy link

I had a testbot using disnake with around 10 test users - and had been running for a couple of months. Ran into this issue a week ago and still happened today.
I then created a new bot, invited it to a guild/server, grabbed the new ID + token and placed into the code I had issues with.
This worked without any issues.
I'm using Buttons, Select Menu and Modals - and the error occurred when using the buttons.

@AlecM33
Copy link

AlecM33 commented Mar 20, 2024

I can reliably reproduce this error for my bot. As another concluded in this thread, fundamentally it's not an issue with any specific library. Discord's API is giving you an HTTP 404 because it can't find the interaction for what is very likely a legitimate reason. In my case it comes down to Node.js processing capabilities and the interaction between two of my bot's commands, one deferred and one not.

I have a CPU-intensive command that constructs and attaches an image. At the high end this command can take 5 or 6 seconds. I defer this command, but due to suboptimal programming on my part the processing still blocks the event loop for the duration. If I send another one of my bot's commands--a performant one that is not deferred--Discord's API immediately starts the 3-second or so timer for my bot's reply to that one. If the event loop is blocked past that 3 second window, Discord cancels the interaction and thus it no longer exists. Some time after the event loop is unblocked and my bot attempts to reply to the non-existent interaction. HTTP 404 "unknown interaction".

Below is an example. I actually sent the RANDOM command a little after 16:20:18 GMT, but the code in my interaction handler didn't start executing until 16:20:22 GMT.

image

Personally I think for this issue to remain open, others need to provide more concrete data on why HTTP 404 is not warranted

@Olzie-12
Copy link

Olzie-12 commented Apr 9, 2024

Im getting the same with JDA as well... What weird is that the error is being thrown in my console. But the message did actually reach discord in time.

@maxibue
Copy link

maxibue commented May 4, 2024

Still getting the same error in d.js-14.14.1.

@marcustyphoon
Copy link

I see this multiple times a week on a deferred request using https://github.com/Snazzah/slash-create (yet another completely different library), on a bot that's in one server and gets ~5 requests/day. Mine's hosted on Cloudflare Workers; is it possible that some global rate limit is getting hit on a hosting provider basis? (I don't know how I would investigate that.)

@milenakos
Copy link

milenakos commented May 14, 2024

Happens to me regularly in both discord.py and nextcord. I have 70ms latency and usually respond to commands well within a second, so I think its an API issue.

@dev-737
Copy link

dev-737 commented May 19, 2024

This happens to me at random times (I use discord.js) when I try to defer replies, before which there is no other time consuming processing going on. I do have ~300ms of latency so I'm not sure if it's a spike causing it, but it happens far too often in a week.

@real2two
Copy link

+1 I sometimes get random Unknown Interaction errors using discord.js and discordeno as well sometimes. I've gotten to the point I use HTTP interactions as a workaround.

@SemiMute
Copy link

Getting this issue on latest versions of JDA and Discord.JS, seems like something wrong with discord itself not any single library.

@marcustyphoon
Copy link

marcustyphoon commented May 29, 2024

I modified my code to repeatedly resend the failed deferred request with a 1000ms delay, and I just had 4 404 Unknown Webhook failures logged followed by a success (with no other activity in that time period, nor any other activity that day).

So at least in my case, the webhook and interaction are definitely (eventually) valid; it's not a matter of a URL being incorrect or a timeout (or a reasonable minimum time between requests being required to prevent a serverside race condition, which clients could plausibly implement; 4 seconds is way too long for that). That, to me, makes it sound very much like it is in fact a Discord-side problem (re: AlecM33's comment).

@ghost
Copy link

ghost commented Jun 7, 2024

Using JDA, with commands that are otherwise instants (a /ping command), I get this error.

The weird part is that it's really random, but once it happens, it just wont go away. The command itself or my code does not seems to be a problem either since when it works, the interaction respond instantly. The device on which the code run is good, and the network is fast.

I was not able to identify a pattern, unfortunately, hope it get fixed soon

edit : the commands fail instantly, it does not wait 3 seconds, and when it happens, JDA dont get a SlashCommandInteractionEvent

edit (8th of june, 12:00) : removing the bot from a guild and re-adding it seems to have fixed it for now ? For the record, the interactions would fail no matter the guild, or if sent through dms, and was persistent accross restarts of the app

edit (10th of june, early morning) : correcting what I said two days ago, once the bug get triggered, it will only do so in existing guilds, adding it to a new guild will make commands work, but only in dms (for members of that guild) and the guild

@AlecM33
Copy link

AlecM33 commented Jun 11, 2024

@marcustyphoon I still think there are probably legitimate explanations for the scenarios where this occurs, but I of course won't claim that for sure since everyone's situation is different. Since I had a 100% reliable way to reproduce the problem, I thought it would be useful to provide my explanation and more or less challenge those here, since so much of the info here is anecdotal and difficult to act on from discord's perspective. It does sound like your scenario is simple and somewhat consistent, so perhaps you could provide a minimally reproducible code example. That would probably help this gain traction in the event a maintainer checks in on this.

Just personally, whenever I've run into this it's had a client-side explanation. In any case - I'm not just looking to dismiss people's troubles. Rather I hoped to facilitate since this has been open for some time.

@timotejroiko
Copy link

timotejroiko commented Jul 9, 2024

I have been seeing this issue ever since i first implemented slash commands over a year ago but i have largely dismissed it as being caused by network lag and interactions that arrive too late, but now I'm convinced there is an actual issue going on and decided to investigate further, so here are my two cents.

My setup is as follows:

I have a website that receives interactions via webhook URL, hosted on a Hetzner vps located in Ashburn US, running nginx 1.25.4 with an upstream proxy to Node.js 22.4 which runs my own custom code.

Here is an example of the timings i observe multiple times per day:

Sample 1
interaction ID: 1260125566783193129
snowflake timestamp: 2024-07-09T06:49:07.122Z
timeline:

  1. ? - received by nginx (logs at the end of the request)
  2. 2024-07-09T06:49:07.173Z - received by node, defer response sent back to nginx
  3. 09/Jul/2024:06:49:07 - nginx logs request completed:
    09/Jul/2024:06:49:07 +0000 client=35.196.132.85 host=redacted path=/ request=POST / HTTP/1.1 status=200 request_length=2031 bytes_sent=183 body_bytes_sent=20 user_agent=Discord-Interactions/1.0 (+https://discord.com) upstream_status=200 request_time=0.001 upstream_response_time=0.001 upstream_connect_time=0.000 upstream_header_time=0.001
  4. 2024-07-09T06:49:07.173Z - command code runs
  5. 2024-07-09T06:49:07.729Z - command code ends and response initiates
  6. 2024-07-09T06:49:07.729Z - node http request created (http.request())
  7. 2024-07-09T06:49:07.729Z - node http stream write started (request.write())
  8. 2024-07-09T06:49:07.773Z - node http stream write ended (request.end())
  9. 2024-07-09T06:49:07.773Z - node http request emitted finish event
  10. 2024-07-09T06:49:10.199Z - node http emitted response event
  11. 2024-07-09T06:49:10.199Z - node http response emitted end event
  12. 2024-07-09T06:49:10.202Z - bot emitted error event status 404 "Unknown Webhook" "code: 10015"

Sample 2
interaction ID: 1260117594833293343
snowflake timestamp: 2024-07-09T06:17:26.461Z

  1. ? - received by nginx (logs at the end of the request)
  2. 2024-07-09T06:17:26.484Z - received by node, defer response sent back to nginx
  3. 09/Jul/2024:06:17:26 - nginx logs request completed:
    09/Jul/2024:06:17:26 +0000 client=35.237.4.214 host=redacted path=/ request=POST / HTTP/1.1 status=200 request_length=1951 bytes_sent=183 body_bytes_sent=20 user_agent=Discord-Interactions/1.0 (+https://discord.com) upstream_status=200 request_time=0.001 upstream_response_time=0.001 upstream_connect_time=0.000 upstream_header_time=0.001
  4. 2024-07-09T06:17:26.484Z - command code runs
  5. 2024-07-09T06:17:26.485Z - command code ends and response initiates
  6. 2024-07-09T06:17:26.485Z - node http request created (http.request())
  7. 2024-07-09T06:17:26.485Z - node http stream write started (request.write())
  8. 2024-07-09T06:17:26.485Z - node http stream write ended (request.end())
  9. 2024-07-09T06:17:26.485Z - node http request emitted finish event
  10. 2024-07-09T06:17:29.601Z - node http emitted response event
  11. 2024-07-09T06:17:29.601Z - node http response emitted end event
  12. 2024-07-09T06:17:29.603Z - bot emitted error event status 404 "Unknown Webhook" "code: 10015"

My conclusion:

Discord is somehow not acknowledging the defer from the interaction webhook response and then delaying the follow up request until it expires. I thought about the possibility that the follow up is sent too fast, before the response from nginx is received by discord, but it doesn't seem to be the case as the issue persists even when the command takes 500ms+ to run.

I hope this is useful in getting this resolved, i can provide more information and more tests if needed.

(edit: formatting + typos)

@ghost
Copy link

ghost commented Aug 10, 2024

Using JDA, with commands that are otherwise instants (a /ping command), I get this error.

The weird part is that it's really random, but once it happens, it just wont go away. The command itself or my code does not seems to be a problem either since when it works, the interaction respond instantly. The device on which the code run is good, and the network is fast.

I was not able to identify a pattern, unfortunately, hope it get fixed soon

edit : the commands fail instantly, it does not wait 3 seconds, and when it happens, JDA dont get a SlashCommandInteractionEvent

edit (8th of june, 12:00) : removing the bot from a guild and re-adding it seems to have fixed it for now ? For the record, the interactions would fail no matter the guild, or if sent through dms, and was persistent accross restarts of the app

edit (10th of june, early morning) : correcting what I said two days ago, once the bug get triggered, it will only do so in existing guilds, adding it to a new guild will make commands work, but only in dms (for members of that guild) and the guild

update :

After more experimentation, it turned out to be a client side issue, for a reason I dont understand, reloading discord did fix it and I dont know why. Since this error seems to be generic, it wont be useful for everyone, but for people that get this error in the same way as I did, it could help

edit (11 of august) : some of the users reported the exact same issue to me, the issue does not seems to appear on mobile, only desktop

@milenakos
Copy link

observation: you can solve the issue by deferring the response, this results in an unnecessary request (note: both defer and response happen in under a second, this is a workaround and not the intended use of defers)

@mayeradelman
Copy link

observation: you can solve the issue by deferring the response, this results in an unnecessary request (note: both defer and response happen in under a second, this is a workaround and not the intended use of defers)

I've been experiencing this issue exclusively with commands that were already deferred.

@timotejroiko
Copy link

timotejroiko commented Sep 7, 2024

update:

I was able to greatly reduce the number of errors by not responding to the webhook itself.

My setup now is as follows:

  1. [nginx] interaction webhook received via http, proxied to node
  2. [node] interaction webhook received via http
  3. [node] initial response sent via rest api callback
  4. [nginx] request terminated with code 499, meaning discord acknowledged the rest api callback and terminated the webhook on their side
  5. [node] reply/followup sent via rest api

This solution is only applicable when receiving interactions via webhook, but it seems to work well for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug synced Synced to internal tracker
Projects
None yet
Development

No branches or pull requests