-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API google.pubsub.v1.Publisher exceeded 5000 milliseconds when running on Cloud Run #1442
Comments
FYI I experienced the same behavior running on Cloud Functions. Rolling back to version 2.12.0 unblocked me. |
Yeah we are experiencing issues on cloud run. |
I don't believe this is the issue in production. Most likely cloud run now kills the network access when the container is inactive (aka when the request has ended) which happens more often on low load. It used to work though, so I believe it might be a change in cloud run infrastructure. I am making sure that all the pubsub calls are awaited properly on my side. You should also set |
For context: I was referring to an issue with ports on the emulator. I have deleted the relevant comment because I also believe it's not what's happening here. Apologies for the confusion. |
Can confirm, we are facing the same issue on Cloud Functions with |
What's new about this case? I have the same problem |
We revered to 2.17. I believe the bug was introduced in 2.18. |
@Sytten Just checking — the issue does not reoccur with 2.17, only with 2.18? Which minor version were you using? I do see a change that went out re: publish timeouts, https://github.com/googleapis/nodejs-pubsub/releases/tag/v2.18.0 and then updated in https://github.com/googleapis/nodejs-pubsub/releases/tag/v2.18.3. It looks like @feywind did some investigation in an earlier issue here: #1425 and it seems like there was an upstream change we're trying to figure out. Our current believe is that 2.18.3 fixes the issue, but please let me know if this isn't the case. |
Getting the same issue in google cloud function trigger. |
@meredithslota We had the problem with 2.18.3 too |
same here too, with 2.18.3 and 2.18.4 versions |
Tried this solution it did not work for my case, while using a background cloud function.
|
@apettiigrew Remove ^... use version only. "@google-cloud/pubsub": "2.17.0" |
It occurred to me today as well after upgrading from 2.17x, reverted back but why did they introduce 5s timeout? |
Same here, too. Running 2.18.4 with dockerized node application. |
Same problem here with 2.15.1 |
I had the same issue using Google Cloud Functions and "@google-cloud/pubsub": "^2.17.0". I was publishing successive messages probably the wrong way : |
Inside cloud run, i needed to disable the batching (maxMessages = 1), and make sure the promise was awaited before the http response was returned. Note: By default the cloud run indeed dont allocate cpu after the request is finished (https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation) |
We are facing the same issue as well, using pub sub version 2.18.3, is downgrading the only solution? |
We are seeing this error: Received error while publishing: Total timeout of API google.pubsub.v1.Publisher exceeded 600000 milliseconds before any response was received. |
We only use pubsub in Nodejs for some tests locally. We use the Java sdk on production where it works fine. I tried deploying it to the gcloud just to see and didn't get the timeout there.I do not know why but I am thinking there is something wrong with the way we have implemented nodejs. Will do some more tests with some other versions of writing it today or tomorrow. |
Same here - we are using the version 2.18.1 though. So I guess downgrading needs to go at least below that - if that helps anyway. We also opened (yet another) P3 support case at google. They will refer to the product team - lets hope this issue resolves quick. |
Does anyone know why it would take over a minute to publish? |
If the queue has a lot of data to process, it will take more time. You can reproduce the issue by load testing with multiple requests. The issue will be fixed by increasing the timeout value to a max value of 540000 i.e. 540 secs. |
Something to consider - we noticed this issue arising from an HttpFunction where we were sending a response before publishing to PubSub. Having awaits after sending a response is contraindicated by google, and when we switched the order to publish first, the issue went away. |
Some point to consider about this issue, in case your application is not receiving consistent traffic/load, this error may affect the initial request(s)/action(s) that boot(s) up your cloud run instance(s). With subsequent requests/actions in the same timeframe, you'll notice that this error may not re-occur. I cannot explain why it takes that long to publish to pub/sub but increasing the timeout as in koushil-mankali's solution here will save you some headaches. |
@bharathjinka09 Did you find any solutions for your last error? I'm facing the same problem, and I have tried several solutions but haven't succeeded. Solutions I have tried:
|
Just to reiterate what I said back in June: While the symptom of these issues (publisher timeouts) is similar across the cases, it is highly likely the the underlying causes are different and so this GitHub issue now covers a lot of different cases where more user-specific information is needed. If you are still experiencing issues, please create a support case. Thanks! Increasing the gax timeout may be a viable solution in some instances, but it is not a general-purpose answer to how to address timeouts in all cases. We need specific information for each individual case in order to properly diagnose these issues and so a support case is the proper venue for that exchange of information. |
I solved this issue for me by having a single instance of the topic so you can change your code to something like this:
Give it a try. |
I have solved this issue. The problem was with environmental variables. Google Cloud functions parse each variable added from the GUI console, and ads escape sequences. So It's strange that the Pub/Sub SDK does not return proper error messages; it always returns an API timeout, no matter how much time I increase the timeout. Removing escape sequences from the account's private key solves the problem. |
According to googleapis/nodejs-pubsub#1442 (comment) suggestion, add publishOptions to adjust timeout to 100000.
Hi @ts-geek22, were you able to publish messages through the service at all? Or did the error occur after the application had been running for around 1-2 hours? |
It's a string parsing issue, so it occurs from the start, as a wrong string is always the wrong string. A timeout error is a general error in the case of cloud publishing, as it throws the same error for multiple reasons and does not provide any specific contextual information. This is a link to my Stack Overflow issue, where I have listed a few different solutions I have tried; feel free to give them a try; you might find one that works for you. |
Im experiencing the same problem and the solution that worked for me was to initiate pubsub inside the function and then close the topic. Reason for this was apparently that we kept batching it, resulting in cloud function closing before it had a chance to send it.
EDIT: Added max batching and timeout just due to this thread. Still need to test it. However, somebody changed it according to documentation a while ago and we started having that issue again just recently, so need to retest that. Just throwing this idea out there, maybe it will help somebody. Will try to give an update after more testing. In the mentions above, I noticed people talking about importance of awaiting pubsub calls. In our use case we return a response from a server after we call a function that triggers pubsub that is not awaited to pretty much avoid having pubsub fail before we get a chance to respond. However, we await withing the called function. Example
We use firebase functions gen 2 which are pretty much google cloud functions at this point. To my knowledge. In my head everything here is correct and until the processes initiated in initPubsubMessages, which has been awaited, should be processed without a hitch. But I'm also aware the cloudrun has some quirks so I thought ill ask for somebody else's opinion on that. Would firebase functions/cloudrun stop all processes as soon the server responds, is my question i guess? |
@jdziek I don't know much about firebase functions, but Cloud function will unallocate memory and CPU as soon as you return response in HTTP cloud function. It assumes that your function's execution is complete as you have returned a response. |
The same issue about Cloud Run was already mentioned here before: #1442 (comment) |
Was worried that this might be the case. The thing is that this has been happening in places where we await as well. Will add an await just to be safe in this case, but doubt it will solve the issue fully :/ |
Was good for about two weeks. Checked we are awaiting everything. Put in tweaks mentioned above. Just had a pubsub error chain this morning. Really at a loss here. Was really hoping that it will work. Any suggestions at this point welcome. Might try to put pubsub back to how it was setup in documentation, maybe this will work this time with proper awaitig. |
Are you referring to your configuration mentioned earlier?
|
Yeah. Noticed though that its happening only in one service now. So currently reviewing the code there. Reverted pubsub implementation to how its shown in the docs and pushed. So hopefully nothing new will appear in two weeks |
Hey @jdziek are you using any other library to connect to any other Google API's? |
yeah, no. Just google-cloud ones for firestore, pubsub, and storage. EDIT. I guess also firebase modules. |
Update those packages to the latest versions; hopefully, that will fix your issue. We faced a similar problem with a few of our services; in our case, it was the Google Secret Manager package. Once we had bumped it to the latest version, that fixed this problem. I believe it's due to the gRPC library that most of the npm packages from Google use for communication with their APIs |
Good point. I only updated Pubsub really. Will review them now. Thanks |
Hey, I'm subscribing as well to this issue. I have the same error in all my cloud run services, everything was fine until a few weeks ago when we started having this issue. We batched to 1, we opened and closed the client for each message, but we're still having it. I'm trying to bump all google dependencies to see if it changes anything, let's keep each other posted! |
Hi, folks - I am publishing almost 100k messages to a topic somehow after a while of publishing the topic I started getting the "Error: Total timeout of API google. pub-sub.v1.Publisher exceeded 600000 milliseconds before any response was received" and no matter how many timeout values I increased the error will be still there with waiting for that amount of time specified as the timeout value. |
@vinay-panwar-04 Same i have the timeout increased on the function and can confirm the timeout on the function is larger than 60000ms and i can also see the acknowledgment deadline for the pubsub subscription is also greater than 60000ms but yet i still see this error
I don't see a way to set the |
Hey @jdziek, did updating the packages fixed the issue for you? |
Hey all, We had this timeout issue happening randomly on our Cloud Run instances, that could be mitigated by deploying a new revision. I haven't investigated the root cause, but instinctively it looks like when the Cloud Run instance runs for too long, the connection to pubsub stops working, maybe for something related to auth or long-term gax connections. We finally managed to fix the issue. We needed to upgrade all google packages, including If (like us) you're using yarn v1, here's the list of commands that can help you:
Hope it helped, good luck! |
@Kripu77 So far so good. We havnt had issues. Yet. The problem has been so erratic that i cant say for sure yet, but its seems at least like its improved. |
With this new configuration the timeout errors seem to be disappeared completely. There were no occurance in the last 2 weeks:
Previously, I did increase the timeout of the cloud function itself, which did not help. |
I had this issue as well when publishing messages from a V2 node cloud function and I found a solution which resolved my problem. Increasing the timeout as @bkovari recommended prevented the timeout error from being repeated, but there are a few other things to keep in mind when firing large volumes of messages from cloud functions. I found that batching messages in large numbers led to the node PubSub client hanging in the Cloud Function's execution and not delivering anything. I resolved this by limiting the batch size to one. This may not be ideal for your use case, but for the vast majority it will be fine:
The next thing I can recommend if you are working with JS or TS is resolving the promise from the message send immediately before moving on to your next message send. Allowing numerous unresolved message send promises to build up seems to result in the PubSub client hanging and quietly failing. Here is the TS code that worked for me: ` export const pubsubService = { // initialize pubsub client with token interface Attributes { /**
// Add two custom attributes, origin and username, to the message const dataBuffer = Buffer.from(data); In your cloud function you use the function like so and you should have no more (or at the very least less) problems with message delivery:
|
When the image is run locally on Docker the application is able to successfully publish messages to Cloud PubSub.
When the same image is deployed to Cloud Run, it won't publish a single message. All attempts fail with the error:
Sample Code:
The code hangs after logging this line:
console.info('Worker reminder-report-vitals', 'event payload', JSON.stringify(payload))
Sample Event Payload:
Environment details
@google-cloud/pubsub
version: 2.18.4Steps to reproduce
The text was updated successfully, but these errors were encountered: