Google pubsub 88% of requests come back as 503 - google-cloud-platform

Question on why pubsub requests seem to trigger such a high number of 503 errors? Is this something common? It seems other people see something similar but a majority of my requests end up that way
Similar to
Google Pubsub: UNAVAILABLE: The service was unable to fulfill your request
Catch error code from GCP pub/sub

This is expected behavior. Streaming pull, which is used by the client libraries, creates a bidirectional stream for receiving messages and sending back acknowledgements. These streams stay open for long periods of time and don't close with a successful response code when messages are received, they terminate with an error condition when the stream disconnects, perhaps due to a restart on the part of the server receiving the request or because of brief network blip. Therefore, even if you are receiving messages successfully, you'll still see error response codes for all of the streams themselves. The new streaming pull docs address this question directly.

Related

Cloud Run: 429: The request was aborted because there was no available instance

We (as a company) experience large spikes every day. We use Pub/Sub -> Cloud Run combination.
The issue we experience is that when high traffic hits, Pub/Sub tries to push messages to Cloud/Run all at the same time without any flow control. The result?
429: The request was aborted because there was no available instance.
Although this is marked as a warning, every 4xx HTTP response results in the message retry delivery.
Messages, therefore, come back to the queue and wait. If a message repeats this process and the instances are still taken, Cloud Run returns 429 again, and the message is sent back to the queue. This process repeats x times (depends on what value we set in Maximum delivery attempts). After that, the message goes to the dead-letter queue.
We want to avoid this and ideally don't get any 429, so the message won't travel back and forth, and it won't end up in the dead-letter subscription because it is not one of the application errors we want to keep there, but rather a warning caused by Pub/Sub not controlling the flow and coordinating with Cloud Run.
Neither Pub/Sub nor a push subscription (which is required to use for Cloud Run) have any flow control feature.
Is there any way to control how many messages are sent to Cloud Run to avoid getting the 429 response? And also, why does Pub/Sub even try to deliver when it is obvious that Cloud Run hit the limit of instances. The best would be to keep the messages in a queue until the instances free up.
Most of the answers would probably suggest increasing the limit of instances. We already set 1000. This would not be scalable because even if we set the limit to 1500 and a huge spike comes, we would pass the limit and get the 429 messages again.
The only option I can think of is some flow control. So far, we have read about Cloud Tasks, but we are not sure if this can help us. Ideally, we don't want to introduce any new service, but if necessary, we will do.
Thank you for all your tips and time! :)

Twilio throwing error as unreachable destination handset

We are using twilio for sending messages but as Twilio(Text Messaging) integration was shutting down we deployed the integration using cloud run by following steps from https://github.com/GoogleCloudPlatform/dialogflow-integrations/tree/master/twilio#readme
After deployment messages were sending successfully but now suddenly we are getting errors in twilio like
Some messages are sending successfully and for some messages we are getting error.can anybody help me in this.thanks in advance
According to Twillio docs there might be some possible causes for Unreachable destination handset
1.The destination handset you are trying to reach is switched off or otherwise unavailable.
2.The device you are trying to reach does not have sufficient signal
3.The device cannot receive SMS (for example, the phone number belongs to a landline)
4.There is an issue with the mobile carrier
Possible Solutions
The first step to troubleshooting this issue is to attempt to replicate the problems.
Attempt to send another test message to this user via a REST API request, or through the API Explorer in the Twilio Console.

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

Background:
We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
Issue:
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
References:
- https://cloud.google.com/pubsub/docs/subscriber
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.

Google Places API error 502 - The server encountered a temporary error

we run a website that obtains location data through the Google Place API. We have 150k daily searches available, which we haven´t met yet as the website has been live for few weeks only. We have suddenly received a 502 error. A notification in the Console says: “The server encountered a temporary error and could not complete your request.”. Is this a temporary error? Is there any suggestions on what we can do? The website hasn’t been available for 40 minutes.
When you receive 5xx status or UNKNOWN_ERROR in the response, you should implement a retrying logic. Google has a following recommendation in their web services documentation:
In rare cases something may go wrong serving your request; you may receive a 4XX or 5XX HTTP response code, or the TCP connection may simply fail somewhere between your client and Google's server. Often it is worthwhile re-trying the request as the followup request may succeed when the original failed. However, it is important not to simply loop repeatedly making requests to Google's servers. This looping behavior can overload the network between your client and Google causing problems for many parties.
A better approach is to retry with increasing delays between attempts. Usually the delay is increased by a multiplicative factor with each attempt, an approach known as Exponential Backoff.
https://developers.google.com/maps/documentation/directions/web-service-best-practices#exponential-backoff
However, if retrying logic with Exponential Backoff doesn't help and the error persists for a long time you should file a bug in Google issue tracker
I hope this addresses your doubt!
UPDATE
There was an issue on Google side yesterday (November 6, 2017), you can refer to the following bug that explains the issue:
https://issuetracker.google.com/issues/68938173

Is there any possibility that Amazon S3 will notify of a completed upload that actually failed?

Our application will depend upon uploads of fairly large files to an S3 Bucket via 3rd party apps like S3CMD (command line) and S3 Browser free version (GUI) for Windows from many locations around the world -- some with very shaky and slow internet connections. It is highly likely that packets may get lost and internet may cut out unexpectedly.
The S3 Bucket will be configured to send notifications to an SNS Topic which will forward the message to our application rest endpoint, using an XML file inside the notifications subresource, following the instructions in the Documentation here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
1) Is there any remote chance that a failed or incomplete upload will be reported as complete?
The notifications that can be sent to SNS seem to be fairly limited and there appears to be no method for conveying errors. The error notification documentation for S3 seems to be directed at someone implementing and handling their own upload mechanisms.
http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
2) Am I correct that there is no way to setup forwarding such error messages to SNS?
The SDKs provide their own mechanisms for catching errors when an upload fails.
A failed upload (access denied, content-length mismatch, connection timeout, content-md5 mismatch, multipart never completed nor aborted, or any other reason) will not trigger a notification. There's not a way to generate events from failed uploads -- S3 wouldn't necessarily even be aware of the failure, depending on the cause of the failure.