Resolving intermittent 502 Bad Gateway with Cloudfront pulling from S3 - amazon-web-services

We have an AWS production setup with pieces including EC2, S3, and Cloudfront (among others). The website on EC2 generates XML feeds which includes a number of images for each item (in total, over 300k images). The XML feed is consumed by a third-party, which processes the feed and downloads any new images. All image links point to Cloudfront with an S3 bucket as its origin.
Reviewing third-party logs, many of those images are successfully downloaded. But there are still many images that fail. They are getting 502 Bad Gateway messages. Looking at Cloudfront logs, all I'm seeing is OriginError with no indication of what's causing the error. Most discussions I've found about Cloudfront 502 errors point to SSL issues and seem to be people getting a 502 with every request. SSL isn't a factor here, and most requests successfully process, so it's an intermittent issue - and I haven't been able to manually replicate the issue.
I'm suspecting something with S3 rate limiting, but even with that many images, I don't think the third-party is grabbing images anywhere near fast enough to trigger rate limiting. But I could be wrong. Either way, I can't figure out what's causing the issue - so I can't figure out how to fix it - since I'm not getting a more specific error from S3/CloudFront. Below is one row from the Cloudfront log, broken down.
ABC.2021-10-21-21.ABC:2021-10-21
21:09:47
DFW53-C1
508
ABCIP
GET
ABC.cloudfront.net
/ABC.jpg
502
-
ABCUA
-
-
Error
ABCID
ABC.cloudfront.net
https
294
4.045
-
TLSv1.2
ECDHE-RSA-AES128-GCM-SHA256
Error
HTTP/1.1
-
-
11009
4.045
OriginError
application/json
36
-
-

Related

Do Amazon CloudFront or Azure CDN support dynamic compression for HTTP range requests?

AWS CloudFront and Azure CDN can dynamically compress files under certain circumstances. But do they also support dynamic compression for HTTP range requests?
I couldn't find any hints in the documentations only on the Google Cloud Storage docs.
Azure:
Range requests may be compressed into different sizes. Azure Front Door requires the content-length values to be the same for any GET HTTP request. If clients send byte range requests with the accept-encoding header that leads to the Origin responding with different content lengths, then Azure Front Door will return a 503 error. You can either disable compression on Origin/Azure Front Door or create a Rules Set rule to remove accept-encoding from the request for byte range requests.
See: https://learn.microsoft.com/en-us/azure/frontdoor/standard-premium/how-to-compression
AWS:
HTTP status code of the response
CloudFront compresses objects only when the HTTP status code of the response is 200, 403, or 404.
--> Range-Request has status code 206
See:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/206
• Yes, Azure CDN also supports dynamic compression for HTTP range requests wherein it is known as ‘object chunking’. You can describe object chunking as dividing the file to be retrieved from the origin server/resource into smaller chunks of 8 MB. When a large file is requested, the CDN retrieves smaller pieces of the file from the origin. After the CDN POP server receives a full or byte-range file request, the CDN edge server requests the file from the origin in chunks of 8 MB.
• After the chunk arrives at the CDN edge, it's cached and immediately served to the user. The CDN then prefetches the next chunk in parallel. This prefetch ensures that the content stays one chunk ahead of the user, which reduces latency. This process continues until the entire file is downloaded (if requested), all byte ranges are available (if requested), or the client terminates the connection.
Also, this capability of object chunking relies on the ability of the origin server to support byte-range requests; if the origin server doesn't support byte-range requests, requests to download data greater than 8mb size will fail.
Please find the below link for more details regarding the above: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-large-file-optimization#object-chunking
Also, find the below link for more clarification on the types of compression and the nature of compression for Azure CDN profiles that are supported: -
https://learn.microsoft.com/en-us/azure/cdn/cdn-improve-performance#azure-cdn-standard-from-microsoft-profiles
Some tests have shown when dynamic compression is enabled in AWS CloudFront the range support is disabled. So Range and If-Range headers are removed from all request.

Media Tailor ad returning 504 error in AWS

I'm using AWS Media Tailor to test an ad inserting demo. The demo page is this one: https://github.com/aws-samples/aws-media-services-simple-vod-workflow/tree/master/12-AdMarkerInsertion.
When I place my manifest into a TheoPlayer I always get an 504 error. My manifes page is: https://ebf348c58b834d189af82777f4f742a6.mediatailor.us-west-2.amazonaws.com/v1/master/3c879a81c14534e13d0b39aac4479d6d57e7c462/MyTestCampaign/llama.m3u8.
I have also tried with: https://ebf348c58b834d189af82777f4f742a6.mediatailor.us-west-2.amazonaws.com/v1/master/3c879a81c14534e13d0b39aac4479d6d57e7c462/MyTestCampaign/llama_with_slates.m3u8.
The specific error is:
{"message":"failed to generate manifest: Unable to obtain template playlist. sessionId:[c915d529-3527-4e37-89e0-087e393e75de]"}
I have read about this error: https://docs.aws.amazon.com/mediatailor/latest/ug/playback-errors-examples.html
But don't know how to fix it.
Maybe I did something wrong or do I need a quote in AWS?
Any idea?
Thanks for the inquiry!
The following example shows the result when a timeout occurs between AWS Elemental MediaTailor and either the ad decision server (ADS) or the origin server.
An HTTP 504 error is known as a Gateway Timeout meaning that a resource was unresponsive and prevented the request from completing successfully. In this case since MediaTailor is returning an HTTP 504 this means that either the ADS or Origin failed to respond within the timeout period.
To troubleshoot this you will need to determine which dependency is failing to respond to MediaTailor and correct it. Typically the issue is the ADS failing to respond to a VAST request performed by MediaTailor which you can confirm by reviewing your CloudWatch logs.
https://docs.aws.amazon.com/mediatailor/latest/ug/monitor-cloudwatch-ads-logs.html
Make sure that your ADS follows the guidelines listed below for integrating with MediaTailor.
https://docs.aws.amazon.com/mediatailor/latest/ug/vast-integration.html

Boto3 Upload Issues

I have a very strange issue with uploading to S3 from Boto. In our (Elastic-Beanstalk-)deployed instances, we have no problems uploading to S3, and other developers with the same S3 credentials also have no issues. However, when locally testing, using the same Dockerfile, I can upload files up to exactly 1391 bytes, but anything 1392 bytes and above just gives me a connection that times out and retries a few times.
2018-03-27 18:14:34 botocore.vendored.requests.packages.urllib3.connectionpool INFO Starting new HTTPS connection (1): xxx.s3.amazonaws.com
2018-03-27 18:14:34 botocore.vendored.requests.packages.urllib3.connectionpool INFO Starting new HTTPS connection (1): xxx.s3.xxx.amazonaws.com
2018-03-27 18:15:14 botocore.vendored.requests.packages.urllib3.connectionpool INFO Resetting dropped connection: xxx.s3.xxx.amazonaws.com
I've tried this with every variant of uploading to S3 from Boto, including boto3.resource('s3').meta.client.upload_file, boto3.resource('s3').meta.client.upload_fileobj, and boto3.resource('s3').Bucket('xxx').put_object.
Any ideas what could be wrong here?

Error: EntityTooLarge Status 400 AmazonAWS

I'm trying to upload a video from a Cordova app to an Amazon AWS S3 bucket from an Android/iPhone. But it's failing sometimes, giving sporadic reports of this error from AWS bucket:
http_status:400,
<Code>EntityTooLarge</Code>
Some of the files are tiny, some around 300mb or so.
What can I do to resolve this at the AWS end?
The 400 Bad Request error is sometimes used by S3 to indicate conditions that make the request in some sense invalid -- not just syntactically invalid, which is the traditional sense of 400 errors.
EntityTooLarge
Your proposed upload exceeds the maximum allowed object size.
400 Bad Request
http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
Note the word "proposed." This appears to be a reaction to the Content-Length request header you are sending. You may want to examine that. Perhaps the header is inconsistent with the actual size of the file, or the file is being detected as larger than it actually is.
Note that while the maximum object size in S3 is 5 TiB, the maximum upload size is 5 GiB. (Objects larger than 5 GiB have to be uploaded in multiple parts.)
413 errors occur when the request body is larger than the server is configured to allow. I believe its not the error which AWS S3 is throwing because they support 5 TB size of object.
If you are first accepting this video in your app and from there you are making request to amazon S3, then your server is not configure to accept the large entities in request.
Refer -set-entity-size for different servers. if your server is not listed here, then you need to figure out how to increase entity size for your server.

pyzabbix requests.exceptions.HTTPError: 504 error using trigger.get method

I was using pyzabbix and trying to use trigger.get method to get all trigger info, but returned 504 Gateway Timeout exception. It never happened before, when I tried to get all the single host trigger info by specifying the host name using filter keyword it worked well. I thought it resulted from the increasing amount of hosts which means large number of trigger it returned. I have about 1800 hosts so far. Any solutions to this problem?
While Zabbix API in general has performance issues, and there are various PHP parameters to control timeouts like max_execution_time, HTTP response code 504 sounds suspicious. If you are using a proxy (maybe transparent, reverse etc), check the timeouts there and consider hitting the Zabbix API directly.