Does AWS Application Load balancer still suffers HOL Blocking with clients? - amazon-web-services

We recently switched to Amazon's new ALBs (Application Load Balancer) and were excited to get the benefits of http2 multiplexing. But it appears that there is still a HOL (Head of line) blocking issues going on.
The client is able to request the images in parallel but still has to wait for a period of time before it can begin downloading the image. My guess it that because AWS's Application Load Balancer terminates http2 and then talks to the ec2 instances via http/1 creating the HOL delay.
I may be reading this chart wrong, and if so could someone please explain to me why content isn't being downloaded faster, in other words I would expect the green portion to be smaller and the blue bar to appear sooner. Does networking between the load balacner and ec2 instance not suffer from HOL? Is there some magic stuff happening?

I believe that what you see might be related to how current Chrome (around v54) prioritizes requests.
Exclusive dependencies
Chrome uses HTTP/2's exclusive dependencies (that can be dynamically re-arranged as the page is parsed and new resources are discovered) in order to have all streams depend strictly on one another. That means that all resources are effectively sent one after the other. I wrote a small utility that parses the output of chrome://net-internals/#http2 in order to show the tree for a given page: https://github.com/deweerdt/h2priograph.
A dependency from stream to another can either be exclusive or not. Section 5.3.1 of RFC 7540 covers that part: https://www.rfc-editor.org/rfc/rfc7540#section-5.3.1, here's the example they give when adding a new stream D to A:
Not exclusive:
A A
/ \ ==> /|\
B C B D C
Exclusive:
A
A |
/ \ ==> D
B C / \
B C
Example
Let's take this page for example
Chrome only uses the exclusive dependencies, so the dependency tree might at first look like:
HTML
|
PNG
|
PNG
When the javascript is discovered, Chrome reprioritizes it above the images (because the javascript is more likely to affect how the page is rendered), so Chrome puts the JS right below the HTML:
HTML
|
JS
|
PNG
|
PNG
Because all the requests are daisy chained, it might look, as in the waterfall you posted that requests are executed one after another like in HTTP/1, but they're not, because they can be re-ordered on the fly and all the requests are sent to the browser ASAP.
Firefox on the other hand will have resources of the same type share the same priority so you should be able to see some interlacing there.
Please note that what Chrome does isn't equivalent to what used to happen with HTTP/1 since all the requests have been sent to the server, so the server always has something to be sent on the wire. In addition to that, ordering is dynamic, so if a higher priority resource is discovered in the page, Chrome will re-order the priority tree so that this new resource takes precedence over existing lower priority resources.

Might be super late for an answer, but still.
AWS ELB documentation clearly states that:
The backend connection (between ALB and target) supports HTTP 1.1 only
The backend connection does not support HTTP 1.1 pipelining (which might have somewhat replicated HTTP 2 multiplexing).
So essentially what you observe are multiplexed requests on the frontend connection that are processed one by one by ALB and waiting for backend reply until next frontend request is processed and routed to target.
You can refer to "HTTP connections" section from documentation.

Related

clear old AWS MediaPackage content from Live Stream

I am using AWS MediaLive and MediaPackage to deliver a HLS Livestream.
However if the stream ends there is always one Minute available in the .m3u8 playlist.
The settings "Startover window (sec.): 0" does not seem to solve this.
Deleting and creating new .m3u8 playlist would be very inconviniert because all players would have to be updatet.
Do anyone have an advice?
Cheers, Richy
Thanks for your post. If i understand correctly, you are referring to the MediaPackage endpoint which serves up a manifest with the last known segments, (60 seconds worth of segments by default).
There are several ways to alter or stop this behavior. I suggest testing some of these methods to see which you prefer:
[a] Delete the public-facing MediaPackage endpoint shortly (perhaps 10s) after your event ends. All subsequent requests to that endpoint will return an error. Segments already retrieved and cached by the player will not be affected, but no new data will be served. Note: you may also maintain a private endpoint on the same Channel to allow for viewing + harvesting of the streamed content if you wish.
[b] Use an AWS CloudFront CDN Distribution with a short Time to Live (TTL) in front of your MediaPackage Channel (which acts as the origin) to deliver content segments to your viewers. When the event ends, you can immediately disable or delete this CDN Distribution, and all requests for content segments will return an error. Segments already retrieved and cached by the player will not be affected, but no new data will be served from this distribution.
[c] Encrypt the content using MediaPackage encryption, then disable the keys at the end of the event. This Same approach applies to CDN Authorization headers, which you can mandate for the event playback and then delete after event completes.
[e] Use DNS redirection to your MediaPackage endpoint. When the event ends, remove the DNS redirector so that any calls to the old domain will fail.
I think one or a combination of these methods will work for you. Good Luck!

Optimization and loadbalancing of microservice based backend

I have a client which has a pretty popular ticket selling service, to the point that the microservice based backend is struggling to keep up, I need to come up with a solution to optimize and loadbalance the system. The infrastructure works through a series of interconnected microservices.
When a user enter the sales channels (mobile or web app), the request is directed to an AWS API Gateway which is in charge of orchestrating the communication towards the microservice in charge of obtaining the requested resources.
These resources are provided from a third party API
This third party has physical servers in each venue in charge of synchronizing the information between the POS systems and the digital sales channels.
We have a REDIS instance in charge of caching these requests that we make to the third party API, we cache each endpoint with a TTL relative to the frequency of updating the information.
Here is some background info:
We get traffic mostly from 2 major countries
On a normal day, about 100 thousands users will use the service, with an 70%/30% traffic relation in between the two countries
On important days, each country has different opening hours (Country A starts sales at 10 am UTC, but country B starts at 5 pm UTC), on these days the traffic increases some n times
We have a main MiddleWare through which all requests made by clients are processed.
We have a REDIS cache database that stores GETs with different TTLs for each endpoint.
We have a MiddleWare that decides to make the request to the cache or to the third party's API, as the case may be.
And these are the complaints I have gotten that need to be deal with:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
Every time the above happens, the computation layer must be manually increased from the infrastructure.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
I was thinking of a couple of thinks that I could suggest:
Adding in instrumentation of the requests by using AWS X-Ray
Adding in a separate table for errors in the redis cache layer (old data has to be better than no data for the end user)
Adding in AWS elastic load balancing for the main middleware
But I'm not sure how realistic would be to implement these 3 things, I'm also not sure if they would even solve the problem, I personally don't really have experience with optimizing this type of backed. I would appreciate any suggestions, recommendations, links, documentation, etc. I'm really desperate for a solution to this problem
few thoughts:
When a country receives a high amount of requests, the country with the least traffic gets negatively affected, the clients do not respond, or respond partially because the computation layer's limit was exceeded and so the users have a bad experience
A common approach in aws is to regionalize stack - assuming you are using cdk/cloud formation creating regionalized stack should be a straightforward task.
But it is a question if this will solve the problem. Your system suffers from availability issues, regionalization will isolate this problem down to regions. So we should be able to do better (see below)
Every time the above happens, the computation layer must be manually increased from the infrastructure.
AWS has an option to automatically scale up and down based on traffic patterns. This is a neat feature, given you set limits to make sure you are not overcharged.
Each request has different response times, stadiums respond in +/- 40 seconds and movie theaters in 3 seconds. These requests enter a queue and are answered in order of arrival.
It seems that the large variance is because you have to contact the servers at venues. I recommend to decouple that activity. Basically calls to venues should be done async; there are several ways you could do that - queues and customer push/pull are the approaches (please, comment if more details are needed. but this is quite standard problem - lots of data in the internet)
The error handling is not clear. The errors are mixed up and you can't tell from which country the errors are coming from and how many errors there are
That's a code fix, when you do send data to cloudwatch (do you?). You could put country as a context to all request, via filter or something. And when error is logged that context is logged as well. You probably need venue id even more than country, as you can conclude country from venue id.
The responses from the third party API are not cached correctly in the cache layer since errors are stored for the time of the TTL
Don't store errors + add a circuit breaker pattern.

Making GCP Load-balancer HTTP logs integrate with Cloud Trace (in GCP Logs Explorer)?

Cloud Trace and Cloud Logging integrate quite nicely in most cases, described in https://cloud.google.com/trace/docs/trace-log-integration
Unfortunately, this doesn't seem to include the HTTP request logs generated by a Load Balancer when request logging is enabled.
The LB logs show the traces icon, and are correctly associated with an overall trace in the Cloud Trace system, but the context menu 'show trace details' is greyed out for those log items.
A similar problem arose with my application level logging/tracing, and was solved by setting the traceSampled attribute on the LogEntry, but this can't work for LB logs, since I'm not in control of their generation.
In this instance I'm tracing 100% of requests since the service is M2M and fairly low volume, but in the general case it makes sense that the LB can't know if something is actually generating traces without being told.
I can't find any good references in the docs, but in theory a response header indicating it was sampled could be observed by the LB and cause it to issue the appropriate log.
Any ideas if such a feature exists, in this form or any other?
(Last-ditch workaround might be to use Logs Router to feed LB logs into a pubsub queue (and exclude them from normal logging sinks), and resubmit them to the normal sink(s) with fields appropriately set by some Cloud Function or other pubsub consumer, but that seems like a lot of work and complexity for this purpose)
There is currently a Feature Request created for this, you can follow the status in the following link 1.
As a workaround, you could implement target proxies along with your Load Balancer, according to the documentation for a Global external HTTP(S) load balancer:
The proxies set HTTP request/response headers as follows:
Via: 1.1 google (requests and responses)
X-Forwarded-Proto: [http | https] (requests only)
X-Cloud-Trace-Context: <trace-id>/<span-id>;<trace-options> (requests only) Contains parameters for Cloud Trace.
X-Forwarded-For: [<supplied-value>,]<client-ip>,<load-balancer-ip>
(see X-Forwarded-For header) (requests only)
You can find the complete documentation about external HTTP(S) load balancers and target proxies here 2.
And finally, take a look at the documentation on how to use and configure target proxies here 3.

Camel route POSTs to service that takes 20+ minutes to respond

I have an Apache Camel (version 2.15.3) route that is configured as follows (using a mix of XML and Java DSL):
Read a file from one of several folders on an FTP site.
Set a header to indicate which folder it was read from.
Do some processing and auditing.
Synchronously POST to an external REST service (jax-rs 1.1, Glassfish, Java EE 6).
The REST service takes a long time to do its job, 20+ minutes.
Receive the reply.
Do some more processing and auditing.
Write the response to one of several folders on an FTP site.
Use the header set at the start to know which folder to write to.
This is all configured in a single path of chained routes.
The problem is that the connection to the external REST service will timeout while the service is still processing. The infrastructure is a bit complex (edge servers, load balancers, Glassfish), and regardless I don't think increasing the timeout is the right solution.
How can I implement this route such that I avoid timeouts while still meeting all my requirements to (1) write the response to the appropriate FTP folder, (2) audit the transaction, and (3) meet other transaction/context-specific requirements?
I'm relatively new to Camel and REST, so maybe this is easy, but I don't know what Camel and REST tools and techniques to use.
(Questions and suggestions for improvement are welcome.)
Isn't it possible to break the two main steps a part and have two asynchronous operations?
I would do as follows.
Read a file from one of several folders on an FTP site.
Set a header to indicate which folder it was read from.
Save the header and file name and other relevant information in a cache. There is a camel component called camel-cache that is relatively easy to setup and you can store key-value or any other objects.
Do some processing and auditing. Asynchronously POST to an external REST service (jax-rs 1.1, Glassfish, Java EE 6). Note that we are posting asynchronously here.
Step 2.
Receive the reply.
Lookup the reply identifiers i.e. filename or some other identifier in cache to match the reply and then fetch the header.
Do some more processing and auditing.
Write the response to one of several folders on an FTP site.
This way, you don't need to wait and processing can take 20 min or longer. You just set your cache values to not expire for say 24h.
This is a typical asynchronous use case. Can the rest service give you a token id or some unique id immediately after you hit them ?
So that you can have a batch job or some other camel route which will pick up this id from a database/cache and hit the rest service again after 20 minutes.
This is the ideal solution I can think of, if the rest service can provision this.
You are right, waiting for 20 minutes on a synchronous call is a crazy idea. Also what is the estimated size of the file/payload which you are planning to post to the rest service ?

Amazon EC2 EBS i/o costs

I am hosting my application on amazon ec2 , on one of their micro linux instances.
It costs (apart from other costs) $0.11 per 1 million I/O requests . I was wondering how much I/O requests does it take when I have say 1000 users using it for say 1 hours per day for 1 month ?
I guess my main concern is : if a hacker keeps hitting my login page (simple html) , will it increase the I/O request count ? I guess yes, as every time the server needs to do something to server that page.
There are a lot of factors that will impact your IO requests, as #datasage says, try it and see how it behaves under your scenario. Micro Linux instances are incredible cheap to begin with, but if you are really concerned, setup a billing alert that will notify you when your usage passes a pre-determined threshold - if it suddenly spikes up, you can take some action to shut it down if that is what you want.
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts
Take a look at CloudWatch, and (for free) set up a VolumeWriteOps and VolumeReadOps alarm to work with Amazon Simple Notification Service (SNS) to send you a text message and eMail notice right away if things get too busy, before the bill gets high! (A billing alert will let you know too late - after it has reached the threshold.)
In general though, from my experience, you will not have the problem you outline. Scan the EC2 Discussion Forum at forums.aws.amazon.com where you would find evidence of this kind of problem if were prevalent; it does not seem to be happening.
#Dilpa yes you are right. If some brute force attack will occur to your website eg: somebody hitting to your loginn page then it will increase the server I/O if you enable loging for your webserver. Webserver will keep log to it's log files of every event and that will increase your I/O. Just verify your webserver log for such kind of attack and you can prevent them.