How much latency does X-ray add to AWS Lamda functions - amazon-web-services

I've gone over the documentation and cannot find a clear statement regarding how much latency is X-ray tracing supposed to add to Lambda function executions (and to other services as well). It should be minimal, but since it's sending out traces, some latency is expected.
Does anyone have the numbers?

AWS X-Ray SDKs which you use in your application do not send trace segments to X-Ray service directly. The segments are transmitted over UDP to xray daemon running on localhost. So the latency involved is only for in memory updates to the segment data. Only when the segments are complete, they are sent over UDP to localhost. Hence, you should expect minimal possible overhead on your application. Also the daemon which runs in a separate process does not send segments to the service immediately. It buffers the segments for a short period and periodically sends them in batches using the PutTraceSegments API call.
If you are interested to dig further, most AWS X-Ray SDKs are open sourced on GitHub. Java SDK for example https://github.com/aws/aws-xray-sdk-java

Related

AWS SQS SendMessage takes hundreds of seconds to complete

I am developing a web application where a HTTP request triggers a long background task. To decouple the HTTP request processing I am using AWS SQS. I enqueue the request to do the background processing when handling the HTTP request. The message is then picked up by background process which actually does the work. This way the latency of my application is kept low.
Recently I noticed worryingly high latencies when sending messages to SQS. I tried googling and the normal latency should be in hundreds of milliseconds.
The problem is the latency sometimes spikes over 130000 ms! The background processing actually takes less time than enqueuing the work.
I am using Standard queue, which I understand is a more of a best-effort service. Is this kind of latency common thing with AWS SQS? How can I proceed with debugging of this issue?
The messages are short JSONs containing the ID of the object which should be processed in background.
{"type":"DO_BACKGROUND_WORK","ids":["123456"]}
The obvious reason could be networking issue between AWS and my server. However the SES endpoint does not have such latencies, nor the Rollbar error logging does.
The application is hosted in Europe (Contabo) but not in AWS. The CPU load is normal during the work, also the RAM usage is normal.

What is `Active tracing` mean in lambda with Xray?

I deployed a lambda with xray is enabled. And I am able to see all trace in XRay console from my lambda. But I can see a warning message in below screenshot. It shows Active tracing requires permissions that are not configured to lambda. But I don't understand what Active tracing mean. I have read article like this https://docs.aws.amazon.com/xray/latest/devguide/xray-services-lambda.html but it doesn't explain very well.
So what does Active tracing mean and does it cost too much?
I also had this warning under "Active tracing." If you click into Edit it gives a bit more explanation, saying it needs permission to send trace data.
You can find the documentation here, but the short version is that you'll want to add the AWSXRayDaemonWriteAccess policy to your lambda function's execution role.
The different levels of x-ray integration with AWS services is explained here:
Active instrumentation – Samples and instruments incoming requests.
Passive instrumentation – Instruments requests that have been sampled by another service.
Request tracing – Adds a tracing header to all incoming requests and propagates it downstream.
Tooling – Runs the X-Ray daemon to receive segments from the X-Ray SDK.
AWS Lambda supports both active and passive instrumentation. So basically you use passive instrumentation if your function handles requests that have been sampled by some other service (e.g. API gateway). In contrast, if your function gets "raw" un-sampled requests, you should use active instrumentation, so that the sampling takes place.

Performance testing for serverless applications in AWS

In Traditional Performance Automation Testing:
There is an application server where all the requests hits are received. So in this case; we have server configuration (CPU, RAM etc) with us to perform load testing (of lets say 5k concurrent users) using Jmeter or any load test tool and check server performance.
In case of AWS Serverless; there is no server - so to speak - all servers are managed by AWS. So code only resides in lambdas and it is decided by AWS on run time to perform load balancing in case there are high volumes on servers.
So now; we have a web app hosted on AWS using serverless framework and we want to measure performance of the same for 5K concurrent users. With no server backend information; only option here is to rely on the frontend or browser based response times - should this suffice?
Is there a better way to check performance of serverless applications?
I didn't work with AWS, but in my opinion performance testing in case serverless applications should perform pretty the same way as in traditional way with own physical servers.
Despite the name serverless, physical servers are still used (though are managed by aws).
So I will approach to this task with next steps:
send backend metrics (response time, count requests and so on) to some metrics system (graphite, prometheus, etc)
build dashboard in this metric system (ideally you should see requests count and response time per every instance and count of instances)
take a load testing tool (jmeter, gatling or whatever) and start your load test scenario
During the test and after the test you will see how many requests your app processing, it response times and how change count of instances depending of concurrent requests.
So in such case you will agnostic from aws management tools (but probably aws have some management dashboard and afterwards it will good to compare their results).
"Loadtesting" a serverless application is not the same as that of a traditional application. The reason for this is that when you write code that will run on a machine with a fixed amount CPU and RAM, many HTTP requests will be processed on that same machine at the same time. This means you can suffer from the noisy-neighbour effect where one request is consuming so much CPU and RAM that it is negatively affecting other requests. This could be for many reasons including sub-optimal code that is consuming a lot of resources. An attempted solution to this issue is to enable auto-scaling (automatically spin up additional servers if the load on the current ones reaches some threshold) and load balancing to spread requests across multiple servers.
This is why you need to load test a traditional application; you need to ensure that the code you wrote is performant enough to handle the influx of X number of visitors and that the underlying scaling systems can absorb the load as needed. It's also why, when you are expecting a sudden burst of traffic, you will pre-emptively spin up additional servers to help manage all that load ahead of time. The problem is you cannot always predict that; a famous person mentions your service on Facebook and suddenly your systems need to respond in seconds and usually can't.
In serverless applications, a lot of the issues around noisy neighbours in compute are removed for a number of reasons:
A lot of what you usually did in code is now done in a managed service; most web frameworks will route HTTP requests in code however API Gateway in AWS takes that over.
Lambda functions are isolated and each instance of a Lambda function has a certain quantity of memory and CPU allocated to it. It has little to no effect on other instances of Lambda functions executing at the same time (this also means if a developer makes a mistake and writes sub-optimal code, it won't bring down a server; serverless compute is far more forgiving to mistakes).
All of this is not to say its not impossible to do your homework to make sure your serverless application can handle the load. You just do it differently. Instead of trying to push fake users at your application to see if it can handle it, consult the documentation for the various services you use. AWS for example publishes the limits to these services and guarantees those numbers as a part of the service. For example, API Gateway has a limit of 10 000 requests per second. Do you expect traffic greater than 10 000 per second? If not, your good! If you do, contact AWS and they may be able to increase that limit for you. Similar limits apply to AWS Lambda, DynamoDB, S3 and all other services.
As you have mentioned, the serverless architecture (FAAS) don't have a physical or virtual server we cannot monitor the traditional metrics. Instead we can capture the below:
Auto Scalability:
Since the main advantage of this platform is Scalability, we need to check the auto scalability by increasing the load.
More requests, less response time:
When hitting huge amount of requests, traditional servers will increase the response time where as this approach will make it lesser. We need to monitor the response time.
Lambda insights in Cloudwatch:
There is an option to monitor the performance of multiple Lambda functions - Throttles, Invocations & Errors, Memory usage, CPU usage and network usage. We can configure the Lambdas we need and monitor in the 'Performance monitoring' column.
Container CPU and Memory usage:
In cloudwatch, we can create a dashboard with widgets to capture the CPU and memory usage of the containers, tasks count and LB response time (if any).

What Elastic Beanstalk Environment should I choose?

My job is to move our existing java calculation (servlet as WAR file) from our own server to AWS. This is a calculation without user interface or database. Other companies should be able to call the calculation in their programs. The servlet takes a post request with Json payload and the response sends Json payload back to client after the calculation is performed. The calculation is relatively heavy and therefore time-consuming (1-2 sec.).
I have decided to use AWS Elastic Beanstalk for the cloud computing but I'm in doubt as to what EB Environment to use - Server or Worker environment? and if I should use AWS API Gateway in front of EB?
Hopefully somebody can clarify this for me.
Worker environment produces an SQS queue where you submit your jobs into. To enable access to it from outside of AWS you would have to front it with API Gateway (preferred way).
However, the worker environment works in asynchronous way. It does not return job results to the caller. You would need to have some other mechanism for your clients to get the results back, e.g. though different API call.
An alternative is web environment where the clients get the response back directly from your json processing application. 1-2 seconds is not that long of a wait for an HTTP request.
For more complex solution based on EB, one could look at Creating links between Elastic Beanstalk environments. You would have a front-end environment for your clients linked with worker environment that does the json job processing.
The other way would be to rewrite the app into lambda, if possible of course. Lambda seems as a good fit for 1-2 seconds processing tasks.

is latency and throughput in AWS SNS good enough to replace dedicated MQ for pub/sub?

For a sake of HA I'm considering switching from self hosted solution (ZeroMQ) to AWS Simple Notification Service for pub/sub in an application. Which is a backend for an app, thus should be reasonably real-time.
What are latency and throughput I can expect of SNS?
Is the app going to be hosted on EC2? If so, the latency will be far diminished, as the communication channel will be going across Amazon's connection, rather than through the internet.
If you are going to call AWS services from boxes not hosted on EC2, here's a cool site that attempts to give you an idea of the amount of latency between you and various AWS services and locations.
How are you measuring the HTTP Ping Request Latency?
We are making a HTTP GET request to AWS Service Endpoints (like EC2,
SQS, SNS etc) for PING and measuring the observed latency for it
across all regions.
As for thoughput, that is left up to you. You can use various strategies to increase throughput, like multi-treading, batching messages, etc.
Keep in mind that you will have to code for some side effects, like possibly seeing the same message twice (At Least Once Delivery), and not being able to rely on FIFO.