Cost efficiency for AWS Lambda Provisioned Concurrency

Cost efficiency for AWS Lambda Provisioned Concurrency - amazon-web-services

I'm running a system with lots of AWS Lambda functions. Our load is not huge, let's say a function gets 100k invocations per month.
For quite a few of the Lambda functions, we're using warmup plugins to reduce cold start times. This is effectively a CloudWatch event triggered every 5 minutes to invoke the function with a dummy event which is ignored, but keeps that Lambda VM running. In most cases, this means one instance will be "warm".
I'm now looking at the native solution to the cold start problem: AWS Lambda Concurrent Provisioning, which at first glance looks awesome, but when I start calculating, either I'm missing something, or this will simply be a large cost increase for a system with only medium load.
Example, with prices from the eu-west-1 region as of 2020-09-16:
Consider function RAM M (GB), average execution time t (s), million requests per month N, provisioned concurrency C ("number of instances"):
Without provisioned concurrency
Cost per month = N⋅(16.6667⋅M⋅t + 0.20)
= $16.87 per million requests # M = 1 GB, t = 1 s
= $1.87 per million requests # M = 1 GB, t = 100 ms
= $1.69 per 100.000 requests # M = 1 GB, t = 1 s
= $1686.67 per 100M requests # M = 1GB, t = 1 s
With provisioned concurrency
Cost per month = C⋅0.000004646⋅M⋅60⋅60⋅24⋅30 + N⋅(10.8407⋅M⋅t + 0.20) = 12.04⋅C⋅M + N(10.84⋅M⋅t + 0.20)
= $12.04 + $11.04 = $23.08 per million requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1.28 = $13.32 per million requests # M = 1 GB, t = 100 ms, C = 1
= $12.04 + $1.10 = $13.14 per 100.000 requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1104.07 = $1116.11 per 100M requests # M = 1 GB, t = 1 s, C = 1
There are obviously several factors to take into account here:
How many requests per month is expected? (N)
How much RAM does the function need? (M)
What is the average execution time? (t)
What is the traffic pattern, few small bursts or even traffic (might mean C is low, high, or must be dynamically changed to follow peak hours etc)
In the end though, my initial conclusion is that Provisioned Concurrency will only be a good deal if you have a lot of traffic? In my example, at 100M requests per month there's a substantial saving (however, at that traffic it's perhaps likely that you would need a higher value of C as well; break-even at about C = 30). Even with C = 1, you need almost a million requests per month to cover the static costs.
Now, there are obviously other benefits of using the native solution (no ugly dummy events, no log pollution, flexible amount of warm instances, ...), and there are also probably other hidden costs of custom solutions (CloudWatch events, additional CloudWatch logging for dummy invocations etc), but I think they are pretty much neglectible.
Is my analysis fairly correct or am I missing something?

I think about provisioned concurrency as something that eliminates the cold starts and not something that saves money. There is a bit of saving if you can keep the lambda function running all the time (100%) utilization, but as you've calculated it becomes quite expensive when the provisioned capacity sits idle.

Related

Google Dataflow Pricing Streaming Mode

I'm new to Dataflow.
I'd like to use the Dataflow streaming template "Pub/Sub Subscription to BigQuery" to transfer some messages, say 10000 per day.
My question is about pricing since I don't understand how they're computed for the streaming mode, with Streaming Engine enabled or not.
I've used the Google Calculator which asks for the following:
Machine Type, Number of worker nodes used by the job, If streaming or Batch job, Number of GB of Persistent Disks (PD), Hours the job runs per month.
Consider the easiest case, since I don't need many resources, i.e.
Machine type: n1-standard1
Max Workers: 1
Job Type: Streaming
Price: in us-central1
Case 1: Streaming Engine DISABLED
Hours using the vCPU = 730 hours (1 month always active). Is this always true for the streaming mode? Or there can be a case in a streaming mode in which the usage is lower?
Persistent Disks: 430 GB HDD, which is the default value.
So I will pay:
(vCPU) 730 x $0.069(cost vCPU/hour) = $50.37
(PD) 730 x $0.000054 x 430 GB = $16.95
(RAM) 730 x $0.003557 x 3.75 GB = $9.74
TOTAL: $77.06, as confirmed by the calculator.
Case 2 Streaming Engine ENABLED.
Hours using the v CPU = 730 hours
Persistent Disks: 30 GB HDD, which is the default value
So I will pay:
(vCPU) 30 x $0.069(cost vCPU/hour) = $50.37
(PD) 30 x $0.000054 x 430 GB = $1.18
(RAM) 30 x $0.003557 x 3.75 GB = $9.74
TOTAL: $61.29 PLUS the amount of Data Processed (which is extra with Streaming Engine)
Considering messages of 1024 Byte, we have a traffic of 1024 x 10000 x 30 Bytes = 0.307 GB, and an extra cost of 0.307 GB x $0.018 = $0.005 (almost zero).
Actually, with this kind of traffic, I will save about $15 in using Streaming Engine.
Am I correct? Is there something else to consider or something wrong with my assumptions and my calculations?
Additionally, considering the low amount of data, is Dataflow really fitted for this kind of use? Or should I approach this problem in a different way?
Thank you in advance!

It's not false, but not perfectly accurate.
In the streaming mode, your Dataflow always listen the PubSub subscription and thus you need to but up full time.
In batch processing, you normally start the batch, it performs its job and then it stops.
In your comparison, you consider to have a batch job that runs full time. It's not impossible, but it doesn't fit your use case, I think.
About streaming and batching, all depends on your need of real time.
If you want to ingest the data in BigQuery with low latency (in few seconds) to have real time data, streaming is the good choice
If having data only updated every hour or every day, batch is a more suitable solution.
A latest remark, if your task is only to get message from PubSub and to stream write to BigQuery, you can consider to code it yourselves on Cloud Run or Cloud Functions. With only 10k messages per day, it will be free!

How to get current time in CP when using Intervals for scheduling

I am trying to schedule tasks in different machines. These machines have dynamique available ressources, for example:
machine 1: max capacity 4 core.
At T=t1 => available CPU = 2 core;
At T=t2 => available CPU = 1 core;
Each interval has a fixed time (Ex: 1 minute).
So in CPLEX, I have a cumulFunction to sum the used ressource in a machine :
cumulFunction cumuls[host in Hosts] =
sum(job in Jobs) pulse(itvs[task][host], requests[task]);
Now the problem is in the constraint:
forall(host in Hosts) {
cumuls[host] <= ftoi(available_res_function[host](**<<Current Period>>**));
}
I can't find a way to get the current period so that I could compare the used ressources to the available in that specefic period.
PS: available_res_function is a stepFunction of the available ressources.
Thank you so much for your help.

What you can do is to add a set of pulse in your cumul function.
For instance, in the sched_cumul function you could change:
cumulFunction workersUsage =
sum(h in Houses, t in TaskNames) pulse(itvs[h][t],1);
into
cumulFunction workersUsage =
sum(h in Houses, t in TaskNames) pulse(itvs[h][t],1)+pulse(1,40,3);
if you want to mention that 3 workers less are available between time 1 and 40.

Improve UPDATE-per-second performance of SQLite?

My question comes directly from this one, although I'm only interested on UPDATE and only that.
I have an application written in C/C++ which makes heavy use of SQLite, mostly SELECT/UPDATE, on a very frequent interval (about 20 queries every 0.5 to 1 second)
My database is not big, about 2500 records at the moments, here is the table structure:
CREATE TABLE player (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR(64) UNIQUE,
stats VARBINARY,
rules VARBINARY
);
Up to this point I did not used transactions because I was improving the code and wanted stability rather performance.
Then I measured my database performance by merely executing 10 update queries, the following (in a loop of different values):
// 10 times execution of this
UPDATE player SET stats = ? WHERE (name = ?)
where stats is a JSON of exactly 150 characters and name is from 5-10 characters.
Without transactions, the result is unacceptable: - about 1 full second (0.096 each)
With transactions, the time drops x7.5 times: - about 0.11 - 0.16 seconds (0.013 each)
I tried deleting a large part of the database and/or re-ordering / deleting columns to see if that changes anything but it did not. I get the above numbers even if the database contains just 100 records (tested).
I then tried playing with PRAGMA options:
PRAGMA synchronous = NORMAL
PRAGMA journal_mode = MEMORY
Gave me smaller times but not always, more like about 0.08 - 0.14 seconds
PRAGMA synchronous = OFF
PRAGMA journal_mode = MEMORY
Finally gave me extremely small times about 0.002 - 0.003 seconds but I don't want to use it since my application saves the database every second and there's a high chance of corrupted database on OS / power failure.
My C SQLite code for queries is: (comments/error handling/unrelated parts omitted)
// start transaction
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
// query
sqlite3_stmt *statement = NULL;
int out = sqlite3_prepare_v2(query.c_str(), -1, &statement, NULL);
// bindings
for(size_t x = 0, sz = bindings.size(); x < sz; x++) {
out = sqlite3_bind_text(statement, x+1, bindings[x].text_value.c_str(), bindings[x].text_value.size(), SQLITE_TRANSIENT);
...
}
// execute
out = sqlite3_step(statement);
if (out != SQLITE_OK) {
// should finalize the query no mind the error
if (statement != NULL) {
sqlite3_finalize(statement);
}
}
// end the transaction
sqlite3_exec(db, "END TRANSACTION", NULL, NULL, NULL);
As you see, it's a pretty typical TABLE, records number is small and I'm doing a plain simple UPDATE exactly 10 times. Is there anything else I could do to decrease my UPDATE times? I'm using the latest SQLite 3.16.2.
NOTE: The timings above are coming directly from a single END TRANSACTION query. Queries are done into a simple transaction and i'm
using a prepared statement.
UPDATE:
I performed some tests with transaction enabled and disabled and various updates count. I performed the tests with the following settings:
VACUUM;
PRAGMA synchronous = NORMAL; -- def: FULL
PRAGMA journal_mode = WAL; -- def: DELETE
PRAGMA page_size = 4096; -- def: 1024
The results follows:
no transactions (10 updates)
0.30800 secs (0.0308 per update)
0.30200 secs
0.36200 secs
0.28600 secs
no transactions (100 updates)
2.64400 secs (0.02644 each update)
2.61200 secs
2.76400 secs
2.68700 secs
no transactions (1000 updates)
28.02800 secs (0.028 each update)
27.73700 secs
..
with transactions (10 updates)
0.12800 secs (0.0128 each update)
0.08100 secs
0.16400 secs
0.10400 secs
with transactions (100 updates)
0.088 secs (0.00088 each update)
0.091 secs
0.052 secs
0.101 secs
with transactions (1000 updates)
0.08900 secs (0.000089 each update)
0.15000 secs
0.11000 secs
0.09100 secs
My conclusions are that with transactions there's no sense in time cost per query. Perhaps the times gets bigger with colossal number of updates but i'm not interested in those numbers. There's literally no time cost difference between 10 and 1000 updates on a single transaction. However i'm wondering if this is a hardware limit on my machine and can't do much. It seems i cannot go below ~100 miliseconds using a single transaction and ranging 10-1000 updates, even by using WAL.
Without transactions there's a fixed time cost of around 0.025 seconds.

With such small amounts of data, the time for the database operation itself is insignificant; what you're measuring is the transaction overhead (the time needed to force the write to the disk), which depends on the OS, the file system, and the hardware.
If you can live with its restrictions (mostly, no network), you can use asynchronous writes by enabling WAL mode.

You may still be limited by the time it takes to commit a transaction. In your first example each transaction took about 0.10 to complete which is pretty close to the transaction time for inserting 10 records. What kind of results do you get if you batch 100 or 1000 updates in a single transaction?
Also, SQLite expects around 60 transactions per second on an average hard drive, while you're only getting about 10. Could your disk performance be the issue here?
https://sqlite.org/faq.html#q19

Try adding INDEXEs to your database:
CREATE INDEX IDXname ON player (name)

How to setup Concurrency Thread Group

I have following test plan to test concurrent user load test of a website -
Configuration set as -
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 6
So it's creating confusion, what I am expecting that it will send only 10 requests at a time in 1 second but the result is it sends first 10 request at a time in 1 second and continue sending requests till 60 seconds.
Why it is so?

Keep Hold Target Rate Time to 1 sec to match your expectations.
The graph should reflect the settings you made.
Note: In the graph you shared, it is clearly visible that you kept Hold Target Rate Time to 60 sec (reflected in the graph also) which resulted in 60 seconds execution after ramp-up time.
Reference:
Refer Concurrency ThreadGroup section in the link

as per requirements for simulating 10 requests at a time in 1 second
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 1
Keep Hold Target rate time till you want to run to test.
e.g 1 sec for running test plan for 1 sec, 1 min to run test plan for 1 min.

SoapUI load test, calculate cnt in variance strategy

I work with SoapUI project and I have one question. In following example I've got 505 requests in 5 seconds with thread count =5. I would like to understand how count has been calculated in this example.
For example, if I want 1000 request in 1 minute what setting should I set in variance strategy?
Regards, Evgeniy

variance strategy as the name implies, it varies the number of threads overtime.Within the specified interval the threads will increase and decrease as per the variance value, thus simulating a realistic real time load on target web-service.
How variance is calculated : its not calculated using the mathematical variance formula. its just a multiplication. (if threads = 10 and variance = 0.5 then 10 * 0.5 = 5. The threads will be incremented and decremented by 5)
For example:
Threads = 20
variance = 0.8
Strategy = variance
interval = 60
limit = 60 seconds
the above will vary the thread by 16 (because 20 * 0.8 = 16), that is the thread count will increase to 36 and decrease to 4 and end with the original 20 within the 60 seconds.
if your requirement is to start with 500 threads and hit 1000 set your variance to 2 and so on.
refrence link:
chek the third bullet - simulating different type of load - soapUI site
Book for reference:
Web Service Testing with SoapUi by Charitha kankanamge

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Cost efficiency for AWS Lambda Provisioned Concurrency - amazon-web-services

Related

Google Dataflow Pricing Streaming Mode

How to get current time in CP when using Intervals for scheduling

Improve UPDATE-per-second performance of SQLite?

How to setup Concurrency Thread Group

SoapUI load test, calculate cnt in variance strategy

Categories

Resources