How to get current time in CP when using Intervals for scheduling - scheduling

I am trying to schedule tasks in different machines. These machines have dynamique available ressources, for example:
machine 1: max capacity 4 core.
At T=t1 => available CPU = 2 core;
At T=t2 => available CPU = 1 core;
Each interval has a fixed time (Ex: 1 minute).
So in CPLEX, I have a cumulFunction to sum the used ressource in a machine :
cumulFunction cumuls[host in Hosts] =
sum(job in Jobs) pulse(itvs[task][host], requests[task]);
Now the problem is in the constraint:
forall(host in Hosts) {
cumuls[host] <= ftoi(available_res_function[host](**<<Current Period>>**));
}
I can't find a way to get the current period so that I could compare the used ressources to the available in that specefic period.
PS: available_res_function is a stepFunction of the available ressources.
Thank you so much for your help.

What you can do is to add a set of pulse in your cumul function.
For instance, in the sched_cumul function you could change:
cumulFunction workersUsage =
sum(h in Houses, t in TaskNames) pulse(itvs[h][t],1);
into
cumulFunction workersUsage =
sum(h in Houses, t in TaskNames) pulse(itvs[h][t],1)+pulse(1,40,3);
if you want to mention that 3 workers less are available between time 1 and 40.

Related

Cost efficiency for AWS Lambda Provisioned Concurrency

I'm running a system with lots of AWS Lambda functions. Our load is not huge, let's say a function gets 100k invocations per month.
For quite a few of the Lambda functions, we're using warmup plugins to reduce cold start times. This is effectively a CloudWatch event triggered every 5 minutes to invoke the function with a dummy event which is ignored, but keeps that Lambda VM running. In most cases, this means one instance will be "warm".
I'm now looking at the native solution to the cold start problem: AWS Lambda Concurrent Provisioning, which at first glance looks awesome, but when I start calculating, either I'm missing something, or this will simply be a large cost increase for a system with only medium load.
Example, with prices from the eu-west-1 region as of 2020-09-16:
Consider function RAM M (GB), average execution time t (s), million requests per month N, provisioned concurrency C ("number of instances"):
Without provisioned concurrency
Cost per month = N⋅(16.6667⋅M⋅t + 0.20)
= $16.87 per million requests # M = 1 GB, t = 1 s
= $1.87 per million requests # M = 1 GB, t = 100 ms
= $1.69 per 100.000 requests # M = 1 GB, t = 1 s
= $1686.67 per 100M requests # M = 1GB, t = 1 s
With provisioned concurrency
Cost per month = C⋅0.000004646⋅M⋅60⋅60⋅24⋅30 + N⋅(10.8407⋅M⋅t + 0.20) = 12.04⋅C⋅M + N(10.84⋅M⋅t + 0.20)
= $12.04 + $11.04 = $23.08 per million requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1.28 = $13.32 per million requests # M = 1 GB, t = 100 ms, C = 1
= $12.04 + $1.10 = $13.14 per 100.000 requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1104.07 = $1116.11 per 100M requests # M = 1 GB, t = 1 s, C = 1
There are obviously several factors to take into account here:
How many requests per month is expected? (N)
How much RAM does the function need? (M)
What is the average execution time? (t)
What is the traffic pattern, few small bursts or even traffic (might mean C is low, high, or must be dynamically changed to follow peak hours etc)
In the end though, my initial conclusion is that Provisioned Concurrency will only be a good deal if you have a lot of traffic? In my example, at 100M requests per month there's a substantial saving (however, at that traffic it's perhaps likely that you would need a higher value of C as well; break-even at about C = 30). Even with C = 1, you need almost a million requests per month to cover the static costs.
Now, there are obviously other benefits of using the native solution (no ugly dummy events, no log pollution, flexible amount of warm instances, ...), and there are also probably other hidden costs of custom solutions (CloudWatch events, additional CloudWatch logging for dummy invocations etc), but I think they are pretty much neglectible.
Is my analysis fairly correct or am I missing something?
I think about provisioned concurrency as something that eliminates the cold starts and not something that saves money. There is a bit of saving if you can keep the lambda function running all the time (100%) utilization, but as you've calculated it becomes quite expensive when the provisioned capacity sits idle.

Aerospike error: All batch queues are full

I am running an Aerospike cluster in Google Cloud. Following the recommendation on this post, I updated to the last version (3.11.1.1) and re-created all servers. In fact, this change cause my 5 servers to operate in a much lower CPU load (it was around 75% load before, now it is on 20%, as show in the graph bellow:
Because of this low load, I decided to reduce the cluster size to 4 servers. When I did this, my application started to receive the following error:
All batch queues are full
I found this discussion about the topic, recommending to change the parameters batch-index-threads and batch-max-unused-buffers with the command
asadm -e "asinfo -v 'set-config:context=service;batch-index-threads=NEW_VALUE'"
I tried many combinations of values (batch-index-threads with 2,4,8,16) and none of them solved the problem, and also changing the batch-index-threads param. Nothing solves my problem. I keep receiving the All batch queues are full error.
Here is my aerospace.conf relevant information:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 4
batch-index-threads 40
proto-fd-max 15000
batch-max-requests 30000
replication-fire-and-forget true
}
I use 300GB SSD disks on these servers.
A quick note which may or may not pertain to you:
A common mistake we have seen in the past is that developers decide to use 'batch get' as a general purpose 'get' for single and multiple record requests. The single record get will perform better for single record requests.
It's possible that you are being constrained by the network between the clients and servers. Reducing from 5 to 4 nodes reduced the aggregate pipe. In addition, removing a node will start cluster migrations which adds additional network load.
I would look at the batch-max-buffer-per-queue config parameter.
Maximum number of 128KB response buffers allowed in each batch index
queue. If all batch index queues are full, new batch requests are
rejected.
In conjunction with raising this value from the default of 255 you will want to also raise the batch-max-unused-buffers to batch-index-threads x batch-max-buffer-per-queue + 1 (at least). If you do not do that new buffers will be created and destroyed constantly, as the amount of free (unused) buffers is smaller than the ones you're using. The moment the batch response is served the system will strive to trim the buffers down to the max unused number. You will see this reflected in the batch_index_created_buffers metric constantly rising.
Be aware that you need to have enough DRAM for this. For example if you raise the batch-max-buffer-per-queue to 320 you will consume
40 (`batch-index-threads`) x 320 (`batch-max-buffer-per-queue`) x 128K = 1600MB
For the sake of performance the batch-max-unused-buffers should be set to 13000 which will have a max memory consumption of 1625MB (1.59GB) per-node.

AWS EMR Parallel Mappers?

I am trying to determine how many nodes I need for my EMR cluster. As part of best practices the recommendations are:
(Total Mappers needed for your job + Time taken to process) / (per instance capacity + desired time) as outlined here: http://www.slideshare.net/AmazonWebServices/amazon-elastic-mapreduce-deep-dive-and-best-practices-bdt404-aws-reinvent-2013, page 89.
The question is how to determine how many parallel mappers the instance will support since AWS don't publish? https://aws.amazon.com/emr/pricing/
Sorry if i missed something obvious.
Wayne
To determine the number of parallel mappers , you will need to check this documentation from EMR called Task Configuration where EMR had a predefined mapping set of configurations for every instance type which would determine the number of mappers/reducers.
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html
For example : Lets say you have 5 m1.xlarge core nodes. According to the default mapred-site.xml configuration values for that instance type from EMR docs, we have
mapreduce.map.memory.mb = 768
yarn.nodemanager.resource.memory-mb = 12288
yarn.scheduler.maximum-allocation-mb = 12288 (same as above)
You can simply divide the later with former setting to get the maximum number of mappers supported by one m1.xlarge node = (12288/768) = 16
So, for the 5 node cluster , it would a max of 16*5 = 80 mappers that can run in parallel (considering a map only job). The same is the case with max parallel Reducers(30). You can do similar math for a combination of mappers and reducers.
So, If you want to run more mappers in parallel , you can either re-size the cluster or reduce the mapreduce.map.memory.mb(and its heap mapreduce.map.java.opts) on every node and restart NM to
To understand what the above mapred-site.xml properties mean and why you do need to do those calculations , you can refer it here :
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Note : The above calculations and statements are true if EMR stays in its default configuration using YARN capacity scheduler with DefaultResourceCalculator. If for example , you configure your capacity scheduler to use DominantResourceCalculator, it will consider VCPU's + Memory on every nodes (not just memory's) to decide on parallel number of mappers.

How to setup Concurrency Thread Group

I have following test plan to test concurrent user load test of a website -
Configuration set as -
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 6
So it's creating confusion, what I am expecting that it will send only 10 requests at a time in 1 second but the result is it sends first 10 request at a time in 1 second and continue sending requests till 60 seconds.
Why it is so?
Keep Hold Target Rate Time to 1 sec to match your expectations.
The graph should reflect the settings you made.
Note: In the graph you shared, it is clearly visible that you kept Hold Target Rate Time to 60 sec (reflected in the graph also) which resulted in 60 seconds execution after ramp-up time.
Reference:
Refer Concurrency ThreadGroup section in the link
as per requirements for simulating 10 requests at a time in 1 second
Target Concurrency = 10
Ramp up Time = 1
Ramp up step count = 1
Hold Target rate time = 1
Keep Hold Target rate time till you want to run to test.
e.g 1 sec for running test plan for 1 sec, 1 min to run test plan for 1 min.

AKKA thread pool configuration

Based on my usecase, I would like to create an Akka Actor per item id. This way each item action is executed by one and one thread/actor but can do concurrent items at the same time. But I have stumbled upon how to configure the number of threads for my system since I may have 1000 of items coming in at the same time and would like to use some max number of threads say 20 to service the 1000 items. Is this possible with Akka.
Thanks,
Ravi
Based on the Docs, you should try these parameters on your akka config:
# This will be used if you have set "executor = "fork-join-executor""
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 8
# The parallelism factor is used to determine thread pool size using the
# following formula: ceil(available processors * factor). Resulting size
# is then bounded by the parallelism-min and parallelism-max values.
parallelism-factor = 3.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 64
}
I'm assuming you have not changed your default-executor parameter because if you had you would already know where to look.