Let's say I have task definition on AWS ECS and want to schedule it to run as multiple instances with different env variables (~20 parallel tasks). I have some ideas how to do that, but not sure which one is correct.
Create multiple task definitions with different env variables, which sounds not really effective and silly
Create multiple targets for scheduled tasks, but its number is limited to 5 ones, which doesn't work for me.
Create 20 container overrides, but I didn't found a way how to do it using the user interface, and not sure it's correct too.
Could you please give me an idea? Thanks
Related
I am trying to build a multi-node parallel job in AWS Batch running an R script. My R script runs independently multiple statistical models for multiple users. Hence, I want to split and distribute this job running on parallel on a cluster of several servers for faster execution. My understanding is that at some point I have to prepare a containerized version of my R-application code using a Dockerfile pushed to ECR. My question is:
The parallel logic should be placed inside the R code, while using 1 Dockerfile? If yes, how does Batch know how to split my job (in how many chunks) ?? Is the for-loop in the Rcode enough?
or I should define the parallel logic somewhere in the Dockerfile saying that: container1 run the models for user1-5, container2 run
the models for user6-10, etc.. ??
Could you please share some ideas or code on that topic for better understanding? Much appreciated.
AWS Batch does not inspect or change anything in your container, it just runs it. So you would need to handle the distribution of the work within the container itself.
Since these are independent processes (they don't communicate with each other over MPI, etc) you can leverage AWS Batch Array Jobs. Batch MNP jobs are for tightly-coupled workloads that need that inter-instance or inter-GPU communication using Elastic Fabric Adapter.
Your application code in the container can leverage the AWS_BATCH_JOB_ARRAY_INDEX environment variable to process a subset of users. AWS_BATCH_JOB_ARRAY_INDEX starts with 0 so you would need to account for that.
You can see an example in the AWS Batch docs for how to use the index.
Note that AWS_BATCH_JOB_ARRAY_INDEX is zero based, so you will need to account for that if your user numbering / naming scheme is different.
This is the only doc i have found for Task Group and it doesn't explain how or where to create one.
I can't find any docs that adequately explain what a Task Group actually is with an example of how to create and use one. It sound like its a way for a service to run multiple different Task Definitions which would be useful to me.
For example, I added a container to a task definition and the service is balancing multiple instances of it on the cluster. But I have another container I want to deploy along with the first one, but I only want a single instance of it to run. So I can't add it to the same task definition because I'd be creating multiple instances of it and consuming unnecessary resources. Seems like this is what Task Groups are for.
You are indeed correct, there exists no proper documentation on this (I opened a support case with our AWS team to verify!).
However, all is not lost. A solution to your conundrum does indeed exist, and is a solution we use every day. You don't have to use the task group, whatever that is (since we don't actually know yet (AWS engineer is writing up some docs for me, will post them here when I get them)).
All you need though are placement constraints (your same doc), which are easy enough to setup. If you have a launch configuration, you can add something like this to the Advanced > User Data section, so that it gets run during boot (or just add it when launching your instance manually (or if you're feeling exceptionally hacky, you can logon to your instance and run the commands manually.. for science and stuff)):
echo ECS_INSTANCE_ATTRIBUTES={\"env\": \"prod\",\"primary\": \"app1\",\"secondary\": \"app2\"} >> /etc/ecs/ecs.config
Everything in quotes is arbitrarily defined by you, so use whatever tags and values make sense for your use case. If you go this route, make sure you add the following line to your docker launch command: --env-file=/etc/ecs/ecs.config
So now that you have an instance that's properly tagged (and make sure it's only the single instance you want (which means you probably need a dedicated launch configuration for this specific type of instance)), you can go ahead and create your ECS service like you were wanting to do. However, make sure you setup your Task Placement correctly, to match the roles that are now configured for your instances:
So for the example above, this service is configured to only launch this task on instances that are configured for both env==prod and secondary==app2 -- since your other two instances aren't configured for secondary==app2, they're not allowed to host this task.
It can be confusing at first, and took us a while to get right, but I hope this helps!
Response from AWS Support
I looked into the procedure how to use the Task Groups and here were my findings: - The assumption is that you already have a task group named "databases" if you had existing tasks launched from RunTask/StartTask API.
When you launch a task using the RunTask or StartTask action, you can specify the name of the task group for the task. If you don't specify a task group for the task, the default name is the family name of the task definition (for example, family:my-task-definition) - So to create a Task Group, either you define a TaskGroup (say webserver) while creating a Task on Task Console or use following command : $ aws ecs run-task --cluster <ecs-cluster> --task-definition taskGroup-td:1 --group webserver
Once created you will notice a Task running with a group: webserver.
Now you can use following placement constraints with the Task Definition to place your tasks only on the containers that are running tasks with this Task Group.
"placementConstraints":
[
{
"expression": "task:group == webserver", "type": "memberOf"
}
]
If you try to run a task with above placementConstraint, but you do not have any task running with taskGroup : webserver, you will receive following error: Run tasks failed Reasons : ["memberOf constraint unsatisfied"].
References: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html
We are using queue based managed instance scaling. We need to setup environment variables on the VMs by instance-groups (so that same VM image can be used to subscribe to different queues in different instance-groups). I don’t see the option to define an environment variables when I create an instance-group.
Is there a way to use the same image across multiple instance groups and still achieve different VM behavior based on either different environment variable at instance group level or some other way?
Example: Create 2 managed-instance-groups with the same VM image. One has environment variable 'queue-name' set to 'queue-1' and the other has 'queue-name' set to 'queue-2'. The application deployed to the VMs in first instance-group pulls tasks from pub/sub queue 'queue-1' and on other group pulls tasks from 'queue-1'.
Using two templates same VM image
In order to create two instances groups with the same VM image having a different behaviour you can definitely make use of two different instance templates.
In this way you will be able to change networking configuration, startup and shutdown scripts or metadata.
For example you could make use of a startup script to set up the different environment variables and in this way connect to one. Fort example like here.
Using same Template same VM image
On the other hand if you cannot make use of two different templates I would propose a small hack but I guess there are several ways to do it.
As you noticed there is no a direct way to do it (since there is the possibility to customise already in the template creation).
I would add in the startup script a small portion of code that making use of the gcloud command understand which is the name of the instance group it belongs to and basing itself on this info set up in different ways an environment variable.
In this way you will need merely to follow some kind of pattern naming your instances, but I am sure you can find more elegant solution.
Or you could even base your decision on the hostname of the machine (but I like this solution even less).
Is there a way to add additional templates to the 'default' EC2 scheduler https://aws.amazon.com/answers/infrastructure-management/ec2-scheduler/
so say i want two separate functions/tags
start VM # 8am on a weekday
stop VM # 8pm on a weekday
There's a bit of confusion where I work with VMs not starting up because we only have a stopVM schedule, and custom start tag values are being setup wrong or not at all
going by the doco it seems like i need to set up one or the other (or a single instance that starts and stops VMs)
then use custom values in the individual tags of the VMs to assign a custom value to the start or stop time
but i want something simpler
eg add Ec2Scheduler:startAt8 - true
Ec2Scheduler:StopAt8 - true
do i need to have 2 instances of the scheduler running or can i add another row to the DynamoDB db?
The doco.pdf is not very good at explaining this.
I would suggest to use tool that have simple scheduler like CloudTiming, but unfortunately it's not free but pretty cheap. You can setup schedule for any resources in any region. From my perspective Amazon's interface is not so user-friendly for such simple action.
I've followed the "Getting Started Guide", the Two-Shards / Two-Replicas secnario and everything worked perfectly.
Then I started using the Collections API which is the preferable way of managing collection,shards and replication.
I launched two instances locally (afterwards with AWS, same problem)
I created a new collection with two shards using the following command:
/admin/collections?action=CREATE&name=collection1&numShards=2&collection.configName=collection
This successfully created two shards, one on each instance.
Then I launched another instance, expecting it to automatically set itself as a replica for the first shard, just like in the example. That didn't happen.
Is there something I'm missing?
There were two ways I was able to achieve this:
Manually, using the Collections API, I added a replica to shard1 and then another to shard2.
This is not good enough, as I will need to have this done automatically with Auto Scaling, so I'll need to micro-manage each server "role" - which replicas of which shards of which collections its handling, which complicates things a lot
The second way, which I couldn't find a documentation for is to launch an instance with a folder named "collectionX", inside a file named core.properties. In it the following line:
collection=collection1
Each instance I launched that way was automatically added as a replica in a round-robin way.
(Also working with several collections)
That's actually not a bad way at all, as I can pass parameter when I launch an AMI/instance in AWS.
Thanks everyone.
Amir
1) You are running the wrong command; the complete command is as follows:
curl 'http://localhost:8080/solr/admin/collections?action=CREATE&name=corename&numShards=2&replicationFactor=2&maxShardsPerNode=10'
Here I have given replication factor and due to which it will create the replica for your shards.