Mounting a volume for AWS Batch - aws-batch

We have some AWS batch processes that run nicely, using images from ECS. We do not assign any volumes or storage, and it seems we get 8gb by default. I'm not actually sure why/where that is defined.
Anyway we now have a situation where we need more space. It's only temporary processing space - we need to extract an archive, convert it, re-compress it and then upload it to S3. We already have this process, it's just that we've now ran out of space in our 8gb allowance.
So; Just to be absolutely sure, how should we go about adding this space? I see a few things about connecting EFS to the instance, is that a good use case? Are there considerations regarding to multiple jobs running at the same time etc? (There are scenarios where this is allowed - since it's a generic unzipper process, that gets called many times).
So the requirement is a throwaway storage volume, that doesn't need to persist, it can disappear once the AWS batch job finishes. The data files that have currently blown it up are 9gb, i'm not sure how much our image itself uses. Alpine linux so presumably not a huge amount.
Or of course, if we can simply tune that initial 8gb up by a couple of gb then we're laughing...

Only info I'm finding is in specifying a custom Launch Template in the Environment definition, and the Launch Template setup is as follows: https://aws.amazon.com/premiumsupport/knowledge-center/batch-mount-efs/

As of now, by default EC2 instances on Batch have 30 GB EBS storage attached. This can be tuned by adding a custom launch template as presented here:
{
"LaunchTemplateName": "increase-container-volume-encrypt",
"LaunchTemplateData": {
"BlockDeviceMappings": [
{
"DeviceName": "/dev/xvda",
"Ebs": {
"Encrypted": true,
"VolumeSize": 100,
"VolumeType": "gp2"
}
}
]
}
}
Then, add it to the compute environment definition:
{
"computeEnvironmentName": "",
"type": "MANAGED",
"state": "ENABLED",
"computeResources": {
"type": "EC2",
"allocationStrategy": "BEST_FIT_PROGRESSIVE",
"minvCpus": 2,
"maxvCpus": 20,
"desiredvCpus": 2,
"instanceTypes": [
"c6i"
],
"imageId": "",
"subnets": [
""
],
"launchTemplate": {
"launchTemplateName": "increase-container-volume-encrypt"
},
]
}
}

Related

Can the cdk.context.json file be auto-generated for a specific CDK environment?

I have inherited a small CDK project that features a cdk.context.json file:
{
"vpc-provider:account=9015xxxxxxxx:filter.tag:Name=preprod-eu-west-1:region=eu-west-1:returnAsymmetricSubnets=true": {
"vpcId": "vpc-0d891xxxxxxxxxxxx",
"vpcCidrBlock": "172.35.0.0/16",
"availabilityZones": [],
"subnetGroups": [
{
"name": "Private",
"type": "Private",
"subnets": [
{
"subnetId": "subnet-0ad04xxxxxxxxxxxx",
"cidr": "172.35.a.0/22",
"availabilityZone": "eu-west-1b",
"routeTableId": "rtb-0fee4xxxxxxxxxxxx"
},
{
"subnetId": "subnet-08598xxxxxxxxxxxx",
"cidr": "172.35.z.0/22",
"availabilityZone": "eu-west-1c",
"routeTableId": "rtb-0f477xxxxxxxxxxxx"
}
]
},
{
"name": "Public",
"type": "Public",
"subnets": [
{
"subnetId": "subnet-0fba3xxxxxxxxxxxx",
"cidr": "172.35.y.0/22",
"availabilityZone": "eu-west-1b",
"routeTableId": "rtb-02dfbxxxxxxxxxxxx"
},
{
"subnetId": "subnet-0a3b8xxxxxxxxxxxx",
"cidr": "172.35.x.0/22",
"availabilityZone": "eu-west-1c",
"routeTableId": "rtb-02dfbxxxxxxxxxxxx"
}
]
}
]
}
}
This works fine for the preprod environment it connects to, but I now need to deploy to production. I could generate the same file by hand, but this feels like a waste of time - I suspect this is a generated file.
I have tried cdk context --build in the hope that it would produce this file, but it seems not - it appears just to echo the preprod one that already exists. What can I do to avoid having to hand-build this (and to avoid all the guessing which VPCs and subnets it wants)?
The file is generated for whatever environment your CDK stacks are being deployed to.
If you want to populate it with values for both your staging environment and production, you should simply run cdk synth after activating your production AWS credentials (using --profile, for example). This will populate the context with new values (in addition to the ones that are already there), and you should commit these changes to VCS.
It is advised to keep the file in your VCS specifically to avoid having it re-populated on each deployment. This makes your CDK app deterministic.
From the best practices:
Commit cdk.context.json to avoid non-deterministic behavior
AWS CDK includes a mechanism called context providers to record a snapshot of non-deterministic values, allowing future synthesis operations produce exactly the same template. The only changes in the new template are the changes you made in your code. When you use a construct's .fromLookup() method, the result of the call is cached in cdk.context.json, which you should commit to version control along with the rest of your code to ensure future executions of your CDK app use the same value.
A better way to organize this would be to make your stacks environment-specific - that is, to instantiate your top-level stack or stage once for each environment you want to deploy to, specifying the env prop. Deploying then becomes cdk deploy MyStaginStack and cdk deploy MyProductionStack. Although this is off-topic for the question.
This file is auto generated per deployment. You should exclude this file from your code repo.

How much of Amazon Connect is scriptable, either through Terraform, Ansible or another approach?

This question concerns AWS Connect, the cloud-based call center. For those people who have been involved in the setup and configuration of AWS Connect, is there any portion of Amazon Connect that is configurable through a continuous integration flow other than any possible Lambda touchpoints. What I am looking for is scripting various functions such as loading exported flows, etc.
Looking at the AWS CLI, I see a number of AWS Connect calls but a majority is getting access to information (https://docs.aws.amazon.com/cli/latest/reference/connect/index.html) but very few that are available to configure portions of AWS Connect.
There is basically nothing at this time. All contact flows must be imported/exported by hand. Other settings (e.g. routing profiles, prompts, etc.) must be re-created manually.
Someone has created a "beta" Connect CloudFormation template though that actually uses puppeteer behind the scenes to automate the import/export process. I imagine that Amazon will eventually support this, because devops is certainly one of the rough edges of the platform right now.
For new people checking this question. Amazon has recently published the APIs you are looking for. create-contact-flow
It uses a JSON-based language specific to Amazon Connect, below is an example:
{
"Version": "2019-10-30",
"StartAction": "12345678-1234-1234-1234-123456789012",
"Metadata": {
"EntryPointPosition": {"X": 88,"Y": 100},
"ActionMetadata": {
"12345678-1234-1234-1234-123456789012": {
"Position": {"X": 270, "Y": 98}
},
"abcdef-abcd-abcd-abcd-abcdefghijkl": {
"Position": {"X": 545, "Y": 92}
}
}
},
"Actions": [
{
"Identifier": "12345678-1234-1234-1234-123456789012",
"Type": "MessageParticipant",
"Transitions": {
"NextAction": "abcdef-abcd-abcd-abcd-abcdefghijkl",
"Errors": [],
"Conditions": []
},
"Parameters": {
"Prompt": {
"Text": "Thanks for calling the sample flow!",
"TextType": "text",
"PromptId": null
}
}
},
{
"Identifier": "abcdef-abcd-abcd-abcd-abcdefghijkl",
"Type": "DisconnectParticipant",
"Transitions": {},
"Parameters": {}
}
]
}
Exporting from the GUI does not produce a JSON in this format. Obviously, a problem with this is keeping a state. I am keeping a close eye on Terraform/CloudFormation/CDK and will update this post if there is any support (that does not use puppeteer).
I think it's doable now; with the newest APIs, you can do many things to script the entire process. There are some issues with the contact flows themself, but I think this will improve over the next few months.
In the meantime, there is some effort to add Amazon Connet to Terraform. here are the issues and the WIP PRs
Github Issue
Service PR
Resource PR

Execute stack after cloudFormation Deploy

I have a ApiGateway made with Serverless-model-application that I made a integration with GitHub via CodePipeline, everything is running fine, the pipeline reads the webhook, builds the buildpsec.yml and deploys the CloudFormation file, creating the updating the stack.
The thing is after the stack is updated it still needs a approval on the console, how can I make the execute on the stack update be auto-run?
It sounds like your pipeline is doing one of two things, unless I'm misunderstanding you:
Making a change set but not executing it in the cloudformation console.
Proceeding to a manual approval step in the pipeline and awaiting your confirmation.
Since #2 is simply solved by removing that step, let's talk about #1.
Assuming you are successfully creating a change set called ChangeSetName, you need a step in your pipeline with the following (cfn JSON template syntax):
"Name": "StepName",
"ActionTypeId": {"Category": "Deploy",
"Owner": "AWS",
"Provider": "CloudFormation",
"Version": "1"
},
"Configuration": {
"ActionMode": "CHANGE_SET_EXECUTE",
"ChangeSetName": {
"Ref": "ChangeSetName"
},
...
Keep the other parameters (e.g. RoleArn) consistent per usual.

Editing cloud formation templates terminates existing instances and creates new instances

We are using a cloud formation template for a set of VMs and each time after the code deployment , we need to edit the package version on the template parameters for auto scaling to take the latest package from the s3 bucket.
The issue is, editing cloud formation template triggers a cloudformation-based upgrade of the instances(which involves destroying the existing machines and creating new ones from scratch, which is time consuming).
Is there anyway we can prevent this.
Basically, we dont need the cloud formation template to destroy and recreate the instances whenever we edit it.?
EDIT : This my autoscaling group setting
"*********":{
"Type":"AWS::AutoScaling::AutoScalingGroup",
"Properties":{
"AvailabilityZones":[
{
"Ref":"PrimaryAvailabilityZone"
}
],
"Cooldown":"300",
"DesiredCapacity":"2",
"HealthCheckGracePeriod":"300",
"HealthCheckType":"EC2",
"LoadBalancerNames":[
{
"Ref":"elbxxbalancer"
}
],
"MaxSize":"8",
"MinSize":"1",
"VPCZoneIdentifier":[
{
"Ref":"PrivateSubnetId"
}
],
"Tags":[
{
"Key":"Name",
"Value":"my-Server",
"PropagateAtLaunch":"true"
},
{
"Key":"VPCRole",
"Value":{
"Ref":"VpcRole"
},
"PropagateAtLaunch":"true"
}
],
"TerminationPolicies":[
"Default"
],
"LaunchConfigurationName":{
"Ref":"xxlaunch"
}
},
"CreationPolicy":{
"ResourceSignal":{
"Timeout":"PT10M",
"Count":"1"
}
},
"UpdatePolicy":{
"AutoScalingRollingUpdate":{
"MinInstancesInService":"1",
"MaxBatchSize":"1",
"PauseTime":"PT10M",
"WaitOnResourceSignals":"true"
}
}
},
You can look at the documentation and view the Update requires: field on the property you modifying on your CF template.
If it says Replacement it will recreate the instance, with a new logical id
If it says Some Interruption it will make the instance unavailable, in the ec2 case, restarting it, but will not recreate the instance, keeping the same logical id
If it says No interruption it will not impact the instance at all
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-instance.html
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html

Want to server-side encrypt S3 data node file created by ShellCommandActivity

I created a ShellCommandActivity with stage = "true". Shell command creates a new file and store it in ${OUTPUT1_STAGING_DIR}. I want this new file to be server-side encrypted in S3.
According to document all files created in s3 data node are server-side encrypted by default. But after my pipeline completes an un-encrypted file gets created in s3. I tried setting s3EncryptionType as SERVER_SIDE_ENCRYPTION explicitly in S3 datanode but that doesn't help either. I want to encrypt this new file.
Here is relevant part of pipeline:
{
"id": "DataNodeId_Fdcnk",
"schedule": {
"ref": "DefaultSchedule"
},
"directoryPath": "s3://my-bucket/test-pipeline",
"name": "s3DataNode",
"s3EncryptionType": "SERVER_SIDE_ENCRYPTION",
"type": "S3DataNode"
},
{
"id": "ActivityId_V1NOE",
"schedule": {
"ref": "DefaultSchedule"
},
"name": "FileGenerate",
"command": "echo 'This is a test' > ${OUTPUT1_STAGING_DIR}/foo.txt",
"workerGroup": "my-worker-group",
"output": {
"ref": "DataNodeId_Fdcnk"
},
"type": "ShellCommandActivity",
"stage": "true"
}
Short answer: Your pipeline definition looks correct. You need to ensure you're running the latest version of the Task Runner. I will try to reproduce your issue and let you know.
P.S. Let's keep conversation within a single thread here or in AWS Data Pipeline forums to avoid confusion.
Answer on official AWS Data Pipeline Forum page
This issue is resolved when I downloaded new TaskRunner-1.0.jar. I was running older version.