I created a ShellCommandActivity with stage = "true". Shell command creates a new file and store it in ${OUTPUT1_STAGING_DIR}. I want this new file to be server-side encrypted in S3.
According to document all files created in s3 data node are server-side encrypted by default. But after my pipeline completes an un-encrypted file gets created in s3. I tried setting s3EncryptionType as SERVER_SIDE_ENCRYPTION explicitly in S3 datanode but that doesn't help either. I want to encrypt this new file.
Here is relevant part of pipeline:
{
"id": "DataNodeId_Fdcnk",
"schedule": {
"ref": "DefaultSchedule"
},
"directoryPath": "s3://my-bucket/test-pipeline",
"name": "s3DataNode",
"s3EncryptionType": "SERVER_SIDE_ENCRYPTION",
"type": "S3DataNode"
},
{
"id": "ActivityId_V1NOE",
"schedule": {
"ref": "DefaultSchedule"
},
"name": "FileGenerate",
"command": "echo 'This is a test' > ${OUTPUT1_STAGING_DIR}/foo.txt",
"workerGroup": "my-worker-group",
"output": {
"ref": "DataNodeId_Fdcnk"
},
"type": "ShellCommandActivity",
"stage": "true"
}
Short answer: Your pipeline definition looks correct. You need to ensure you're running the latest version of the Task Runner. I will try to reproduce your issue and let you know.
P.S. Let's keep conversation within a single thread here or in AWS Data Pipeline forums to avoid confusion.
Answer on official AWS Data Pipeline Forum page
This issue is resolved when I downloaded new TaskRunner-1.0.jar. I was running older version.
Related
I have inherited a small CDK project that features a cdk.context.json file:
{
"vpc-provider:account=9015xxxxxxxx:filter.tag:Name=preprod-eu-west-1:region=eu-west-1:returnAsymmetricSubnets=true": {
"vpcId": "vpc-0d891xxxxxxxxxxxx",
"vpcCidrBlock": "172.35.0.0/16",
"availabilityZones": [],
"subnetGroups": [
{
"name": "Private",
"type": "Private",
"subnets": [
{
"subnetId": "subnet-0ad04xxxxxxxxxxxx",
"cidr": "172.35.a.0/22",
"availabilityZone": "eu-west-1b",
"routeTableId": "rtb-0fee4xxxxxxxxxxxx"
},
{
"subnetId": "subnet-08598xxxxxxxxxxxx",
"cidr": "172.35.z.0/22",
"availabilityZone": "eu-west-1c",
"routeTableId": "rtb-0f477xxxxxxxxxxxx"
}
]
},
{
"name": "Public",
"type": "Public",
"subnets": [
{
"subnetId": "subnet-0fba3xxxxxxxxxxxx",
"cidr": "172.35.y.0/22",
"availabilityZone": "eu-west-1b",
"routeTableId": "rtb-02dfbxxxxxxxxxxxx"
},
{
"subnetId": "subnet-0a3b8xxxxxxxxxxxx",
"cidr": "172.35.x.0/22",
"availabilityZone": "eu-west-1c",
"routeTableId": "rtb-02dfbxxxxxxxxxxxx"
}
]
}
]
}
}
This works fine for the preprod environment it connects to, but I now need to deploy to production. I could generate the same file by hand, but this feels like a waste of time - I suspect this is a generated file.
I have tried cdk context --build in the hope that it would produce this file, but it seems not - it appears just to echo the preprod one that already exists. What can I do to avoid having to hand-build this (and to avoid all the guessing which VPCs and subnets it wants)?
The file is generated for whatever environment your CDK stacks are being deployed to.
If you want to populate it with values for both your staging environment and production, you should simply run cdk synth after activating your production AWS credentials (using --profile, for example). This will populate the context with new values (in addition to the ones that are already there), and you should commit these changes to VCS.
It is advised to keep the file in your VCS specifically to avoid having it re-populated on each deployment. This makes your CDK app deterministic.
From the best practices:
Commit cdk.context.json to avoid non-deterministic behavior
AWS CDK includes a mechanism called context providers to record a snapshot of non-deterministic values, allowing future synthesis operations produce exactly the same template. The only changes in the new template are the changes you made in your code. When you use a construct's .fromLookup() method, the result of the call is cached in cdk.context.json, which you should commit to version control along with the rest of your code to ensure future executions of your CDK app use the same value.
A better way to organize this would be to make your stacks environment-specific - that is, to instantiate your top-level stack or stage once for each environment you want to deploy to, specifying the env prop. Deploying then becomes cdk deploy MyStaginStack and cdk deploy MyProductionStack. Although this is off-topic for the question.
This file is auto generated per deployment. You should exclude this file from your code repo.
This question concerns AWS Connect, the cloud-based call center. For those people who have been involved in the setup and configuration of AWS Connect, is there any portion of Amazon Connect that is configurable through a continuous integration flow other than any possible Lambda touchpoints. What I am looking for is scripting various functions such as loading exported flows, etc.
Looking at the AWS CLI, I see a number of AWS Connect calls but a majority is getting access to information (https://docs.aws.amazon.com/cli/latest/reference/connect/index.html) but very few that are available to configure portions of AWS Connect.
There is basically nothing at this time. All contact flows must be imported/exported by hand. Other settings (e.g. routing profiles, prompts, etc.) must be re-created manually.
Someone has created a "beta" Connect CloudFormation template though that actually uses puppeteer behind the scenes to automate the import/export process. I imagine that Amazon will eventually support this, because devops is certainly one of the rough edges of the platform right now.
For new people checking this question. Amazon has recently published the APIs you are looking for. create-contact-flow
It uses a JSON-based language specific to Amazon Connect, below is an example:
{
"Version": "2019-10-30",
"StartAction": "12345678-1234-1234-1234-123456789012",
"Metadata": {
"EntryPointPosition": {"X": 88,"Y": 100},
"ActionMetadata": {
"12345678-1234-1234-1234-123456789012": {
"Position": {"X": 270, "Y": 98}
},
"abcdef-abcd-abcd-abcd-abcdefghijkl": {
"Position": {"X": 545, "Y": 92}
}
}
},
"Actions": [
{
"Identifier": "12345678-1234-1234-1234-123456789012",
"Type": "MessageParticipant",
"Transitions": {
"NextAction": "abcdef-abcd-abcd-abcd-abcdefghijkl",
"Errors": [],
"Conditions": []
},
"Parameters": {
"Prompt": {
"Text": "Thanks for calling the sample flow!",
"TextType": "text",
"PromptId": null
}
}
},
{
"Identifier": "abcdef-abcd-abcd-abcd-abcdefghijkl",
"Type": "DisconnectParticipant",
"Transitions": {},
"Parameters": {}
}
]
}
Exporting from the GUI does not produce a JSON in this format. Obviously, a problem with this is keeping a state. I am keeping a close eye on Terraform/CloudFormation/CDK and will update this post if there is any support (that does not use puppeteer).
I think it's doable now; with the newest APIs, you can do many things to script the entire process. There are some issues with the contact flows themself, but I think this will improve over the next few months.
In the meantime, there is some effort to add Amazon Connet to Terraform. here are the issues and the WIP PRs
Github Issue
Service PR
Resource PR
I have a ApiGateway made with Serverless-model-application that I made a integration with GitHub via CodePipeline, everything is running fine, the pipeline reads the webhook, builds the buildpsec.yml and deploys the CloudFormation file, creating the updating the stack.
The thing is after the stack is updated it still needs a approval on the console, how can I make the execute on the stack update be auto-run?
It sounds like your pipeline is doing one of two things, unless I'm misunderstanding you:
Making a change set but not executing it in the cloudformation console.
Proceeding to a manual approval step in the pipeline and awaiting your confirmation.
Since #2 is simply solved by removing that step, let's talk about #1.
Assuming you are successfully creating a change set called ChangeSetName, you need a step in your pipeline with the following (cfn JSON template syntax):
"Name": "StepName",
"ActionTypeId": {"Category": "Deploy",
"Owner": "AWS",
"Provider": "CloudFormation",
"Version": "1"
},
"Configuration": {
"ActionMode": "CHANGE_SET_EXECUTE",
"ChangeSetName": {
"Ref": "ChangeSetName"
},
...
Keep the other parameters (e.g. RoleArn) consistent per usual.
We have deployed a PowerBI embedded workspace collection with the following really simple arm template
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {},
"variables": {},
"resources": [
{
"comments": "Test Power BI workspace collection",
"apiVersion": "2016-01-29",
"type": "Microsoft.PowerBI/workspaceCollections",
"location": "westeurope",
"sku": {
"name": "S1",
"tier": "Standard"
},
"name": "myTestPowerBiCollection",
"tags": {
"displayNmae": "Test Power BI workspace collection"
}
}
],
"outputs": {}
}
For deployment we used the well known Powershell command New-AzureRmResourceGroupDeployment After the creation if we try to execute the command again it fails with the following message
New-AzureRmResourceGroupDeployment : Resource Microsoft.PowerBI/workspaceCollections 'myTestPowerBiCollection' failed with message
{
"error": {
"code": "BadRequest",
"message": ""
}
}
If we delete the collection and execute again succeeds without a problem. I tried with both the options for the -Mode parameter (Incremental, Complete) and didn't help, even though Incremental is the default option.
This is a major issue for us as we want to provision the collection as part of our Continuous Delivery and we execute this several times.
Any ideas on how to bypass this problem?
As you mentioned , if PowerBI Workspace Collection name is existed, it will throw expection when we try to deploy the PowerBI Workspace Collection again.
If it is possible to add customized logical code, we could use Get-AzureRmPowerBIWorkspaceCollection to check whether PowerBI Workspace Collection is existed. If it is existed, it will return PowershellBIworkspaceCollection object, or will throw not found exception.
We also could use Remove-AzureRmPowerBIWorkspaceCollection command to remove PowerBI Workspace Collection. If PowerBI workspace Connection is existed we could skip to deploy or delete and renew it according to our logic.
I have just created an account on Amazon AWS and I am going to use DATAPIPELINE to schedule my queries. Is it possible to run multiple complex SQL queries from .sql file using SQLACTIVITY of data pipeline?
My overall objective is to process the raw data from REDSHIFT/s3 using sql queries from data pipeline and save it to s3. Is it the feasible way to go?
Any help in this regard will be appreciated.
Yes, if you plan on moving the data from Redshift to S3, you need to do an UNLOAD command found here: http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html
The input of your sql queries will be a single DATA Node and Output will be a single data file. Data pipeline provide only one "Select query" field in which you will write your extraction/transformation query. I don't think there are any use case of multiple queries file.
However if you want to make your pipeline configurable ,you can make your pipeline configurable by adding "parameters" and values objects in your pipeline definition JSON.
{
"objects":[
{
"selectQuery":"#{myRdsSelectQuery}"
}
],
"parameters":[
{
"description":"myRdsSelectQuery",
"id":"myRdsSelectQuery",
"type":"String"
}
],
"values":{
"myRdsSelectQuery":"Select Query"
}
}
If you want to execute and schedule multiple sql script , you can do with ShellCommandActivity.
I managed to execute a script with multiple insert statements with following AWS datapipeline configuration:
{
"id": "ExecuteSqlScript",
"name": "ExecuteSqlScript",
"type": "SqlActivity",
"scriptUri": "s3://mybucket/inserts.sql",
"database": { "ref": "rds_mysql" },
"runsOn": { "ref": "Ec2Instance" }
}, {
"id": "rds_mysql",
"name": "rds_mysql",
"type": "JdbcDatabase",
"username": "#{myUsername}",
"*password": "#{*myPassword}",
"connectionString" : "#{myConnStr}",
"jdbcDriverClass": "com.mysql.jdbc.Driver",
"jdbcProperties": ["allowMultiQueries=true","zeroDateTimeBehavior=convertToNull"]
},
It is important to allow the MySql driver to execute multiple queries with allowMultiQueries=true and the script s3 path is provided by scriptUri