how to configure AWS Elastic Transcoder - amazon-web-services

I'm trying to use AWS's Elastic Transcoder to implement http live streaming for an iPad app. Suppose that I have an output bucket called "output". I want Elastic Transcoder to decode a video and to put the .ts files for each hls output in their own folder, inside a folder called "camera", inside a folder called "tutorials". The resulting directory structure would look like:
output/tutorials/camera/hls20M/.ts
output/tutorials/camera/hls15M/.ts
output/tutorials/camera/hls10M/*.ts
etc.
The master playlist would go in the /camera folder: output/tutorials/camera/index.m3u8
I'm having trouble figuring out how to set up the "output key prefix" and the "output key" in my job in order to achieve this structure.

I think this is the gist of it:
CreateJob
{
...
"Outputs": [
{
"Key": "hls20M/fileName"
},
{
"Key": "hls15M/fileName"
},
{
"Key": "hls10M/fileName"
}
],
"OutputKeyPrefix": "output/tutorials/camera/",
"Playlists": [
{
"Name": "index"
}
]
}
All outputs (including the master playlist) are prefixed by the OutputKeyPrefix. Then you put each output under the desired subfolder within that.

You basically need to do something like this:
elastic_transcoder.create_job(pipeline_id=PIPELINE_ID,input_name=input_obj
ect,outputs=output_objects)
#where output_objects is as under:
output_objects = [
{
'Key': '%s/video/%s_1080.mp4'%(project_name,video_id),
'PresetId': '1351620000001-000001',
'Rotate': 'auto',
'ThumbnailPattern': '',
},
{
'Key': '%s/video/%s_720.mp4'%(project_name,video_id),
'PresetId': '1351620000001-000010',
'Rotate': 'auto',
'ThumbnailPattern': '',
},
{
'Key': '%s/video/%s_480.mp4'%(project_name,video_id),
'PresetId': '1351620000001-000020',
'Rotate': 'auto',
'ThumbnailPattern': '',
},
{
'Key': '%s/video/%s_360.mp4'%(project_name,video_id),
'PresetId': '1351620000001-000040',
'Rotate': 'auto',
'ThumbnailPattern': '',
}
]
Moreover the preset_id mentioned here are for different versions one outputs one of them could be your ipad version too.
for a detailed outlook how to setup output and input check this post It explains in detail about the whole process.Hope it helps someone

Related

How to use CDK fine Grain Assertions Against DefinitionString in AWS::StepFunctions::StateMachine?

Following the approach for finely grained assertions, specifically the example against AWS::StepFunctions::StateMachine in
https://docs.aws.amazon.com/cdk/v2/guide/testing.html#testing_fine_grained, I can't get the use of Match.serializedJson() to work for my own State Machine definition.
Here is a snippet of my generated CloudFormation template -
"StateMachine985JK630": {
"Type": "AWS::StepFunctions::StateMachine",
"Properties": {
"RoleArn": {
"Fn::GetAtt": [
"StateMachineRoleJ4IK852F",
"Arn"
]
},
"DefinitionString": {
"Fn::Join": [
"",
[
"{\"StartAt\":\"My Task One\",\"States\":{\"My Task One \":{\"Next\":\"My Task Two\",\"Retry\":[{\"ErrorEquals\":[\"Lambda.ServiceException\",\"Lambda.AWSLambdaException\",\"Lambda.SdkClientException\"],\"IntervalSeconds\":2,\"MaxAttempts\":6,\"BackoffRate\":2}],\"Type\":\"Task\",\"ResultPath\":\"$.truNarrativeAuth\",\"ResultSelector\":{\"authString.$\":\"$.Payload\"},\"Resource\":\"arn:",
{
"Ref": "AWS::Partition"
},
":states:::lambda:invoke\",\"Parameters\":{\"FunctionName\":\"",
{
"Fn::GetAtt": [
"MyTaskOne",
"Arn"
]
},
"\",\"Payload.$\":\"$\"}},\"My Task Two\":{\"Next\":\"My Task Three\",\"Retry\"
...
]
...
The test against DefinitionString in the example in the documents doesn't handle the Arrays or Fn::Join property seen in the above CloudFormation template which is in the rough format of DefinitionString: { 'Fn::Join': [ '', [Array] ] }. My below test works as is, however without checking against properties such as StartAt it's obviously useless.
My Test -
// PASSES
it('should create My Task 1', () => {
template.hasResourceProperties('AWS::StepFunctions::StateMachine', {
DefinitionString: {
'Fn::Join': [
'',
Match.arrayWith([
// Match.serializedJson(Match.objectLike({ StartAt: 'My Task One' })), // FAILS
// Match.serializedJson({ StartAt: 'My Task One' }) // FAILS
// Match.objectLike(Match.serializedJson({ StartAt: 'My Task One' })) // FAILS
{
Ref: 'AWS::Partition',
},
]),
],
},
});
});
I have tried
Lifting the example exactly as it is from the docs (and changing expect properties to my own) but fails with
Expected JSON as a string but found object at /Properties/DefinitionString
which makes sense given the makeup of DefinitionString.
Adding any of the commented out lines to the Match.arrayWith([]) fail with
Missing element at pattern index 0 at /Properties/DefinitionString/Fn::Join[1] (using arrayWith matcher)
but I can't quite make sense of this as the functions I am using are 'like' matches.
Pasting the entire string "{\"StartAt\":\"My Task One\"... directly as it is into the Match.arrayWith([]) which passes but is far from ideal.
My question is how do I parse the array of strings and objects, then match only what I want? Any guidance would be much appreciated.

AWS Appflow flow against Salesforce is generating additional folders with flow execution ID in S3 destination bucket

I have an AWS Appflow flow against Salesforce. These are the properties:
{
description: 'salesforce_Account',
kMSArn: 'arn:aws:kms:us-west-2:<account_id>:key/e8e57dff-31ab-42c9-a997-2da88ffa3fa7',
destinationFlowConfigList: [
{
connectorType: 'S3',
destinationConnectorProperties: {
s3: {
bucketName: 'kavak-landing-raw-prod',
bucketPrefix: 'salesforce',
s3OutputFormatConfig: {
aggregationConfig: {
aggregationType: 'None',
},
fileType: 'PARQUET',
prefixConfig: {
prefixFormat: 'MONTH',
prefixType: 'PATH',
},
},
},
},
},
],
flowName: 'salesforce_Account',
sourceFlowConfig: {
connectorProfileName: 'appflow-salesforce-conn',
connectorType: 'Salesforce',
incrementalPullConfig: {
datetimeTypeFieldName: 'LastModifiedDate',
},
sourceConnectorProperties: {
salesforce: {
enableDynamicFieldUpdate: true,
includeDeletedRecords: true,
object: 'Account',
},
},
},
tasks: [
{
connectorOperator: {
salesforce: 'NO_OP',
},
sourceFields: [],
taskProperties: [
{
key: 'EXCLUDE_SOURCE_FIELDS_LIST',
value: '[]',
},
],
taskType: 'Map_all',
},
],
triggerConfig: {
triggerProperties: {
dataPullMode: 'Incremental',
scheduleExpression: 'rate(30minutes)',
},
triggerType: 'Scheduled',
},
}
The flow runs without problems, but I'm facing an issue. The issue is that, when I see the folder structure in S3 the flow is generating an additional folder before the Parquet files. These folders corresponds with the execution ID of each flow run. Here are some printscreens:
So, my question is why are these folders being created? Is it Appflow normal behavior? Or is it something related to the flow properties? I couldn't find anything in the official documentation or recent posts online regarding this issue. Can anybody help me understand this?
Thanks!

How do you properly format the syntax in an AWS System Manager Document using downloadContent sourceInfo StringMap

My goal is to have an AWS System Manager Document download a script from S3 and then run that script on the selected EC2 instance. In this case, it will be a Linux OS.
According to AWS documentation for aws:downloadContent the sourceInfo Input is of type StringMap.
The example code looks like this:
{
"schemaVersion": "2.2",
"description": "aws:downloadContent",
"parameters": {
"sourceType": {
"description": "(Required) The download source.",
"type": "String"
},
"sourceInfo": {
"description": "(Required) The information required to retrieve the content from the required source.",
"type": "StringMap"
}
},
"mainSteps": [
{
"action": "aws:downloadContent",
"name": "downloadContent",
"inputs": {
"sourceType":"{{ sourceType }}",
"sourceInfo":"{{ sourceInfo }}"
}
}
]
}
This code assumes you will run this document by hand (console or CLI) and then enter the sourceInfo in the parameter. When running this document by hand, anything entered in the parameter (an S3 URL) isn't accepted. However, I'm not trying to run this by hand, but rather programmatically and I want to hard code the S3 URL into sourceInfo in mainSteps.
AWS does give an example of syntax that looks like this:
{
"path": "https://s3.amazonaws.com/aws-executecommand-test/powershell/helloPowershell.ps1"
}
I've coded the document action in mainSteps like this:
{
"action": "aws:downloadContent",
"name": "downloadContent",
"inputs": {
"sourceType": "S3",
"sourceInfo":
{
"path": "https://s3.amazonaws.com/bucketname/folder1/folder2/script.sh"
},
"destinationPath": "/tmp"
}
},
However, it doesn't seem to work and I receive this error:
invalid format in plugin properties map[sourceInfo:map[path:https://s3.amazonaws.com/bucketname/folder1/folder2/script.sh] sourceType:S3];
error json: cannot unmarshal object into Go struct field DownloadContentPlugin.sourceInfo of type string
Note: I have seen this post that references how to format it for Windows. I did try it, didn't work and doesn't seem relevant to my Linux needs.
So my questions are:
Do you need a parameter for sourceInfo of type StringMap - something that won't be used within the aws:downloadContent {{ sourceInfo }} mainSteps?
How do you properly format the aws:downloadContent action sourceInfo StringMap in mainSteps?
Thank you for your effort in advance.
I had similar issue as I did not want anyone to type the stuff when running. So I added a default to the download content
"sourceInfo": {
"description": "(Required) Blah.",
"type": "StringMap",
"displayType": "textarea",
"default": {
"path": "https://mybucket-public.s3-us-west-2.amazonaws.com/automation.sh"
}
}

HIVE_INVALID_METADATA in Amazon Athena

How can I work around the following error in Amazon Athena?
HIVE_INVALID_METADATA: com.facebook.presto.hive.DataCatalogException: Error: : expected at the position 8 of 'struct<x-amz-request-id:string,action:string,label:string,category:string,when:string>' but '-' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)
When looking at position 8 in the database table connected to Athena generated by AWS Glue, I can see that it has a column named attributes with a corresponding struct data type:
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
My guess is that the error occurs because the attributes field is not always populated (c.f. the _session.start event below) and does not always contain all fields (e.g. the DocumentHandling event below does not contain the attributes.x-amz-request-id field). What is the appropriate way to address this problem? Can I make a column optional in Glue? Can (should?) Glue fill the struct with empty strings? Other options?
Background: I have the following backend structure:
Amazon PinPoint Analytics collects metrics from my application.
The PinPoint event stream has been configured to forward the events to an Amazon Kinesis Firehose delivery stream.
Kinesis Firehose writes data to S3
Use AWS Glue to crawl S3
Use Athena to write queries based on the databases and tables generated by AWS Glue
I can see PinPoint events successfully being added to json files in S3, e.g.
First event in a file:
{
"event_type": "_session.start",
"event_timestamp": 1524835188519,
"arrival_timestamp": 1524835192884,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {
"session_id": "[a session id]",
"start_timestamp": 1524835188519
},
"attributes": {},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "672057540201"
}
Second event in the same file:
{
"event_type": "DocumentHandling",
"event_timestamp": 1524835194932,
"arrival_timestamp": 1524835200692,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {},
"attributes": {
"action": "Button-click",
"label": "FavoriteStar",
"category": "Navigation"
},
"metrics": {
"details": 40.0
},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "[aws account id]"
}
Next, AWS Glue has generated a database and a table. Specifically, I see that there is a column named attributes that has the value of
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
However, when I attempt to Preview table from Athena, i.e. execute the query
SELECT * FROM "pinpoint-test"."pinpoint_testfirehose" limit 10;
I get the error message described earlier.
Side note, I have tried to remove the attributes field (by editing the database table from Glue), but that results in Internal error when executing the SQL query from Athena.
This is a known limitation. Athena table and database names allow only underscore special characters#
Athena table and database names cannot contain special characters, other than underscore (_).
Source: http://docs.aws.amazon.com/athena/latest/ug/known-limitations.html
Use tick (`) when table name has - in the name
Example:
SELECT * FROM `pinpoint-test`.`pinpoint_testfirehose` limit 10;
Make sure you select "default" database on the left pane.
I believe the problem is your struct element name: x-amz-request-id
The "-" in the name.
I'm currently dealing with a similar issue since my elements in my struct have "::" in the name.
Sample data:
some_key: {
"system::date": date,
"system::nps_rating": 0
}
Glue derived struct Schema (it tried to escape them with ):
struct <
system\:\:date:String
system\:\:nps_rating:Int
>
But that still gives me an error in Athena.
I don't have a good solution for this other than changing Struct to STRING and trying to process the data that way.

Use the name of the table from Amazon RDS in the output csv being sent to S3

I successfully managed to get a data pipeline to transfer data from a set of tables in Amazon RDS (Aurora) to a set of .csv files in S3 with a "copyActivity" connecting the two DataNodes.
However, I'd like the .csv file to have the name of the table (or view) that it came from. I can't quite figure out how to do this. I think the best approach is to use an expression the filePath parameter of the S3 DataNode.
But, I've tried #{table}, #{node.table}, #{parent.table}, and a variety of combinations of node.id and parent.name without success.
Here's a couple of JSON snippets from my pipeline:
"database": {
"ref": "DatabaseId_abc123"
},
"name": "Foo",
"id": "DataNodeId_xyz321",
"type": "MySqlDataNode",
"table": "table_foo",
"selectQuery": "select * from #{table}"
},
{
"schedule": {
"ref": "DefaultSchedule"
},
"filePath": "#{myOutputS3Loc}/#{parent.node.table.help.me.here}.csv",
"name": "S3_BAR_Bucket",
"id": "DataNodeId_w7x8y9",
"type": "S3DataNode"
}
Any advice you can provide would be appreciated.
I see that you have #{table} (did you mean #{myTable}?). If you are using a parameter to pass the name of the DB table, you can use that in the S3 filepath as well like this:
"filePath": "#{myOutputS3Loc}/#{myTable}.csv",