Enforcing AWS Glue table properties order - amazon-web-services

I'm using boto3 to update a glue table's table parameters.
I'm doing this with the method update_table.
However, whatever order I push the TableInput parameter in, it gets ignored.
For example I push the following dict:
...
"Parameters": {
"AVTOR": "",
"PROJEKT": "",
"PODROCJE": "",
"CrawlerSchemaDeserializerVersion": "1.0",
"CrawlerSchemaSerializerVersion": "1.0",
...
After updating the table, the table properties are in a random order:
Is there a way to enforce the table properties order with update_table?

Related

AWS AppFlow - rename source field

I have an AppFlow set up with Salesforce as the source and S3 as the destination. I am able to move all columns over by using a Map_all task type in the flow definition, and leaving the source fields empty.
However now I want to move just a few columns to S3, and rename them as well. I was trying to do something like this :
"Tasks": [
{
"SourceFields": ["Website"],
"DestinationField": "Website",
"TaskType": "Map",
"TaskProperties": {},
},
{
"SourceFields": ["Account Name"],
"DestinationField": "AccountName",
"TaskType": "Map",
"TaskProperties": {},
},
{
"SourceFields": ["Account ID"],
"DestinationField": "AccountId",
"TaskType": "Map",
"TaskProperties": {},
}
],
but I get the error
Create Flow request failed: [Task Validation Error: You must specify a projection task or a MAP_ALL task].
How can I select a few columns as well as rename them before moving them to S3 without resorting to something like Glue?
Figured it out - first added a Projection task to fetch the fields needed, and then Map tasks, one per field being renamed

Glue JSON serialization and athena query, return full record each field

I've been trying for a long time, through Glue's crawlers, to recognize .jsons from my S3, to be queried in Athena. But after different changes in settings, the best result I got, is still wrong.
Glue's crawler even recognizes the column structure of my .json, however, when queried in Athena, it sets up the columns found, but throws all items in the same line, one item for each column, as in images below.
My Classifier setting is "$[*]".
The .json data structure
[
{ "id": "TMA.fid--4f6e8018_18596f01b4f\_-5e3a", "airspace_p": 1061, "codedistv1": "SFC", "fid": 299 },
{ "id": "TMA.fid--4f6e8018_18596f01b4f\_-5e39", "airspace_p": 408, "codedistv1": "STD", "fid": 766 },
{ "id": "TMA.fid--4f6e8018_18596f01b4f\_-5e38", "airspace_p": 901, "codedistv1": "STD", "fid": 806 },
...
]
Configuration result in Glue:
Configuration result in Glue
Result in Athena from this table:
Result in Athena from this table
I already tried different .json structures, different classifiers, changed and added the JsonSerde
If you can change the data source, use the JSON lines format instead, then run the Glue crawler without any custom classifier.
{"id": "TMA.fid--4f6e8018_18596f01b4f_-5e3a","airspace_p": 1061,"codedistv1": "SFC","fid": 299}
{"id": "TMA.fid--4f6e8018_18596f01b4f_-5e39","airspace_p": 408,"codedistv1": "STD","fid": 766}
{"id": "TMA.fid--4f6e8018_18596f01b4f_-5e38","airspace_p": 901,"codedistv1": "STD","fid": 806}
Cause of your issue is that Athena doesn't support custom JSON classifier.

AWS Step Function get all Range keys with a single primary key in DynamoDB

I'm building AWS Step Function state machines. My goal is to read all Items from a DynamoDB table with a specific value for the Hash key (username) and any (wildcard) Sort keys (order_id).
Basically something that would be done from SQL with:
SELECT username,order_id FROM UserOrdersTable
WHERE username = 'daffyduck'
I'm using Step Functions to get configurations for AWS Glue jobs from a DynamoDB table and then spawn a Glue job via step function's Map for each dynamodb item (with parameters read from the database).
"Read Items from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "UserOrdersTable",
"Key": {
"username": {
"S": "daffyduck"
},
"order_id": {
"S": "*"
}
}
},
"ResultPath": "$",
"Next": "Invoke Glue jobs"
}
But I can't bring the state machine to read all order_id's for the user daffyduck in the step function task above. No output is displayed using the code above, apart from http stats.
Is there a wildcard for order_id ? Is there another way of getting all order_ids? The query customization seems to be rather limited inside step functions:
https://docs.amazonaws.cn/amazondynamodb/latest/APIReference/API_GetItem.html#API_GetItem_RequestSyntax
Basically I'm trying to accomplish what can be done from the command line like so:
$ aws dynamodb query \
--table-name UserOrdersTable \
--key-condition-expression "Username = :username" \
--expression-attribute-values '{
":username": { "S": "daffyduck" }
}'
Any ideas? Thanks
I don't think that is possible with Step functions Dynamodb Service yet.
currently supports get, put, delete & update Item, not query or scan.
For GetItem we need to pass entire KEY (Partition + Range Key)
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
We need to write a Lambda function to query Dynamo and return a map and invoke the lambda function from step.

Is it possible to iterate through a DynamoDB table within a step function's map state?

Just what the title says, basically. I have read through the documentation:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
This describes how to get a single item of information out of a DynamoDB table from a step function. What I would like to do is iterate through the entire table and start execution of another state machine for each item. Each new state machine would have an individual item as input. I have attempted the following code, which unfortunately is not functional:
{
"StartAt": "OuterFunction",
"States": {
"OuterFunction": {
"Type": "Map",
"Iterator": {
"StartAt": "InnerFunction",
"States": {
"InnerFunction": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem.sync",
"Parameters": {
"StateMachineArn":"other-state-machine-arn",
"TableName": "TestTable"
},
"End": true
}
}
},
"End": true
}
}
}
Is it actually possible to iterate through a DynamoDB table in this way?
You are now able to call DynamoDB directly from step functions. This includes the query and scan operations. With the result, you can then iterate through the items. The one less convenient, caveat is that it does not use the document client, so the results are in the dynamodb json format.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-ddb.html
No, getItem is designed to fetch particular DynamoDB document. You need to write custom Lambda that will .query() or .scan() your table and then use Map step to iterate over results (most likely you won't need getItem at that time, because you can load all data with the query/scan operation).

Analytics in WSO2DAS

I'm getting a Table Not Found error while running a select query on spark console of wso2das. I've kept all the default configurations intact after the installation. I'm unable to fetch the data from the event stream even when it's been shown under table dropdown of data explorer.
Initially when the data is moved into the wso2das, it would be persisted in the data store you mention.
But, these are not the tables that are created in spark. You need to write a spark query to create a temporary table in spark which would reference the table you have persisted.
For example,
If your stream is,
{
"name": "sample",
"version": "1.0.0",
"nickName": "",
"description": "",
"payloadData": [
{
"name": "ID",
"type": "INT"
},
{
"name": "NAME",
"type": "STRING"
}
]
}
you need to write the following spark query in the spark console,
CREATE TEMPORARY TABLE sample_temp USING CarbonAnalytics OPTIONS (tableName "sample", schema "ID INT, NAME STRING");
after executing the above script,try the following,
select * from sample_temp;
This should fetch the data you have pushed into WSO2DAS.
Happy learning!! :)