RDS to Neptune DMS task translates all rows into a single vertex - amazon-web-services

I have a table in RDS defined as follows:
| id | first_name | prime_faculty | etc...|
My GraphMappingConfig json is defined as follows:
{
"rules": [
{
"rule_id": "1",
"rule_name": "researcher_row_to_vertex",
"table_name": "table_name",
"vertex_definitions": [
{
"vertex_id_template": "RESEARCHER_{id}",
"vertex_label": "researcher",
"vertex_definition_id": "1",
"vertex_properties": [
{
"property_name": "first_name",
"property_value_template": "{first_name}",
"property_value_type": "String"
},
{
"property_name": "prime_faculty",
"property_value_template": "{prime_faculty}",
"property_value_type": "String"
}
]
}
]
}
]
}
however when the DMS tasks completes, the output I get from gremlin is a single vertex with all the property values defined in an array.
The query I used was g.V().valueMap()
Is there something I'm doing wrong with my graphmappingconfig?
The DMS task completes successfully and even says all the rows from the RDS database are loaded in, it just doesn't load it in the correct format.

Related

GCP - BigTable to BigQuery

I am trying to query Bigtable data in BigQuery using the external table configuration. I have the following SQL command that I am working with. However, I get an error stating invalid bigtable_options for format CLOUD_BIGTABLE.
The code works when I remove the columns field. For context, the raw data looks like this (running query without column field):
rowkey
aAA.column.name
aAA.column.cell.value
4271
xxx
30
yyy
25
But I would like the table to look like this:
rowkey
xxx
4271
30
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": string,
"qualifierString": string,
"fieldName": "xxx",
"type": string,
"encoding": string,
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
I think you let the default value for each column attribute. the string is the type of the value to provide, but not the raw value to provide. It makes no sense in JSON here. Try to add double quote like that
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": "string",
"qualifierString": "string",
"fieldName": "xxx",
"type": "string",
"encoding": "string",
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
The false is correct because the type is a boolean. More details here. The encoding "string" will be erroneous (use a real encoding type).
The error here is in this part:
bigtableColumnFamilies: [
It should be:
"columnFamilies": [
Concerning adding columns for string you will only add:
"columns": [{
"qualifierString": "name_of_column_from_bt",
"fieldName": "if_i_want_rename",
}],
fieldName is not required.
However to access your field value you will still have to use such SQL code:
SELECT
aAA.xxx.cell.value as xxx
FROM dev_test.telem_test

Appsync custom resolver error "Unable to transform Response Template"

I am trying to write "BatchPutItem" custom resolver, so that I can create multiple items (not more than 25 at a time), which should accept a list of arguments and then perform a batch operation.
Here is the code which I have in CusomtResolver:
#set($pdata = [])
#foreach($item in ${ctx.args.input})
$util.qr($item.put("createdAt", $util.time.nowISO8601()))
$util.qr($item.put("updatedAt", $util.time.nowISO8601()))
$util.qr($item.put("__typename", "UserNF"))
$util.qr($item.put("id", $util.defaultIfNullOrBlank($item.id, $util.autoId())))
$util.qr($pdata.add($util.dynamodb.toMapValues($item)))
#end
{
"version" : "2018-05-29",
"operation" : "BatchPutItem",
"tables" : {
"Usertable1-staging": $utils.toJson($pdata)
}
}
Response in the query console section:
{
"data": {
"createBatchUNF": null
},
"errors": [
{
"path": [
"createBatchUserNewsFeed"
],
"data": null,
"errorType": "MappingTemplate",
"errorInfo": null,
"locations": [
{
"line": 2,
"column": 3,
"sourceName": null
}
],
"message": "Unsupported operation 'BatchPutItem'. Datasource Versioning only supports the following operations (TransactGetItems,PutItem,BatchGetItem,Scan,Query,GetItem,DeleteItem,UpdateItem,Sync)"
}
]
}
And the query is :
mutation MyMutation {
createBatchUNF(input: [{seen: false, userNFUserId: "userID", userNFPId: "pID", user NFPOwnerId: "ownerID"}]) {
items {
id
seen
}
}
}
Conflict detection is also turned off
And when I check cloud logs I found this error:
I was able to solve this issue by disabling DataStore for entire API.
When we create a new project/backend app using amplify console, and in data tab it asks us to enable data store so that we can start modelling our GraphQL api, that's the reason which enables versioning for the entire API, and it prevents batchWrite to execute.
So in order to solve this issue, we can follow these steps:
[Note: I am using #aws-amplify/cli]
$ amplify update api
? Please select from one of the below mentioned services: GraphQL
? Select from the options below: Disable DataStore for entire API
And then amplify push
This will fix this issue.

AWS Redshift - Loading GeoJSON Into Geometry Field

Currently it is not possible to load GeoJSON directly into a redshift geometry column using the copy command, but a workaround has been suggested at:-
Copying GeoJSON data from S3 to Redshift
This involves ingesting as WKT then converting to geometry using a spatial function, however i'm not entirely sure how to get from geojson to wkt - i am sure there must be some converter available.
But this is where my limited understanding of spatial data comes in - let's say i want to load weather geojson objects like the one shown below into a table in redshift.
If I understand it correctly, each Feature in the FeatureCollection would be a row in the table with just the contents of json geometry field loaded into a field of type GEOMETRY i.e. this field does not take any of the properties or attributes of the feature. The properties would then be loaded into completely separate fields using conventional datatypes. Then if i wanted to export that feature as geojson, i would have to stitch the geometry and properties back together again.
Is that correct?
Or does the GEOMETRY type actually have the facility to store the properties as well as the geometry field contents?
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": "abcd1234",
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[
7.51,
48.04
],
[
8.12,
48.05
],
[
8.19,
47.95
]
]
]
]
},
"geometry_name": "contour",
"properties": {
"identifier": "abcd1234",
"analysisTime": "2019-04-06T14:15:00Z",
"convectionCellType": "CELL_BASE",
"speed": 3,
"area": "MSG",
"phasetype": null,
"top": 10363,
"intensityValue": null,
"ice": true,
"created_at": "2020-07-21T12:01:25.651Z"
}
}
],
"totalFeatures": 1,
"numberMatched": 1,
"numberReturned": 1,
"timeStamp": "2021-02-03T16:39:48.963Z",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::4326"
}
}
}
I found this Gist which provides the following Python snippet to do the conversion:
from shapely.geometry import shape
o = {
"coordinates": [[[23.314208, 37.768469], [24.039306, 37.768469], [24.039306, 38.214372], [23.314208, 38.214372], [23.314208, 37.768469]]],
"type": "Polygon"
}
geom = shape(o)
# Now it's very easy to get a WKT/WKB representation
geom.wkt
geom.wkb

AWS IoT rule - timestamp for Elasticsearch

Have a bunch of IoT devices (ESP32) which publish a JSON object to things/THING_NAME/log for general debugging (to be extended into other topics with values in the future).
Here is the IoT rule which kind of works.
{
"sql": "SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS timestamp, topic(2) AS deviceId FROM 'things/+/stdout'",
"ruleDisabled": false,
"awsIotSqlVersion": "2016-03-23",
"actions": [
{
"elasticsearch": {
"roleArn": "arn:aws:iam::xxx:role/iot-es-action-role",
"endpoint": "https://xxxx.eu-west-1.es.amazonaws.com",
"index": "devices",
"type": "device",
"id": "${newuuid()}"
}
}
]
}
I'm not sure how to set #timestamp inside Elasticsearch to allow time based searches.
Maybe I'm going about this all wrong, but it almost works!
Elasticsearch can recognize date strings matching dynamic_date_formats.
The following format is automatically mapped as a date field in AWS Elasticsearch 7.1:
SELECT *, parse_time("yyyy/MM/dd HH:mm:ss", timestamp()) AS timestamp FROM 'events/job/#'
This approach does not require to create a preconfigured index, which is important for dynamically created indexes, e.g. with daily rotation for logs:
devices-${parse_time("yyyy.MM.dd", timestamp(), "UTC")}
According to elastic.co documentation,
The default value for dynamic_date_formats is:
[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]
#timestamp is just a convention as the # prefix is the default prefix for Logstash generated fields. Because you are not using Logstash as a middleman between IoT and Elasticsearch, you don't have a default mapping for #timestamp.
But basically, it is just a name, so call it what you want, the only thing that matters is that you declare it as a timestamp field in the mappings section of the Elasticsearch index.
If for some reason you still need it to be called #timestamp, you can either SELECT it with that prefix right away in the AS section (might be an issue with IoT's sql restrictions, not sure):
SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS #timestamp, topic(2) AS deviceId FROM 'things/+/stdout'
Or you use the copy_to functionality when declaring you're mapping:
PUT devices/device
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"copy_to": "#timestamp"
},
"#timestamp": {
"type": "date",
}
}
}
}

Insert data in AWS ES from S3 by lambda function doesn't work

I would like to ask you about how to correctly put the data from S3 to ES domain. I've created and configured a new ES domain, bucket and the lambda function (from this example:
https://github.com/awslabs/amazon-elasticsearch-lambda-samples). All of them are created on the same location.
Everything would be fine but I tried to put something to my bucket and everything looks good - I've placed a new json file and then lambda function detected it and show the results like:
{
"Records": [
"bucket": {
"name": "test",
"...."
},
"object": {
"key": "test.json",
"size": 22,
"eTag": "",
"sequencer": ""
}
....
]
}
2016-04-08T07:34:xxxxxxx 0 All 26 log records added to ES.
After all, I tried to search something in ES, but it doesn't show me any new indexes, i've checked this by url:
https://search-xxxx.us-west-2.es.amazonaws.com/_aliases
What am I doing wrong?
Cheers :)
The index must be created before
The lambda function from the code you report https://github.com/awslabs/amazon-elasticsearch-lambda-samples/blob/master/src/s3_lambda_es.js have the link of existing index
/* Globals */
var esDomain = {
endpoint: 'my-search-endpoint.amazonaws.com',
region: 'my-region',
index: 'logs',
doctype: 'apache'
};
You have updated those values and it must match the index you created
If you left logs you must create the logs index in elasticsearch
curl -XPOST 'https://search-xxxx.us-west-2.es.amazonaws.com/_aliases' -d '
{
"actions" : [
{ "add" : { "index" : "logs", "alias" : "logs" } }
]
}'
Just make sure the index is aligned between the creation and your lambda function