The post seems long but is is only because of data (samples and errors).
I am trying to make a bucket mocking the buildFailed sample in cep 2.1.0. (This sample works).
I have created my own stream and my own sample data.
Yet it seams that the input handler of cep his having trouble with my events.
So far I have not found the issue.
The stream def :
"version": "1.2.0",
"nickName": "poc sample",
"description": "poc sample stream",
{ "name":"code",
The events data :
"metaData" : [""] ,
"correlationData" : ["PSOR", "Appli2", "Ref-1"] ,
"payloadData" : ["1363700128138496600", "6", "BIZ", "6"]
"metaData" : [""] ,
"correlationData" : ["PSOR", "Appli2", "Ref-0"] ,
"payloadData" : ["1363700126353394500", "6", "BIZ", "6"]
"metaData" : [""] ,
"correlationData" : ["PSOR", "Appli2", "Ref-3"] ,
"payloadData" : ["1363700131731702100", "6", "BIZ", "6"]
"metaData" : [""] ,
"correlationData" : ["PSOR", "Appli2", "Ref-2"] ,
"payloadData" : ["1363700129894597000", "6", "BIZ", "6"]
"metaData" : [""] ,
"correlationData" : ["PSOR", "Appli2", "Ref-4"] ,
"payloadData" : ["1363700133472801700", "6", "BIZ", "6"]
When I send the streamdef, no error and no log except the admin connected
We might need more feedback here. I use the curl post command.
When I send the events I have errors :
[2013-03-19 14:58:00,586] ERROR {org.wso2.carbon.databridge.core.internal.queue.QueueWorker} - Error in passing event eventList [
correlationData=[PSOR, Appli2, Ref-1],
payloadData=[1363700128138496600, 6, BIZ, 6],
correlationData=[PSOR, Appli2, Ref-0],
payloadData=[1363700126353394500, 6, BIZ, 6],
correlationData=[PSOR, Appli2, Ref-3],
payloadData=[1363700131731702100, 6, BIZ, 6],
correlationData=[PSOR, Appli2, Ref-2],
payloadData=[1363700129894597000, 6, BIZ, 6],
correlationData=[PSOR, Appli2, Ref-4],
payloadData=[1363700133472801700, 6, BIZ, 6],
] to subscriber$AgentBrokerCallback#2d7fbbd6
at org.wso2.carbon.cep.core.mapping.input.mapping.TupleInputMapping.getValue(
at org.wso2.carbon.cep.core.mapping.input.mapping.TupleInputMapping.convertToEventTuple(
at org.wso2.carbon.cep.core.mapping.input.mapping.InputMapping.convert(
at org.wso2.carbon.cep.core.listener.TopicEventListener.onEvent(
at org.wso2.carbon.cep.core.listener.BrokerEventListener.onEvent(
at java.util.concurrent.Executors$
at java.util.concurrent.FutureTask$Sync.innerRun(
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
at java.util.concurrent.ThreadPoolExecutor$
Please anyone, do you have any hints ?
I really need this to keep on with my Proof of concept CEP Project.
Best regards,

I have gone through the details that you given above... But without the bucket configuration and complete error log it is hard to say what went wrong... But I have checked the stream definition and events that you have given above... It is working perfectly without any issue... I hope that you might made simple mistake when creating the bucket... Here I am sharing the bucket xml that I have created (note: change the email address in the output topic)
events json : link [1]
stream json : link [2]
bucket xml : link [3]
curl command for Stream :
curl -k --user admin:admin https://localhost:9443/datareceiver/1.0.0/streams/ --data #streamdefn2.json -H "Accept: application/json" -H "Content-type: application/json" -X POST
curl command for events :
curl -k --user admin:admin https://localhost:9443/datareceiver/1.0.0/stream/ --data #events2.json -H "Accept: application/json" -H "Content-type: application/json" -X POST
(Please follow the doc [4] thoroughly for more details]
Hope this will help you...



I tried to follow this example to load data to neptune
curl X POST -H 'Content-Type: application/json' https://endpoint:port/loader -d '
"source" : "s3://source.csv",
"format" : "csv",
"iamRoleArn" : "role",
"region" : "region",
"failOnError" : "FALSE",
"parallelism" : "MEDIUM",
"updateSingleCardinalityProperties" : "FALSE",
"queueRequest" : "TRUE"
"status" : "200 OK",
"payload" : {
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb"
And I get status 200 but then I try to check if the data was loaded and get this:
curl G 'https://endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb'
"status" : "200 OK",
"payload" : {
"feedCount" : [
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
I had no idea why I get LOAD_FAILED so I decided to use get-status API to see what errors caused the load failure and got this:
curl -X GET 'endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb?details=true&errors=true'
"status" : "200 OK",
"payload" : {
"feedCount" : [
"overallStatus" : {
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 4,
"startTime" : 1617653964,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
"failedFeeds" : [
"fullUri" : "s3://source.csv",
"runNumber" : 1,
"retryNumber" : 1,
"status" : "LOAD_FAILED",
"totalTimeSpent" : 1,
"startTime" : 1617653967,
"totalRecords" : 10500,
"totalDuplicates" : 0,
"parsingErrors" : 0,
"datatypeMismatchErrors" : 0,
"insertErrors" : 10500
"errors" : {
"startIndex" : 1,
"endIndex" : 10,
"loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb",
"errorLogs" : [
"errorMessage" : "Either from vertex, '1414', or to vertex, '70', is not present.",
"fileName" : "s3://source.csv",
"recordNum" : 0
What does this error even mean and what is the possible fix?
It looks as if you were trying to load some edges. When an edge is loaded, the two vertices that the edge will be connecting must already have been loaded/created. The message:
"errorMessage" : "Either from vertex, '1414', or to vertex, '70',is not present.",
is letting you know that one (or both) of the vertices with ID values of '1414' and '70' are missing. All vertices referenced by a CSV file containing edges must already exist (have been created or loaded) prior to loading edges that reference them. If the CSV files for vertices and edges are in the same S3 location then the bulk loader can figure out the order to load them in. If you just ask the loader to load a file containing edges but the vertices are not yet loaded, you will get an error like the one you shared.

Not able to connect to Snowflake from EMR Cluster using Pyspark using airflow emr operator

I am trying to connect to snowflake from EMR cluster launched by airflow EMR operator but I'm getting the following error
py4j.protocol.Py4JJavaError: An error occurred while calling
o147.load. : java.lang.ClassNotFoundException: Failed to find data
source: net.snowflake.spark.snowflake. Please find packages at
These are the steps I am adding to my EMRaddsteps operator to run the script and I am describing my snowflake packages in the "Args"
"Name" : "convo_facts",
"ActionOnFailure" : "TERMINATE_CLUSTER",
"HadoopJarStep" : {
"Jar" : "command-runner.jar",
"Args" : ["spark-submit", "s3://dev-data-lake/spark_files/cf/", \
"--packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4", \
"INPUT=s3://dev-data-lake/table_exports/public/", \
'Name' : 'cftest',
'LogUri' : 's3://dev-data-lake/emr_logs/cf/log.txt',
'ReleaseLabel' : 'emr-5.32.0',
'Instances' : {
'InstanceGroups' : [
'Name' : 'Master nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'MASTER',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 1,
'Name' : 'Slave nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'CORE',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 3,
'KeepJobFlowAliveWhenNoSteps' : True,
'TerminationProtected' : False
'Applications' : [{
'Name' : 'Spark'
'JobFlowRole' : 'EMR_EC2_DefaultRole',
'ServiceRole' : 'EMR_DefaultRole'
And, this is how I am adding snowflake creds in my script to extract into a pyspark dataframe.
# Set options below
sfOptions = {
"sfURL" : "",
"sfUser" : "user",
"sfPassword" : "xxxx",
"sfDatabase" : "",
"sfSchema" : "PUBLIC",
"sfWarehouse" : ""
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
query_sql = """select * from cf""";
messages_new = \
.options(**sfOptions) \
.option("query", query_sql) \
Not sure if I am missing something here or where am I doing wrong.
The option --package should be placed before s3://.../ in the spark-submit command. Otherwise, it'll be considered as application argument.
Try with this :
"Name": "convo_facts",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [

AWS Managed elastic Search restore - node does not match index setting

Im trying to restore the ElasticSearch snapshot which is taken from the AWS managed elastic search. Version 5.6. Instance type i3.2xlarge.
While restoring this on a VM, immediately the cluster status went to Red and all the shards are unassigned.
"cluster_name" : "es-cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 480,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 0.0
When I use the allocation explain API, I got this below response.
"node_id" : "3WEV1tHoRPm6OguKyxp0zg",
"node_name" : "node-1",
"transport_address" : "",
"node_decision" : "no",
"deciders" : [
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [instance_type:\"i2.2xlarge OR i3.2xlarge\"]"
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
This is something strange and I never faced this. Anyhow the snapshot is done, How can I ignore this setting while restoring? Even I tried the below query but still the same issue.
curl -X POST "localhost:9200/_snapshot/restore/awsnap/_restore?pretty" -H 'Content-Type: application/json' -d'
{"ignore_index_settings": [
I found the cause and the solution.
Detailed troubleshooting steps are here
But leaving this comment here, so others can get benefit from it.
This is AWS specific thing, So I used this to solve it.
curl -X POST "localhost:9200/_snapshot/restore/awsnap/_restore?pretty" -H 'Content-Type: application/json' -d'
{"ignore_index_settings": [

Wrong SQS AWS message when I'm subscribed from a SNS Topic

I'm having problems with the next design:
When I'm receiving the message in my SQS Subscriber, the model of message it's wrong, example:
"Type" : "Notification",
"MessageId" : "7a6789f0-02f0-5ed3-8a11-deebcd08f145",
"TopicArn" : "arn:aws:sns:us-east-2:167186109795:name_sns_topic",
"Message" : "My JSON message",
"Timestamp" : "1987-04-23T17:17:44.897Z",
"SignatureVersion" : "1",
"Signature" : "string",
"SigningCertURL" : "url",
"UnsubscribeURL" : "url",
"MessageAttributes" : {
"X-Header1" : {"Type":"String","Value":"value1"},
"X-Header2" : {"Type":"String","Value":"value2"},
"X-Header3" : {"Type":"String","Value":"value3"},
"X-HeaderN" : {"Type":"String","Value":"value4"}
The common model when recieve message from SQS should be:
"Records": [
"messageId": "19dd0b57-b21e-4ac1-bd88-01bbb068cb78",
"receiptHandle": "MessageReceiptHandle",
"body": "Hello from SQS!",
"attributes": {
"ApproximateReceiveCount": "1",
"SentTimestamp": "1523232000000",
"SenderId": "123456789012",
"ApproximateFirstReceiveTimestamp": "1523232000001"
"messageAttributes": {},
"md5OfBody": "7b270e59b47ff90a553787216d55d91d",
"eventSource": "aws:sqs",
"eventSourceARN": "arn:{partition}:sqs:{region}:123456789012:MyQueue",
"awsRegion": "{region}"
In my handler Java Lambda (example code) is throwing an exception because the estructure of de message received is not SQS Event:
public class MyHandler implements RequestHandler<SQSEvent, String> {
public String handleRequest(SQSEvent event, Context context) {
LambdaLogger logger = context.getLogger();
for (SQSEvent.SQSMessage msg : event.getRecords()) {
logger.log("SQS message body: " + msg.getBody());
logger.log("Get attributes: " + msg.getMessageAttributes().toString());
(k, v) -> {
logger.log("key: " + k + "value: " + v.getStringValue());
return "Successful";
How can I do for handle the message thats its receiving ?
In my opinion this isn't documented too well but it's not bad once you figure it out.
The first thing is that I don't use the predefined Lambda objects. I read everything into a String and take it from there. So the base of my Lamda function is:
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
// copy InputStream to String, avoiding 3rd party libraries
ByteArrayOutputStream result = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int length;
while ((length = != -1) {
result.write(buffer, 0, length);
String jsonString = result.toString();
When you "go direct" from SNS to Lambda the message looks something like (some fields removed for sake of length):
"Records": [
"EventSource": "aws:sns",
"EventVersion": "1.0",
"Sns": {
"Type": "Notification",
"Subject": "the message subject",
"Message": "{\"message\": \"this is the message\", \"value\": 100}",
"Timestamp": "2020-04-24T21:44:28.220Z",
"SignatureVersion": "1"
I had sent in a test message in JSON with two simple fields. Using JsonPath the "message" field inside of everything is read with:
String snsMessage =, "$.Records[0].Sns.Message");
String realMessage =, "$.message");
But when it goes SNS -> SQS -> Lambda (or, indeed any SNS -> SQS path) the SNS message is now mostly wrapped and escaped in an SQS message:
"Records": [
"messageId": "ca8c53e5-8417-4479-a720-d4ecf970ca68",
"body": "{\n \"Type\" : \"Notification\",\n \"Subject\" : \"the message subject\",\n \"Message\" : \"{\\\"message\\\": \\\"this is the message\\\", \\\"value\\\": 100}\"\n}",
"attributes": {
"ApproximateReceiveCount": "1"
"md5OfBody": "6a4840230aca6a7bf7934bf191a529b8",
"eventSource": "aws:sqs"
So in this case, the value is in Records[0].body but that contains another JSON object. I'll admit that there is likely an easier way but from what I found I had to parse 3 times:
String sqsBody = <as read in lambda>;
String recordBody =, "$.Records[0].body");
String internalMessage =, "$.Message");
// now read out of the sns message
String theSnsMessage =, "$.message");

How to get entire request headers/querystrings in Serverless framework?

I started to try Serverless framework, but it looks little confusable for some points...
One of them is request headers/querystrings,
I made request template like this:
"apiRequestTemplate": {
"application/json": {
"httpMethod": "$context.httpMethod",
"body": "$input.json('$')",
"queryParams" : "$input.params().querystring",
"headerParams" : "$input.params().header",
"headerParamNames" : "$input.params().header.keySet()",
"contentTypeValue" : "$input.params().header.get('Content-Type')"
"requestParameters": {},
"requestTemplates": "$${apiRequestTemplate}",
With this setting, I expected to get the request something like this:
"body" : {}
"contentTypeValue" : ""
"headerParamNames" : ["Accept", "Accept-Encoding", ... ],
"headerParams" : {
"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding" : "gzip, deflate, sdch, br, Accept-Language=ja,en-US;q=0.8,en;q=0.6",
"httpMethod" : "GET",
"queryParams" : {
"category" : "Some Category"
But in real, what I get is:
"body" : {}
"contentTypeValue" : ""
"headerParamNames" : "[Accept,Accept-Encoding, ... ]",
"headerParams" : "{Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", Accept-Encoding=gzip, deflate, sdch, br, Accept-Language=ja,en-US;q=0.8,en;q=0.6", ...}",
"httpMethod" : "GET",
"queryParams" : "{category=Some Category}"
This results inconvenient to handle.
I know also, method like below:
"requestParameters": {},
"requestTemplates": {
"application/json": "{\"category\":\"$input.params('category')\"}"
But this is also inconvenient need to specify all parameters in configuration..
Is there any way to get entire request-headers / query-strings as json object in lambda function?
Modified after answer
I tried to change s-template.json to
"queryParams" : "$util.parseJson($input.params().querystring)",
"headerParams" : "$util.parseJson($input.params().header)",
But result was same...
And, in AWS document, what I want can be seen here:
#set($allParams = $input.params())
"params" : {
#foreach($type in $allParams.keySet())
#set($params = $allParams.get($type))
"$type" : {
#foreach($paramName in $params.keySet())
"$paramName" : "$util.escapeJavaScript($params.get($paramName))"
But I don't know how to set this setting to Serverless framework's s-templates.json...
I use following request template. It will wrap data, path, headers,params,query into a JSON object and pass it to the function.
"application/json": {
"data": "$input.json('$')",
"path": "$context.resourcePath",
"method": "$context.httpMethod",
"headers": "{#foreach($header in $input.params().header.keySet())\"$header\": \"$util.escapeJavaScript($input.params().header.get($header))\" #if($foreach.hasNext),#end#end}",
"params": "{#foreach($param in $input.params().path.keySet())\"$param\": \"$util.escapeJavaScript($input.params().path.get($param))\" #if($foreach.hasNext),#end#end}",
"query": "{#foreach($queryParam in $input.params().querystring.keySet())\"$queryParam\": \"$util.escapeJavaScript($input.params().querystring.get($queryParam))\" #if($foreach.hasNext),#end#end}"
You can refer Apache Velocity Templates to get better understanding about the inner syntax such as #foreach($header in .....).
Have you tried $util.parseJson()? It takes the json as a string and turns it into a traditional json object.