WSO2 CEP siddhi Filter issue - wso2

I am trying to use the siddhi query langage but it seems I am misusing it.
I have some events with the following streamdef :
{ 'name':'eu.ima.stat.events', 'version':'1.1.0', 'nickName': 'Flux event Information', 'description': 'Details of Analytics Statistics', 'metaData':[ {name:'HostIP','type':'STRING'} ], 'correlationData':[ {name:'ProcessType','type':'STRING'}, {name:'Flux','type':'STRING'}, {name:'ReferenceId','type':'STRING'} ], 'payloadData':[ {'name':'Timestamp','type':'STRING'}, {'name':'EventCode','type':'STRING'}, {'name':'Type','type':'STRING'}, {'name':'EventInfo','type':'STRING'} ]}
I am just trying to filter events with the same processus value and the same flux value using a query like this one :
from myEventStream[processus == 'SomeName' and flux == 'someOtherName' ]
insert into someStream
processus, flux, timestamp
Whenever I try this, no output is generated. When I get rid of the filter
from myEventStream
insert into someStream
processus, flux, timestamp
all my events are ther in the output.
What's wrong with my query ?

I can see some spell mistakes in your query... In the filter you have used a variable name called "processus" which is not in the event stream. That is why this query does not give any output. When you are creating a bucket in WSO2 CEP, make sure that the bucket is deployed correctly in the CEP server and check in the management console.(CEP BUCKETS --> List).
On your situation. bucket will not be deployed because of the wrong configuration and also there will be error messages printed in the terminal where CEP server runs. After correcting this mistake your query will run perfectly without any issue...
Regards,
Mohan

Considering Mohan's answer,rename 'ProcessType' or change your query like this
from myEventStream[ ProcessType == 'SomeName' and flux == 'someOtherName' ]
insert into someStream
ProcessType, flux, timestamp

Related

AWS CloudWatch Logs Insight Nested Json

My json log object is like this.
{
"FileName":"file1.xlsx",
"IsSuccess":false,
"LogList":[
{"ErrorDetail":"Text1"},
{"ErrorDetail":"Text2"},
{"ErrorDetail":"Text3"},
]
}
When I write the query in CloudWatch Logs Insight like below it list the nested json in a single line.
fields #timestamp, FileName, LogList.ErrorDetail:
LogsInsight Query Result Nested Json
LogsInsight Query Result
This makes very difficult to read as the user need to scroll horizontally. I want the list to be displayed veritically. How can this be achieved?

Dealing With Incoming Null Values In Cloud Data Fusion When Building Data Pipeline

I have started trying out google cloud data fusion as a prospect ETL tool that I can finally decide to use.When building a data pipeline to fetch data from a REST API source and load it to a MySQL database am facing this error Expected a string but was NULL at line 1 column 221'. Please check the system logs for more details. and yes it's true I have a field that is null from the JSON response am seeing
"systemanswertime": null
How do I deal with null values since the available dropdown in the cloud data fusion studio string is not working are they other optional data types that I can use?
Below are two screenshots showing my current data pipeline structure
geneneral view
view showing mapping and the output schema
Thank You!!
What you need to do is to tell HTTP plugin that you are expecting a null by checking the null checkbox in front of output on the right side. See below example
You might be getting this error because in the JSON schema you are defining the value properties. You should allow systemanswertime parameter to be NULL.
You could try to parse the JSON value as follow:
"systemanswertime": {
"type": [
"string",
"null"
]
}
In the case you don't have access to the JSON file, you could try to use this plug in in order to enable the HTTP to manage nulleable values by dynamically substituting the configurations that can be served by the HTTP Server. You will need access to the HTTP endpoint in order construct an accessible HTTP endpoint that can serve content similar to:
{
"name" : "output.schema", "type" : "schema", "value" :
[
{ "name" : "id", "type" : "int", "nullable" : true},
{ "name" : "first_name", "type" : "string", "nullable" : true},
{ "name" : "last_name", "type" : "string", "nullable" : true},
{ "name" : "email", "type" : "string", "nullable" : true},
]
},
In case you are facing an error such as: No matching schema found for union type: ["string","null"], you could try the following workaround. The root cause of this errors are when the entries in the response from the API doesn't have all the fields it needs to have. For example, some entries may have callerId, channel, last_channel, last data, etc... but others entries may have not have last_channel or whatever other field from the JSON. This leads to a mismatch in the schema provided in the HTTP source and the pipeline fails right away.
As pear this when nodes encounter null values, logical errors, or other sources of errors, you may use an error handler plugin to catch errors. The way is as following:
In the HTTP source plug-in, change the following:
Output schema to account for custom field.
JSON/XML field mapping to account into custom field.
Changed Non-HTTP Error Handling field to Send to Error. This way it pushes the records through error collector and the pipeline proceeds with subsequent records.
Added Error Collector and a sink to capture the error records.
With this method you will be able to run the pipeline and had the problematic fields detected.
Kind regards,
Manuel

Log entries api not retrieving log entries

I am trying to retrieve custom logs for a particular project in google-cloud. I am using this api:
https://logging.googleapis.com/v2/entries:list
as per the example given in this link.
The below is the payload:
{
"filter": "projects/projectA/logs/slow_log",
"resourceNames": [
"projects/projectA"
]
}
There is a custom log based metric called slow_log I created in that projectA, which gathers query logs from cloud-SQL database in that project. I also generated data before calling this api. I am able to see the data in stack-driver console, but unable to get it from the rest call.
Every time I run this api, I only get this response and nothing else:
"nextPageToken": "EAA4suKu3qnLwbtrSg8iDSIDCgEAKgYIgL7q8wVSBwibvMSMvhhglPDiiJzdjt_zAWocCgwI2buKhAYQlvTd2gESCAgLEMPV7ukCGAAgAQ"
Is there anything missing here?
How is it possible to pass time range in this query?
Update
Changed the request as per the comment below as gave the full path of the logs: still only the token is displayed
{
"filter": "projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log",
"projectIds": [
"projectA"
],
"orderBy": "timestamp desc"
}
Also I give this command from command line:
gcloud logging read logName="projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log"
then it fetches the logs in command line, so I am not sure what I am missing in the api explorer and postman where I get only nextpage token.
resourceNames, filter and orderBy are mandatory, try like this:
{
"resourceNames": [
"projects/projectA"
],
"filter": "projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log",
"orderBy": "timestamp desc"
}

How to see progress when using Glue to export DynamoDB table

I'm trying to export every item in a DynamoDB table to S3. I found this tutorial https://aws.amazon.com/blogs/big-data/how-to-export-an-amazon-dynamodb-table-to-amazon-s3-using-aws-step-functions-and-aws-glue/ and followed the example. Basically,
table = glueContext.create_dynamic_frame.from_options(
"dynamodb",
connection_options={
"dynamodb.input.tableName": table_name,
"dynamodb.throughput.read.percent": read_percentage,
"dynamodb.splits": splits
}
)
glueContext.write_dynamic_frame.from_options(
frame=table,
connection_type="s3",
connection_options={
"path": output_path
},
format=output_format,
transformation_ctx="datasink"
)
I tested it in a tiny table in nonprod environment and it works fine. But my Dynamo table in production is over 400GB, 200 mil items. I suppose it'll take a while, but I have no idea how long to expect. Hours, or even days? Are there any way to show progress? For example, showing a count of how many items have been processed. I don't want to blindly start this job and wait.
One way would be to enable continuous logging for your AWS Glue Job to monitor its progress.
Another way would be to trigger a Lambda function whenever a file has been stored in S3, using Amazon S3 event notifications.
Did you try the custom waiter class within was docs?
For instance custom waiter for a Glue Job should look something like this:
class JobCompleteWaiter(CustomWaiter):
def __init__(self, client):
super().__init__(
"JobComplete",
"get_job_run",
"JobRun.JobRunState",
{"SUCCEEDED": WaitState.SUCCEEDED, "FAILED": WaitState.FAILED},
client,
max_tries=100,
)
def wait(self, JobName, RunId):
self._wait(JobName=JobName, RunId=RunId)
According to boto3 docs, you should expect a set of 6 different possible states from a JOB: STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
So I chost checkein whether was SUCCEEDED or FAILED.

How to parse mixed text and JSON log entries in AWS CloudWatch for Log Metric Filter

I am trying to parse log entries which are a mix of text and JSON. The first line is text representation and the next lines are JSON payload of the event. One of the possible examples are:
2016-07-24T21:08:07.888Z [INFO] Command completed lessonrecords-create
{
"key": "lessonrecords-create",
"correlationId": "c1c07081-3f67-4ab3-a5e2-1b3a16c87961",
"result": {
"id": "9457ce88-4e6f-4084-bbea-14fff78ce5b6",
"status": "NA",
"private": false,
"note": "Test note",
"time": "2016-02-01T01:24:00.000Z",
"updatedAt": "2016-07-24T21:08:07.879Z",
"createdAt": "2016-07-24T21:08:07.879Z",
"authorId": null,
"lessonId": null,
"groupId": null
}
}
For these records I try to define Log Metric Filter to a) match records b) select data or dimensions if possible.
According to the AWS docs JSON pattern should look like this:
{ $.key = "lessonrecords-create" }
however, it does not match anything. My guess is that because of mix text and JSON in a single log entry.
So, the questions are:
1. Is it possible to define a pattern that will match this log format?
2. Is it possible to extract dimensions, values from such a log format?
3. Help me with a pattern to do this.
If you set up the metric filter in the way that you have defined, the test will not register any matches (I have also had this issue), however when you deploy the metric filter it will still register matches (at least mine did). Just keep in mind that there is no way (as far as I am aware) to run this metric filter BACKWARDS (ie. it will only capture data from when it is created). [If you're trying to get stats on past data, you're better off using log insight queries]
I am currently experimenting with different parse statements to try and extract data (its also a mix of JSON and text), this thread MAY help you (it didn't for me) Amazon Cloudwatch Logs Insights with JSON fields .
UPDATE!
I have found a way to parse the text but its a little bit clunky. If you export your cloudwatch logs using a lamda function to SumoLogic, their search tool allows for MUCH better log manipulation and lets you parse JSON fields (if you treat the entire entry as text). SumoLogic is also really helpful because you can just extract your search results as a CSV. For my purposes, I parse the entire log message in SumoLogic, extract all the logs as a CSV and then I used regex in Python to filter through and extract the values I need.
Let's say you have the following log
2021-09-29 15:51:18,624 [main] DEBUG com.company.app.SparkResources - AUDIT : {"user":"Raspoutine","method":"GET","pathInfo":"/analysis/123"}
you can parse it like this to be able to handle the part after "AUDIT : " as a JSON
fields #message
| parse #message "* [*] * * - AUDIT : *" as timestamp, thread, logLevel, clazz, msg
| filter ispresent(msg)
| filter method = "GET" # You can use fields which are contained in the JSON String of 'msg' field. Do not use 'msg.method' but directly 'method'
The fields contained in your isolated / parsed JSON field are automatically added as fields usable in the query
You can use CloudWatch Events for such purpose(aka Subscription Filters), what you will need to do is define a cloudwatch Rule which uses an expression statement to match your logs.
Here, I will let you do all the reading:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/Create-CloudWatch-Events-Scheduled-Rule.html
:)
Split the message into 3 fields and the 3rd field will be a valid json . I think in your case it would be
fields #timestamp, #message
| parse #message '[] * {"*"}' as field1, field2, field3
| limit 50
field3 is the valid json.
[INFO} will be the first field.
You can search JSON string representation, which is not as powerful.
For your example,
instead of { $.key = "lessonrecords-create" }
try "\"key\":\"lessonrecords-create\"".
This filter is not semantically identical to your requirement, though. It will also give events where key is not at the root of json.
you can use fluentd agent to send logs to Cloudwatch. Create custom grok pattern based on your metric filter.
Steps:
Install fluentd agent in your server
Install fluent-plugin-cloudwatch-logs plugin and fluent-plugin-grok-parser plugin
write your custom grok pattern based on your log format
please refer this blog for more information