How to format date in Logstash Configuration - regex

I am using logstash to parse log entries from an input log file.
LogLine:
TID: [0] [] [2016-05-30 23:02:02,602] INFO {org.wso2.carbon.registry.core.jdbc.EmbeddedRegistryService} - Configured Registry in 572ms {org.wso2.carbon.registry.core.jdbc.EmbeddedRegistryService}
Grok Pattern:
TID:%{SPACE}\[%{INT:SourceSystemId}\]%{SPACE}\[%{DATA:ProcessName}\]%{SPACE}\[%{TIMESTAMP_ISO8601:TimeStamp}\]%{SPACE}%{LOGLEVEL:MessageType}%{SPACE}{%{JAVACLASS:MessageTitle}}%{SPACE}-%{SPACE}%{GREEDYDATA:Message}
My grok pattern is working fine. I am sending these parse entries to an rest base api made by myself.
Configurations:
output {
stdout { }
http {
url => "http://localhost:8086/messages"
http_method => "post"
format => "json"
mapping => ["TimeStamp","%{TimeStamp}","CorrelationId","986565","Severity","NORMAL","MessageType","%{MessageType}","MessageTitle","%{MessageTitle}","Message","%{Message}"]
}
}
In the current output, I am getting the date as it is parsed from the logs:
Current Output:
{
"TimeStamp": "2016-05-30 23:02:02,602"
}
Problem Statement:
But the problem is that my API is not expecting the date in such format, it is expecting the date in generic xsd type i.e datetime format. Also, as mentioned below:
Expected Output:
{
"TimeStamp": "2016-05-30T23:02:02:602"
}
Can somebody please guide me, what changes I have to add in my filter or output mapping to achieve this goal.

In order to transform
2016-05-30 23:02:02,602
to the XSD datetime format
2016-05-30T23:02:02.602
you can simply add a mutate/gsub filter in order to replace the space character with a T and the , with a .
filter {
mutate {
gsub => [
"TimeStamp", "\s", "T",
"TimeStamp", ",", "."
]
}
}

Related

How do I write a query for a json in logs insights?

I have a simple message in the form of json like below in one of the log group. The query that I use is {$.level = "INFO"} This doesn't bring up any result. What could be the problem? Can somebody help please.
{
"level": "INFO",
"location": "lambda_handler:31",
"message": {
"msg": "abc",
"event": {
"Records": [
{
.
.
.
}]
}
}
}
Now CloudWatch Log Insights allows to filter based on json fields.
The sintax is as following:
Filter based on field 'level'
filter level = 'INFO'
| display level, #message
Filter based on nested fields
filter message.msg != '123'
| display message.msg, #message
Documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html#CWL_AnalyzeLogData-discoverable-JSON-logs

AWS Sagemaker Monitoring Error: Encoding mismatch

I am trying to run model monitoring on a model in AWS Sagemaker. The monitoring jobs are failing due to " Encoding mismatch: Encoding is JSON for endpointInput, but Encoding is base64 for endpointOutput. We currently only support the same type of input and output encoding at the moment."
Encoding is JSON for endpointInput and base64 for endpointOutput but the expected is json for both input and output.
I tried using the json_content_types in the DataCaptureConfig but the endpointOutput is still in base64 encoded.
Below is my DataCaptureConfig which i used in the deploy :
data_capture_config=DataCaptureConfig(
enable_capture = True,
sampling_percentage=100,
json_content_types = 'application/json',
destination_s3_uri=MY_BUCKET)
My capture files from the model looks something like this:
{
"captureData": {
"endpointInput": {
"observedContentType": "application/json",
"mode": "INPUT",
"data": "{ === json data ===}",
"encoding": "JSON"
},
"endpointOutput": {
"observedContentType": "*/*",
"mode": "OUTPUT",
"data": "{====base 64 encoded output ===}",
"encoding": "BASE64"
}
},
"eventMetadata": {
=== some metadata ===
}
I have observed that the output content type is not being recognized as the json/application.
So I need a workaround/procedure to get the output in the json encoded form.
Please help to the get JSON encoding for both input and output data.
Similar issue reported here , but there is no response.
I have come across a similar issue earlier while invoking the endpoint using boto3 sagemaker-runtime. Try adding the 'Accept' request parameter in invoke_endpoint function with value as 'application/json'.
refer for more help https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_RequestSyntax
While deploying the endpoint please set the CaptureContentTypeHeader in the DataCaptureConfig and appropriately map the output either Json ontentTypes or CsvContentTypes.
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CaptureContentTypeHeader.html
Doing this would set the encoding accordingly . If this is not set the default is base64 encoding and hence the issue .

Not able to get desired search results in ElasticSearch search api

I have field "xyz" on which i want to search. The type of the field is keyword. The different values of the field "xyz "are -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Now for the following query -
{
"query": {
"query_string" : {
"query" : "(xyz:(\"a/b/c\"*))"
}
}
}
I should only get these two results -
a/b/c/d
a/b/c/e
but i get all the four results -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Edit -
Actually i am not directly querying on ElasticSearch, I am using this API https://atlas.apache.org/api/v2/resource_DiscoveryREST.html#resource_DiscoveryREST_searchWithParameters_POST which creates the above mentioned query for elasticsearch, so i dont have much control over the elasticsearch query_string. What i can change is the elasticsearch analyzer for this field or it's type.
You'll need to let the query_string parser know you'll be using regex so wrap the whole thing in /.../ and escape the forward slashes:
{
"query": {
"query_string": {
"query": "xyz:/(a\\/b\\/c\\/.*)/"
}
}
}
Or, you might as well use a regexp query:
{
"query": {
"regexp": {
"xyz": "a/b/c/.*"
}
}
}

WSO2DSS 3.5.1 - Issue in JSON response from a RESTful service

I have defined a restful service with nested queries. Output mapping is defined in XML. I get proper response as XML. But if I request a JSON response using Accept:Application/json I get
{
"Fault": {
"faultcode": "soapenv:Server",
"faultstring": "Error while writing to the output stream using JsonWriter",
"detail": ""
}
}
I was getting below exception in 3.5.0 and I found a jira saying it is fixed in 3.5.1. So I tried in 3.5.1 now I am not getting below exception but the same output.
javax.xml.stream.XMLStreamException: Invalid Staring element
Please note I have also tried the escapeNonPrintableChar="true" option in my queries but no use. Strange thing is it working for different data sets. Just one particular data set is throwing this output.
I have changed the JSON formatters as below and got it to work but there is a problem in that.
<messageFormatter contentType="application/json" class="org.apache.axis2.json.JSONMessageFormatter"/>
<!--messageFormatter contentType="application/json" class="org.apache.axis2.json.gson.JsonFormatter" / -->
<messageBuilder contentType="application/json" class="org.apache.axis2.json.JSONOMBuilder"/>
<!--messageBuilder contentType="application/json" class="org.apache.axis2.json.gson.JsonBuilder" /-->
If I use above formatter the null values are not represented properly. Like I get
"Person": {
"Name": {
"#nil": "true"
}
but I want it as (like the other JSON formatter used to give)
"Person": {
"Name": null
}
Any help please. Is there a bug still left in this area?
When you are creating the query in your output response, you define the format you want to receive the response you can select xml or json, in the case you mention you can select the json option, then select generate response this creates this json structure.
{
"entries": {
"entry": [
{
"field1": "$column1",
"field2": "$column2"
}
]
}
}
Then you can modify the answer you need with your fields. Here is an example of how I use it in my query
{
"Pharmacies": {
"Pharmacy": [
{
"ID": "$Id",
"Descripcion": "$Desc",
"Latitude": "$Latitude",
"Longitude": "$Longitude",
"Image": "$Image"
}
]
}
}
The values ​​with "$" are correspond to the name of the column of the query
Regards

good resources for grok patterns for python log file

I want to use logtash for parsing python log files , where can i find the resources that help me in doing that. For example:
20131113T052627.769: myapp.py: 240: INFO: User Niranjan Logged-in
In this I need to capture the time information and also some data information.
I had exactly the same problem/need. I couldn't really find a solution to this. No available grok patterns really matched the python logging output, so I simply went ahead and wrote a custom grok pattern which I've added naively into patterns/grok-patterns.
DATESTAMP_PYTHON %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND},%{INT}
The logstash configuration I wrote gave me nice fields.
#timestamp
level
message
Added some extra field which I called pymodule which should show you the python module that was producing the log entry.
My logstash configuration file looks like this (ignore the sincedb_path this is simple a manner of forcing logstash to read the entire log file everytime you run it):
input {
file {
path => "/tmp/logging_file"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => [
"message", "%{DATESTAMP_PYTHON:timestamp} - %{DATA:pymodule} - %{LOGLEVEL:level} - %{GREEDYDATA:logmessage}" ]
}
mutate {
rename => [ "logmessage", "message" ]
}
date {
timezone => "Europe/Luxembourg"
locale => "en"
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
output {
stdout {
codec => json
}
}
Please note that
I give absolutely no guarantee that this is the best or even an
slightly acceptable solution.
Our Python log file has a slightly different format:
[2014-10-08 19:05:02,846] (6715) DEBUG:Our debug message here
So I was able to create a configuration file without any need for special patterns:
input {
file {
path => "/path/to/python.log"
start_position => "beginning"
}
}
filter {
grok {
match => [
"message", "\[%{TIMESTAMP_ISO8601:timestamp}\] \(%{DATA:pyid}\) %{LOGLEVEL:level}\:%{GREEDYDATA:logmessage}" ]
}
mutate {
rename => [ "logmessage", "message" ]
}
date {
timezone => "Europe/London"
locale => "en"
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
output {
elasticsearch {
host => localhost
}
stdout {
codec => rubydebug
}
}
And this seems to work fine.