fluentd multiline parser in parser filter - regex

I'm trying to parse multiline logs from my applications in fluentd on kubernetes.
I currently have the following filter dropped-in my fluentd container:
<filter kubernetes.**>
#type parser
key_name log
emit_invalid_record_to_error false # do not fail on non-matching log messages
reserve_data true # keep the log key (needed for non-matching records)
<parse>
#type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\s+(?<level>\S+)(?:\s+\[[^\]]*\])?\s+(?<pid>\d+)\s+---\s+\[\s*(?<thread>[^\]]+)\]\s+(?<class>\S+)\s+:\s+(?<message>.*)/
time_format %Y-%m-%d %H:%M:%S.%L
types pid:integer
</parse>
</filter>
This filter should parse spring boot style logs (which is not that important, as it is not working for my other filters as well).
Single line logs are parsed fine! All capture groups are detected and time format and pid type is also saved as an integer. But in case of a multi line log statement, the next line is just left as it is and saved as its own entry.
I got the idea for this parser from the fluentd documentation: https://docs.fluentd.org/parser/multiline
The documentation says currently in_tail plugin works with multiline but other input plugins do not work with it.
The container I'm using uses the in_tail plugin to get the logs. But I'm using the parser inside a filter. Not sure if this might be the problem? In the documentation the parser filter plugin (https://docs.fluentd.org/filter/parser) just links to the Parser Plugin Overview (https://docs.fluentd.org/parser) without mentioning anything about single parsers not working.
Would be great if someone could point me into the right direction!
Thanks in advance!

I came recently to exactly the same issue and still couldn't find obvious solution so I had to figure it out myself. It is exactly as it is in doc - this parser you mentioned works only as Parser section in Input plugin ('in_tail' only). It doesn't work in filter plugin unfortunately.
But for me this plugin helped:
https://github.com/fluent-plugins-nursery/fluent-plugin-concat
You just have to add one filter section above your main one where you do this concat, e.g. my example looks exactly like this (indicator of real new log is timestamp, if there is no timestamp it is always stacktrace of errors where the problem appears):
<filter XYZ.**>
#type concat
key log
multiline_start_regexp /\d{4}-\d{1,2}-\d{1,2}/
</filter>
<filter>
# here the original filter
</filter>

Related

Regex Filter Error in google_logging_project_sink Terraform Script

I'm trying to create a Cloud Logging Sink with Terraform, that contains a regex as part of the filter.
textPayload=~ '^The request'
There have been many errors around the format of the regex, and I can't see anything in the documentation or other SO questions on how to properly create the script. Sinks are also not a valid option for a script generated by Terraformer, so I can't export the filter created via the UI
When including the regex as a standard string, the following error is thrown.
Unparseable filter: regular expressions must begin and end with '"' at line 1, column 106, token ''^The',
And when included as a variable with and without slash escapes variable "search" { default = "/^The request/" }
there is the following:
Unparseable filter: unrecognized node at token 'MEMBER'
I'd be grateful for any tips, or links to documentation on how I would be able to include a regex as part of a logging filter.
The problem is not with your query, which is obviously a valid query to search google cloud logging. I think it is due to the fact that you are using another provider (Terraform) to deploy everything. Which will transform your string values and pass them to GCP as a JSON. We ran into a similar issue and it caused me some headaches as well. What we came up with was the following:
"severity>=ERROR AND NOT protoPayload.#type=\"type.googleapis.com/google.cloud.audit.AuditLog\" AND NOT (resource.type=\"cloud_scheduler_job\" AND jsonPayload.status=\"UNKNOWN\")"
Applying this logic to your query:
filter = "textPayload=~\"^The request\""
Another option is to exclude the quotes:
filter = "textPayload=~^The request"

SoapUI xPath match tests

I'm writing some tests for my web service right now and can't find a lot of information regarding xPath match and Contains. Looking for examples as well.
1) For example, I would like to check if the date has format YYYY-MM-DD.
Do i have to write regex expression in expected result?
http://prntscr.com/jhlxml
2) How can I check if answer equals to one of allowed values (using xsd enumeration)?
http://prntscr.com/jhm07g
If you have control over the WSDL, those sorts of format and simple content validations can be built into the XML Schema within your WSDL that defines your response messages. Then, just use a Schema Compliance assertion on your test step.

Grok Parse Failure on Custom Log Format and regex in logstash

I have a custom log format ,i am new to it so trying to figure out how it works . It is not getting parsed in logstash .Can someone help to identify the issue.
Logformat is as follows
{u'key_id': u'1sdfasdfvaa/sd456dfdffas/zasder==', u'type': u'AUDIO'}, {u'key_id': u'iu-dsfaz+ka/q1sdfQ==', u'type': u'HD'}], u'model': u'Level1', u'license_metadata': {u'license_type': u'STREAMING THE SET', u'request_type': u'NEW', u'content_id': u'AAAA='}, u'message_type': u'LICENSE', u'cert_serial_number': u'AAAASSSSEERRTTYUUIIOOOasa='}
I need to get it parsed in logstash and then store it in elasticsearch
The problem is the none of the existing grok pattern are taking care of it and i am unaware of regex custom config
Alain's comment may be useful to you, if that log is, in fact, coming in as JSON you may want to look at the JSON Filter to automajically parse a JSON message into an elastic friendly format or using the JSON Codec in your input.
If you want to stick with grok, a great resource for building custom grok patterns is Grok Constructor.
It seems like you're dumping a json hash from python 2.x to a logfile, and then trying to parse it from logstash.
First - Fix your json format and encoding:
Your file doesn't correclty generated json strings. My recommendation is to fix it on your application before trying to consume the data from Logstash, if not you'll have to make use of some tricks to do it from there:
# Disable accii default charset and encode to UTF-8
js_string = json.dumps(u"someCharactersHere", ensure_ascii=False).encode('utf8')
# validate that your new string is correct
print js_string
Second - Use the Logstash JSON filter
Grok is module intended to parse any kind of text using regular expressions. Every expression converts to a variable, and those variable can be converted to event fields. You could do it, but it will be much more complex and prune to errors.
Your input has a format already (json), so you can make use of Logstash JSON Filter. It will do all the heavy lifting for you by converting the json structure into fields:
filter {
json {
# this is your default input. you shouldn't need to touch it
source => "message"
# you can map the result into a variable. Simply uncomment the
# following:
# target => "doc"
# note: if you don't use the target option. the filter will try to
# map the json string into fields into the 'root' of your event
}
}
Hope it helps,

Jenkins Groovy to extract Regex from the Current build log and call REST API?

I would like to add a build step that summarizes my build, using Groovy to read the build log that was generated to date.
I've seen several other questions on SO about related topics but not all of them run, I'm a bit confused on the API docs, and overall I can't seem to get this exact thing running.
Below is the code/resultant failure I have currently.
I have a few questions, if it is ok to put them all together here;
1.Is it safe to test things in the console window? Or, stated differently, when can it be that something works in the /script Groovy console editor window, but it will fail as a Groovy build step? (I think the API differs for the two but I'm not clear how.)
2.Is there a repo anywhere of Groovy Jenkins script examples?
3.How can I do the following?
Read the console log.
Parse it with regex for words of interest, eg "step#2 success".
Rearrange those words into a nice string with some newlines.
Call our internal REST API to submit the results.
thank you so much!
Anne
//Groovy command from SO Post#23139654
def log = manager.build.logFile.text
def summary = log =~ /(?ms)(TEST SUMMARY.*?failures)/
//From there you can extract the matches or as in my case further parse the match:
def total = summary[0] =~ /\d+ tests/
Result includes;
ERROR: Build step failed with exception
groovy.lang.MissingPropertyException: No such property: manager for class: Script1
Here are my answers.
1.Groovy console vs Groovy build step differ as per Jenkins Packages on Groovy Classpath?
2.Examples are available from a user in 2011 and unidentified dates on the wiki: "Jenkins, Groovy System scripts (and Maven) | Code Snippets" https://mriet.wordpress.com/2011/06/23/groovy-jenkins-system-script/
and https://wiki.jenkins-ci.org/display/JENKINS/Jenkins+Script+Console
3.To parse the console log and grep outputs , simply input into the web box provided as input for the Editable Email plugin [4] post-build step.
Do NOT use dollar-curlyBrace syntax: use simple dollar-variable or dollar-paren syntax as shown here which is my first crack at the 'Default Content'.
STATUS=$BUILD_STATUS
$DEFAULT_CONTENT
GIT Changelog , Revision = $GIT_REVISION
$CHANGES
LOG-SNIPPETS: Regex Hits/Rules for words that give Unit Test Summaries, Error, Failure, etc =
$BUILD_LOG_REGEX( regex="^.*?BUILD FAILED.*?$", linesBefore=0, linesAfter=10, maxMatches=5, showTruncatedLines=false, escapeHtml=true)
3B.To call the REST plugin, that would need to be done in a separate step, so for now I did not do that.
I had not properly understood the Email-Ext (aka "Editable Email Notification") plugin - that is why I was trying to do this directly in Groovy.
4.[] Email-ext plugin - Jenkins - Jenkins Wiki ; ; https://wiki.jenkins-ci.org/display/JENKINS/Email-ext+plugin

How will I filter out only errors in Jenkins-email-ext, BUILD_LOG_REGEX?

Currently I m using BUILD_LOG_REGEX in Jenkins Editable email information to get a log of the errors via email. But I get a lot of junk and I want to filter out the errors and I want the log of errors filtered to perfection. Any help?
Your question is rather non-specific. As Juuso Ohtonen notes in a comment, what you do highly depends on what can be usually found in your log. Here's an example of what we use in one of our jobs, it is rather generic (if not to say minimalistic):
${BUILD_LOG_REGEX, regex="^.*?BUILD FAILED.*?$", linesBefore=0, linesAfter=10, maxMatches=5, showTruncatedLines=false, escapeHtml=true}
I would suggest the following: create a job that logs some text that contains types of errors you encounter (you may just spew some text file that you place in the job's workspace), then play with Java regex patterns - java.util.regex.Pattern - in the Plugin until you get the desired result. Make sure you send the e-mails from the job only to yourself :)
To use custom HTML - here's a quote from the Plugin's Content Token reference:
${JELLY_SCRIPT, template} - Custom message content generated from a Jelly script
template. There are two templates provided: "html" and "text". Custom Jelly templates
should be placed in $JENKINS_HOME/email-templates. When using custom templates, the
template filename without ".jelly" should be used for the "template" argument.
template - the template name. Defaults to "html".
The default template that you can use as your starting point is located in
$JENKINS_HOME/plugins/email-ext/WEB-INF/classes/hudson/plugins/emailext/templates/html.jelly