Log parsing rules in Jenkins - regex

I'm using Jenkins log parser plugin to extract and display the build log.
The rule file looks like,
# Compiler Error
error /(?i) error:/
# Compiler Warning
warning /(?i) warning:/
Everything works fine but for some reasons, at the end of "Parsed Output Console", I see this message,
NOTE: Some bad parsing rules have been found:
Bad parsing rule: , Error:1
Bad parsing rule: , Error:1
This, I'm sure is a trivial issue but not able to figure it out at this moment.
Please help :)
EDIT:
Based on Kobi's answer and having looked into the "Parsing rules files", I fixed it this way (a single space after colon). This worked perfectly as expected.
# Compiler Error
error /(?i)error: /
# Compiler Warning
warning /(?i)warning: /

The Log Parser Plugin does not support spaces in your pattern.
This can be clearly seen in their source code:
final String ruleParts[] = parsingRule.split("\\s");
String regexp = ruleParts[1];
They should probably have used .split("\\s", 2).
As an alternative, you can use \s, \b, or an escape sequence - \u0020.

I had tried no spaces in the pattern, but that did not work.
Turns out that the Parsing Rules files does not support
empty lines in it. Once I removed the empty lines, I did not
get this "Bad parsing rule: , Error:1".
I think since the line is empty - it doesn't echo any rule after
the first colon. Would have been nice it the line number was
reported where the problem is.

Related

Varnish modsecurity rules syntax errors

I am trying to run modsecurity CRS within Varnish (varnish-6.0.3) but I have a problem with the rules 932106, 932150, 932105 and 932100 -> RCE related. The VCC compiler is throwing the syntax error (for example [rule 932106]):
if(req.url ~ "(?:;|\{|\||\|\||&|&&|\n|\r|\$\(|\$\(\(|`|\${|<\(|>\(|\(\s*\))\s*(?:{|\s*\(\s*|\w+=(?:[^\s]*|\$.*|\$.*|<.*|>.*|\'.*\'|\".*\")\s+|!\s*|\$)*\s*(?:'|\")*(?:[\?\*\[\]\(\)\-\|+\w'\"\.\/\\\\]+\/)?[\\\\'\"]*(?:(?:(?:a[\\\\'\"]*p[\\\\'\"]*t[\\\\'\"]*i[\\\\'\"]*t[\\\\'\"]*u[\\\\'\"]*d|u[\\\\'\"]*p[\\\\'\"]*2[\\\\'\"]*d[\\\\'\"]*a[\\\\'\"]*t)[\\\\'\"]*e|d[\\\\'\"]*n[\\\\'\"]*f|v[\\\\'\"]*i)[\\\\'\"]*(?:\s|<|>).*|p[\\\\'\"]*(?:a[\\\\'\"]*c[\\\\'\"]*m[\\\\'\"]*a[\\\\'\"]*n[\\\\'\"]*(?:\s|<|>).*|w[\\\\'\"]*d|s)|w[\\\\'\"]*(?:(?:\s|<|>).*|h[\\\\'\"]*o))\b{
------------------------------------------------------------------------------------------------------------------------------------------#-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
without any further explanation, which means the syntax error is within: \", this is kinda surprising as the " char is escaped.
Does anyone had similar issue in the past or have any idea how to solve it?
The regexes from the rulesets are not modified and are stable ones from CRS (v3.0).
I suggest you use long strings to reduce the risk of escaping issues.
This is what a long string looks like in Varnish:
{"Some string, including "double quotes""}
That way, you don't need to escape double quotes, and maybe that will solve your problem.

Aws CloudWatch filtering compact json with # in it

We use serilog to output from our .nrt core app. We are using compact json to keep size down. In compact it seems to put the error key with an # sign;
"#l": "Warning"
I can’t seem to get a filter working it either returns no results or says error. I’ve tried many things but I’m sure this should work;
{ $.#l = "Warning" }
Anyone suggest where I’m going wrong.
I don't think you can use # in the selector. From the docs:
Property selectors are alphanumeric strings that also support '-' and '_' characters.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html#extract-json-log-event-values
One way to get around this would be to match the line as if it's not part of json.
For example, if your log line looks like this:
"#l": "Warning"
you could filter it out with:
[key="#l", colon, value=Warning]
I had the same issue. Most likely you used Serilog.Formatting.Compact.CompactJsonFormatter as me.
Implementing own ITextFormatter is a workaround because prefixes like # or $ are hardcoded inside CompactJsonFormatter.
I used CompactJsonFormatter as a basis, replaced there usage of #, $ by s_ and it works.

Go validator.v2 gives error "unknown tag" for regexp

I have been trying to understand why some regular expressions give me an error "Unknown tag" while using the validator.v2 package in golang. It works for some regular expressions but does not work with some which have "{}" inside them, and when I use the validator.Validate() it gives me an error at runtime "unknown tag".
Here's the code:
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]{1,100}$"`
}
which gives me the following error at runtime:
Name: unknown tag
however this regex works perfectly fine
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]*$"`
}
I am using the braces because of length restrictions that I want to put on the string. There could be other ways to do it, but I feel the regex is the way to go and is easier to have it along with other rules right there in the expression.
The problem appears to be the , char in your first regex. You can see in the validator source code that the tag is split on ,. By UTSLing, I see no support for escaped commas in the tags; this is probably an oversight on the part of the project author. I suggest filing a bug/feature request.
The problem pointed out by #Flimzy is correct, but it's not a bug.
Probably it was fixed since then, so at the moment validator supports escape sequence \\ for such case, and you can do it like this:
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]{1\\,100}$"`
}

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

RGoogleAnalytics replacing unexpectedly escaped characters with gsub

I'm using RGoogleAnalytics, I'm just at the learning stage at the moment.
I'm following the code in the tutorial here https://code.google.com/p/r-google-analytics/
But when I try to run
ga.goals <- conf$GetGoals()
ga.goals
I get an error message telling me there is an unexpected escaped character '\.' at pos 7
I get a similar message for the next two lines of code (GetSegments)
This question deals with a similar problems in the Facebook Graphs API
How to replace "unexpected escaped character" in R
I've tried using a similar bit of code
confGoalsSub <- gsub('\\.', ' ', conf$GetGoals())
to remove the escaped characters, but I get another error :
cannot coerce type 'closure' to vector of type 'character'
Out of desperation I have tried confGoalsSub <- gsub('\\.', ' ', conf) which returns a character vector that is just garbage (it's just the code for conf with the decimal points stripped out).
Can anyone suggest a better expression than gsub that will return a useful object?
EDIT: As per the suggestion below I've now added the brackets at the end of the function call but I still get the same error message about unexpected escape characters. I get the same error when I try to call other, similar function such as $GetSegments().
I saw on one video at the weekend that this package was broken for a long time, although the speaker did not provide details as to why. Perhaps I should give up and try one of the other Google Analytics packages in R.
Seems odd, given that this one is supposed to be Google supported.
I think this error arises when the RJSON library isn't able to parse the Google Analytics Data Feed properly and convert it into a nested list. The updated version of [RGoogleAnalytics] (http://cran.r-project.org/web/packages/RGoogleAnalytics/index.html) fixes this problem. Currently, you won't be able to retrieve Goals and Segments from your Google Analytics account using the library but beyond that it supports the full range of dimensions and metrics.