How to ignore a specific sub-string from Splunk query - regex

Need some help to generate appropriate Spunk query. I am searching for this but could not come up with a solution.
Currently, I want to ignore all error alerts that are generated for logs with only ev31=error; term. If we use NOT ev31=error; in search query, it also removes results with valid error terms. So the current query will fail in case log contains both error and ev31=error; terms resulting in incorrect results.
Can anyone suggest a example query, where we can ignore ev31=error; term altogether but keep logs with error term.

Try including the string you want to ignore in quotes, so your search might look something like index=myIndex NOT "ev31=error"

Related

How to I make gerrit query that spans across few specific projects?

I tried for few hours to find the right syntax for making a regex query that returns reviews from 2-3 different projects but I failed and decided to crowdsource the task ;)
The search is documented at https://review.openstack.org/Documentation/user-search.html and mentions possible use of REGEX,... but it just didn't work.
Task: return all CRs from openstack-infra/gerritlib and openstack-infra/git-review projects from https://review.openstack.org
Doing it for one project works well project:openstack-infra/gerritlib
Ideally I would like to look for somethign like ^openstack-infra\/(gerritlib|git-review), or at least this is the standard regex syntax.
Still, I found impossible to use parentheses so far, every time I used them it stopped it from returning any results.
1) You don't need to escape the "/" character.
2) You need to use double quotes to make the parentheses work.
So the following search should work for you:
project:"^openstack-infra/(gerritlib|git-review)"

Regex for Notepad++

I have a log for an app that executes every minute or so and at the end it reports number of records it failed with like this: TaskBlahBlah: [0] records failed
Up until now I would simply search through the whole document for ] records failed string and would visually identify the lines with greater then zero records. Is there a way to use regex and search for any non zero value specifically so I don't have to visually go through the list and potentially miss something?
I tried applying some regex to Notepad++ but it seems that either I did it wrong or Noteppad++ has a 'different' regex or something.
thank you
EDIT: just to list some of the things I tried:
[1-9][0-9]|[1-9]
\[[1-9][0-9]\] records failed|\[[1-9]\] records failed
For some reason it picks up things like [1] records failed but not [10] records failed
I guess this should get what you want:
/\[[1-9]\d*\] records failed/

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

Is there a way to search terms in order with RegexpQuery in lucene?

I would like to search my indexed documents in order using RegexpQuery.
For example I have 2 Document
text: Oracle unveils better than expected quarterly results.
text: Research In Motion shares gained almost 13 per cent on the Toronto Stock Exchange Friday, a day after the smartphone maker posted better than expected quarterly results.
So far I tried this but I got no luck.
Query regexq = new RegexpQuery(new Term("text", "^.+better.+quarterly.+results"));
Is there another way of implementing this?
Thanks
I believe a PhraseQuery fits what you are looking for better. You can use PhraseQuery.setSlop(int) to allow terms to appear between the terms of the query. This would like like:
Query pq = new PhraseQuery();
pq.add(new Term("text", "better"));
pq.add(new Term("text", "quarterly"));
pq.add(new Term("text", "results"));
pq.setSlop(10); //Or whatever is an appropriate slop value for you.
This sort of query is also supported by the standard QueryParser, as seen here, like:
text:"better quarterly results"~10
I think a PhraseQuery is most definitely the better implementation here, but...
Regarding RegexpQuery:
I believe it is intended to compare terms against the regex, and since the phrase you are searching for (I am assuming) is tokenized, no single Term matches your whole regex. You would need to index the entire field as a single Term to make this work, using StringField, KeywordAnalyzer, or similar.
I believe it works like Matcher.matches(), rather than Matcher.find(), which is to say, it must match the entire input term, rather than a portion of it. So, if you had specified "text" as a StringField, you would need to add a .* to the end to consume the rest of the input.
On a similar note, I'm not sure if it supports the use of the character "^" as the start of input, being that it is redundant in that case. I don't see it specified in Lucene's Regexp, but I have seen reference to it's use, so I'm not sure whether it would be accepted or not.
To summarize, a RegexpQuery could work like:
Query regexq = new RegexpQuery(new Term("text", ".+better.+quarterly.+results.*"));
If you used a StringField, or KeywordAnalyzer index the entire field as a single Term.
With the leading wildcard in your regexp, though, you could expect very poor performance from it (See the warning at the top of the RegexpQuery documentation).

graph api for search is not functioning if the search pattern is a single word

The graph api for search is not functioning if the search pattern is a single word.It gives empty result.
eg:
https://graph.facebook.com/search?q=mark&type=user
If you use two word query like below, you will get the result.
https://graph.facebook.com/search?q=ho hie&type=user
Even if it is URL encoded, like "ho%20%hie", the result will be empty.
Can anybody please help me out for finding the reason