Graylog regex search with numbers in text - regex

I use graylog 2.0 (http://docs.graylog.org/en/2.0/pages/queries.html) and it's super useful.
I want to refine my full_message search.
Currently I'm:
- searching graylog for all full_message occurrences of the start of the string
- I then export this to excel
- Split the text (text to columns)
- Apply an autofilter
- Filter for any times > 20
search pattern:
full_message: "Running queue with*"
search text:
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 1 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 5 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 25 items
Network Queue: Running queue with id: dd82c225-fab7-44ce-9618-67d1ef332a03 and 200 items
I'm wondering if a better reg search could just list any reccord with items > 20.
e.g. the search string would be
full_message: "Running queue with [insert better regex here]"
Thanks

You can use the pattern
Running queue with id: \S+ and (?:\d{3,}|[3-9]\d|2[1-9])
The final group there allows for either:
\d{3,} Any number with three or more digits, or
[3-9]\d Any number 30-99, or
2[1-9] Any number 21-29
https://regex101.com/r/ctLvQD/1

Related

Parse lines from messages in a Splunk query to be displayed as a chart on a dashboard

I generate events on multiple computers that list service names that aren't running. I want to make a chart that displays the top offending service names.
I can use the following to get a table for the dashboard:
ComputerName="*.ourDomain.com" sourcetype="WinEventLog:Application" EventCode=7223 SourceName="internalSystem"
| eval Date_Time=strftime(_time, "%Y-%m-%d %H:%M")
| table host, Date_Time, Message, EventCode
Typical Message(s) will contain:
The following services were not running after 5603 seconds and a start command has been sent:
Service1
Service2
The following services were not running after 985 seconds and a start command has been sent:
Service2
Service3
Using regex I can make a named group of everything but the first line with (?<Services>((?<=\n)).*)
However, I don't think this is the right approach as I don't know how to do a valuation for the chart with this information.
So in essence, how do I grab and tally service names from messages in Splunk?
Edit 1:
Coming back to this after a few days.
I created a field extraction called "Services" with regex that grabs the contents of each message after the first line.
If I use | stats count BY Services it counts each message as a whole instead of the lines inside. The results look like this:
Service1 Service2 | Count: 1
Service2 Service3 | Count: 1
My intention is to have it treat each line as its own value so the results would look like:
Service1 | Count: 1
Service2 | Count: 2
Service3 | Count: 1
I tried | mvexpand Services but it didn't change the output so I assume I'm either using it improperly or it's not applicable here.
I think you can do it with the stats command.
| stats count by service
will give a number of appearances for each service. You then can choose the bar chart visualization to create a graph.
I ended up using split() and mvexpand to solve this problem.
This is what worked in the end:
My search
| eval events=split(Service, "
")
| mvexpand events
| eval events=replace(events, "[\n\r]", "")
| stats count BY events
I had to add the replace() method because any event with just one service listed was being treated differently from an event with multiple, after the split on an event with multiple services each service had a carriage return, hence the replace.
My end result dashboard chart:
For Chart dropping down that is clean:
index="yourIndex" "<searchCriteria>" | stats count(eval(searchmatch("
<searchCriteria>"))) as TotalCount
count(eval(searchmatch("search1"))) as Name1
count(eval(searchmatch("search2" ))) as Name2
count(eval(searchmatch("search3"))) as Name3
| transpose 5
| rename column as "Name", "row 1" as "Count"
Horizontal table example with percentages:
index=something "Barcode_Fail" OR "Barcode_Success" | stats
count(eval(searchmatch("Barcode_Success"))) as SuccessCount
count(eval(searchmatch("Barcode_Fail"))) as FailureCount
count(eval(searchmatch("Barcode_*"))) as Totals | eval
Failure_Rate=FailureCount/Totals |eval Success_Rate=SuccessCount/Totals

Get a string after a specific word, using a program that has limited regex features?

Looking for help on building a regex that captures a 1-line string after a specific word.
The challenge I'm running into is that the program where I need to build this regex uses a single line format, in other words dot matches new line. So the formula I created isn't working. See more details below. Any advice or tips?
More specific regex task:
I'm trying to grab the line that comes after the word Details from entries like below. The goal is pull out 100% Silk, or 100% Velvet. This is the material of the product that always comes after Details.
Raw data:
<p>Loose fitted blouse green/yellow lily print.
V-neck opening with a closure string.
Small tie string on left side of top.</p>
<h3>Details</h3> <p>100% Silk.</p>
<p>Made in Portugal.</p> <h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p> <p>Size 34 measurements</p>
OR
<p>The velvet version of this dress. High waist fit with hook and zipper closure.
Seams run along edges of pants to create a box-like.</p>
<h3>Details</h3> <p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
Here is the current formula I created that's not working:
Replace (.)(\bDetails\s+(.)) with $3
The output gives the below:
<p>100% Silk.</p>
<p>Made in Portugal.</p>
<h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p>
<p>Size 34 measurements</p>
OR
<p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
`
How do I capture just the desired string? Let me know if you have any tips! Thank you!
Difficult to provide a working solution in your situation as you mention your program has "limited regex features" but don't explain what limitations.
Here is a Regex you can try to work with to capture the target string
^(?:<h3>Details<\/h3>)(.*)$
I would personally use BeautifulSoup for something like this, but here are two solutions you could use:
Match the line after "Details", then pull out the data.
matches = re.findall('(?<=Details<).*$', text)
matches = [i.strip('<>') for i in matches]
matches = [i.split('<')[0] for i in [j.split('>')[-1] for j in matches]]
Replace "Details<...>data" with "Detailsdata", then find the data.
text = re.sub('Details<.*?<.*>', '', text)
matches = re.findall('(?<=Details).*?(?=<)', text)

Data Preperation Identify String using Regex and move to new column

Hello I am using Talend to prepare product data for import into DB. I want to use the extract string parts function for Talend.
I have the following data in one cell. (The length of the data varies not a fixed width format)
Measurement: Ring Head Width: 6.8 Ring Height: 5.5 Ring Shank Width: 1.1 Ladies Band Width: 2.5 Ladies band shank Width: 1.2
I need help creating a regex format to match each measurement value and extract it to a new column.
What would be Regex to match the following text ?
Ring Head Width: 6.8
and extract the numeric value following it, which is
6.8
Similarly I want to create regex for all the above measurements. I am assuming the format will be the same.
Thank for your time and help.
If you don't bother using multiple actions to acheive this result I suggest that you use:
the "Split text in parts" action on ":"
and then use "remove whitespaces" to have a clean value.
If you really need to keep one action, you have the "Remove part of the text" action on regex that is based on the java Pattern.
Using regex ".*:\s" works fine

How to use window.frequent?

Anybody can give me a example about how to use window.frequent?
For example,
I write a test,
"define stream cseEventStream (symbol string, price float, time long);" +
"" +
"#info(name = 'query1') " +
"from cseEventStream[700 > price]#window.frequent(3, symbol) " +
"select symbol, price, time " +
"insert expired events into outputStream;";
But from the outputStream, i can't find out the rule.
Thanks.
In this particular query 'window.frequent(3, symbol)' will make the query to find the most frequent 3 symbols(or 3 symbols that has the highest number of occurrences). But, when you insert events to outputStream you have inserted only expired events. So that, as the end result this query will output events that are expired from the frequent window.
In a frequent window, expired events are events that are not belonging to a frequent group anymore. In this case events which are the symbol is not among 3 symbols that has the highest number of occurrences.
for an example if you send the following sequence of events,
{"symbolA", 71.36f, 100}
{"symbolB", 72.36f, 100}
{"symbolB", 74.36f, 100}
{"symbolC", 73.36f, 100}
{"symbolC", 76.36f, 100}
{"symbolD", 76.36f, 100}
{"symbolD", 76.36f, 100}
The query will output {"symbolA", 71.36f, 100}.
When you send the events with 'symbolD'. SymbolA will not be among the top3 symbols with highest number of occurrences anymore so that event with symbolA is expired and {"symbolA", 71.36f, 100} is emitted.
For every event which has a price > 700, this window will retain most frequent 3 items based on symbol and since the output type is 'expired events' you will only receive output once an event loose it's position as a frequent event.
Ex: for frequent window of size 2
Input
WSO2 1000 1
WSO2 1000 2
ABC 700 3
XYZ 800 4
Output
ABC 700 3
ABC event was in the frequent window and was expired upon receiving of XYZ event. If you use default output which is 'current events' it will output all incoming events which are selected as frequent events and put into the window.
Implementation is based on Misra-Gries counting algorithm.
Documentation : https://docs.wso2.com/display/CEP400/Inbuilt+Windows#InbuiltWindows-frequent
Test cases : https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/test/java/org/wso2/siddhi/core/query/window/FrequentWindowTestCase.java

How to parse through a string in perl to extract certain value?

I have following string
> show box detail
2 boxes:
1) Box ID: 1
IP: 127.0.0.1
Interface: 1/1
Priority: 31
2) Box ID: 2
IP: 192.68.1.1
Interface: 1/2
Priority: 31
How to get BOX ID from above string in perl?
The number of boxes here can vary . So based on the number of boxes "n", how to extract box Ids if the show box detail can go upto n nodes in the same format ?
my #ids = $string =~ /Box ID: ([0-9]+)/g;
More restrictive:
my #ids = $string =~ /^[0-9]+\) Box ID: ([0-9]+)$/mg;