regex exclude hit if text contains a second string match - regex

I am trying to search thru log files to see if any warnings have appeared so that I can warn in a Jenkins pipeline using Jenkins plug in "Text Finder".
However, I have a case where I do not want hits on the string "CRIT" int he logfile if the string also contains plms.
E.g.
I have the following text in the log file:
<CRIT> 23-Jun-2014::10:57:13.649 Upgrade committed
<CRIT> 23-Jun-2014::10:57:13.703 no registration found for callpoint plmsView/get_next of type=external
I am not interested in having a warning for the second line, so I have added the following regex to Text Finder in Jenkins:
WARN|ERROR|<ERR>|/^(?=<CRIT>)(?=^(?:(?!plms).)*$).*$/
This should get a hit on CRIT only if the string does not also contain plms, i.e the first line, but I do not get a hit on either line.
I got the code from here: Combine Regexp
Could someone please help me correct this? Thanks!

You should use something like this:
WARN|ERROR|<ERR>|<CRIT>(?!.*?no registration found)
Change the no registration found part to match the <CRIT> message you want to exclude.
This expression matches also for the line:
<INFO> User WARNER registered
so you should consider using something like:
^(WARN|ERROR|<ERR>|<CRIT>(?!.*?no registration found))
that matches only if the tokens are at the beginning of the line (change the tokens accordingly).

This should work for you:
^<CRIT>(.(?!plms))*$
Demo and explanation

Related

TCL Regex Skipping Over a Set of Characters and Matching to a New line

I'm working with expect scripting in order to ssh into a device and pull information off of it. However, I'm facing issues parsing the expect_out(buffer) for the data from the commands I send.
This is the contents of my expect_out(buffer):
"mca-cli-op info\r\n\r\nModel: UAP-AC-Lite\r\nVersion: 6.0.21.13673\r\nMAC Address: 10:9f:5r:20:c5:7e\r\nIP Address: 123.123.1.123\r\nHostname: UAP-AC-Lite\r\nUptime: 152662 seconds\r\n\r\nStatus: Connected (http://base_controller<url;>/inform)\r\nUAP-AC-Lite-BZ.6.0.21# "
Right now I'm trying to get the Model (UAP-AC-LITE) without the Model tag.
So the regex expression I'm using is,
expect -re {(?=(Model: ))+[.*\$]}
set model "$expect_out(0,string)"
puts $model
The command doesn't work, but my thought process was that I would perform a look ahead for the Model tag, then match only the subsequent characters after it to the new line. I've tried replacing the "$" with \r\n but that doesn't work either. Can anyone explain what I'm doing wrong? Thanks for the help!
Note: If possible, I wouldn't want to include the newline either, as it might mess up commands that I run which use these variables.
You're close, but the regex is incorrect. Try
expect -re {Model:\s+([^\r]+)}
set model $expect_out(1,string)
The 1 in $expect_out(1,string) means the first set of capturing parentheses.
Regexes are documented at http://www.tcl-lang.org/man/tcl8.6/TclCmd/re_syntax.htm

Regex to extract all strings from source code used when calling a function

We have an old, grown project with thousands of php files and need to clean it up.
Throughout the whole project we do have a lot of function calls similar to:
trans('somestring1');
trans("SomeString2");
trans('more_string',$somevar);
trans("anotherstring4",$somevar);
trans($tx_key);
trans($anotherKey,$somevar);
All of those are embedded into the code and represent translation keys. I would like to find a way to extract all "translation keys" in all occurrences.
The PHP project is in VS Code, so a RegEx Search would be helpful to list the results.
Or I could search through the project with any other tool you would recommend
However I would also need to "export" just the strings to a textfile or similar.
The ideal result would be:
somestring1
SomeString2
more_string
anotherstring4
$tx_key
$anotherKey
As a bonus - if someone knows, how I could get the above list including filename where the result has been found - that would be really fantastic!
Any help would be greatly appreciated!
Update:
The RegEx I came up with:
/(trans)+\([^\)]*\)(\.[^\)]*\))?/gim
list the full occurrence - How can I just get the first part of the result (between Single Quotes OR between Double Quotes OR beginning with $)
See here: regexr.com/548d4
Here are some steps to get exactly what you want. Using this you can do a find and replace on your search results!
So you could do sequential regex find/replaces in the right circumstances.
The replace can be just within the search results editor and not affect the underlying files at all - which is what you want.
You can also have the replace action actually edit the underlying files if you wish.
[Hint: This technique can also make doing a find item a / replace with b in files that contain term c much easier to do.]
(1) Open a new search editor: Ctrl+Shift+P
(That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`(.*?)(\btrans\(['"]?)([^,'")]+)(.*)` - a relatively simple regex
regex101 demo
See my other answer for a regex to work with up to 6 entries per line:
(\s*\d+:\s)?((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)
(3) You will get a list of files with the search results. Now open a Find widget Shift+F in this Search editor.
(4) Put the same regex into that Find input. Regex option selected. Put $3 into the Replace field. This only replaces in this Search editor - not the original files (although that can be done if you want it in some case). Replace All.
If using the 1-6 version regex, replace with:
$1$5 $9 $13 $17 $21 $25
(5) Voila. You can now save this Search Editor as a file.
The first answer works for one desired capture per line as in the original question. But that relatively simple regex won't work if there are two or more per line.
The regex below works for up to 6 entries per line, like
trans('somestring1');
stuff trans("SomeString2"); some content trans("SomeString2a");more stuff [repeat, repeat]
But it doesn't for 7+ - you'll need a regex guru for that.
Here is the process again with a twist of using a snippet in the Search Editor instead of a Find/Replace. Using a snippet allows more control over the formatting of the final result.
(1) Open a new search editor: Ctrl+Shift+P (That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)`
regex101 demo
(3) You will get a list of files with the search results. Now select all your results individually with Ctrl+Shift+L.
(4) Trigger this keybinding:
{
"key": "alt+i", // whatever keybinding you like
"command": "editor.action.insertSnippet",
"when": "editorTextFocus",
"args": {
"snippet": "${TM_SELECTED_TEXT/((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*)((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?)(.*)/$4${8:+\n }$8${12:+\n }$12${16:+\n }$16${20:+\n }$20${24:+\n }$24/g}"
}
},
That snippet will be applied to each selection in your search result. This part ${8:+\n } is a conditional which adds a newline and some spaces if there is a capture group 8 - which would be a second trans(...) on a line.
Demo: (unfortunately, it doesn't properly show the Ctrl+Shift+L selecting all lines individually or the Alt+i snippet trigger)

Regex to skip first word and parse the rest of the message

I've been trying to get the right regex for skipping the first word and parsing the rest of the message.
I've been testing the regex by running Logstash locally
grok {
match => { "resource" => "/[^/]+/[^/]+(/|)(?<repo>[^/]+)?(/%{GREEDYDATA:resource_path})?" }
}
Test Messages:
/list/Lighter-test-group/xyz/123
/list/
/list
For messages,
/list/Lighter-test-group/xyz/123 gives us repo value as "Lighter-test-group" which is valid
/list/ gives us repo value as null which is valid
but /list gives repo value as "list" which is an invalid value. The correct value needs to be empty or null.
Not sure if you are restricted to using one really long regex but I would look into custom patterns to ignore the first word.
Using this grok debugger, I setup some custom patterns in the 3rd box:
IGNORE /\b\w+\b
REPO [A-Za-z]([A-Za-z0-9+\-.]+)+
And tested out this grok pattern in the 2nd box:
%{IGNORE}(/)?(%{REPO:repo})?(%{GREEDYDATA:resource_path})
Using these custom patterns, I was able to get what I think is your desired output but test them out with more use cases if you have any.

How to format a WinMerge fllter to ignore part of the line

I would like WinMerge to compare the full text but exclude a variable substring.
Orientation="West" PhysicalAddress="2395226" DefFieldFrmt="Uf4d0" UnitCustomText="sec"
Orientation="West" PhysicalAddress="2395230" DefFieldFrmt="Uf4d1" UnitCustomText="sec"
In the lines above I want to ignore the PhysicalAddress="xxx" and locate the changed DefFieldFrmt="Uf4d1"
I have tried adding the filter:
PhysicalAddress=".*"
However this filters the complete line.
The actual text before and after the PhysicalAddress="xxx" will vary so I need a filter that says: match prefix and match suffix but ignore target variable substring.
Help please.
According to the documentation, is not possible to use the line filters for this:
When a rule matches any part of the line, the entire difference is ignored. Therefore, you cannot filter just part of a line.
However, since WinMerge's source code is on GitHub, it is possible to add a feature request for this to its list of issues.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.