A regex with Splunk - regex

Got some troubles with my regex.
I got some lines like this:
SomeText#"C:\\","Shadow Copy Components:\\","E:\\",""
SomeText#"D:\\"
SomeText#"E:\\","Shadow Copy Components:\\"
SomeText#"SET SNAP_ID=serv.a.x.com_1380312019","BACKUP H:\\ USING \\\\?\\GLOBALROOT\\Device\\HarddiskVolumeShadowCopy47\\ OPTIONS:ALT_PATH_PREFIX=c:\\VERITAS\\NetBackup\\temp\\_vrts_frzn_img_3200\"
SomeText#"SET SNAP_ID=serv.a.x.com_1380312019","BACKUP Y:\\Libs USING \\\\?\\GLOBALROOT\\Device\\HarddiskVolumeShadowCopy47\\ OPTIONS:ALT_PATH_PREFIX=c:\\VERITAS\\NetBackup\\temp\\_vrts_frzn_img_3200\"
What i would like is to get a group named jobFileList containing for each line:
"C:\\","Shadow Copy Components:\\","E:\\",""
"D:\\"
"E:\\","Shadow Copy Components:\\"
H:\\
Y:\\Libs
You can see i only want the file list, but some times its only the full text after the # mark and sometimes there is a lot of ** that i need to remove.
Fact is i cant use a script for this case so i need to do this with only ONE regexp, can't just do a streplace of other stuff after the regex.
What i did is :
SomeText(#.*BACKUP (?P<jobFileList>.*?) .*)?(#(?P<jobFileList>.*))?
But seems i cant set the same GroupName :( If i replace the second jobFileList with another name its works perfectly but not what i need .
Thanks for your help,
EDIT:
I can also have some lines like :
SomeText#/ahol5d72_1_2
SomeText#/p7ol4a1p_1_2
SomeText#Gvadag04SANDsk_Daily
SomeText#/bck_reco_a9ol5765_1_2_827497669
In all these cases i need to have all the text after the # mark.

A version which doesn't rely on the double quotes after the double backslash:
SomeText#(?:(.*?BACKUP) )?(?P<jobFileList>(?(1)[^ ]*|.*$))
This: (?(1)[^ ]*|.*$) is a conditional group that is supported in Python 2.7.5 (probably works for higher versions but I don't know for previous ones). If there's BACKUP, it grabs all the non-spaces and if there's no BACKUP, it grabs everything till the end of the string.
regex101 demo
EDIT: As per comment, the regex that worked after #timmalos' modifications:
\#(?P<G>.*?[^E]BACKUP\s)?(?P<G2>f:\\\\Mailbox\\\)?(?P<jobFileList>(?(G)(?(G2)[^\]|\S)‌​*|.*))

This is possible to match with a single regular expression however I know nothing of splunk. Maybe this will help:
("?[A-Z]:\\\\(?:".+|\S+)?)
Live demonstration here

Related

How can I create a Regex that matches and transforms a period delimited path?

I am using den4b Renamer to rename a lot of files that follow a specific pattern. The program allows me to use RegEx: (https://www.den4b.com/wiki/ReNamer:Regular_Expressions)
I am stuck trying to conjure up an expression for a specific pattern.
My current RegEx:
Expression: ^(com\.)(([\w\s]*\.){0,4})([\w\s]*)$
Replace: \L$1\L$2\u$4
Note: \L and \u transform the sub-expression to upper and lower case as defined in the table below:
Here are a few example strings so you can get an idea of the input:
Android File Transfer.svg
Angular Console.svg
Au.Edu.Uq.Esys.Escript.svg
Avidemux.svg
Blackmagic Fusion8.svg
Broken Sword.svg
Browser360 Beta.svg
Btsync GUI.svg
Buttercup Desktop.svg
Calc.svg
Calibre EBook Edit.svg
Calibre Viewer.svg
Call Of Duty.svg
com.GitHub.Plugarut.Pwned Checker.svg
com.GitHub.Plugarut.Wingpanel Monitor.svg
com.GitHub.Rickybas.Date Countdown.svg
com.GitHub.Spheras.Desktopfolder.svg
com.GitHub.Themix Project.Oomox.svg
com.GitHub.Unrud.Remote Touchpad.svg
com.GitHub.Unrud.Video Downloader.svg
com.GitHub.Weclaw1.Image Roll.svg
com.GitHub.Zelikos.Rannum.svg
com.Gitlab.Miridyan.Mt.svg
com.Inventwithpython.Flippy.svg
com.Neatdecisions.Detwinner.svg
com.Rafaelmardojai.Share Preview.svg
com.Rafaelmardojai.Webfont Kit Generator.svg
Distributor Logo Antix.svg
Distributor Logo Archlabs.svg
Distributor Logo Dragonflybsd.svg
DOSBox.svg
Drawio.svg
Drweb GUI.svg
For this question I am focused on the strings that begin with com.xxx.xxx.
Since I can't only target those names in Renamer, the expression has to "play nice" with the other input file names and correctly leave them alone. That's why I've prefixed my expression with ^(com\.)
What I want:
Transform the entire string to lower case except for the last period separated part of the string.
Strip white space from the entire string.
For instance:
Original: com.GitHub.Alcadica.Develop.svg
After my Regex: com.github.alcadica.Develop.svg
What I want: com.github.alcadica.Develop.svg
This specific file is correctly renamed. What I'm having trouble with are names that have spaces in any part of the string. I can't figure out how to strip whitespace:
Original: com.Belmoussaoui.Read it Later.svg
After my Regex: com.belmoussaoui.Read it Later.svg
What I want: com.belmoussaoui.ReaditLater.svg
Here is a hypothetical example because I couldn't find a file with more than four parts. I want my pattern to be robust enough to handle this:
Original: com.Shatteredpixel.Another Level.Next.Pixel Dungeon.svg
After my Regex: com.shatteredpixel.another level.next.Pixel Dungeon.svg
What I want: com.shatteredpixel.anotherlevel.next.PixelDungeon.svg
Note that since I'm not using any kind of programming language, I don't have access to common string operations like trim, etc. I can, however, stack expressions. But this would create more overhead and since I am renaming thousands of files at a time I'd ideally like to keep it to one find/replace expression.
Any help would be greatly appreciated. Please let me know if I can provide any more information to make this more clear.
Edit:
I got it to work with the following rules:
Really inefficient, but it works. (Thanks to Jeremy in the comments for the idea)

Regex with multiple groups, some of which are optional

I have trouble matching multiple groups, some of which are optional. I've tried variations of greedy/non greedy, but can't get it to work.
As input, I have cells which look like this:
SEPA Overboeking IBAN: AB1234 BIC: LALA678 Naam: John Smith Omschrijving: Hello hello Kenmerk: 03-05-2019 23:12 533238
I wanna split these up into groups of IBAN, BIC, Naam, Omschrijving, Kenmerk.
For this example, this yields: AB1234; LALA678; John Smith; Hello hello; 03-05-2019 23:12 533238.
To obtain this, I've used:
.*IBAN: (.*)\s+BIC: (.*)\s+Naam: (.*)\s+Omschrijving: (.*)\s+Kenmerk: (.*)
This works perfectly as long as all these groups are present in the input. Some cells, however don't have the "Omschrijving" and/or "Kenmerk" part. As output, I would like to have empty groups if they're not present. Right now, nothing is matched.
I've tried variations with greedy/non greedy, but couldn't get it to work.
Help would be greatly appreciated!
N.B.: I'm working in KNIME (open source data analysis tool)
I was able to split your input using the following regular expression:
^.*
\s+IBAN\:\s*(?<IBAN>.*?)
\s+BIC\:\s*(?<BIC>.*?)
\s+Naam\:\s*(?<Naam>.*?)
(?:\s+Omschrijving\:\s*(?<Omschrijving>.*?))?
(?:\s+Kenmerk\:\s*(?<Kenmerk>.*?))?
$
This requires your fields to follow the given order and will treat the fields IBAN, BIC and Naam as required. Fields Omschrijving and Kenmerk may be optional. I am pretty sure, this can still be optimized, but it results in the following output, which should be fine for you (or at least a starting point):
For evaluation and testing in KNIME, I used Palladian's Regex Extractor node, that can be configured as follows and provides a nice preview functionality:
I added an example workflow to my NodePit Space. It contains some example lines, parses them and provides the above seen output.

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

I have a text file in Notepad++ that contains about 66,000 words all in 1 line, and it is a set of 200 "lines" of output that are all unique and placed in 1 line in the basic JSON form {output:[{output1},{output2},...}]}.
There is a set of characters matching the RegEx expression "id":.........,"kind":"track" that occurs about 285 times in total, and I am trying to either single them out, or copy all of them at once.
Basically, without some super complicated RegEx terms, I am stuck because I can't figure out how to highlight all of them at once, and also the Remove Unbookmarked Lines feature does not apply because this is all in one line. I have only managed to be able to Mark every single occurrence.
So does this require a large number of steps to get the file into multiple lines and work from there, or is there something else I am missing?
Edit: I have come up with a set of Macro schemes that make the process of doing this manually work much faster. It's another alternative but still takes a few steps and quite some time.
Edit 2: I intended there to be an answer for actually just highlighting the different sections all at once, but I guess that it not possible. The answer here turns out to be more useful in my case, allowing me to have a list of IDs without everything else.
You seem to already have a regex which matches single instances of your pattern, so assuming it works and that we must use Notepad++ for this:
Replace .*?("id":.........,"kind":"track").*?(?="id".........,"kind":"track"|$) with \1.
If this textfile is valid JSON, this opens you up to other, non-notepad++ options, like using Python with the json module.
Edited to remove unnecessary steps

How to use VI to remove ocurance of character on lines matching regex?

I'm trying to change the case of method names for some functions from lowercase_with_underscores to lowerCamelCase for lines that begin with public function get_method_name(). I'm struggling to get this done in a single step.
So far I have used the following
:%s/\(get\)\([a-zA-Z]*\)_\(\w\)/\1\2\u\3/g
However, this only replaces one _ character at a time. What I would like it a search and replace that does something like the following:
Identify all lines containing the string public function [gs]et.
On these lines, perform the following search and replace :s/_\(\w\)/\u\1/g
(
EDIT:
Suppose I have lines get_method_name() and set_method_name($variable_name) and I only want to change the case of the method name and not the variable name, how might I do that? The get_method_name() is more simple of course, but I'd like a solution that works for both in a single command. I've been able to use :%g/public function [gs]et/ . . . as per the solution listed below to solve for the get_method_name() case, but unfortunately not the set_method_name($variable_name) case.
If I've understood you correctly, I don't know why the things you've tried haven't worked but you can use g to perform a normal mode command on lines matchings a pattern.
Your example would be something like:
:%g/public function [gs]et/:s/_\(\w\)/\u\1/g
Update:
To match only the method names, we can use the fact that there will only be method names before the first $, as this looks to be PHP.
To do that, we can use a negative lookbehind, #<!:
:%g/public function [gs]et/:s/\(\$.\+\)\#<!_\(\w\)/\u\2/g
This will look behind #<! for any $ followed by any number of characters and only match _\(\w\) if no $s are found.
Bonus points(?):
To do this for multiple buffers stick a bufdo in front of the %g
You want to use a substitute with an expression (:h sub-replace-expression)
Match the complete string you want to process then pass that string to a second substitute command to actually change the string
:%s/\(get\|set\)\zs_\w\+/\=substitute(submatch(0), '_\([A-Za-z]\)', '\U\1', 'g')
Running the above on
get_method_name($variable_name)
set_method_name($variable_name)
returns
getMethodName($variable_name)
setMethodName($variable_name)
To have vi do replace sad with happy, on all lines, in a file:
:1, $ s/sad/happy/g
(It is the :1, $ before the sed command that instructs vi to execute the command on every line in the file.)

Maven replacer plugin - repeat while matches exist

I am using the maven replacer plugin and I've run into a situation where I have a regular expression that matches across lines which I need to run on the input file until all matches have been replaced. The configuration for this expression looks like this:
<regexFlags>
<regexFlag>DOTALL</regexFlag>
</regexFlags>
<replacements>
<replacement>
<token>\#([^\n\r=\#]+)\#=([^\n\r]*)(.*)(\#default\.\1\#=[^\n\r]*)(.*)</token>
<value>#$1#=$2$3$5</value>
<replacement>
<replacements>
The input could look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
and I want the output to look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.RR.TT#=393993
The intention is to re-write the file, but without the tokens with a #default prefix, where another token without the prefix has already been defined.
#default.a.b.c#=QQQ and #default.h.i.j#=234 have been removed from the output because other tokens already contains a.b.c and h.i.j.
The current problem I have is that the replacer plugin only replaces the first match, so my output looks like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
Here, #default.a.b.c=QQQ is gone, which is correct, but #default.h.i.j#=234 is still present.
If I were writing this in code, I think I could probably just loop while attempting to match on the entire output, and break when there are no matches. Is there a way to do this with the replacer plugin?
Edit: I may have over simplified my example. A more realistic one is:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
#x.y.z#=0
#default.q.r.s#=1
#l.m.n#=8.3
#q.r.s#=78
#blah.blah.blah#=blah
This shows that it's possible for a default.x.x.x=y to precede a x.x.x=y token (as #default.q.r.s#=1 preceedes #q.r.s#=78`), my prior example wasn't clear about this. I do actually have an expression to capture this, it looks a bit like this:
\#default\.([^\n\r=#|]+)#=([^\n\r|]*)(.*)#\1#=([^\n\r|]*)(.*)
I know line separators are missing from this even though they were in the other one - I was experimenting with removing all line separators and treating it as a single line but that hasn't helped. I can resolve this problem simply by running each replacement multiple times by copying and pasting the configurations a few times, but that is not a good solution and will fail eventually.
I don't believe you could solve this problem as is, a work-around is to reverse the order of the file top to bottom, perform lookahead regex and then reverse the result order
pattern = #default\.(.*?)#[^\r\n]+(?=[\s\S]*#\1#) Demo
another way (depending on the capabilities of "Maven") is to run this pattern
#(.*)(#[\s\S]*)#default\.\1.*
and replace with #$1$2 Demo in a loop until there are no matches
then run this pattern
#default\.(.*)#.*(?=[\s\S]*\1)
and replace with nothing Demo in a loop until there are no matches
It doesn't look like the replacer plugin can actually do what I want. I got around this by using regular expressions to build multiple filter files, and then applying them to the resource files.
My original goal had been to use regular expressions to build a single, clean, and tidy filter file. In the end, I discovered that I was able to get away with just using multiple filters (not as clean or tidy) and apply them in the correct order.