Perl: regx pattern matching - regex

Whenever i find the word ".abc.corp:" in a line on file, i would like to exclude those lines:
Example Line:
kubernte-fileserver-NN.abc.corp:/srv/export/storage/nsp_updates 1231231 123112 123123 89% /devops
can someone help me to find out the correct regex pattern:
im trying out with below pattern match: unable to figure it out
/^(.*(?!\.abc\.corp).*)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\%\s+([\/\w\d\.-]+)$/
I'm confused with subpattern group $1 with negative look around!

By "exclude it", I assume you want to exclude the entire line.
Your try will not exclude anything, because here Perl can always find some point in the share path to split it, where the split point is not immediately followed by .abc.corp, like if it splits it:
kubernte-fileserver-N
N.abc.corp:/srv/export/storage/nsp_updates
or (as it's actually going to do) just consume everything by the first .*, with nothing left for the second one.
I'd instead first try to match the string you're trying to avoid, and failing to do so, proceed with the actual handling:
if (/^\S+\.abc\.corp:/) {
# SKIP
}
elsif (/^(.*)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\%\s+([\/\w\d\.-]+)$/) {
...
}
Besides actually working, this makes the code much more readable.

Related

Regex to remove a whole phrase from the match

I am trying to remove a whole phrase from my regex(PCRE) matches
if given the following strings
test:test2:test3:test4:test5:1.0.department
test:test2:test3:test4:test5:1.0.foo.0.bar
user.0.display
"test:test2:test3:test4:test5:1.0".division
I want to write regex that will return:
.department
.foo.0.bar
user.0.display
.division
Now I thought a good way to do this would be to match everything and then remove test:test2:test3:test4:test5:1.0 and "test:test2:test3:test4:test5:1.0" but I am struggling to do this
I tried the following
\b(?!(test:test2:test3:test4:test5:1\.0)|("test:test2:test3:test4:test5:1\.0"))\b.*
but this seems to just remove the first tests from each and thats all. Could anyone help on where I am going wrong or a better approach maybe?
I suggest searching for the following pattern:
"?test:test2:test3:test4:test5:1\.0"?
and replacing with an empty string. See the regex demo and the regex graph:
The quotation marks on both ends are made optional with a ? (1 or 0 times) quantifier.

Perl regex to match only if not followed by both patterns

I am trying to write a pattern match to only match when a string is not followed by both following patterns. Right now I have a pattern that I've tried to manipulate but I can't seem to get it to match correctly.
Current pattern:
/(address|alias|parents|members|notes|host|name)(?!(\t{5}|\S+))/
I am trying to match when a string is not spaced correctly but not if it is part of a larger word.
For example I want it to match,
host \t{4} something
but not,
hostgroup \t{5} something
In the above example it will match hostgroup and end up separating it into 2 separate words "host" and "group"
Match:
notes \t{4} something
but not,
notes_url \t{5} something
Using my pattern it ends up turning into:
notes \t{5} _url
Hopefully that makes a bit more sense.
I'm not at all clear what you want, but word boundaries will probably do what you ask.
Does this work for you?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5})/
Update
Having understood your problem better, does this do what you want?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5}(?!\t))/

RegEx find pattern and igonore all white space

I would like to find the following For Next loop within a script file. I have tried the following regex but doesn't work. I cant figure out how to skip all the white spaces. The text in the middle of the For Next can vary as well.
RegEx...
/[fF]or [eE]ach.*[aA]s [lL]ist[iI]tem [iI]n .*\.[]tems\s*.*[nN]ext/
Seach for this...
For Each item As ListItem In CheckBoxList1.Items
If item.Selected = True Then
MyList.Add(item.Text)
End If
Next
If what is between For and Next can vary I think you are looking for:
/[fF]or[\S\s]*?[nN]ext/
This will match anything that looks like For(anything at all until the next:)Next
If matching more words as you describe I would use something like:
/[fF]or\s+?[eE]ach[\S\s]+?[aA]s\s+?[lL]ist[iI]tem\s+?[iI]n[\S\s]+?[iI]tems[\S\s]+?[nN]ext/
Let me know if you want a more detailed description of this, but your example in the comments below will not ensure the in between words are also there.
\s will match whitespace:
/[fF]or\s+[eE]ach.*[aA]s\s+[sS]tring\s+[iI]n\s+.*\.[]tems\s*.*[nN]ext/
I think this one will do it:
/[fF]or\s+[eE]ach.*?[aA]s\s+[sS]tring\s+[iI]n\s+.*?\.[]tems\s+.*?\s+[nN]ext/
remember .*? will match the shortest string possible.

Regular Expression Troubles

Given the following type of string:
"#First Thing# #Another One##No Space# Main String #After Main# #EndString#"
I would like to come up with a regular expression that can return all the text surrounded by the # symbols as matches. One of the things giving me grief is the fact that the # symbol is both the opening and closing delimiter. All of my attempts at a regex have just returned the entire string. The other issue is that it is possible for part of the string to not be surrounded by # symbols, as shown by the substring "Main String" above. Does anyone have any ideas? I have toyed around with Negative Look-behind assertion a bit, but haven't been able to get it to work. There may or may not be a space in between the groups of #'s but I want to ignore them (not match against them) if there are. The other option would be to just write a string parser routine, which would be fairly easy, but I would prefer to use a regex if possible.
/((#[^#]+#)|([^#]+))/
Perhaps something like the above will match what you want.
This will match the space in between two hashes. Hmm.
/((#[^#]+#)|([^#]*[^#\s]+[^#]*))/
That will get rid of the nasty space, I think.
[Edit]
I think that this is what you need:
(?<=#)[^#]+?(?=#)
With input #First Thing# #Another One##No Space# Main String #After Main# matches:
First Thing
Another One
No Space
Main String
After Main
The second match is the space between Thing# and #Another.
[EDIT] To ignore space:
(?<=)(?!\s+)[^#]+?(?=#)
If you want to ignore trailing spaces:
(?<=)(?!\s+)[^#]+?(?=\s*#)
Try this. The first and last groups should not be captured and the .*? should be lazy
(?:#)(.*?)(?:#)
I think this is what you really need:
((#[^#]+#)|([^#]*[^#\s]+[^#]*))
but it will not capture the #'s around Main String

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.