RegEx find pattern and igonore all white space - regex

I would like to find the following For Next loop within a script file. I have tried the following regex but doesn't work. I cant figure out how to skip all the white spaces. The text in the middle of the For Next can vary as well.
RegEx...
/[fF]or [eE]ach.*[aA]s [lL]ist[iI]tem [iI]n .*\.[]tems\s*.*[nN]ext/
Seach for this...
For Each item As ListItem In CheckBoxList1.Items
If item.Selected = True Then
MyList.Add(item.Text)
End If
Next

If what is between For and Next can vary I think you are looking for:
/[fF]or[\S\s]*?[nN]ext/
This will match anything that looks like For(anything at all until the next:)Next
If matching more words as you describe I would use something like:
/[fF]or\s+?[eE]ach[\S\s]+?[aA]s\s+?[lL]ist[iI]tem\s+?[iI]n[\S\s]+?[iI]tems[\S\s]+?[nN]ext/
Let me know if you want a more detailed description of this, but your example in the comments below will not ensure the in between words are also there.

\s will match whitespace:
/[fF]or\s+[eE]ach.*[aA]s\s+[sS]tring\s+[iI]n\s+.*\.[]tems\s*.*[nN]ext/

I think this one will do it:
/[fF]or\s+[eE]ach.*?[aA]s\s+[sS]tring\s+[iI]n\s+.*?\.[]tems\s+.*?\s+[nN]ext/
remember .*? will match the shortest string possible.

Related

Perl: regx pattern matching

Whenever i find the word ".abc.corp:" in a line on file, i would like to exclude those lines:
Example Line:
kubernte-fileserver-NN.abc.corp:/srv/export/storage/nsp_updates 1231231 123112 123123 89% /devops
can someone help me to find out the correct regex pattern:
im trying out with below pattern match: unable to figure it out
/^(.*(?!\.abc\.corp).*)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\%\s+([\/\w\d\.-]+)$/
I'm confused with subpattern group $1 with negative look around!
By "exclude it", I assume you want to exclude the entire line.
Your try will not exclude anything, because here Perl can always find some point in the share path to split it, where the split point is not immediately followed by .abc.corp, like if it splits it:
kubernte-fileserver-N
N.abc.corp:/srv/export/storage/nsp_updates
or (as it's actually going to do) just consume everything by the first .*, with nothing left for the second one.
I'd instead first try to match the string you're trying to avoid, and failing to do so, proceed with the actual handling:
if (/^\S+\.abc\.corp:/) {
# SKIP
}
elsif (/^(.*)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\%\s+([\/\w\d\.-]+)$/) {
...
}
Besides actually working, this makes the code much more readable.

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Perl regex to match only if not followed by both patterns

I am trying to write a pattern match to only match when a string is not followed by both following patterns. Right now I have a pattern that I've tried to manipulate but I can't seem to get it to match correctly.
Current pattern:
/(address|alias|parents|members|notes|host|name)(?!(\t{5}|\S+))/
I am trying to match when a string is not spaced correctly but not if it is part of a larger word.
For example I want it to match,
host \t{4} something
but not,
hostgroup \t{5} something
In the above example it will match hostgroup and end up separating it into 2 separate words "host" and "group"
Match:
notes \t{4} something
but not,
notes_url \t{5} something
Using my pattern it ends up turning into:
notes \t{5} _url
Hopefully that makes a bit more sense.
I'm not at all clear what you want, but word boundaries will probably do what you ask.
Does this work for you?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5})/
Update
Having understood your problem better, does this do what you want?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5}(?!\t))/

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1

Regex matching in ColdFusion OR condition

I am attempting to write a CF component that will parse wikiCreole text. I am having trouble getting the correct matches with some of my regular expression though. I feel like if I can just get my head around the first one the rest will just click. Here is an example:
The following is sample input:
You can make things **bold** or //italic// or **//both//** or //**both**//.
Character formatting extends across line breaks: **bold,
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
My first attempt was:
<cfset out = REreplace(out, "\*\*(.*?)\*\*", "<strong>\1</strong>", "all") />
Then I realized that it would not match where the ** is not given, and it should end where there are two carriage returns.
So I tried this:
<cfset out = REreplace(out, "\*\*(.*?)[(\*\*)|(\r\n\r\n)]", "<strong>\1</strong>", "all") />
and it is close but for some reason it gives you this:
You can make things <strong>bold</strong>* or //italic// or <strong>//both//</strong>* or //<strong>both</strong>*//.
Character formatting extends across line breaks: <strong>bold,</strong>
this is still bold. This line deliberately does not end in star-star.
Not bold. Character formatting does not cross paragraph boundaries.
Any ideas?
PS: If anyone has any suggestions for better tags, or a better title for this post I am all ears.
The [...] represents a character class, so this:
[(\*\*)|(\r\n\r\n)]
Is effectively the same as this:
[*|\r\n]
i.e. it matches a single "*" and the "|" isn't an alternation.
Another problem is that you replace the double linefeed. Even if your match succeeded you would end up merging paragraphs. You need to either restore it or not consume it in the first place. I'd use a positive lookahead to do the latter.
In Perl I'd write it this way:
$string =~ s/\*\*(.*?)(?:\*\*|(?=\n\n))/<strong>$1<\/strong>/sg;
Taking a wild guess, the ColdFusion probably looks like this:
REreplace(out, "\*\*(.*?)(?:\*\*|(?=\r\n\r\n))", "<strong>\1</strong>", "all")
You really should change your
(.*?)
to something like
[^*]*?
to match any character except the *. I don't know if that is the problem, but it could be the any-character . is eating one of your stars. It also a generally accepted "best practice" when trying to balance matching characters like the double star or html start/end tags to explicitly exclude them from your match set for the inner text.
*Disclaimer, I didn't test this in ColdFusion for the nuances of the regex engine - but the idea should hold true.
I know this is an older question but in response to where Ryan Guill said "I tried the $1 but it put a literal $1 in there instead of the match" for ColdFusion you should use \1 instead of $1
I always use a regex web-page. It seems like I start from scratch every time I used regex.
Try using '$1' instead of \1 for this one - the replace is slightly different... but I think the pattern is what you need to get working.
Getting closer with this:
**(.?)**|//(.?)//
The tricky part is the //** or **//
Ok, first checking for //bold//
then //bold// then bold, then
//bold//
**//(.?)//**|//**(.?)**//|**(.?)**|//(.?)//
I find this app immensely helpful when I'm doing anything with regex:
http://www.gskinner.com/RegExr/desktop/
Still doesn't help with your actual issue, but could be useful going forward.