sublime text regex multiple parameters - regex

I want to take Parameter1 using regex in Sublime Text. Other parameter will not be used.
Initial tags:
<description><![CDATA[<b>Parameter1</b></br></br>
This not to be copied and can be long]]></description>
This expression in Regex Sublime Text...
<description><!\[CDATA\[<b>(\w+)</b></br></br>(\w*)\]\]</description>
cannot find what I need (when I reach it stops finding)

Your regex doesn't match the test string.
There are whitespaces between the word letters.
It also won't match non-word letters like punctuation.
Below are two Regexs'
1. This is just to match your test string.
# <description>\s*<!\[CDATA\[\s*<b>([\s\w]+)</b>\s*</br>\s*</br>([\s\w]*)\]\]\s*</description>
<description>
\s*
<!\[CDATA\[
\s*
<b>
( # (1)
[\s\w]+
)
</b> \s* </br> \s* </br>
( # (2)
[\s\w]*
)
\]\]
\s*
</description>
2. This is how it should be done if your engine supports lookahead assertions.
# (?s)<description>\s*<!\[CDATA\[\s*<b>((?:(?!\]\]|\s*</b>).)+?)\s*</b>\s*</br>\s*</br>\s*((?:(?!\s*\]\]).)*)\s*\]\]\s*</description>
(?s)
<description>
\s*
<!\[CDATA\[
\s*
<b>
( # (1)
(?:
(?! \]\] | \s* </b> )
.
)+?
)
\s* </b> \s* </br> \s* </br> \s*
( # (2)
(?:
(?! \s* \]\] )
.
)*
)
\s*
\]\]
\s*
</description>

Related

Using regular expression extractor to extract a value?

I am trying to extract the value from the following code. Even though my regex expression is fine it is still not extracting the value.
token" value="(.+?)"
this does give me the exact match which I checked using regex101.com
<input type="hidden" name="token" value="GSYGEP2UUWOTMZ2SFV1G5D2M8L247KIG">
what the regex expression should be
Your original regular expression is just fine:
value="(.+?)"
It might be additional spaces, or code problems with it. Let's remove the token" or try to escape ", if necessary.
DEMO 1
DEMO 2
Reference:
Regular Expressions
Try this
<input(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(['"])\s*token\s*\1)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\svalue\s*=\s*(['"])((?:(?!\2)[\S\s])*)\2)\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>
The Value content you're after is in Capture Group 3
https://regex101.com/r/HJhStT/1
https://regex101.com/r/8BWONb/1
Explained
< input # Input tag
(?= # Name attribute: Assert (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s name \s* = \s* # name =
( ['"] ) # (1), Quote
\s* token \s* # token
\1
)
(?= # Value attribute
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s value \s* = \s* # value =
( ['"] ) # (2), Quote
( # (3 start), value content
(?:
(?! \2 )
[\S\s]
)*
) # (3 end)
\2
)
# Just get rest of tag
\s+
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>

Is there a regex expression to remove text between tags containing a specific word

I have made a regex expression to remove text between <FormattingRule and </FormattingRule>
Now i also want to include a extra condition: It must contains EdtJobEmpId.
Can someone assist me with this?
I have tried to following regex expression:
<FormattingRule(.|\n)*?<\/FormattingRule>
It can be found on the site : https://regex101.com/r/ttUMON/1
I want to remove the following text based on the extra condition:
<FormattingRule Action="OnChange">
<Triggers>
<Trigger PropertyName="${EdtJobEmpId}" />
</Triggers>
<Choose>
<When Condition="${EdtJobSkcId}==Empty">
<Assign PropertyName="${EdtJobSkcId.Value}" Value="=${EdtEmpSkcId.Value}" />
</When>
</Choose>
</FormattingRule>
This regex matches <FormattingRule> nodes only if they contain EdtJobEmpId:
(?m)<FormattingRule((?!/FormattingRule).)*EdtJobEmpId((?!/FormattingRule).)*\/FormattingRule>
See live demo.
It works by used the "multi-line" flag (?m) and negative lookahead (?!/FormattingRule) to not match outside the currently matching tag.
There is no regular expression that will get this 100% right every time. For example, most attempts will be defeated by such things as comments, CDATA sections, and entity or character references in the source.
The right tool for this job is XSLT.
This is the way it is done.
If you think you will run into problems where your html/xml has
constructs that could hide markup like Comments or CDATA (or anything else)
and you are worried about it, let me know and I'll patch up this
regex with a couple of functions to consume those bad boys.
(?:<(?:(FormattingRule)(?:\s+(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+)?\s*>)(?:(?!</\1\s*>)[\S\s])*?EdtJobEmpId(?:[\S\s]*?</\1\s*>|(*SKIP)(*FAIL)))
https://regex101.com/r/Plih3R/1
Readable version
(?:
<
(?:
( # (1 start), End tag req'd
FormattingRule
) # (1 end)
(?:
\s+
(?>
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
)?
\s* >
)
(?:
(?! </ \1 \s* > )
[\S\s]
)*?
EdtJobEmpId
(?:
[\S\s]*? </ \1 \s* >
|
(*SKIP)(*FAIL)
)
)

How to match fuzzy empty div with a regular expression?

I have the following HTML code:
<div id="page126-div" style="position:relative;width:918px;height:1188px;">
</div>
<div id="page127-div" style="position:relative;width:918px;height:1188px;">
sometext for example
</div>
<div id="page128-div" style="position:relative;width:918px;height:1188px;">
</div>
My task is to match empty divs. Empty means in this context that they do not content at all (no characters between open > and closing <) or contain just newline, or just a space or newline or less than 5 characters. So emptyness is pretty fuzzy.
If I would match all divs, not only empty I would use the following regex:
\<div id="page.*?"\>.*?\<\/div\>
Naturally I should use it with dotall modifier.
But when I try to match only empty divs I try to use this expression:
\<div id="page.*?"\>.{0,5}?\<\/div\>
I expect to get first and last(third) divs, because they contain: opening div tag with attributes, then div content that can be from 0 to 5 characters and closing div tag.
First match is right, but second match is second and third divs stacked together instead of third div only.
I do not understand why.
This regex is pretty straight-forward:
<div id=\"[^"]+?\" style=[^>]+?>(\s|\n|[^\n]{,5})<\/div>
Just notice it doesn't necessarily requires the exact same id and style properties.
You can give this a try.
Scraper Series
/(?><div(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sid\s*=\s*(?:(['"])\s*page(?:(?!\1)[\S\s])*\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|(?:(?!\/>)[^>])?)+>)\s*[\S\s]{0,5}\s*<\/div\s*>/
https://regex101.com/r/x8jf8D/1
Formatted
(?>
< div # div tag
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s id \s* = \s*
(?:
( ['"] ) # (1), Quote
\s* page # With 'id = "page XXX"
(?:
(?! \1 )
[\S\s]
)*
\1
)
)
\s+
(?:
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
\s* # Optional whitespaces (remove if necessary)
[\S\s]{0,5} # Optional 1-5 anything (including wsp)
\s* # Optional whitespaces (remove if necessary)
</div \s* >

how to match the iframe text, then skip and match another string in wordpress

I have this iframe code that I want to match for both the text right in the beginning of the string and continue with the code to find the "soundcloud" text:
<iframe src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/297769462&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe>
My regex, which is: (<iframe.*?><\/iframe>), which tries to match the iframe and anything in between.
What I want is the + skip everything in between until it finds soundcloud. If both conditions are fulfilled, then it's a match.
Any help would be great thank you.
Try this
(?i)<iframe(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s(src\s*=\s*(['"])(?:(?!\3)[\S\s])*?soundcloud(?:(?!\3)[\S\s])*\3)(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\1\s*</iframe\s*>
https://regex101.com/r/KkJH6x/1
Formatted
(?i) # Case insensitive modifier
< iframe # The iframe tag
(?= # Asserttion (a pseudo atomic group)
( # (1 start)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
( # (2 start), src attribute with 'soundcloud' in value
src \s* = \s*
( ['"] ) # (3), Quote
(?:
(?! \3 )
[\S\s]
)*?
soundcloud # 'Soundcloud'
(?:
(?! \3 )
[\S\s]
)*
\3 # Close quote
) # (2 end)
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (1 end)
)
\1
\s*
</iframe \s* >

Validating Email Id using Regex in C#

I have an input string ("My Email id is abc # gmail.com"). From the input string I need to validate Email id using Regex and need to replace it with (xxxxxxx).
I am using the below pattern but it doesn't work if the Email Id contains white Space.
\\w+([-+.']\\w+)*#\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*
Thanks.
If all you want to do is add whitespaces to word characters and maintain the original
regex integrity, it starts to get ugly:
// (?=\\s*\\w)[\\w\\s]+(?:[-+.'](?=\\s*\\w)[\\w\\s]+)*#(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*\\.(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*
(?= \s* \w )
[\w\s]+
(?:
[-+.']
(?= \s* \w )
[\w\s]+
)*
#
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*
\.
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*