Replacing chemform in wiki - regexp - regex

could you please give me some advice, I'm replacing the <chemform> code from my wiki which is not used any more... The strings are usually simple like these:
<chemform>CH3COO-</chemform>
<chemform>Ba2+</chemform>
<chemform>H2CO3</chemform>
I need them to be replaced by these:
CH<sub>3</sub>COO<sup>-</sup>
Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub>
So far I came up with this regexp for the RegExr tool:
match: <chemform\b[^>]*>(\D*?)([0-9]*)(\D*?)(\D*?)([0-9]*)(\D*?)([-+]*?)</chemform>
replace: $1<sub>$2</sub>$3$4<sub>$5</sub>$6<sup>$7</sup>
I know the code is horrible, but so far it's been working for me except for the fact it's getting me empty strings like <sub></sub>:
<sub></sub>CH<sub>3</sub>COO<sup>-</sup>
<sub></sub>Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub><sup></sup>
How can I get rid of these without doing second replace search? Thanks a lot!

You could use Notepad++, which is able to proceed to conditional replacements (you can have details in that previous post from Wiktor Stribiżew).
Use the following patterns:
match: ([A-Za-z]+(?=[-+\d]))(?<sub>\d+)?(?<sup>[-+])?(?=[-+\w]*</chemform>)
replace: $1(?{sub}<sub>$+{sub}</sub>)(?{sup}<sup>$+{sup}</sup>)
Given your input sample, I get:
<chemform>CH<sub>3</sub>COO<sup>-</sup></chemform>
<chemform>Ba<sub>2</sub><sup>+</sup></chemform>
<chemform>H<sub>2</sub>CO<sub>3</sub></chemform>

Related

Nino Regular Expression

I have the following text, for example:
nino&searchPhrase=jn123456&alphabetical
And I want to extract jn123456.
I've put together the following regex to extract NINOs:
(\bnino?\b.*?|Nino?\b.*?)[a-zA-Z]{2}[0-9]{6}
The problem I have is at the very end of the regex where I'm matching the last alpha character which may or may not be there.
I've tried adding the following at the end of the regex shown above without any luck:
?[a-zA-Z]{1} and
[?a-zA-Z]{1}
Could someone please look at this and let me know where I've gone wrong.
Many thanks and kind regards
Chris
You may use something like this:
^[Nn]ino&?\w*=([a-z]{2}\d{6})
which will capture "jn123456" in the first capturing group.
Demo.
If the character & can be anything else, then you may use . instead.

Using RegEx with Alteryx to replace string

I have a simple issue: Using Alteryx, I want to take a string, match a certain pattern and return the matched pattern.
This is my current approach:
Regex_replace("CP:ConsumerProducts&Retail</td><td><strong><fontcl","[^\<]+","$1")
According to various sources and tools like regex101, the first matched sequence should be "CP:ConsumerProducts&Retail". However, Alteryx returns
<<<<
Alteryx uses the Perl RegEx Syntax (https://help.alteryx.com/2018.2/boost/syntax_perl.html), therefore, it should have no problem with the pattern itself.
I believe I am missing something obvious but I cannot figure it out.
I have received a reply through a different forum. A solution that works for me is to use the following pattern: ([^\<]+).*
You can try the following workflow:

use regex to get both link and text associated with it (anchor tag)

I created a regex string that I hoped would get both the link and the associated text in an html page. For instance, if I had a link such as:
<a href='www.la.com/magic.htm'>magicians of los angeles</a>
Then the link I want is 'www.la.com/magic.htm' and the text I want is 'magicians of los angeles'.
I used the following regex expression:
strsearch = "\<a\s+(.*?)\>(.*?)\</a\s*?\>|"
But my vb program told me I was getting too many matches.
Is there something wrong with the regEx expression?
The circle-brackets are meant to get 'groups' that can be back-referenced.
Thanks
What about this one:
\<a href=.+\</a>
All there is left to do is to go over each match and extract the substrings using regular string manipulation.
Check here (although regexr follows javascript regex implementation, it is still useful in our scenario)
With that being said, I often see people stating that regexes are not suited for parsing Html. You might need to use an Html Parser for this. You have HtmlAgilityPack, which is not maintained anymore, and AngleSharp, that I know of to recommend.
I tried with following pattern , it worked.
\<a href=(.*?)\>(.*?)\<\/a\s*?\>|
Also Found two errors on your origin string:
missed a escape syntax on /a
the reserved word 'href' is captured on
first group
At last , i would like recommend you a great site to test REGEX string. It will helps your debug really fast. Refer this (also demonstrating the result you want) :
REGEX101

Notepad++ Wildcard Find/Replace

I'm using Notepad++ and need to update a file where there are various differences in earlier sections of the string of text and think Wildcards may help here. From the research I've done thus far, it isn't clear what syntax would be used for this.
Here's an example of the original string:
"EEID","SUPLIFE","Voluntary Life Insurance","500000.00","500000.00",0,276,10.62.0,0,0,"20151112","","A","","","","",""
I'd like to find a way to add wildcards in the places noted below as WILDCARD:
"EEID","SUPLIFE","Voluntary Life Insurance","WILDCARD","WILDCARD",WILDCARD,WILDCARD,WILDCARD,WILDCARD,WILDCARD,WILDCARD,"20151112","","A","","","","",""
The final output would then look like the following after the find/replace with wildcards to add VLIFE:
"EEID","SUPLIFE","Voluntary Life Insurance","500000.00","500000.00",0,276,10.62.0,0,0,"20151112","","A","VLIFE","","","",""
Thanks,
Brandon
Tested in Notepad++ and appears to work:
("EEID","SUPLIFE","Voluntary Life Insurance",([^,]+,){8}"","A",)("")(.*)
and replace pattern:
\1"VLIFE"\4
Regex101 example

Issues with RegEx

I am trying to make an if-then-else statement using RegEx. I want to match the text if it contains Monty and also contains Python. Also the text should get matched if Monty is not present in the text.
RegEx
(?(?=Monty)(?(?=Python).*|)|^.*).*$
Kindly help!
How about this:
(^(?!.*Monty(?!.*Python.*).*).*$|^.*Python.*Monty.*$)
This passes my tests, but let me know if it works for you.
I am not versed in lookahead regex but just tried to build the regex from what I understood from above description. Check the link to see if this is what you are trying to do.
try this instead
((?=Monty)((?=Python).*|)|^.*).*$