regex change to detetct more pre-domain - regex

I am new to regular expression and trying to learn changing the following regex, was wondering if you can help me.
The following will detect for me the links such as :
<http://www.ijs.si/software/delet.obo#VO_Broker>
regex:
<(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9]\.[^\s]{2,})
How do I change the same regex so it can also detect for me
<http://kt.ijs.si/software/delet.obo#VO_Broker>

I use the following regex to check for urls:
(\b(https?|ftp|file):\/\/[-\w+&##\/%?=~_|!:,.;]*[-\w+&##\/%=~_|])
Works fine and is a lot simpler.

Related

Using RegEx with Alteryx to replace string

I have a simple issue: Using Alteryx, I want to take a string, match a certain pattern and return the matched pattern.
This is my current approach:
Regex_replace("CP:ConsumerProducts&Retail</td><td><strong><fontcl","[^\<]+","$1")
According to various sources and tools like regex101, the first matched sequence should be "CP:ConsumerProducts&Retail". However, Alteryx returns
<<<<
Alteryx uses the Perl RegEx Syntax (https://help.alteryx.com/2018.2/boost/syntax_perl.html), therefore, it should have no problem with the pattern itself.
I believe I am missing something obvious but I cannot figure it out.
I have received a reply through a different forum. A solution that works for me is to use the following pattern: ([^\<]+).*
You can try the following workflow:

Matching REGEX website url

Hello i am having trouble creating a regular expression that matches a particular set of urls.
http://www.somedomain.com/example/some-other-page/news/?p=12 Fail
http://www.somedomain.com/some-page/chat/?p=123 Pass
http://www.somedomain.com/example/path/test/chat/?p=12345 Fail
http://www.somedomain.com/example/?p=4321 Pass
http://www.somedomain.com/some-page/chat/?p=1 Fail
This is what i have so far i have ^http://www.somedomain.com(/(some-page)(/chat)(/?)(\?.*)?) I am not very comfortable with regular expressions
I solved this problem using regex101.com. this is the reg-ex i came up with ^http://www.somedomain.com(/(?:some-page|example)(/(?:‌​chat/\?p=123|\?p=43‌​21))(\?.*)?). Thanks

trying to Exclude Strings from my Regex Search

I'm using the following expression to filter Oracle Java vulnerabilities from a list. This works just fine:
^(?!.*Oracle Java.*).*$
I'm having a tough time adding another string to exclude.
I got the expression from an earlier question here:
Regular Expression to exclude set of Keywords
I've tried all the examples from this link but the answer Tim gave was the only one that worked for me.
Does anyone know how I could add another string to this?
^(?!.*Oracle Java.*).*$
You can use regex alternation inside the lookahead:
^(?!.*(Oracle Java|excluded1|excluded2).*).*$

Issues with RegEx

I am trying to make an if-then-else statement using RegEx. I want to match the text if it contains Monty and also contains Python. Also the text should get matched if Monty is not present in the text.
RegEx
(?(?=Monty)(?(?=Python).*|)|^.*).*$
Kindly help!
How about this:
(^(?!.*Monty(?!.*Python.*).*).*$|^.*Python.*Monty.*$)
This passes my tests, but let me know if it works for you.
I am not versed in lookahead regex but just tried to build the regex from what I understood from above description. Check the link to see if this is what you are trying to do.
try this instead
((?=Monty)((?=Python).*|)|^.*).*$

Replacing chemform in wiki - regexp

could you please give me some advice, I'm replacing the <chemform> code from my wiki which is not used any more... The strings are usually simple like these:
<chemform>CH3COO-</chemform>
<chemform>Ba2+</chemform>
<chemform>H2CO3</chemform>
I need them to be replaced by these:
CH<sub>3</sub>COO<sup>-</sup>
Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub>
So far I came up with this regexp for the RegExr tool:
match: <chemform\b[^>]*>(\D*?)([0-9]*)(\D*?)(\D*?)([0-9]*)(\D*?)([-+]*?)</chemform>
replace: $1<sub>$2</sub>$3$4<sub>$5</sub>$6<sup>$7</sup>
I know the code is horrible, but so far it's been working for me except for the fact it's getting me empty strings like <sub></sub>:
<sub></sub>CH<sub>3</sub>COO<sup>-</sup>
<sub></sub>Ba<sub>2</sub><sup>+</sup>
H<sub>2</sub>CO<sub>3</sub><sup></sup>
How can I get rid of these without doing second replace search? Thanks a lot!
You could use Notepad++, which is able to proceed to conditional replacements (you can have details in that previous post from Wiktor Stribiżew).
Use the following patterns:
match: ([A-Za-z]+(?=[-+\d]))(?<sub>\d+)?(?<sup>[-+])?(?=[-+\w]*</chemform>)
replace: $1(?{sub}<sub>$+{sub}</sub>)(?{sup}<sup>$+{sup}</sup>)
Given your input sample, I get:
<chemform>CH<sub>3</sub>COO<sup>-</sup></chemform>
<chemform>Ba<sub>2</sub><sup>+</sup></chemform>
<chemform>H<sub>2</sub>CO<sub>3</sub></chemform>