A regular expression to exclude a word/string - regex

I have a regular expression as follows:
^/[a-z0-9]+$
This matches strings such as /hello or /hello123.
However, I would like it to exclude a couple of string values such as /ignoreme and /ignoreme2.
I've tried a few variants but can't seem to get any to work!
My latest feeble attempt was
^/(((?!ignoreme)|(?!ignoreme2))[a-z0-9])+$
Any help would be gratefully appreciated :-)

Here's yet another way (using a negative look-ahead):
^/(?!ignoreme|ignoreme2|ignoremeN)([a-z0-9]+)$
Note: There's only one capturing expression: ([a-z0-9]+).

This should do it:
^/\b([a-z0-9]+)\b(?<!ignoreme|ignoreme2|ignoreme3)
You can add as much ignored words as you like, here is a simple PHP implementation:
$ignoredWords = array('ignoreme', 'ignoreme2', 'ignoreme...');
preg_match('~^/\b([a-z0-9]+)\b(?<!' . implode('|', array_map('preg_quote', $ignoredWords)) . ')~i', $string);

As you want to exclude both words, you need a conjuction:
^/(?!ignoreme$)(?!ignoreme2$)[a-z0-9]+$
Now both conditions must be true (neither ignoreme nor ignoreme2 is allowed) to have a match.

This excludes all rows containing ignoreme from search results. It will also work pretty well when there are any character in a row
^((?!ignoreme).)*$

This worked for me:
^((?!\ignoreme1\b)(?!\ignoreme2\b)(?!\ignoreme3\b).)*$

This worked for me in python 3.x for a Machine Learning pipeline make_column_selector for including and excluding certain columns from a dataframe. to exclude ^(?!(col2|co4|col6)).*$
categoral_selector = make_column_selector(pattern = "(col2|co4|col6)")
numeric_selector = make_column_selector(pattern = "^(?!(col2|co4|col6)).*$")

simpler:
re.findall(r'/(?!ignoreme)(\w+)', "/hello /ignoreme and /ignoreme2 /ignoreme2M.")
you will get:
['hello']

Related

RegEx remove part of string and and replace another part

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.
Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation
You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.
Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

Regex to match everything between multiple set of brackets

I am trying to match everything between multiple set of brackets
Example of data
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
I need to match everything within the inner brackets as below
42.30722,-83.181125,
42.30722,-83.18112667,
42.30722167,-83.18112667,
42.30721667,-83.181125,
+42.30721667,-83.181125
How do I do that. I tried \[([^\[\]]|)*\] but it gives me values with brackets. Can anybody please help me with this. Thanks in advance
Seems like one of them is missing a bracket maybe, or if not, maybe some expression similar to:
\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?
might be OK to start with.
Test
import re
expression = r"\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?"
string = """
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
"""
print([list(i) for i in re.findall(expression, string)])
print(re.findall(expression, string))
Output
[['42.30722', '-83.181125'], ['42.30722', '-83.18112667'], ['42.30722167', '-83.18112667'], ['42.30721667', '-83.181125'], ['+42.30721667', '-83.181125']]
[('42.30722', '-83.181125'), ('42.30722', '-83.18112667'), ('42.30722167', '-83.18112667'), ('42.30721667', '-83.181125'), ('+42.30721667', '-83.181125')]
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
A little late, but figured I would include it anyhow.
Your 3rd set is missing a ']'.
If that is in there, then in Alteryx, you can just use Text to Columns splitting to Rows and ignore delimiter in brackets

regular expression match domain

I need a regular expression to match the following domains as follows:
http://www.cnn.com/fred = www.cnn.com
cnn.com = cnn.com
www.cnn.com:8080 = www.cnn.com
I have the following regular expression (using pcre):
([^/]+://)?([^:/]+)
The above works fine in case 2 and 3 however with 1 i still have the http:// appended to the matching string, is there a regular expression option which i can use to skip the http part?
many thanks in advance
This one should suit your needs:
^(?:(?:f|ht)tps?://)?([^/:]+)
The first group will contain what you're looking for.
this looks like the closest i could get to what i want not perfect but seems to gets the job done
www?([^/:]+)

MATLAB 2012 regular expression

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.
^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data

How do I escape an apostrophe in my XPath text query with Perl and Selenium?

I have an XPath query which needs to match some text in a span attribute, as follows:
my $perl_query = qq(span[text\(\)='It's a problem']);
$sel->click_ok($perl_query);
Where the text has no apostrophe there is no problem.
I've tried the following instead of 'It's a problem':
'It\'s a problem'
'It&apos\;s a problem'
'It\${apos}s a problem' #some thread on Stackoverflow suggested that this was a solution implemented by Selenium, but it doesn't work.
Any ideas?
On a different note, if I can't solve this, I'd be happy enough matching 'a problem' but not sure how to do regex matching in XPath with Selenium.
Thanks for any pointers
It's an XPath problem rather than the Perl problem.
The problem was discussed and answered here in great detail:
http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes (broken link)
In a nutshell, modify your xquery to assemble the quote-containing string using concat()
my $perl_query = qq(span[text\(\)=concat("It","'","s a problem"]);
A couple of suggestions; hopefully at least one of them will work:
my $perl_query = qq!span[text()='It\\'s a problem']!;
my $perl_query = qq!span[text()="It's a problem"]!;
I just had the same problem and google didn't give me a satisfied solution.
I tried to substring this: value=' - ending with an Apostrophe.
My XPath that works look like:
"substring-after(., concat('value=', ''''))"
So four Apostrophes in a row.
Well the post is quite old. But here goes my working answer for those who still come wandering around looking for escaping single apostrophe and unable to find proper answer.
Text = It's a problem
Solution xpath = //div[text()=\"It's a problem\"]
or
Solution xpath = //div[contains(text(),\"It's a\")]
Is it possible that the actual text on the web page is a curly quote and not a straight apostrophe? Also, you may have extra space at the beginning and end of the span, so that the strict equality against your string won't match.
Consider breaking up your string if possible:
my $spanValue = q/text()='It's a problem'/;
my $perlQuery = qq/span[$spanValue]/;
# $perlQuery = span[text()='It's a problem']
The solution to escaping apostrophes in xpath string literals is to double the apostrophe, e.g.
qq(span[text()='It''s a problem'])