RegEx remove part of string and and replace another part - regex

I have a challenge getting the desired result with RegEx (using C#) and I hope that the community can help.
I have a URL in the following format:
https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1
I want make two modifications, specifically:
1) Remove everything after 'value' e.g. '&ida=0&idb=1'
2) Replace 'category' with e.g. 'newcategory'
So the result is:
https://somedomain.com/subfolder/newcategory/?abc=text:value
I can remove the string from 1) e.g. ^[^&]+ above but I have been unable to figure out how to replace the 'category' substring.
Any help or guidance would be much appreciated.
Thank you in advance.

Use the following:
Find: /(category/.+?value)&.+
Replace: /new$1 or /new\1 depending on your regex flavor
Demo & explanation
Update according to comment.
If the new name is completely_different_name, use the following:
Find: /category(/.+?value)&.+
Replace: /completely_different_name$1
Demo & explanation

You haven't specified language here, I mainly work on python so the solution is in python.
url = re.sub('category','newcategory',re.search('^https.*value', value).group(0))
Explanation
re.sub is used to replace value a with b in c.
re.search is used to match specific patterns in string and store value in the group. so in the above code re.search will store value from "https to value" in group 0.

Using Python and only built-in string methods (there is no need for regular expressions here):
url = r"https://somedomain.com/subfolder/category/?abc=text:value&ida=0&idb=1"
new_url = (url.split('value')[0] + "value").replace("category", 'newcategory')
print(new_url)
Outputs:
https://somedomain.com/subfolder/newcategory/?abc=text:value

Related

Regular Expression for BBcode replacement

I need PCRE regex for replace
This
[quote author=MEMBER link=topic=8.msg1111 date=1438798587]
sample text[/quote]
To this one
[quote="MEMBER, post: 1111"]sample text[/quote]
So i need:
Delete attribute date=xxxxxxxxxx and place " in end of tag (after post id)
Replace link=topic=8.msg to post:
Replace author= to ="
Can sombody help please?
Thanks!
Well, you can try this :
\[quote\sauthor=([^\s]+)\s*.*?msg(\d+)\s[^]]*]\s*([^[]*)(\[\/quote])
See this demo.
Although I can't be sure this is really standard to your kind of inputs since you only provided one sample.

MATLAB 2012 regular expression

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!
str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.
^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data

Regex Assistance for a url filepath

Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.

A regular expression to exclude a word/string

I have a regular expression as follows:
^/[a-z0-9]+$
This matches strings such as /hello or /hello123.
However, I would like it to exclude a couple of string values such as /ignoreme and /ignoreme2.
I've tried a few variants but can't seem to get any to work!
My latest feeble attempt was
^/(((?!ignoreme)|(?!ignoreme2))[a-z0-9])+$
Any help would be gratefully appreciated :-)
Here's yet another way (using a negative look-ahead):
^/(?!ignoreme|ignoreme2|ignoremeN)([a-z0-9]+)$
Note: There's only one capturing expression: ([a-z0-9]+).
This should do it:
^/\b([a-z0-9]+)\b(?<!ignoreme|ignoreme2|ignoreme3)
You can add as much ignored words as you like, here is a simple PHP implementation:
$ignoredWords = array('ignoreme', 'ignoreme2', 'ignoreme...');
preg_match('~^/\b([a-z0-9]+)\b(?<!' . implode('|', array_map('preg_quote', $ignoredWords)) . ')~i', $string);
As you want to exclude both words, you need a conjuction:
^/(?!ignoreme$)(?!ignoreme2$)[a-z0-9]+$
Now both conditions must be true (neither ignoreme nor ignoreme2 is allowed) to have a match.
This excludes all rows containing ignoreme from search results. It will also work pretty well when there are any character in a row
^((?!ignoreme).)*$
This worked for me:
^((?!\ignoreme1\b)(?!\ignoreme2\b)(?!\ignoreme3\b).)*$
This worked for me in python 3.x for a Machine Learning pipeline make_column_selector for including and excluding certain columns from a dataframe. to exclude ^(?!(col2|co4|col6)).*$
categoral_selector = make_column_selector(pattern = "(col2|co4|col6)")
numeric_selector = make_column_selector(pattern = "^(?!(col2|co4|col6)).*$")
simpler:
re.findall(r'/(?!ignoreme)(\w+)', "/hello /ignoreme and /ignoreme2 /ignoreme2M.")
you will get:
['hello']

REGEX Help for Coldfusion - In a URL, removing the ? And everything after

I'm looking for some REGEX help
Given the following URL: http://news.cnet.com/8301-13924_3-10315534-64.html?part=rss&subj=news&tag=2547-1_3-0-20
What is the REGEX to obtain the following:
http://news.cnet.com/8301-13924_3-10315534-64.html
Thus removing the ? and everything after it
Thanks, B
You can certainly use a regex for this, but it would be more efficient to use
listfirst(theurl, '?')
which finds the first part of a list delimited by question marks.
In ColdFusion you could use regex replace:
myURL = REReplace(myURL,"\?.*$","")
That would leave you with everything before the question mark.
This regular expression will do the trick:
^([^?]+)
Just take the second capture group from the match (the first capture group is always the original string itself if it matched).
#Ben Doom: If I'm not mistaken, the #url# variable is a complex object and cannot be treated as a string or list. The way I go about getting everything before the query string is:
<cfset myURL = "http://" & #cgi.HTTP_HOST# & #cgi.SCRIPT_NAME# />