Regexp to search and replace with transformed result - regex

In our project we had translations called like
Resources.Blabla.MooFoo.GetString("I am a whatever string!");
It was using Resgen, and now we want to use "standard" way for it.
We parsed out the source text files(removing special chars from keys) into resx files, and now want to search-replace whole project to change every call to
Resources.Translate("Iamawhateverstring");
The point here is that aside from replacing the call signature, which is not a problem, I need to parse out symbols like spaces, dots, etc. from parameter so that
"I am a whatever string!"
turns into
"Iamawhateverstring"
How can I do it ?

Regex for spaces replacement:
(?<=(GetString\(")[A-Za-z0-9 ]+) (?=(.*?("\)){1}))
(?<=(GetString\(")[A-Za-z0-9 ]+) looks before space character for GetString("[a-Z0-9] including space if there is one or more space occurance in string
(?=(.*?("\)){1})) looks after space character for ")

I would replace it using this regex: DEMO
(Resources.Blabla.MooFoo.GetString)(\(".*"\);)
then when you get the capture groups:
Replace capture group 1 with "Resources.Translate"
Replace capture group 2 using captureGroup2.replace(/\s/g, '')
So essentially it is a two step process.

Related

Regex Find Spaces between single qoutes and replace with underscore

I have a database table that I have exported. I need to replace the image file name with a space and would like to use notepad++ and regex to do so. I have:
'data/green tea powder.jpg'
'data/prod_img/lumina herbal shampoo.JPG'
'data/ALL GREEN HERBS.jpeg'
'data/prod_img/PSORIASIS KIT (640x530) (2).jpg'
and need to make them look like this:
'data/green_tea_powder.jpg'
'data/prod_img/lumina_herbal_shampoo.JPG'
'data/ALL_GREEN_HERBS.jpeg'
'data/prod_img/PSORIASIS_KIT_(640x530)_(2).jpg'
I just want to change the spaces between the quotes (I don't want to change the capitalization). To be more specific I would like to replace any and all spaces between 'data/ and ' because there are other spaces between quotes in the DB, for example:
'data/ REPLACE ANY SPACE HERE '
I found this:
\s(?!(?:[^']*'[^']*')*[^']*$)
but there are other places where there are spaces between quotes so I'd like to search for data/ in the beging and not just a single quote but I can't figure out how. I tried \s(?!(?:[^'data\/]*'[^']*')*[^']*$) but it didn't work and I am not familiar enough with regex to make it do so.
An example of a full line from the database is:
(712, 'GRTE-P', '', 'data/green tea powder.jpg', '2014-03-12 22:52:03'),
I don't want to replace the spaces in the time and data stamp at the end of the line, just the image file names.
Thanks in advance for your help!
You have to use a \G based pattern to ensure that matches are contiguous.
search: (?:\G(?!^)|'data/)[^' ]*\K[ ]replace: _
The first match uses the second branch of the alternation, then the next matches are contiguous and use the first branch.

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

Replacing part of delimited string with R's regex

I have the following list of strings:
name <- c("hsa-miR-555p","hsa-miR-519b-3p","hsa-let-7a")
What I want to do is for each of the above strings
replace the text after second delimiter (-) with "zzz".
Yielding:
hsa-miR-zzz
hsa-miR-zzz
hsa-let-zzz
What's the way to do it?
Might as well use something like:
gsub("^((?:[^-]*-){2}).*", "\\1zzz", name)
(?:[^-]*-) is a non-capturing group which consists of several non-dash characters followed by a single dash character and the {2} just after means this group occurs twice only. Then, match everything else for the replacement. Note I used an anchor just in case to avoid unintended substitutions.
Perhaps something like this:
> gsub("([A-Za-z]+-)([A-Za-z]+-)(.*)", "\\1\\2zzz", name)
[1] "hsa-miR-zzz" "hsa-miR-zzz" "hsa-let-zzz"
There are actually several ways to approach this, depending on how "regular" your expressions actually are. For example, do they all start with "hsa-"? What are the options for the "middle" group? Might there be more than three dashes?

How to distinguish between saved segment and alternative?

From the following text...
Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF>
... I need to extract company name + zip code + rest of the line.
Since either a TAB or a SPACE character can separate the second from the third tokens, I tried using the following regex:
FIND:^(.+) (\d{5})(\t| )(.+)$
REPLACE:\1\t\2\t\3
However, the contents of the alternative part is put in the \3 part, so the result is this:
Acme Inc.<TAB>12345<TAB><TAB or SPACE here>$
How can I tell the (Perl) regex engine that (\t| ) is an alternative instead of a token to be saved in RAM?
Thank you.
You want:
^(.+?) (\d{5})[\t ](.+)$
Since you are matching one character or the other, you can use a character class instead. Also, I made your first quantifier non-greedy (+? instead of +) to reduce the amount of backtracking the engine has to do to find the match.
In general, if you want to make capture groups not capture anything, you can add ?: to it, like:
^(.+?) (\d{5})(?:\t| )(.+)$
Use non-capturing parentheses:
^(.+) (\d{5})(?:\t| )(.+)$
One way is to use \s instead of ( |\t) which will match any whitespace char.
See Backslash-sequences for how Perl defines "whitespace".