Regex np++ Change different instances of "this" to different word - regex

I have several of text files that contain the something like the following on many different lines:
this_is_THIS.doc
What I need to accomplish is to replace THIS with different objects for the first 5 occurrences and disregard the rest.
I would like for it to appear like the following:
this_is_TREE.doc
this_is_CAR.doc
this_is_CAT.doc
this_is_DONKEY.doc
this_is_ROCK.doc
I will have to do this many times in the future with the words changing so I feel a regex that I can alter in the future would help me a lot. I have searched but found nothing useful. Thanks for any help, you folks are great here.

As long as you want to replace just 5 instances of THIS, I think the following solution is manageable. For this particular case, you can replace:
^(.+?)THIS(.+?)THIS(.+?)THIS(.+?)THIS(.+?)THIS
With
$1TREE$2CAR$3CAT$4DONKEY$5ROCK
Change the above texts like CAT, CAR as per your requirements.
Click for Demo
Before Replacing:
Don't forget to check . matches newline and Match case settings as shown below.
After Replacing:
Note: Even I wouldn't recommend this method if you need to replace say 100 instances of THIS. The regex is going to be too long in that case.

Related

Notepad++ / RegEx - Find & replace multiple parts of a line while ignoring any words between

I appreciate what I'm asking may be very simple to more experienced folks. I've spent several hours trying to get my head around RegEx and have gotten close to what I need, but as this is something I'm trying to achieve for a hobby project (RegEx is not something I require in my day job) I'm hoping some of you may be able to help me out.
In short, I have a very large file with tens of thousands of lines of code that I am converting to be readable by another program. All I need to accomplish this is to change some formatting.
I need to find every instance where the tag "{#graphic examplename}" is used, and change it so that only "examplename" remains in square [[ ]] brackets.
Examples of how the tags currently appear (example names can be either single words or multiple):
"{#graphic example1}",
"{#graphic example2}",
"{#graphic example3 with multiple words}"
What I want them to look like when done, replacing the { with [[, removing #graphic, and replacing } with ]].
"[[example1]]",
"[[example2]]",
"[[example3 with multiple words]]"
It's easy enough to do a simple find-and-replace to replace "{#graphic " with "[[", as the #graphic tag is something I want to remove universally however the issue I'm running into is that I can't replicate that with the "}" at the end, because I can't find a way to specify that I only want to replace examples of "}" that come after an instance of "{#graphic " while leaving any other words (the examplename) intact.
Any assistance gratefully received - if the above needs any elaboration please don't hesitate to ask, I understand I may be putting this in amateurish terms.
Regards,
K
Often programs have a way of capturing groups and referencing them later, often with $
so find {#graphic ([^}]+)}
replace [[$1]]
Captures what is inside the () and makes it available in the replace as $1, i.e. the first "capture group".
Regex 101 is an excellent resource for trying these things out:
https://regex101.com/r/r5OX8I/1

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

Extract only the text field needed

I am at the beginning of learning Regex, and I use every opportunity to understand how it's working. Currently I am trying to extract dates from a text file (which is in fact a vnt-file type from my mobile phone). It looks like following:
BEGIN:VNOTE
VERSION:1.1
BODY;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:18.07.=0A14.08.=0A15.09.=0A15.10.=
=0A13.11.=0A13.12.=0A12.01.=0A03.02. Grippe=0A06.03.=0A04.04.2015=0A0=
5.05.2015=0A03.06.2015=0A03.07.2015=0A02.08.2015=0A30.08.2015=0A28.09=
17.11.2017=0A
DCREATED:20171118T095601
X-IRMC-LUID:150
END:VNOTE
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
and so on. If the date has also a year, it should also be displayed.
I almost found out how to detect the dates by the following regex:
.+(\d\d\.\d\d\.(2015|2016|2017)?).+
But it only detect very few of the dates. The result is this:
BEGIN:VNOTE
VERSION:1.1
15.10.
04.04.2015
30.08.2015
24.01.2016
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
Then I tried to add a question mark which makes the .+ not greedy, as far as I read in tutorials. Then the regex looks like:
.+?(\d\d\.\d\d\.(2015|2016|2017)?).+?
But the result is still not what I am looking for:
BEGIN:VNOTE
VERSION:1.1
21.03.20.04.18.05.18.06.18.07.14.08.15.09.15.10.
13.11.13.12.12.01.03.02.06.03.04.04.20150A0=
03.06.201503.07.201502.08.201530.08.20150A28.09=
28.10.201525.11.201528.12.201524.01.20160A
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
For someone who is familiar with regex I am pretty sure this is very easy to solve, but I don't get it. It's very confusing when you are new to regex. I tried to find a hint in some tutorials or stackoverflow posts, but all I found is this: Notepad++ how to extract only the text field which is needed?
But it doesn't work for me. I assume it might have something to do with the fact that my text file is not one single line.
I have my example on regex101 too.
I would be very thankful if maybe someone can give me a hint what else I can try.
Edit: I would like to detect the dates with the regex and as a result have a list with only the dates (maybe it is called substitute?)
Edit 2: Sorry for not mentioning it earlier: I just want to use the regex in e.g. Notepad++ or an online regex test website. Just to get the result of the dates and save the result in a new txt-file. I don't want to use the regex in an programming language. My apologies for not being precisely before.
Edit 3: The result should be a list with the dates, and each date in a new line:
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
I suggest this pattern:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)
This makes use of the \G flag that, in this case, allows for multiple matches from the very start of the match without letting any single unmatched character in the text, thus allowing the removal of all but what's wanted.
If you want to remove the extra matches as well, add |.* at the end:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)|.*
regex101 demo
In N++, make sure the options underlined are selected, and that the cursor is at the beginning. In the picture below, I replaced then undid the replacement, only to show that matches were identified (16 replacements).
You can try using the following pattern:
\d{2}\.\d{2}\.(?:\d{4})?
This will match day.month dates of the form 18.07., but it also allows such a date to be followed by a four digit year, e.g. 18.07.2017. While it would be nice to make the pattern more restrictive, to avoid false fire matches, I do not see anything obvious which can be added to the above pattern. Follow the demo link below to see the pattern in action.
Demo

How do I join two regular expressions into one in Notepad++?

I've been searching a lot in the web and in here but I can't find a solution to this.
I have to make two replacements in all registry paths saved in a text file as follows:
replace all asterisc with: [#42]
replace all single backslashes with two.
I already have two expressions that do this right:
1st case:
Find: (\*) - Replace: \[#42\]
2nd case:
Find: ([^\\])(\\)([^\\]) - Replace: $1$2\\$3
Now, all I want is to join them together into just one expression so that I can do run this in one time only.
I'm using Notepad++ 6.5.1 in Windows 7 (64 bits).
Example line in which I want this to work (I include backslashes but i don't know if they will appear right in the html):
HKLM\SOFTWARE\Classes\*\shellex\ContextMenuHandlers\
I already tried separating it with a pipe, like I do in Jscript (WSH), but it doesn't work here. I also tried a lot of other things but none worked.
Any help?
Thanks!
Edit: I have put all the backslashes right, but the page html seem to be "eating" some of them!
Edit2: Someone reedited my text to include an accent that doesn't remove the backslashes, so the expressions went wrong again. But I got it and fixed it. ;-)
Sorry, but this was my first post here. :)
As everyone else already mentioned this is not possible.
But, you can achieve what you want in Notepad++ by using a Macro.
Go to "Macro" > "Start Recording" menu, apply those two search and replace regular expressions, press "Stop Recording", then "Save Current Recorded Macro", there give it a name, assign a shortcut, and you are done. You now can reuse the same replacements whenever you want with one shortcut.
Since your replacement strings are totally different and use data that come not from any capture (i.e. [#42]), you can't.
Keep in mind that replacement strings are only masks, and can not contain any conditional content.

Regular Expression matching anything after a word

I am looking to find anything that matches this pattern, the beginning word will be:
organism aogikgoi egopetkgeopt foprkgeroptk 13
So anything that starts with organism needs to be found using regex.
^organism will match anything starting with "organism".
^organism(.*) will also capture everything that follows, into the variable that contains the first match (which varies according to language -- in Perl it's $1).
Also just wanna add for others newbies like me and their various circumstances, you can do it in various ways depending on your text and what you are tryna do.
Like here's an Example where I wanna delete everything after ?spam so I could use .?spm.+ or .?spm.+ or any other ways as long you are creative about it lol.
This might come in handy, here's a Link | Link where you can find some basic necessary regex and their meanings.