Using Regex in Word to Mass Modify Entries - regex

How can I find the expression '([anumber][anumber],' in word?
I have [0-9][0-9], but this is repeated a few times per option therefore it removes everything in the pattern. How do i either dictate either to remove **(**[0-9][0-9], with that left parentheses or only remove [0-9][0-9], for the first instance of it in each line?

Let's clarify your problem first:
You have something like (0123, (4567, etc., and
you want to make them (23, (67, respectively.
In the replace dialog,
put ([(])[0-9][0-9] in the find box, and
put \1 in the replace with box.
Actually, put ( in the replace with box is just fine, but \1 is a more flexible option.

Related

How to use regex to look around a complex pattern?

I have the following html element in Sublime Text:
<div class="exg"><div><strong class="syn">investigate</strong><span class="syn">, conduct investigations into, make inquiries into, inquire into, probe, examine, explore, research, study, look into, go into</span></div>
I want to use regex to select the content after and including the 5th comma in this element, stopping before
</span></div>.
So, in this case I'd want to select:
, examine, explore, research, study, look into, go into
So far, I was able to write this regex, which works:
(<div class="exg"><div><strong class="syn">(\w+)((\s)?(\w+)?)+</strong><span class="syn">((\,((\s)?(\w+)?)+)?){5})
This allows me to select the part before what I need to select. I tried to use this with a positive lookbehind, but it isn't working and I can't figure out how to fix it. Here is what I tried:
(?<=(<div class="exg"><div><strong class="syn">(\w+)((\s)?(\w+)?)+</strong><span class="syn">((\,((\s)?(\w+)?)+)?){3}))((\,?((\s)?(\w+)?)+?)+)
You make a heavy use of parenthesis. Also your expression for catching words between commas could be simpler. Replacing your groups with non capturing ones, you'll get the expected match in your first (and only) group with this regex:
(?<=<div class="exg"><div><strong class="syn">)(?:\s?\w)*<\/strong><span class="syn">(?:,(?:\s?\w)*){4}(.*?)(?=<\/span><\/div>)
BTW if you want to capture the 5th comma I think your quantifier should be {4} (but I might have misunderstood)
Check the Demo
Update:
If you're looking to delete the matched group (i.e. replacing it with an empty string). Just do the opposite: build one group before and one after:
(<div class="exg"><div><strong class="syn">(?:\s?\w)*<\/strong><span class="syn">(?:,(?:\s?\w)*){4}).*?(<\/span><\/div>)
Demo
Then replace in your editor with \1\2(groups one after the other, without the previously matched string inbetween)

Limit g flag in regex substitution to part of line?

I have a file like this:
"File_name_1.dat" "File_name_1.dat"
"File_name_2.dat" "File_name_2.dat"
"Some_other_thing.dat" "Some_other_thing.dat"
Is there a regex technique can I use to replace the underscores in only the second file name on each line, like this?
"File_name_1.dat" "File name 1.dat"
"File_name_2.dat" "File name 2.dat"
"Some_other_thing.dat" "Some other thing.dat"
I tried matching the column (\%XXc in Vim), but it seems to disable the g flag.
This only replaces the first underscore after column 25:
:%s/\%25c\([^_]*\)\zs_/ /g
This only replaces the last underscore in the line:
:%s/\%25c\(.*\)\zs_/ /g
I know I could repeat that command until they're gone, but I was wondering if there is a slicker way to do it.
Yes, there is an easy way to do this with visual selections. It's convenient that your data is layed out nicely, otherwise this wouldn't work.
Visually select all of the second filenames
Run this regex:
:'<,'>s/\%V_/ /g
The \%V will restrict your substitute to the inside of the current visual selection. Here's a screen shot of what I mean:
There are probably many ways to do this. Since the data is formatted nicely I would probably visually select and delete the first column (with <c-v>), Then run :%s/_/ /g. Then paste back the first column.
If you really wanted to do this in a single regex, you would need to use a lookbehind
:%s/\(\%25c.\{-}\)\#<=_/ /g
Where \#<= matches if the preceding element matches. :help \#<=

Going through lines and doing deletion of a text on particular criteria

Is it possible to delete everything after first word in a line using Notepad++? I will need this to go through each line.
Also I would like to know if it is possible to delete only first word in a line, leaving everything behind intact.
You should make replacement with Regex. Then in "Find what" you put
^(\S+)(.*)$
and in Replace with you put
$1
I assume you separate the first word with space, tab or something similar
For the second task you put
$2
in "Replace with" instead, but it leaves the first character like space or tab intact, if that's what you want.

Remove text appearing after numbers in Notepad++ using regular expressions

I have a large text file which contains many timestamps. The timestamps look like this: 2013/11/14 06:52:38AM. I need to remove the last two characters (am/pm/AM/PM) from each of these. The problem is that a simple find and replace of "AM" may remove text from other parts of the file (which contains a lot of other text).
I have done a find using the regular expression (:\d\d[ap]m), which in the above example would track down the last bit of the timestamp: :38AM. I now need to replace this with :38, but I don't know how this is done (allowing for any combination of two digits after the colon).
Any help would be much appreciated.
EDIT: What I needed was to replace (:\d\d)[ap]m with \1
Make (:\d\d[ap]m) into (:\d\d)[ap]m and use $1 not \1
Go to Search > Replace menu (shortcut CTRL+H) and do the following:
Find what:
[0-9]{2}\K[AP]M
Replace:
[leave empty]
Select radio button "Regular Expression"
Then press Replace All
You can test it at regex101.
Note: the use of [0-9] is generally better than \d (read why), and avoiding to use a capture group $1 with the use of \K is considered better. It's definitely not important in your case, but it is good to know :)

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.