How do i select only specific words? - regex

With the string ant apple bat I want to select only ant and bat without selecting apple or any white space.
I've tried a few different things like;
\bant\b\bbat\b
or
ant\bbbat\b
or
(\b(ant)\b.*\b(bat)\b)
or
ant(?:.*)bat
or
ant(?:\sapple\s)bat
or
(ant)*[^a-z\s]*(bat)
I know the above examples are stupid but i'm just messing around on a regex tester trying to figure it out. I did regex before and I figured it out but can't seem to remember how i did it. Seems pretty simple just to select "ant" and "bat" but obviously not. Is there some way to deselect a match or to remove part of the match?

Use a non capturing group with alternation:
(?:\bant\b|\bbat\b)

I guess you've understood it wrong. A single regex match will match only contiguous text. A simple regex like ant|bat with either match ant or bat. But if you want to match ant and bat ignoring the text in between would not work.
When using regex with any programming languages like python or java, you would loop through the multiple matches and perform the action accordingly.

If you just want two different words, that looks like this:
\b(ant|bat)\b
The \b is a word boundary
The parenthesis is a group
The pipe | is an or statement
a word boundary
followed by either the word ant or the word bat
ending in a word boundary
A word boundary is probably what you would expect it to be, whitespace, end of a line, punctuation

Related

How to capture the word between is after certain text after end with some text in regex?

I would like to find something like this:
-(IBOutlet)UIView *aView;
I would like to find aView, something that I can confirm is -(IBOutlet) must be a prefix, but it comes with not ensure a space or another string, after that, we need to string that must begin with '*', until it match the ;.
So, my regex look like that:
(IBOutlet)*\*?;
For sure, it can't capture what I want. Any advise?
You just have to build it up incrementally. The best reference that I have found (by far) is http://www.regular-expressions.info. After learning the basics, you can then use one of many online pattern matching tools, here is one:
https://regex101.com
With that, your goal is easily determined (with some allowances for free space):
^\s*-\s*\(IBOutlet\)(\w*)\s*(\*\w*)
First problem: you don't have a capturing group so how do you get aView back after the match?
Second, the \*? means "match the * character literally, 0 or 1 times", which I guess isn't what you want either.
Try this pattern:
(IBOutlet)*\*(.+);
RegEx 101 can explain what each component means.

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$

Replacing char in a String with Regular Expression

I got a string like this:
PREFIX-('STRING WITH SPACES TO REPLACE')
and i need this:
PREFIX-('STRING_WITH_SPACES_TO_REPLACE')
I'm using Notepad++ for the Regex Search and Replace, but i'm shure every other Editor capable of regex replacements can do it to.
I'm using:
PREFIX-\('(.*)(\s)(.*)'\)
for search and
PREFIX-('\1_\3')
for replace
but that replaces only one space from the string.
The regex search feature in Notepad++ is very, very weak. The only way I can see to do this in NPP is to manually select the part of the text you want to work on, then do a standard find/replace with the In selection box checked.
Alternatively, you can run the document through an external script, or you can get a better editor. EditPad Pro has the best regex support I've ever seen in an editor. It's not free, but it's worth paying for. In EPP all I had to do was this:
search: ((?:PREFIX-\('|\G)[^\s']+)\s+
replace: $1_
EDIT: \G matches the position where the previous match ended, or the beginning of the input if there was no previous match. In other words, the first time you apply the regex, \G acts like \A. You can prevent that by adding a negative lookahead, like so:
((?:PREFIX-\('|(?!\A)\G)[^\s']+)\s+
If you want to prevent a match at the very beginning of the text no matter what it starts with, you can move the lookahead outside the group:
(?!\A)((?:PREFIX-\('|\G)[^\s']+)\s+
And, just in case you were wondering, a lookbehind will work just as well as a lookahead:
((?:PREFIX-\('|(?<!\A)\G)[^\s']+)\s+
You have to keep matching from the beggining of the string untill you can match no more.
find /(PREFIX-\('[^\s']*)\s([^']*'\))/
replace $1_$2
like: while (/(PREFIX-\('[^\s']*)\s([^']*'\))/$1_$2/) {}
How about using Replace all for about 20 times? Or until you're sure no string contains more spaces
Due to nature of regex, it's not possible to do this in one step by normal regular expression.
But if I be in your place, I do such replaces in several steps:
find such patterns and mark them with special character
(Like replacing STRING WITH SPACES TO REPLACE with #STRING WITH SPACES TO REPLACE#
Replace #([^#\s]*)\s to #\1_ server times.
Remove markers!
I studied a little the regex tool in Notepad++ because I didn't know their possibilities.
I conclude that they aren't powerful enough to do what you want.
Your are obliged to learn and use a programming language having a real regex capability. There are a number of them. Personnaly, I use Python. It would take 1 mn to do what you want with it
You'd have to run the replace several times for each space but this regex will work
/(?<=PREFIX-\(')([^\s]+)\s+/g
Replace with
\1_ or $1_
See it working at http://refiddle.com/10z

Notepad++ Find/Replace Regex Help

I am having issues doing a string replacement in Notepad++, and need some help.
My file:
LastName,(tab)FirstName[optional]MiddleName
Some times there is data that has a middle name, sometimes not.
Public,JohnQ.
Doe,John
Clinton,WilliamJefferson
would be:
Public(tab)John(tab)Q
Doe(tab)John
Clinton(tab)William(tab)Jefferson
I want to split it out into this:
LastName(tab)FirstName(tab)MiddleName
Thanks for adding the sample input. It helps immensely to have that around. Try this and see if it does what you want.
Find, making sure Match case is checked:
([A-Z][a-z]*),([A-Z][a-z]*)(.*)
Replace with:
\1(tab)\2(tab)\3
Of course, (tab) is actually a tab character that you have to place in the replacement string yourself.
An ugly regex like this works for me on the example you've provided:
(\w+),(\w+?)(([A-Z]\w*\.?)?)\n
replace with
\1\t\2\t\3\n
Note:
This only works if the middle name starts with a letter in the A-Z. You might be able to replace [A-Z] with [[:upper:]] if notepad++ supports it (I don't know).
I need that second bracket around the middle name part because I need to match at least an empty string when there is no middle name.