How to delete words containing a specific character in RegEx? - regex

Using Notepad++'s find and replace feature using regular expressions, I want to get rid of every word with the number symbol (#) attached to it, particularly in the beginning of the word in my case.
For example, how do I make:
The #7kfe dog likes #9kea to eat pizza
into:
The dog likes to eat pizza
Any help would be greatly appreciated. Thank you.

Most editors that have find-and-replace using regular expressions work similarly... in the 'find' field, look for #\w* and in the replace field, use (empty string). This will leave double spaces (the space that was before your word and the space that was after your word)... you can either tweak the expression above to something like #\w* ? (so that the space is optional, in case the word in question is the last word of the line), or you can do a second search-and-replace that collapses multiple spaces into one.

Most other responses will give you words starting with # which seems to be what you want, but to suit your question in your OP ("particularly in the beginning"), this will select every word with a # in it (anywhere):
/(\w*#\w+|\w+#\w*)/
DEMO

find: (\W)#\w+
replace: \1
(obviously also set it to regex mode)
The \W looks for a non-word character, to ensure the # is at the beginning of a word. The \1 in the replace puts that character back.

#\w*
use this regex.
will match every word after #

Related

RegEx help for NotePad++

I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));

Allowing words picked up in regex in certain cases only

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks
You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.
The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Regex to find a string that may contain a single string word

I am trying to write a regex that matches for a specific string as long as it does not contain a single string word.
Below, I want to return "I think one is cool" but not "one" because I only want it as long as it's not by itself.
Ex.
one
I think one is cool <--- I want this "one"
Any help would be greatly appreciated
For regex, the beginning of a string will be typically signified with ^ (carat) and end with $ (US Dollar sign)
Many flavors of regex allow you to do forward/backward lookarounds, so basically you want to find the word one that is not by itself, but part of a string.
You're looking for the word one, so you can use \b around the word, which is usually syntax for a word boundary. This helps you filter out searches like none.
So here is the regex that would work for you:
(?<!^)\bone\b(?!$)
This means that out of the following strings, only the bolded text will be a match:
one
is the one
one for all
i can none of
If I understand you correctly, you want to match lines containing your word, but not consisting of only your word.
Depending on your programming language there might be better ways to do this, but you can search for the regex /(\w+\sone|one\s\w+)/ to find lines containing something like a word, then a space, then "one", or "one", a space, and then something like a word. So this would match every line here:
one two three
this is one line
the number one
but no line here:
one
lonely
something else
If you want it to match something like "lonely", remove whitespace escape sequences (\s). If you want to match not only word-characters before and/or after, replace the \w with a dot ..

Regex for deleting characters before a certain character?

I'm very new at regex, and to be completely honest it confounds me. I need to grab the string after a certain character is reached in said string. I figured the easiest way to do this would be using regex, however like I said I'm very new to it. Can anyone help me with this or point me in the right direction?
For instance:
I need to check the string "23444:thisstring" and save "thisstring" to a new string.
If this is your string:
I'm very new at regex, and to be completely honest it confounds me
and you want to grab everything after the first "c", then this regular expression will work:
/c(.*)/s
It will return this match in the first matched group:
"ompletely honest it confounds me"
Try it at the regex tester here: regex tester
Explanation:
The c is the character you are looking for
.* (in combination with /s) matches everything left
(.*) captures what .* matched, making it available in $1 and returned in list context.
Regex for deleting characters before a certain character!
You can use lookahead like this
.*(?=x)
where x is a particular character or word or string.{using characters like .,$,^,*,+ have special meaning in regex so don't forget to escape when using it within x}
EDIT
for your sample string it would be
.*(?=thisstring)
.* matches 0 to many characters till thisisstring
Here is a one-line solution for matching everything after "before"
print $1."\n" if "beforeafter" =~ m/before(.*)/;
Edit:
While using lookbehind is possible, it's not required. Grouping provides an easier solution.
To get the string before : in your example, you have to use [^:][^:]*:\(.*\). Notice that you should have at least one [^:] followed by any number of [^:]s followed by an actual :, the character you are searching for.

Regex to match whole word with a particular definition of a word

I am doing a file search and replace for occurrences of specific words in perl. I'm not usually much of a perl or regex user. I have searched for other regex questions here but I couldn't find one which was quite right so I'm asking for help. My search and replace currently looks like this:
s/originalword/originalword_suffix/g
This matches cases of originalword that appear in the middle of another word, which I don't want. In my application of search and replace, a whole word can be defined as having the letters of the latin alphabet in lowercase or capital letters and the digits 0-9 and the symbol _ in any uninterrupted sequence. Anything else besides these characters, including any other symbols or any form of whitespace including line breaks or tabs, indicate operations or separators of some kind so they are outside the word boundaries. How do I modify my search and replace to only match whole words as I've defined them, without matching substrings?
Examples:
in the case that originalword = cat and originalword_suffix = cat_tastic
:cat { --> :cat_tastic {
:catalog { --> no change
Use the \b anchor to match only on a word boundary:
s/\bcat\b/cat_tastic/g
Although Perl has a slightly different definition of what a "word" is. Reading the perlre reference guide a couple of times might help you understand regexps a bit better.
Running perl -pi -e "YOUR_REGEXP" in a terminal and entering in lines of text can help you understand and debug what a particular regexp is doing.
You could try:
s/([^0-9a-z_])([0-9a-z_]+)([^0-9a-z_])/$1$2_tastic$3/gi
Basically, a non-word character, then a set of word characters, followed by a non-word character. The $1,$2,$3 represent the captured groups, and you replace $2 with $2_suffix.
Hope that helps, not a perl guy buy pretty regex-savvy. Note that the above will fail if the word is the very first or very last thing in a string. Not sure if perl regexen allow the syntax, but if so, fixing the first/last issue could be done with:
s/(^|[^0-9a-z_])([0-9a-z_]+)([^0-9a-z_]|$)/$1$2_tastic$3/gi
Using ^ and $ to match beginning/end of string.
See the example on this page which explains boundary matchers
Enter your regex: \bdog\b
Enter input string to search: The dog plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.
Enter your regex: \bdog\b
Enter input string to search: The doggie plays in the yard.
No match found.