Regular expression to remove parenthesis and space before it - regex

I'm trying to write a regular expression (inside a Google Spreadsheet) to remove parenthesis, the text inside the parenthesis, and space before the parenthesis. Or in other words, I'm trying to extract only the name inside of the text. For example, I'd like the string "A.J. Smith (iOS Developer, San Francisco)" to become "A.J. Smith"
So far I've gotten both =REGEXEXTRACT(D2,"[^()]*") and =REGEXEXTRACT(D2,"^[^(]+") to extract "A.J. Smith " but it leaves that last space at the end. This is probably a really easy problem to solve, I'm just not great with regex.

Just use word boundary.
=REGEXEXTRACT(D2,"^[^(]+\\b")
^[^(]+ greedily matches all the characters upto the first ( symbol including the space which exists before (. Then it backtracks to the last word boundary appears on the matched string because of \b present in the regex.
DEMO

Try this instead:
=REGEXREPLACE(D2,"\s\(.*","")
What I'm doing is replacing everything from a space next to a parenthesis to the end of the string with nothing.

I used https://regoio.herokuapp.com/ to help build a regex to match. This regex would match this example without the space. ^(.+)\s\(
The regex works like this, The ^ matches the beginning of the string, the parenthesis captures whatever expression is inside that you want to use. in this case .+ which matches any character 1 or more times. The \s matchs a whitespace character and \( matches the opening parenthesis.
If you want a regex that removes whitespace at the beginning of the string and any before the parenthesis this should work: ^[\s]*(.+)[\s]+\(
With this regex you can extract all the text you wanted in a single REGEXEXTRACT instead of using multiple ones:
=REGEXEXTRACT(D2,"^[\s]*(.+)[\s]+\(")

I found that =REGEXEXTRACT(D2,"(.*)\s\(") also worked for me.

This should work to remove all parentheses and white space before:
=REGEXTRACT(D2,"\s|\(|\)|\[|]|{|}|")
Feel free to play around with this on rubular.

Related

Find everything except the search match

I need to write a notepad ++ regex to match everything besides my search criteria.
Fore example, if I have
James Bond (E1R1)
I have a regex to match E1R1. But I need to reverse it so I can get rid of everything besides E1R1.
So far I have ^(?!(?<=\().+?(?=\))$).*$. But it seems to match everything.
Use
^.*\(([^()\n\r]*)\)$|^(?!.*\(([^()\n\r]*)\)$).*\R?
Replace with $1.
See regex proof.
The expression finds lines ending with round brackets at the end, and removes all text outside those brackets. It will remove the entire line that contains no brackets at the end.
You could match from an opening till closing parenthesis and skip that match. Then match any single character which should be replaced by an empty string.
\([^()\r\n]*\)(*SKIP)(*F)|.
Explanation
\([^()\r\n]*\) Match from an opening till closing parenthesis (....)
(*SKIP)(*F) Skip the match
| Or
. Match any character except a newline
Regex demo

Modifying regex to match beginning and end characters

I am new to regex and playing around with writing regex to match markdown syntaxes, particularly italic text like:
this is markdown with some *italic text*
After writing some naive implementations I found this regex which seems to do the job quite nicely (dealing with edge-cases) and matches the entire string:
(?<!\*)\*([^ ][^*\n]*?)\*(?!\*)
However, I don't want to match the entire string - I only want to match the beginning and end * characters (so that I can do some special formatting to those characters). How might I go about doing that?
The tricky thing is that I only want to the match the * characters when the rest of the string matches the correct format of a string in italics (i.e. meets the requirements of that regex above). So a simple regex like (\*|\*) isn't going to cut it.
Except from using a capturing group for the asterix at the start and at the end, you can add an asterix to the first negated character class to prevent matching a double **.
Note that as pointed out by #toto you don't really need the capturing groups around the asterix (\*). You can also match them and add the replacement characters before and after the single capturing group for the content in the middle.
It also means that it should match at least a single character other then an asterix.
You don't have to make the first character class non greedy *? as it can not cross the * boundary that follows.
(?<!\*)(\*)([^*\s][^*\r\n]*)(\*)(?!\*)
Regex demo
If there can also not be a space before the ending asterix, you can repeat matching a space followed by matching any non whitespace char except an asterix (?: [^*\s]+)*
The \r\n in the negated character class is to prevent newline boundaries which are also matched by \s. If that should not be the case, you can replace that by a space or tab and space.
(?<!\*)(\*)([^*\s]+(?: [^*\s]+)*)(\*)(?!\*)
Regex demo
Just change the first and second \* to capturing groups and you can change at will:
(?<!\*)(\*)([^ ][^*\n]*?)(\*)(?!\*)
Demo

How to allow spaces in between words?

EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)

RegEx help for NotePad++

I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));

Regex for deleting characters before a certain character?

I'm very new at regex, and to be completely honest it confounds me. I need to grab the string after a certain character is reached in said string. I figured the easiest way to do this would be using regex, however like I said I'm very new to it. Can anyone help me with this or point me in the right direction?
For instance:
I need to check the string "23444:thisstring" and save "thisstring" to a new string.
If this is your string:
I'm very new at regex, and to be completely honest it confounds me
and you want to grab everything after the first "c", then this regular expression will work:
/c(.*)/s
It will return this match in the first matched group:
"ompletely honest it confounds me"
Try it at the regex tester here: regex tester
Explanation:
The c is the character you are looking for
.* (in combination with /s) matches everything left
(.*) captures what .* matched, making it available in $1 and returned in list context.
Regex for deleting characters before a certain character!
You can use lookahead like this
.*(?=x)
where x is a particular character or word or string.{using characters like .,$,^,*,+ have special meaning in regex so don't forget to escape when using it within x}
EDIT
for your sample string it would be
.*(?=thisstring)
.* matches 0 to many characters till thisisstring
Here is a one-line solution for matching everything after "before"
print $1."\n" if "beforeafter" =~ m/before(.*)/;
Edit:
While using lookbehind is possible, it's not required. Grouping provides an easier solution.
To get the string before : in your example, you have to use [^:][^:]*:\(.*\). Notice that you should have at least one [^:] followed by any number of [^:]s followed by an actual :, the character you are searching for.