Regex for matching some characters that should be left out later - regex

I will try to explain my situation with an example, consider the following string:
03 - The-Basics-of-Querying-the-Dom.mov
I need to remove all -s (hyphens) excluding the one after the digits. In other words, all hyphens in between the words.
This is the REGEX I created: /([^\s])\-/. But the problem is, when I try to replace, the character before the space is also removed.
Following the result I am aiming for:
03 - The Basics of Querying the Dom.mov
Think, I can use something like exclude groups? I tried to use ?: & ?! in the capture group to avoid it from being matched, but didn't give any positive results.

You can do:
(?<=\w)-(?=\w)
Demo

I just modified your already proposed RegEx by using a positive lookbehind (which only asserts the correct position):
/(?<=[^\s])\-/

Related

Select Northings from a 1 Line String

I have the following string;
Start: 738392E, 6726376N
I extracted 738392 ok using (?<=.art\:\s)([0-9A-Z]*). This gave me a one group match allowing me to extract it as a column value
.
I want to extract 6726376 the same way. Have only one group appear because I am parsing that to a column value.
Not sure why is (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) giving me the entire line after S.
Helping me get it right with an explanation will go along way.
Because you used positive lookaheads. Those just make some assertions, but don't "move the head along".
(?=(art\:\s\s*)) makes sure you're before "art: ...". The next thing is another positive lookahead that you quantify with a star to make it optional. Finally you match anything, so you get the rest of the line in your capture group.
I propose a simpler regex:
(?<=(art\:\s))(\d+)\D+(\d+)
Demo
First we make a positive lookback that makes sure we're after "art: ", then we match two numbers, seperated by non-numbers.
There is no need for you to make it this complicated. Just use something like
Start: (\d+)E, (\d+)N
or
\b\d+(?=[EN]\b)
if you need to match each bit separately.
Your expression (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) has several problems besides the ones already mentioned: 1) your first and second lookahead match at different locations, 2) your second lookahead is quantified, which, in 25 years, I have never seen someone do, so kudos. ;), 3) your capturing group matches about anything, including any line or the empty string.
You match the whole part after it because you use .* which will match until the end of the line.
Note that this part [0-9]* at the end of the pattern does not match because it is optional and the preceding .* already matches until the end of the string.
You could get the match without any lookarounds:
(art:\s)(\d+)[^,]+,\s(\d+)
Regex demo
If you want the matches only, you could make use of the PyPi regex module
(?<=\bStart:(?:\s+\d+[A-Z],)* )\d+(?=[A-Z])
Regex demo (For example only, using a different engine) | Python demo

Regex for selecting words ending in 'ing' unless

I want to select words ending in with a regular expression, but I want exclude words that end in thing. For example:
everything
running
catching
nothing
Of these words, running and catching should be selected, everything and nothing should be excluded.
I've tried the following:
.+ing$
But that selects everything. I'm thinking look aheads/look arounds could be the solution, but I haven't been able to get one that works.
Solutions that work in Python or R would be helpful.
In python you can use negative lookbehind assertion as this:
^.*(?<!th)ing$
RegEx Demo
(?<!th) is negative lookbehind expression that will fail the match if th comes before ing at the end of string.
Note that if you are matching words that are not on separate lines then instead of anchors use word boundaries as:
\w+(?<!th)ing\b
Something like \b\w+(?<!th)ing\b maybe.
You might also use a negative lookahead (?! to assert that what is on the right is not 0+ times a word character followed by thing and a word boundary:
\b(?!\w*thing\b)\w*ing\b
Regex demo | Python demo

Regex: how do I match a character before other capture characters?

I'm trying to match on a list of strings where I want to make sure the first character is not the equals sign, don't capture that match. So, for a list (excerpted from pip freeze) like:
ply==3.10
powerline-status===2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
psutil==4.0.0
ptyprocess==0.5.1
I want the captured output to look like this:
==3.10
==4.0.0
==0.5.1
I first thought using a negative lookahead (?![^=]) would work, but with a regular expression of (?![^=])==[0-9]+.* it ends up capturing the line I don't want:
==3.10
==2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
==4.0.0
==0.5.1
I also tried using a non-capturing group (?:[^=]) with a regex of (?:[^=])==[0-9]+.* but that ends up capturing the first character which I also don't want:
y==3.10
l==4.0.0
s==0.5.1
So the question is this: How can one match but not capture a string before the rest of the regex?
Negative look behind would be the go:
(?<!=)==[0-9.]+
Also, here is the site I like to use:
http://www.rubular.com/
Of course it does some times help if you advise which engine/software you are using so we know what limitations there might be.
If you want to remove the version numbers from the text you could capture not an equals sign ([^=]) in the first capturing group followed by matching == and the version numbers\d+(?:\.\d+)+. Then in the replacement you would use your capturing group.
Regex
([^=])==\d+(?:\.\d+)+
Replacement
Group 1 $1
Note
You could also use ==[0-9]+.* or ==[0-9.]+ to match the double equals signs and version numbers but that would be a very broad match. The first would also match ====1test and the latter would also match ==..
There's another regex operator called a 'lookbehind assertion' (also called positive lookbehind) ?<= - and in my above example using it in the expression (?<=[^=])==[0-9]+.* results in the expected output:
==3.10
==4.0.0
==0.5.1
At the time of this writing, it took me a while to discover this - notably the lookbehind assertion currently isn't supported in the popular regex tool regexr.
If there's alternatives to using lookbehind to solve I'd love to hear it.

Capture groups inside string using regular expression

i dont know much about regular expressions and from what i'v learned i cant solve my entire problem.
I have this String:
04 credits between subjects of block 02
I'm only sure i will have [00-99] on the beggining and at end.
I wanna capture the beggining and the end IF the middle has "credits between", the system can have other formats as input, so i wanna be sure that these fields captured will go from the correct pattern.
This is what i'v tried to do:
(\w\w) ^credits between$.+ (\w\w)
I'm using the Regexr website to see what i'm doing, but no success.
You may use the following regex:
^(\d{2})\b.*credits between.*\b(\d{2})$
See regex demo
It will match and capture 2 digits at the beginning and end if the string itself contains credits between. Note that newlines can be supported with [\s\S] instead of ..
The word boundaries \b just make the engine match the digits followed by a non-word character (you may remove it if that is not expected behavior). Then, you'd need to use ^(\d{2})\b.*credits between.*?(\d{2})$ with the lazy matching .*? at the end.
If the number of digits in the numbers at both ends can vary, just use
^(\d+).*credits between.*?(\d+)$
See another demo

Regex: negative match on group of characters?

I want to create a regular expression that will match all strings starting with 0205052I0 and then where the next two characters are not BB.
So I want to match:
0205052I0AAAAAA
0205052I0ACAAAA
0205052I0BCABAA
But not match:
0205052I0BBAA
How can I do this with PCRE regular expressions?
I've been trying $0205052I0^(BB) on https://regex101.com/ but it doesn't work.
You can use a negative look ahead :
"0205052I0(?!BB).*"
See demo https://regex101.com/r/mO6uV4/1
Also note that you have putted the anchors at a wrong position. If you want to use anchor you can use following regex
:
"^0205052I0(?!BB).*$"
Just in case: ^ is for NOT in character classes, only. E.g.: [^B]. In your case, you would need something like
0205052I0(B[^B]|[^B]B|[^B][^B])
for the described effect.
See it in action: RegEx 101
Which is rather cumbersome, though. The negative lookahead as suggested by #Kasra is by far the better option.
Still - if you actually wanted to capture the matched expression, you needed to add parentheses:
(0205052I0(?:B[^B]|[^B]B|[^B][^B]).*)
or -again- better (in the sense of readability/extensibility/maintainability)
(0205052I0(?!BB).*)
RegEx 101
But if you want to keep the strings, which do not contain the BB, you might be better off, to match these and to replace them with nothing: (0205052I0(?=BB).*)
RegEx 101
Your sample strings having leading blanks, I didn't add anchors into the picture...
However, talking of anchors: $ is for end of line - but not for line break as your attempt might be read...
Please comment, if and as this requires adjustment / further detail.