Find slash that are NOT followed by non word character - regex

I am trying to write a regex for finding slashes only that are not followed by special characters.
For example, if the string is,
/PErs/#loc/g/2, then I regex should find slashes (/) that are before P, g and 2. It should not return slash before # as # is a special character.
I could write \/\w but it is returning me /P, /g and /2.

Simplest one by using word boundary \b.
\/\b
\b matches between a word character and a non-word character.
DEMO

You want to use the lookahead operator.
Positive lookahead or detect if something is present after (ahead)
Try this regex instead:
\/(?=\w)
DEMO
We use here the positive lookahead operator (?=). It will "detect" the position of a given expression but won't match the expression.
Negative lookahead or detect if something is NOT present after (ahead)
Alternatively, you can also use the negative look ahead operator (?!).
\/(?![#])
DEMO
Negative lookahead with multiple special characters
This will match any / NOT followed by #. If you have more special characters, simply add them to the character class.
For example, if # and % were special characters, the regular expression above would become:
\/(?![##%])
DEMO

Matching slashes NOT followed by NON word character is not the same than followed by word character.
Have a try with:
/(?!\W)
This matches slashes NOT followed by NON word character
It matches the final slash in string: PErs/

Related

Regex lookahead with unknown number of spaces

I am trying to capture a string that can contain any character but must always be followed by ';'
I want to capture it and trim the white space around it. I've tried using positive lookahead but that does not seem to exclude the whitespace.
Example:
this is a match ;
this is not a match
regex:
.+(?=\s*;)
result:
"this is a match " gets captured with trailing white space behind.
expected result:
"this is a match" (without whitespace)
You have to make sure the first and the last characters of your match are not spaces. Thus we use the non-whitespace character match (\S) before and after the all character match (.*). As spaces might be optional, the any character match (.) must be optional, thus we use * instead of +.
\S.*\S(?=\s*;)
If the string can start with space use .*\S(?=\s*;).
Demonstration
Thanks to #CarySwoveland for improving the answer.
You can match
.*(?<!\s)(?=\s*;)
provided the regex engine supports negative lookbehinds.
Demo
Note that this returns an empty string if the string is " ;".
You can make the dot non greedy and start the match with a non whitespace character:
\S.*?(?=\s*;)
Regex demo
If the non whitespace character itself should also not be a semicolon:
[^\s;].*?(?=\s*;)

Regex that matches strings that are all lower case and do not contain specific string

I need a regular expression to ensure that entries in a form 1) are all lower case AND 2) do not contain the string ".net"
I can do either of those separately:
^((?!.net).)*$ gives me strings that do not contain .net.
[a-z] only matches lower-cased inputs. But I have not been able to combine these.
I've tried:
^((?!.net).)(?=[a-z])*$
(^((?!.net).)*$)([a-z])
And a few others.
Can anyone spot my error? Thanks!
As you are using a dot in your pattern that would match any char except a newline, you can use a negated character class to exclude matching uppercase chars or a newline.
As suggested by #Wiktor Stribiżew, to rule out a string that contains .net you can use a negative lookahead (?!.*\.net) where the .net (note to escape the dot) is preceded by .* to match 0+ times any character.
^(?!.*\.net)[^\nA-Z]+$
^ Start of string
(?!.*\.net) negative lookahead to make sure the string does not contain .net
[^\nA-Z]+ Match 1+ times any character except a newline or a char A-Z
$ End of string
Regex demo

Unmatch complete words if a negative lookahead is satisfied

I need to match only those words which doesn't have special characters like # and :.
For example:
git#github.com shouldn't match
list should return a valid match
show should also return a valid match
I tried it using a negative lookahead \w+(?![#:])
But it matches gi out of git#github.com but it shouldn't match that too.
You may add \w to the lookahead:
\w+(?![\w#:])
The equivalent is using a word boundary:
\w+\b(?![#:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w#:])
Or
(?<!\S)\w+(?![\w#:])
The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.
You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:
(?<!\S)\w+(?!\S)
Demo: https://regex101.com/r/cjhUUM/2

Python regex match certain floating point numbers

I'm trying to match: 0 or more numbers followed by a dot followed by ( (0 or more numbers) but not (if followed by a d,D, or _))
Some examples and what should match/not:
match:
['1.0','1.','0.1','.1','1.2345']
not match:
['1d2','1.2d3','1._dp','1.0_dp','1.123165d0','1.132_dp','1D5','1.2356D6']
Currently i have:
"([0-9]*\.)([0-9]*(?!(d|D|_)))"
Which correctly matches everything in the match list. But for those in the things it should not match it incorrectly matches on:
['1.2d3','1.0_dp','1.123165d0','1.132_dp','1.2356D6']
and correctly does not match on:
['1d2','1._dp','1D5']
So it appears i have problem with the ([0-9]*(?!(d|D|_)) part which is trying to not match if there is a d|D|_ after the dot (with zero or more numbers in-between). Any suggestions?
Instead of using a negative lookahead, you might use a negated character class to match any character that is not in the character class.
If you only want to match word characters without the dD_ or a whitespace char you could use [^\W_Dd\s].
You might also remove the \W and \s to match all except dD_
^[0-9]*\.[^\W_Dd\s]*$
Explanation
^ Start of string
[0-9]*\. Match 0+ times a digit 0-9 followed by a dot
[^\W_Dd\s]* Negated character class, match 0+ times a word character without _ D d or whitespace char
$ End of string
Regex demo
If you don't want to use anchors to assert the start and the end of the string you could also use lookarounds to assert what is on the left and right is not a non whitspace char:
(?<!\S)[0-9]*\.[^\W_Dd\s]*(?!\S)
Regex demo
\d*[.](?!.*[_Dd]).* is what you are looking for:

Regular expression with a set with a character followed by a character

I'm writing a regular expression in Java for capturing some word without spaces.
The word can contain only letter, number, hyphens and dot.
The character set [\w+\-\\.] work well.
Now I want to edit the set for allowing a single space after the dot.
How I have to edit my regular expression?
You can add an alternation that matches this additional requirement
([\w\-.]|(?<=\.) )+
See it here on Regexr
(?<=\.) is a lookbehind assertion. It ensures that space is only matched, if it is preceded by a dot.
Other hints:
\w contains the underscore and matches per default only ASCII letters/digits. If you care about Unicode, use either the modifier UNICODE_CHARACTER_CLASS to enable Unicode for \w or use the Unicode properties \p{L} and \p{Nd} to match Unicode letters and digits.
You don't need to escape the dot in a character class.
You have \w+ in your character class, are you aware, that you just add the "+" character to the accepted characters?
In case of a dot followed by a space, I suppose this pattern should be neither the first, nor the last in the matched string? You may want to enclose it in word boundaries \b:
([0-9A-Za-z-]|\b\.( \b)?)+
I deliberately did not use \w, to exclude underscores.
For allowing ONLY a single space after the dot you can use this regex:
^(?!.*?\. {2})[\w.-]+$
You don't need to escape dot OR hyphen inside character class
(?!.*?\. {2}) is a negative lookahead that disallows 2 or more spaces after a dot