Regex - Finding fullstops (periods) that aren't followed by a space - regex

I'm trying to create a simple Grammar correction tool.
I want to create a regular expression that finds fullstops (" . ") that are not followed by a space so I can replace that with a fullstop and space.
For e.g. This is a sentence.This is another sentence.
Only the first fullstop in the above example should be matched in the expression.
I've tried /\.[^\s]/g but it returns an additional character after the matched fullstop. I would like to match only the fullstop.
How can I do this?

The negated character class [^\s] in the pattern expects a match (any character except a whitespace character), that is why you have the additional character.
If you want to match the dot only, you could use a negative lookahead to assert what is on the right is not a whitspace char or the end of the string:
\.(?!\s|$)
Regex demo
To not match a dot that is not followed by a whitespace char excluding a newline:
\.(?![^\S\r\n])
Regex demo

You can look for all dots using:
(\.)
This will match all dots on below examples:
This is a sentence.This is another sentence.
i am looking. for dots. . ...
You can add a |$ to seek for end of line, and with a little tweak, you get a regex that match all dots not followed by whitespace nor being on the end of a line:
(\.(?!\ |$))
Note that there's a whitespace as literal here. The "must-work-everywhere" example will be like:
(\.(?![[:space:]]|$))
If not, search on the regex reference on the language you use.

Related

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

Regex match last occurrence of substring among the same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
If you want the match only, you can use \K to reset the match buffer right before the parts that you want to match:
^.*\K/a\d?sd/\S+
The pattern will match
^ Start of string
.* Match any char except a newline until end of the line
\K Forget what is matched until now
/a\d?sd/ match a, optional digits and sd between forward slashes
\S+ Match 1+ non whitespace chars
See a regex demo

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regex capturing any text between

I'm trying to capture text (any text) that falls between some kind of delimiter with word boundaries on each end, like so:
This is not the text. ##This is the text I want to capture.## This is also not the text. ##But I would like to capture this, too##.
I thought this would be easy with regex like this
\b([#]{2})(.*)(\1)\b
This doesn't produce a match and I can't figure why.
Note, I would also like to avoid capturing the text between the first '##' and the last '##', capturing both sections with all the text in between.
In other words I don't want one of the matches to be:
##This is the text I want to capture.## This is also not the text. ##But I would like to capture this, too##
georg and Ulugbek Umirov posted the perfect answer on this question as comment. I repeat the expression here with an explanation mainly to give the question an answer and therefore remove it from the list of unanswered questions.
##\b(.+?)## searches for a string
starting and ending with ## and
with a word character at beginning and
having 1 or more characters between.
Because of the parentheses the string found between ## is marked for backreference.
The question mark ? after the + multiplier changes the matching behavior from greedy to non greedy. The greedy expression .+ matches everything from first ## to last ## whereas the non greedy expression .+? matches just everything from first ## to next ##.
\b means word boundary and therefore the first character after ## must be a word character (letter, digit or underscore).
The matching behavior of . depends on a flag. The dot can match any character including line terminating characters, or any character except line terminating characters. Line terminating characters are carriage return (= \r = CR) and line feed (= newline = \n = LF).
If matching everything between two delimiter strings should be independent on matching behavior of the dot, it is better to use the regular expression ##\b([\w\W]+?)## like Ulugbek Umirov suggested as \w matches any word character and \W matches any non word character. Both in a character class definition matches therefore always any character including CR and LF.
It would be also possible to use ##\b([\s\S]+?)## where \s matches any whitespace character and \S matches any non whitespace character resulting with both in a character class definition in matching any character including CR and LF, too.
Further it would be possible to use ##(\w[\s\S]*?)## or ##\w([\w\W]*?)## or ##(\w.*?)## all resulting in the same matching behavior as all other expressions above, if the matching behavor for dot is any character including CR+LF.
Last, if the used regular expression engine supports lookbehind and lookahead, it would be also possible to match only the string between ## without matching the delimiters by using for example the regular expression (?<=##)\b[\w\W]+?(?=##) which makes the need of a marking group unnecessary. (?<=##) is a positive lookbehind expression and (?=##) is a positive lookahead expression both for the string ##.

Regex search for characters like "/", "<" and ">"

What should be the regex pattern if my texts contain the characters like "\ / > <" etc and I want to find them. That's because regex treats "/" like it's part of the search pattern and not an individual character.
For example, I want to find Super Kings from the string <span>Super Kings</span>, using VB 2010.
Thanks!
Just try this:
\bYour_Keyword_to_find\b
\b is used in RegEx for matching word boundary.
[EDIT]
You might be looking for this:
(?<=<span>)([^<>]+?)(?=</span>)
Explanation:
<!--
(?<=<span>)([^<>]+?)(?=</span>)
Options: case insensitive; ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<span>)»
Match the characters “<span>” literally «<span>»
Match the regular expression below and capture its match into backreference number 1 «([^<>]+?)»
Match a single character NOT present in the list “<>” «[^<>]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=</span>)»
Match the characters “</span>” literally «</span>»
-->
[/EDIT]
In regex you must escape the / with \.
For instance, try: <span>(.*)<\/span> <span>([^<]*)<\/span> or <span>(.*?)<\/span>
Read more from:
http://www.regular-expressions.info/characters.html