Match character and any whitespaces around it [duplicate] - regex

This question already has answers here:
Splitting a String by number of delimiters
(2 answers)
Closed 2 years ago.
I have a file containing informations in the following format :
Fred,Frank , Marcel Godwin , Marion,Ryan
I need the match commas and any whitespace around them, but not any comma inside brackets.
My problem is that with my current regex [\s,]+ the whitespaces between words are matched. So in this example the whitespace between Marcel and Godwin.
I thought about using something like \s,\s* but it wouldn't match parts when there is no whitespace around the comma, like between Fred and Frank
Surely, it's a simple fix but I can't figure it out.

I think this will match the commas including the whitespace before and afterwards like you explained in your question.
\s*(?=\,)\,(?<=\,)\s*
This is a positive looahead: (?=\,), it means it matches any whitespace if there is a comma afterwards.
This is a positive lookbehind: (?<=\,), it means it matches any whitespace if there is a comma rigth before.
Try it out yourself. You can use this page to check the output in your browser.

Related

Regex if character matches then, else [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 7 months ago.
I have two regular expressions that work fine to extract text between characters:
(?<=\$)(.*)(?=\*)
(?<=\$)(.*)(?=)
For my example text $66* the first expression extracts 66. When the asterisk is not present in the text (i.e. $66), the second expression extracts 66.
How can I combine the two to use the first one if an asterisk is present and the second one if no asterisk is present?
I tried with what I thought would be an if|then|else like below but am doing something wrong: (?(?=\*)(?<=\$)(.*)(?=\*)|(?<=\$)(.*)(?=))
You can use a negated character set to exclude asterisks in your match instead:
(?<=\$)[^*]+
Demo: https://regex101.com/r/vuGBiJ/2
As you are already using a capture group, you could also match the $ and capture 1+ characters except the asterix.
\$([^*]+)
Regex demo

Notepad++: How to remove all string except containing period [duplicate]

This question already has answers here:
How to match only strings that do not contain a dot (using regular expressions)
(3 answers)
Closed 3 years ago.
I have numerous SELECT statements conjoined by UNION keyword in a single file. What I want to do is to extract all the db.table strings only? How can I delete all words not containing period (.) using regex in notepad++ editor? Database and table are the only ones with a period.
It's okay with me even if new lines are not removed. Though, as a learning bonus for everyone seeing this post, you can also show the regex that trims the new lines, that will show this output:
db.table1
db.table2
...
db.tablen
You may try the following find and replace, in regex mode:
Find: (?<=^|\s)[^.]+(?=$|\s)
Replace: <empty string>
Demo
Note that my replacement only removes the undesired terms in the query; it does not make an effort to remove stray or leftover whitespace. To do that, you can easily do a quick second replacement to remove whitespace you don't want.
Edit:
It appears that Notepad++ doesn't like the variable width lookbehinds I used in the pattern. Here is a refactored, and more verbose version, which uses strictly fixed width lookbehinds:
(^[^.]+$)|(^[^.]+(?=\s))|((?<=\s)[^.]+$)|((?<=\s)[^.]+(?=\s))
Demo
The logic in both of the above patterns is to match a word consisting entirely of non dot characters, which are surrounded on either side by one or more of the following:
start of the string (^)
end of the string ($)
any type of whitespace (\s)
My guess is that maybe this expression:
([\s\S]*?)(\S*(\.)\S*)
being replaced with $2\n or:
(\S*(\.)\S*)|(.+?)
with $1 might work.
Demo 1
Demo 2

Regex validation: Allow new lines but not spaces [duplicate]

This question already has answers here:
How to matches anything except space and new line?
(3 answers)
Closed 3 years ago.
I know there are loads of questions on this site around regex and validating against white spaces, believe me I've spent the past few hours looking. I've been unable to create a regex validation that matches the following requirements:
Fail validation if there are any spaces in this text (including at the start and end of the text)
Allow validation if the text has new lines in it
I very quickly found /s was not a good option as this fails on my second point. The closest I've managed to get is [A-Z]*[a-z]*[\" *"$] which flags exactly the reverse (spaces pass but everything else fails).
I've tried reversing it somehow but not having much success.
Anchor to the beginning of the string, repeatedly match anything but a space with [^ ]*, and anchor to the end of the string:
^[^ ]*$
This matches 0 or more non-space characters, but will permit newlines. (If you want 1 or more non-space characters, use + instead of *)

Does not match when the string does not have a dot but it will match multiple dots [duplicate]

This question already has answers here:
Regex to allow alphanumeric and dot
(3 answers)
Closed 4 years ago.
I am trying to match the string when there's 0 or multiple dots. The regex that I can only match multiple dots but not 0 dot.
(\w*)((\w*\.)+\w*)
These are the test string I am using
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
abc
The Regex will match these
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
But not this one:
abc
https://regexr.com/?38ed7
If you really must use a regex, here is one (but it is inefficient):
/^(?![^.]*\.[^.]*$).*$/
It says:
Match a string so that the beginning of the string is not followed by a whole string with a single dot.
It does some backtracking when parsing the negative lookahead.
As mentioned in the comments to the question, I do think, unless you must have a regex, that a simple function might be better. But if you like the conciseness of a regex and performance is not a huge concern, you can go with the one I gave above. Regexes with "nots" in them are generally a tad messy, but once you understand lookarounds they do become doable. Cheers.
/\..*\.|^[^.]*$/
Or, in plain English:
Match EITHER a dot, then any number of characters, then another dot; OR the beginning of the string, then any number of non-dots, then the end of the string.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101