Having at least one-nonwhitespace - regex

str must be true if it has at least one non-whitespace enclosed in the parenthesis:
str = (a)
str = ( as bs)
str = (as e)
and false if it has non-whitespace at all
str = ( )
Im not sure if i can do that + but this condition is also passing the 0 non-whitespaces. Correct it please.
/^\([\S+\s*]+\)$\.test(str)/

You can use this:
/^\(.*\S.*\)$/.test(str)
This matches any character, then a non-whitespace character (that makes it at least one non-whitespace character), and then any character till the end.

You can use the following:
^\((?!\s*\)).+\)$
This matches an open parentheses ( and then it fails if it is followed just by whitespaces and a ), or it accepts the entire line.

Assuming str must satisfy TRUE and FALSE and nesting is implicitly not allowed
^(?:[^()]*\([^\S()]*[^\s()][^\S()]*\))+[^()]*$
expanded
^
(?:
[^()]*
\(
[^\S()]*
[^\s()]
[^\S()]*
\)
)+
[^()]*
$

Related

Regex: Match every occurrence of dash in between two specific symbols

I have this string: ](-test-word-another)
I'm trying to find every single occurrence of - in between ] and )
Basically the return should be: ](-test-word-another)
Currently I have (?<=\]\()(-)(?=\)) but that just finds if there is only one -
Thank you in advance
Try this: /(?<=\].*)-(?=.*\))/gm
Test here: https://regex101.com/r/xkmTZs/3
This basically matches all - only if they occur after a ] and before ).
You can use
text.replace(/]\([^()]*\)/g, (x) => x.replace(/-/g, '_'))
See the demo below:
const text = 'Word-word ](-test-word-another) text-text-.';
console.log( text.replace(/]\([^()]*\)/g, (x) => x.replace(/-/g, '_')) );
The ]\([^()]*\) regex matches a ]( substring, then any zero or more chars other than ( and ) and then a ), and then all - chars inside the match value (x here) are replaced with _ using (x) => x.replace(/-/g, '_').
Another, single regex solution can look like
(?<=]\((?:(?!]\()[^)])*)-(?=[^()]*\))
See this regex demo. It matches any - that is immediately preceded with ]( and any zero or more chars other than ) that does not start a ]( char sequence, and is immediately followed with any zero or more chars other than ( and ) and then a ) char.

Regex: Deal \r\n as normal word

I'm doing a small project which can calculate the count of functions in C++ files(.cpp).
I used the following Regex as "function pattern":
/[a-z|A-Z]+\s*::\s*~?[a-z|A-Z]+\(.*\)/gm
It works for most cases, but fails when there are new line breaks in ().
void CXYZRScanPanel::OnPrepareScanning()
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k)
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k,
int j)
{
//This one fails.
}
I'm thinking if there is anything "stronger" than the .* which can skip the \r\n.
Thanks for any help.
If there is no such a thing, I will probably remove all /r/n within () before doing the such.
You could write the pattern using a negated character class starting with [^ matching any char except ( and ) which will also match a newline.
Note that you can omit the | in the character class.
[a-zA-Z]+\s*::\s*~?[a-zA-Z]+(\([^()]*\))
The pattern matches:
[a-zA-Z]+ Match 1+ times chars a-zA-Z
\s*::\s* Match :: between optional whitespace chars
~? Match an optional ~ char
[a-zA-Z]+ Match 1+ times chars a-zA-Z
( Capture group 1
\([^()]*\) Optionally match any char except ( and ) between parenthesis
) Close group 1
See a regex demo

Split string on commas ignoring commas, brackets, braces in parenthesis, quotes

I am attempting to split a comma separated list. I want to ignore commas that are in parenthesis, brackets, braces and quotes using regex. To be more precise I am trying to do this in postgres POSIX regexp_split_to_array.
My knowledge of regex is not great and by searching on stack overflow I was able to get a partial solution, I can split the string if it does not contain nested parenthesis, brackets, braces. Here is the regex:
,(?![^()]*+\))(?![^{}]*+})(?![^\[\]]*+\])(?=(?:[^"]|"[^"]*")*$)
Test case:
0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3], "text, text (test)", {a1:1, a2:3, a3:{a1=1, s2=2}, a4:"asasad, sadsas, asasdasd"}
Here is the demo
The problem is that in i.e. (1,2,(1,2)) the first 2 commas get matched if there is a nested parenthesis.
Even though regex is not the best way to go, here is a solution with recursive matching:
(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ]+))(?>[ ]*(?R))*
If we break it down, there is a group with some stuff inside, followed by more of the same kind of matching, separated by optional spaces.
(?> <---- start matching
... <---- some stuff inside
) <---- end matching
(?>
[ ]* <---- optional spaces
(?R) <---- match the entire thing again
)* <---- can be repeated
From your example 0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3],..., we want to match:
0
(1,2)
(1,2,(1,2)) [1,2,3,[1,2]]
[1,2,3]
...
For the third match, the stuff inside will match (1,2,(1,2)) and [1,2,3,[1,2]], which are separated by a space.
The stuff inside is a series of options:
(?>
(?>...)| <---- will match balanced ()
(?>...)| <---- will match balanced []
(?>...)| <---- will match balanced {}
(?>...)| <---- will match "..."
(?>...) <---- will match anything else without space or comma
)
Here are the options:
\( <---- literal (
[^()]* <---- any number of chars except ( or )
(?R)? <---- match the entire thing optionally
[^()]* <---- any number of chars except ( or )
\) <---- literal )
\[ <---- literal [
[^[\]]* <---- any number of chars except [ or ]
(?R)? <---- match the entire thing optionally
[^[\]]* <---- any number of chars except [ or ]
\] <---- literal ]
{ <---- literal {
[^{}]* <---- any number of chars except { or }
(?R)? <---- match the entire thing optionally
[^{}]* <---- any number of chars except { or }
} <---- literal }
" <---- literal "
[^"]* <---- any number of chars except "
" <---- literal "
[^(){}[\]", ]+ <---- one or more chars except comma, or space, or these: (){}[]"
Note that this does not match a comma-separated list, but the items in such a list. The exclusion of comma and space in the last option above causes it to stop matching at comma or space (except for space we explicitly allowed between repeated matches).

How can I replace the last word using Regex?

I have a String extension:
func replaceLastWordWithUsername(_ username: String) -> String {
let pattern = "#*[A-Za-z0-9]*$"
do {
Log.info("Replacing", self, username)
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, self.characters.count)
return regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: username )
} catch {
return self
}
}
let oldString = "Hey jess"
let newString = oldString.replaceLastWordWithUsername("#jessica")
newString now equals Hey #jessica #jessica. The expected result should be Hey #jessica
I think it's because the * regex operator will
Match 0 or more times. Match as many times as possible.
This might be causing it to also match the 'no characters at the end' in addition to the word at the end, resulting in two replacements.
As mentioned by #Code Different, if you use let pattern = "\\w+$" instead, it will only match if there are characters, eliminating the 'no characters' match.
"Word1 Word2"
^some characters and then end
^0 characters and then end
Use this regex:
(?<=\s)\S+$
Sample: https://regex101.com/r/kGnQEM/1
/(?<=\s)\S+$/g
Positive Lookbehind (?<=\s)
Assert that the Regex below matches
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S+ matches any non-whitespace character (equal to [^\r\n\t\f ])
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line
terminator right at the end of the string (if any)
Just change your pattern:
let pattern = "\\w+$"
\w matches any word character, i.e [A-Za-z0-9]
+ means one or more

Stripping comments from Forth source code using regular expressions

I am trying to match all content between parentheses, including parentheses in a non-greedy way. There should be a space before and after the opening parentheses (or the start of a line before the opening parentheses) and a space before and after the closing parentheses. Take the following text:
( )
( This is a comment )
1 2 +
\ a
: square dup * ;
( foo bar
baz )
(quux)
( ( )
(
( )
The first line should be matched, the second line including its content should be matched, the second last line should not be matched (or raise an error) and the last line should be matched. The two lines foo bar baz should be matched, but (quux) should not as it doesn't contain a space before and after the parentheses. The line with the extra opening parentheses inside should be matched.
I tried a few conventional regexes for matching content between parentheses but without much success. The regex engine is that of Go's.
re := regexp.MustCompile(`(?s)\(( | .*? )\)`)
s = re.ReplaceAllString(s, "")
Playground: https://play.golang.org/p/t93tc_hWAG
Regular expressions "can't count" (that's over-simplified, but bear with me), so you can't match on an unbounded amount of parenthesis nesting. I guess you're mostly concerned about matching only a single level in this case, so you would need to use something like:
foo := regexp.MustCompile(`^ *\( ([^ ]| [^)]*? \)$`)
This does require the comment to be the very last thing on a line, so it may be better to add "match zero or more spaces" there. This does NOT match the string "( ( ) )" or try to cater for arbitrary nesting, as that's well outside the counting that regular expressions can do.
What they can do in terms of counting is "count a specific number of times", they can't "count how many blah, then make sure there's the same number of floobs" (that requires going from a regular expression to a context-free grammar).
Playground
Here is a way to match all the 3 lines in question:
(?m)^[\t\p{Zs}]*\([\pZs}\t](?:[^()\n]*[\pZs}\t])?\)[\pZs}\t]*$
See the Go regex demo at the new regex101.com
Details:
(?m) - multiline mode on
^ - due to the above, the start of a line
[\t\p{Zs}]* - 0+ horizontal whitespaces
\( - a (
[\pZs}\t] - exactly 1 horizontal whitespace
(?:[^()\n]*[\pZs}\t])? - an optional sequence matching:
[^()\n]* - a negated character class matching 0+ characters other than (, ) and a newline
[\pZs}\t] - horizontal whitespace
\) - a literal )
[\pZs}\t]* - 0+ horizontal whitespaces
$ - due to (?m), the end of a line.
Go playground demo:
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)^[\t\p{Zs}]*\([\pZs}\t](?:[^()\n]*[\pZs}\t])?\)[\pZs}\t]*$`)
var str = ` ( )
( This is a comment )
1 2 +
\ a
: square dup * ;
( foo bar
baz )
(quux)
( ( )
(
( )`
for i, match := range re.FindAllString(str, -1) {
fmt.Println("'", match, "' (found at index", i, ")")
}
}