c# Regex expression to extract all non-numeric values in brackets - regex

This is the Regex expression i have built so far \{([^{]*[^0-9])\}.
"This is the sample string {0} {1} {} {abc} {12abc} {abc123}"
I wish to extract everything within the string that includes brackets and that does not contain only an integer.
(e.g) '{}'
'{abc}' '{12abc}' '{abc123}'
However the last one which contains numbers at the end is not extracted with the rest.
{abc123}
How can i extract all values in the string that are in curly brackets and do not contain an Integer?

You may use
var res = Regex.Matches(s, #"{(?!\d+})[^{}]*}")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo and the online C# demo.
Pattern details
{ - a { char
(?!\d+}) - no 1+ digits and then } allowed immediately to the right of the current location
[^{}]* - 0+ chars other than { and }
} - a } char.

Related

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Regex Express Return All Chars before a '/' but if there are 2 '/' Return all before that

I have been trying to get a regex expression to return me the following in the following situations.
XX -> XX
XXX -> XXX
XX/XX -> XX
XX/XX/XX -> XX/XX
XXX/XXX/XX -> XXX/XXX
I had the following Regex, however they do no work.
^[^/]+ => https://regex101.com/r/xvCbNB/1
=========
([A-Z])\w+ => https://regex101.com/r/xvCbNB/2
They are close but are not there.
Any Help would be appreciated.
You want to get all text from the start till the last occurrence of a specific character or till the end of string if the character is missing.
Use
^(?:.*(?=\/)|.+)
See the regex demo and the regex graph:
Details
^ - start of string
(?:.*(?=\/)|.+) - a non-capturing group that matches either of the two alternatives, and if the first one matches first the second won't be tried:
.*(?=\/) - any 0+ chars other than line break chars, as many as possible upt to but excluding /
| - or
.+ - any 1+ chars other than line break chars, as many as possible.
It will be easier to use a replace here to match / followed by non-slash characters before end of line:
Search regex:
/[^/]*$
Replacement String:
""
Updated RegEx Demo 1
If you're looking for a regex match then use this regex:
^(.*?)(?:/[^/]*)?$
Updated RegEx Demo 2
Any special reason it has to be a regular expression? How about just splitting the string at the slashes, remove the last item and rejoin:
function removeItemAfterLastSlash(string) {
const list = string.split(/\//);
if (list.length == 1) [
return string;
}
list.pop();
return list.join("/");
}
Or look for the last slash an remove it:
function removeItemAfterLastSlash(string) {
const index = string.lastIndexOf("/");
if (index === -1) {
return string;
}
return string.splice(0, index);
}

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

regex - how to specify the expressions to exclude

I need to replace two characters {, } with {\n, \n}.
But they must be not surrounded in '' or "".
I tried this code to achieve that
text = 'hello(){imagine{myString("HELLO, {WORLD}!")}}'
replaced = re.sub(r'{', "{\n", text)
Ellipsis...
Naturally, This code replaces curly brackets that are surrounded in quote marks.
What are the negative statements like ! or not that can be used in regular expressions?
And the following is what I wanted.
hello(){
imagine{
puts("{HELLO}")
}
}
In a nutshell - what I want to do is
Search { and }.
If that is not enclosed in '' or ""
replace { or } to {\n or \n}
In the opposite case, I can solve it with (?P<a>\".*){(?P<b>.*?\").
But I have no clue how I can solve it in my case.
First replace all { characters with {\n. You will also be replacing {" with {\n". Now, you can replace back all {\n" characters with {".
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('{', '{\n').replace('{\n"','{"')
You may match single and double quoted (C-style) string literals (those that support escape entities with backslashes) and then match { and } in any other context that you may replace with your desired values.
See Python demo:
import re
text = 'hello(){imagine{puts("{HELLO}")}}'
dblq = r'(?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"'
snlq = r"(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*'"
rx = re.compile(r'({}|{})|[{{}}]'.format(dblq, snlq))
print(rx.pattern)
def repl(m):
if m.group(1):
return m.group(1)
elif m.group() == '{':
return '{\n'
else:
return '\n}'
# Examples
print(rx.sub(repl, text))
print(rx.sub(repl, r'hello(){imagine{puts("Nice, Mr. \"Know-all\"")}}'))
print(rx.sub(repl, "hello(){imagine{puts('MORE {HELLO} HERE ')}}"))
The pattern that is generated in the code above is
((?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"|(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*')|[{}]
It can actually be reduced to
(?<!\\)((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'))|[{}]
See the regex demo.
Details:
The pattern matches 2 main alternatives. The first one matches single- and double-quoted string literals.
(?<!\\) - no \ immediately to the left is allowed
((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')) - Group 1:
(?:\\{2})* - 0+ repetitions of two consecutive backslashes
(?: - a non-capturing group:
"[^"\\]*(?:\\.[^"\\]*)*" - a double quoted string literal
| - or
'[^'\\]*(?:\\.[^'\\]*)*' - a single quoted string literal
) - end of the non-capturing group
| - or
[{}] - a { or }.
In the repl method, Group 1 is checked for a match. If it matched, the single- or double-quoted string literal is matched, it must be put back where it was. Else, if the match value is {, it is replaced with {\n, else, with \n}.
Replace { with {\n:
text.replace('{', '{\n')
Replace } with \n}:
text.replace('}', '\n}')
Now to fix the braces that were quoted:
text.replace('"{\n','"{')
and
text.replace('\n}"', '}"')
Combined together:
replaced = text.replace('{', '{\n').replace('}', '\n}').replace('"{\n','"{').replace('\n}"', '}"')
Output
hello(){
imagine{
puts("{HELLO}")
}
}
You can check the similarities with the input and try to match them.
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('){', '){\n').replace('{puts', '{\nputs').replace('}}', '\n}\n}')
print(replaced)
output:
hello(){
imagine{
puts("{HELLO}")
}
}
UPDATE
try this: https://regex101.com/r/DBgkrb/1