Why does this regex throw an exception? - c++

I am trying to use std::regex_replace in C++11 (Visual Studio 2013) but the regex i am trying to create is throwing an exception:
Microsoft C++ exception: std::regex_error at memory location 0x0030ED34
Why is this the case? This is my definition:
std::string regexStr = R"(\([A - Za - z] | [0 - 9])[0 - 9]{2})";
std::regex rg(regexStr); <-- This is where the exception thrown
line = std::regex_replace(line, rg, this->protyp->getUTF8Character("$&"));
What i want to do: Find all matches inside a string which are of the following format:
"\X99" OR "\999"
where X = A-Z or a-z and 9 = 0-9.
I also tried to use the boost regex library, but it also throws an exeception.
(Another question: Can i use the backreference as i am doing in the last line? I want to replace dynamically according to the match)
Thanks for any help

As per the above comments, you need to fix your regex: to match a literal backslash you need to use "\\\\" (or R("\\")).
My code that shows all the first captured groups:
string line = "\\X99 \\999";
string regexStr = "(\\\\([A-Za-z]|[0-9])[0-9]{2})";
regex rg(regexStr); //<-- This is where the exception was thrown before
smatch sm;
while (regex_search(line, sm, rg)) {
std::cout << sm[1] << std::endl;
line = sm.suffix().str();
}
Output:
\X99
\999
Regarding using a method call inside a replacement string, I do not find such a functionality in the regex_replace documentation:
fmt - the regex replacement format string, exact syntax depends on the
value of flags

Related

Split the string at the particular occurrence of special character (+) using regex in Java

I want to split the following string around +, but I couldn't succeed in getting the correct regex for this.
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
I want to have a result like this
[SOP3a'+bEOP3', SOP3b'+aEOP3']
In some cases I may have the following string
c+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2
which should be split as
[c, SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2]
I have tried the following regex but it doesn't work.
input.split("(SOP[0-9](.*)EOP[0-9])*\\+((SOP)[0-9](.*)(EOP)[0-9])*");
Any help or suggestions are appreciated.
Thanks
You can use the following regex to match the string and by replacing it using captured group you can get the expected result :
(?m)(.*?)\+(SOP.*?$)
see demo / explanation
Following is the code in Java that would work for you:
public static void main(String[] args) {
String input = "SOP3a'+bEOP3'+SOP3b'+aEOP3'";
String pattern = "(?m)(.*?)\\+(SOP.*?$)";
Pattern regex = Pattern.compile(pattern);
Matcher m = regex.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
The m.group(1) and m.group(2) are the values that you are looking for.
Do you really need to use split method?
And what are the rules? They are unclear to me.
Anyway, considering the regex you provided, I've only removed some unnecessary groups and I've found what you are looking for, however, instead of split, I just joined the matches as splitting it would generate some empty elements.
const str = "SOP1a+bEOP1+SOP2SOP3a'+bEOP3'+SOP3b'+aEOP3'EOP2";
const regex = RegExp(/(SOP[0-9].*EOP[0-9])*\+(SOP[0-9].*EOP[0-9])*/)
const matches = str.match(regex);
console.log('Matches ', matches);
console.log([matches[1],matches[2]]);

Extract variables from string with Regex

I'm trying to extract from a string variables with the following format: ${var}
Given this string:
val s = "This is a string with ${var1} and ${var2} and {var3}"
The result should be
List("var1","var2")
This is the attempt, it ends in an exception. What's wrong with this regex?
val pattern = """\${([^\s}]+)(?=})""".r
val s = "This is a string with ${var1} and ${var2} and {var3}"
val vals = pattern.findAllIn(s)
println(vals.toList)
and the exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal repetition near index 1 \${([^\s}]+)(?=})
NOTE :- { in regex have special meaning. It denotes range. e.g. a{2,10} denotes match a in between 2 to 10 times. So you need to escape {.
Solution 1
val pattern = """\$\{([^\s}]+)(?=})""".r
You need to access the first capturing group for finding the result and then change it to list.
Solution 2
You can also use lookbehind like
val pattern = """(?<=\$\{)[^\s}]+(?=})""".r
Ideone Demo

Find and replace with regular expressions

I'm trying to replace a bunch of function calls using regular expressions but can't seem to be getting it right. This is a simplified example of what I'm trying to do:
GetPetDog();
GetPetCat();
GetPetBird();
I want to change to:
GetPet<Animal_Dog>();
GetPet<Animal_Cat>();
GetPet<Animal_Bird>();
Use below regex:
(GetPet)([^(]*) with subsitution \1<Animal_\2>
Demo
You can use the following regex and code for that:
std::string ss ("GetPetDog();");
static const std::regex ee ("GetPet([^()]*)");
std::string result;
result = regex_replace(ss, ee, "GetPet<Animal_$1>");
std::cout << result << endl;
Regex:
GetPet - Matches GetPet literally (we need no capturing group here)
([^()]*) - A capturing group to match any characters other than ( or ) 0 or more times (*)
Output:

C++ Regex getting all match's on line

When reading line by line i call this function on each line looking for function calls(names). I use this function to match the any valid characters a-z 0-9 and _ with '('. My problem is i do not understand fully the c++ style regex and how to get it to look through the entire line for possible matches?. This regex is simple and strait forward just does not work as expected but im learning this is the c++ norm.
void readCallbacks(const std::string lines)
{
std::string regxString = "[a-z0-9]+\(";
regex regx(regxString, std::regex_constants::icase);
smatch result;
if(regex_search(lines.begin(), lines.end(), result, regx, std::regex_constants::match_not_bol))
{
cout << result.str() << "\n";
}
}
You need to escape the backslash or use a raw string literal:
std::regex pattern("[a-z0-9]+\\(", std::regex_constants::icase);
// ^^
std::regex pattern(R"([a-z0-9]+\()", std::regex_constants::icase);
// ###^^^^^^^^^^^##
Also, your character range doesn't contain the desired underscore (_).

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"