C++ getline - Extracting a substring using regex

C++ getline - Extracting a substring using regex - c++

I have a file with contents like this -
Random text
+-------------------+------+-------+-----------+-------+
| Data | A | B | C | D |
+-------------------+------+-------+-----------+-------+
| Data 1 | 1403 | 0 | 2520 | 55.67 |
| Data 2 | 1365 | 2 | 2520 | 54.17 |
| Data 3 | 1 | 3 | 1234 | 43.12 |
Some more random text
I want to extract the value of column D of row Data 1 i.e. I want to extract the value 55.67 from the example above. I am parsing this file line by line using getline -
while(getline(inputFile1,line)) {
if(line.find("| Data 1") != string::npos) {
subString = //extract the desired value
}
How can I extract the desired sub string from the line. Is there any way using boost::regex that I can extract this substring?

While regex may have its uses, it's probably overkill for this.
Bring in a trim function and:
char delim;
std::string line, data;
int a, b, c;
double d;
while(std::getline(inputFile1, line)) {
std::istringstream is(line);
if( std::getline(is >> delim, data, '|') >>
a >> delim >> b >> delim >> c >> delim >> d >> delim)
{
trim(data);
if(data == "Data 1") {
std::cout << a << ' ' << b << ' ' << c << ' ' << d << '\n';
}
}
}
Demo

Yes, it is easily possible to extract your substring with a regex. There is no need to use boost, you can also use the existing C++ regex library.
The resulting program is ultra simple.
We read all lines of the source file in a simple for loop. Then we use std::regex_match to match a just read line against our regex. If we have found a match, then the result will be in the std::smatch sm, group 1.
And because we will design the regex for finding double values, we will get exactly what we need, without any additional spaces.
This we can convert to a double and show the result on the screen. And because we defined the regex to find a double, we can be sure that std::stod will work.
The resulting program is rather straightforward and easy to understand:
#include <iostream>
#include <string>
#include <sstream>
#include <regex>
// Please note. For std::getline, it does not matter, if we read from a
// std::istringstream or a std::ifstream. Both are std::istream's. And because
// we do not have files here on SO, we will use an istringstream as data source.
// If you want to read from a file later, simply create an std::ifstream inputFile1
// Source File with all data
std::istringstream inputFile1{ R"(
Random text
+-------------------+------+-------+-----------+-------+
| Data | A | B | C | D |
+-------------------+------+-------+-----------+-------+
| Data 1 | 1403 | 0 | 2520 | 55.67 |
| Data 2 | 1365 | 2 | 2520 | 54.17 |
| Data 3 | 1 | 3 | 1234 | 43.12 |
Some more random text)"
};
// Regex for finding the desired data
const std::regex re(R"(\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|)");
int main() {
// The result will be in here
std::smatch sm;
// Read all lines of the source file
for (std::string line{}; std::getline(inputFile1, line);) {
// If we found our matching string
if (std::regex_match(line, sm, re)) {
// Then extract the column D info
double data1D = std::stod(sm[1]);
// And show it to the user.
std::cout << data1D << "\n";
}
}
}
For most people the tricky part is how to define the regular expression. There are pages like Online regex tester and debugger. There is also a breakdown for the regex and a understandable explanation.
For our regex
\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|
we get the following explanation:
\|
matches the character | literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Data 1 matches the characters Data 1 literally (case sensitive)
\s+
matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\|
matches the character | literally (case sensitive)
.*?
matches any character (except for line terminators)
\|
matches the character | literally (case sensitive)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
1st Capturing Group ([-+]?[0-9]*\.?[0-9]+)
\s*
matches any whitespace character (equal to [\r\n\t\f\v ])
\|
matches the character | literally (case sensitive)
By the way, a more safe (more secure matching) regex would be:
\|\s+Data 1\s+\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|

Related

regex to match all whitespace except those between words and surrounding hyphens?

I'd like to sanitize a string so all whitespace is removed, except those between words, and surrounding hyphens
1234 - Text | OneWord , Multiple Words | Another Text , 456 -> 1234 - Text|OneWord,Multiple Words|Another Text,456
std::regex regex(R"(\B\s+|\s+\B)"); //get rid of whitespaces except between words
auto newStr = std::regex_replace(str, regex, "*");
newStr = std::regex_replace(newStr, std::regex("*-*"), " - ");
newStr = std::regex_replace(newStr, std::regex("*"), "");
this is what I currently use, but it is rather ugly and I'm wondering if there is a regex I can use to do this in one go.

You can use
(\s+-\s+|\b\s+\b)|\s+
Replace with $1, backreference to the captured substrings in Group 1. See the regex demo. Details:
(\s+-\s+|\b\s+\b) - Group 1: a - with one or more whitespaces on both sides, or one or more whitespaces in between word boundaries
| - or
\s+ - one or more whitespaces.
See the C++ demo:
std::string s("1234 - Text | OneWord , Multiple Words | Another Text , 456");
std::regex reg(R"((\s+-\s+|\b\s+\b)|\s+)");
std::cout << std::regex_replace(s, reg, "$1") << std::endl;
// => 1234 - Text|OneWord,Multiple Words|Another Text,456

How do I replace the nth occurrence of a special character, say, a pipe delimiter with another in Scala?

I'm new to Spark using Scala and I need to replace every nth occurrence of the delimiter with the newline character.
So far, I have been successful at entering a new line after the pipe delimiter.
I'm unable to replace the delimiter itself.
My input string is
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
println(txt.replaceAll(".\\|", "$0\n"))
The above statement generates the following output.
January|
February|
March|
April|
May|
June|
July|
August|
September|
October|
November|
December
I referred to the suggestion at https://salesforce.stackexchange.com/questions/189923/adding-comma-separator-for-every-nth-character but when I enter the number in the curly braces, I only end up adding the newline after 2 characters after the delimiter.
I'm expecting my output to be as given below.
January|February
March|April
May|June
July|August
September|October
November|December
How do I change my regular expression to get the desired output?
Update:
My friend suggested I try the following statement
println(txt.replaceAll("(.*?\\|){2}", "$0\n"))
and this produced the following output
January|February|
March|April|
May|June|
July|August|
September|October|
November|December
Now I just need to get rid of the pipe symbol at the end of each line.

You want to move the 2nd bar | outside of the capture group.
txt.replaceAll("([^|]+\\|[^|]+)\\|", "$1\n")
//val res0: String =
// January|February
// March|April
// May|June
// July|August
// September|October
// November|December
Regex Explained (regex is not Scala)
( - start a capture group
[^|] - any character as long as it's not the bar | character
[^|]+ - 1 or more of those (any) non-bar chars
\\| - followed by a single bar char |
[^|]+ - followed by 1 or more of any non-bar chars
) - close the capture group
\\| - followed by a single bar char (not in capture group)
"$1\n" - replace the entire matching string with just the first $1 capture group ($0 is the entire matching string) followed by the newline char
UPDATE
For the general case of N repetitions, regex becomes a bit more cumbersome, at least if you're trying to do it with a single regex formula.
The simplest thing to do (not the most efficient but simple to code) is to traverse the String twice.
val n = 5
txt.replaceAll(s"(\\w+\\|){$n}", "$0\n")
.replaceAll("\\|\n", "\n")
//val res0: String =
// January|February|March|April|May
// June|July|August|September|October
// November|December

You could first split the string using '|' to get the array of string and then loop through it to perform the logic you want and get the output as required.
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
val out = txt.split("\\|")
var output: String = ""
for(i<-0 until out.length -1 by 2){
val ref = out(i) + "|" + out(i+1) + "\n"
output = output + ref
}
val finalout = output.replaceAll("\"\"","") //just to remove the starting double quote
println(finalout)

Reluctant matching in ANTLR 4.4

Just as the reluctant quantifiers work in Regular expressions I'm trying to parse two different tokens from my input i.e, for operand1 and operator. And my operator token should be reluctantly matched instead of greedily matching input tokens for operand1.
Example,
Input:
Active Indicator in ("A", "D", "S")
(To simplify I have removed the code relevant for operand2)
Expected operand1:
Active Indicator
Expected operator:
in
Actual output for operand1:
Active indicator in
and none for the operator rule.
Below is my grammar code:
grammar Test;
condition: leftOperand WHITESPACE* operator;
leftOperand: ALPHA_NUMERIC_WS ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;
EQUALS : '=';
NOT_EQUALS : '!=';
IN : 'in';
NOT_IN : 'not' WHITESPACE 'in';
WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC_WS: WORD ( WORD| DIGIT | WHITESPACE )* ( WORD | DIGIT)+ ;
WHITESPACE : (' ' | '\t')+;
fragment DIGIT: '0'..'9' ;
LOWERCASE : [a-z] ;
UPPERCASE : [A-Z] ;

One solution to this would be to not produce one token for several words but one token per word instead.
Your grammar would then look like this:
grammar Test;
condition: leftOperand operator;
leftOperand: ALPHA_NUMERIC+ ;
operator: EQUALS | NOT_EQUALS | IN | NOT_IN;
EQUALS : '=';
NOT_EQUALS : '!=';
IN : 'in';
NOT_IN : 'not' WHITESPACE 'in';
WORD: (LOWERCASE | UPPERCASE )+ ;
ALPHA_NUMERIC: WORD ( WORD| DIGIT)* ;
WHITESPACE : (' ' | '\t')+ -> skip; // ignoring WS completely
fragment DIGIT: '0'..'9' ;
LOWERCASE : [a-z] ;
UPPERCASE : [A-Z] ;
Like this the lexer will not match the whole input as ALPHA_NUMERIC_WS once the corresponding lexer rule has been entered because any occuring WS forces the lexer to leave the ALPHA_NUMERIC rule. Therefore any following input will be given a chance to be matched by other lexer-rules (in the order they are defined in the grammar).

Regex for matching C++ string constant

I'm currently working on a C++ preprocessor and I need to match string constants with more than 0 letters like this "hey I'm a string.
I'm currently working with this one here \"([^\\\"]+|\\.)+\" but it fails on one of my test cases.
Test cases:
std::cout << "hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
Expected output:
std::cout << String("hello") << String(" world");
std::cout << String("He said: \"bananas\"") << String("...");
std::cout << "";
std::cout << String("\x12\23\x34");
On the second one I instead get
std::cout << String("He said: \")bananas\"String(" << ")...";
Short repro code (using the regex by AR.3):
std::string in_line = "std::cout << \"He said: \\\"bananas\\\"\" << \"...\";";
std::regex r("\"([^\"]+|\\.|(?<=\\\\)\")+\"");
in_line = std::regex_replace(in_line, r, "String($&)");

Lexing a source file is a good job for regexes. But for such a task, let's use a better regex engine than std::regex. Let's use PCRE (or boost::regex) at first. At the end of this post, I'll show what you can do with a less feature-packed engine.
We only need to do partial lexing, ignoring all unrecognized tokens that won't affect string literals. What we need to handle is:
Singleline comments
Multiline comments
Character literals
String literals
We'll be using the extended (x) option, which ignores whitespace in the pattern.
Comments
Here's what [lex.comment] says:
The characters /* start a comment, which terminates with the characters */. These comments do not nest.
The characters // start a comment, which terminates immediately before the next new-line character. If
there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear
between it and the new-line that terminates the comment; no diagnostic is required. [ Note: The comment
characters //, /*, and */ have no special meaning within a // comment and are treated just like other
characters. Similarly, the comment characters // and /* have no special meaning within a /* comment.
— end note ]
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
Easy peasy. If you match anything there, just (*SKIP)(*FAIL) - meaning that you throw away the match. The (?s: .*? ) applies the s (singleline) modifier to the . metacharacter, meaning it's allowed to match newlines.
Character literals
Here's the grammar from [lex.ccon]:
character-literal:
encoding-prefix(opt) ’ c-char-sequence ’
encoding-prefix:
one of u8 u U L
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except the single-quote ’, backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of \’ \" \? \\ \a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
Let's define a few things first, which we'll need later on:
(?(DEFINE)
(?<prefix> (?:u8?|U|L)? )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
prefix is defined as an optional u8, u, U or L
escape is defined as per the standard, except that I've merged universal-character-name into it for the sake of simplicity
Once we have these, a character literal is pretty simple:
(?&prefix) ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
We throw it away with (*SKIP)(*FAIL)
Simple strings
They're defined in almost the same way as character literals. Here's a part of [lex.string]:
string-literal:
encoding-prefix(opt) " s-char-sequence(opt) "
encoding-prefix(opt) R raw-string
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except the double-quote ", backslash \, or new-line character
escape-sequence
universal-character-name
This will mirror the character literals:
(?&prefix) " (?> (?&escape) | [^"\\\r\n]+ )* "
The differences are:
The character sequence is optional this time (* instead of +)
The double quote is disallowed when unescaped instead of the single quote
We actually don't throw it away :)
Raw strings
Here's the raw string part:
raw-string:
" d-char-sequence(opt) ( r-char-sequence(opt) ) d-char-sequence(opt) "
r-char-sequence:
r-char
r-char-sequence r-char
r-char:
any member of the source character set, except a right parenthesis )
followed by the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence:
d-char
d-char-sequence d-char
d-char:
any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \,
and the control characters representing horizontal tab,
vertical tab, form feed, and newline.
The regex for this is:
(?&prefix) R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
[^ ()\\\t\x0B\r\n]* is the set of characters that are allowed in delimiters (d-char)
\k<delimiter> refers to the previously matched delimiter
The full pattern
The full pattern is:
(?(DEFINE)
(?<prefix> (?:u8?|U|L)? )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
# character literal
| (?&prefix) ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
# standard string
| (?&prefix) " (?> (?&escape) | [^"\\\r\n]+ )* "
# raw string
| (?&prefix) R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
See the demo here.
boost::regex
Here's a simple demo program using boost::regex:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
static void test()
{
boost::regex re(R"regex(
(?(DEFINE)
(?<prefix> (?:u8?|U|L) )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
# character literal
| (?&prefix)? ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
# standard string
| (?&prefix)? " (?> (?&escape) | [^"\\\r\n]+ )* "
# raw string
| (?&prefix)? R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
)regex", boost::regex::perl | boost::regex::no_mod_s | boost::regex::mod_x | boost::regex::optimize);
std::string subject(R"subject(
std::cout << L"hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
std::cout << u8R"hello(this"is\a\""""single\\(valid)"
raw string literal)hello";
"" // empty string
'"' // character literal
// this is "a string literal" in a comment
/* this is
"also inside"
//a comment */
// and this /*
"is not in a comment"
// */
"this is a /* string */ with nested // comments"
)subject");
std::cout << boost::regex_replace(subject, re, "String\\($&\\)", boost::format_all) << std::endl;
}
int main(int argc, char **argv)
{
try
{
test();
}
catch(std::exception ex)
{
std::cerr << ex.what() << std::endl;
}
return 0;
}
(I left syntax highlighting disabled because it goes nuts on this code)
For some reason, I had to take the ? quantifier out of prefix (change (?<prefix> (?:u8?|U|L)? ) to (?<prefix> (?:u8?|U|L) ) and (?&prefix) to (?&prefix)?) to make the pattern work. I believe it's a bug in boost::regex, as both PCRE and Perl work just fine with the original pattern.
What if we don't have a fancy regex engine at hand?
Note that while this pattern technically uses recursion, it never nests recursive calls. Recursion could be avoided by inlining the relevant reusable parts into the main pattern.
A couple of other constructs can be avoided at the price of reduced performance. We can safely replace the atomic groups (?>...) with normal groups (?:...) if we don't nest quantifiers in order to avoid catastrophic backtracking.
We can also avoid (*SKIP)(*FAIL) if we add one line of logic into the replacement function: All the alternatives to skip are grouped in a capturing group. If the capturing group matched, just ignore the match. If not, then it's a string literal.
All of this means we can implement this in JavaScript, which has one of the simplest regex engines you can find, at the price of breaking the DRY rule and making the pattern illegible. The regex becomes this monstrosity once converted:
(\/\/.*|\/\*[\s\S]*?\*\/|(?:u8?|U|L)?'(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^'\\\r\n])+')|(?:u8?|U|L)?"(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^"\\\r\n])*"|(?:u8?|U|L)?R"([^ ()\\\t\x0B\r\n]*)\([\s\S]*?\)\2"
And here's an interactive demo you can play with:
function run() {
var re = /(\/\/.*|\/\*[\s\S]*?\*\/|(?:u8?|U|L)?'(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^'\\\r\n])+')|(?:u8?|U|L)?"(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^"\\\r\n])*"|(?:u8?|U|L)?R"([^ ()\\\t\x0B\r\n]*)\([\s\S]*?\)\2"/g;
var input = document.getElementById("input").value;
var output = input.replace(re, function(m, ignore) {
return ignore ? m : "String(" + m + ")";
});
document.getElementById("output").innerText = output;
}
document.getElementById("input").addEventListener("input", run);
run();
<h2>Input:</h2>
<textarea id="input" style="width: 100%; height: 50px;">
std::cout << L"hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
std::cout << u8R"hello(this"is\a\""""single\\(valid)"
raw string literal)hello";
"" // empty string
'"' // character literal
// this is "a string literal" in a comment
/* this is
"also inside"
//a comment */
// and this /*
"is not in a comment"
// */
"this is a /* string */ with nested // comments"
</textarea>
<h2>Output:</h2>
<pre id="output"></pre>

Regular expressions can be tricky for beginners but once you understand it's basics and well tested divide and conquer strategy, it will be your goto tool.
What you need to search for quote (") not starting with () back slash and read all characters upto next quote.
The regex I came up is (".*?[^\\]"). See a code snippet below.
std::string in_line = "std::cout << \"He said: \\\"bananas\\\"\" << \"...\";";
std::regex re(R"((".*?[^\\]"))");
in_line = std::regex_replace(in_line, re, "String($1)");
std::cout << in_line << endl;
Output:
std::cout << String("He said: \"bananas\"") << String("...");
Regex Explanation:
(".*?[^\\]")
Options: Case sensitive; Numbered capture; Allow zero-length matches; Regex syntax only
Match the regex below and capture its match into backreference number 1 (".*?[^\\]")
Match the character “"” literally "
Match any single character that is NOT a line break character (line feed, carriage return) .*?
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
Match any character that is NOT the backslash character [^\\]
Match the character “"” literally "
String($1)
Insert the character string “String” literally String
Insert an opening parenthesis (
Insert the text that was last matched by capturing group number 1 $1
Insert a closing parenthesis )

Read the relevant sections from the C++ standard, they are called lex.ccon and lex.string.
Then convert each rule you find there into a regular expression (if you really want to use regular expressions; it might turn out that they are not capable of doing this job).
Then, build more complicated regular expressions out of them. Be sure to name your regular expressions exactly as the rules from the C++ standard, so that you can recheck them later.
If, instead of using regular expressions, you want to use an existing tool, here is one: http://clang.llvm.org/doxygen/Lexer_8cpp_source.html. Have a look at the LexStringLiteral function.

Regular Expression : Splitting a string of list of multivalues

My goal is splitting this string with regular expression:
AA(1.2,1.3)+,BB(125)-,CC(A,B,C)-,DD(QWE)+
in a list of:
AA(1.2,1.3)+
BB(125)-
CC(A,B,C)-
DD(QWE)+
Regards.

This regex works with your sample string:
,(?![^(]+\))
This splits on comma, but uses a negative lookahead to assert that the next bracket character is not a right bracket. It will still split even if there are no following brackets.
Here's some java code demonstrating it working with your sample plus some general input showing its robustness:
String input = "AA(1.2,1.3)+,BB(125)-,FOO,CC(A,B,C)-,DD(QWE)+,BAR";
String[] split = input.split(",(?![^(]+\\))");
for (String s : split) System.out.println(s);
Output:
AA(1.2,1.3)+
BB(125)-
FOO
CC(A,B,C)-
DD(QWE)+
BAR

I don't know what language you are working with, but this makes it in grep:
$ grep -o '[A-Z]*([A-Z0-9.,]*)[^,]*' file
AA(1.2,1.3)+
BB(125)-
CC(A,B,C)-
DD(QWE)+
Explanation
[A-Z]*([A-Z0-9.,]*)[^,]*
^^^^^^ ^^^^^^^^^^^ ^^^^^
| ^ | ^ |
| | | | everything but a comma
| ( char | ) char
| A-Z 0-9 . or , chars
list of chars from A to Z

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ getline - Extracting a substring using regex - c++

Related

regex to match all whitespace except those between words and surrounding hyphens?

How do I replace the nth occurrence of a special character, say, a pipe delimiter with another in Scala?

Reluctant matching in ANTLR 4.4

Regex for matching C++ string constant

Regular Expression : Splitting a string of list of multivalues

Categories

Resources