Regex with BoostRegex C++

Regex with BoostRegex C++ - c++

Hi i wish to get the values of the following expression :
POLYGON(100 20, 30 40, 20 10, 21 21)
Searching POLYGON(100 20, 30 40, 20 10, 21 21)
When i execute the following code i obtains this result :
POLYGON(100 20, 30 40, 20 10, 21 21)
result = 100 20
r2 = 100
r2 = 20
r2 = , 21 21
r2 = 21
size = 7
I don't know why i not obtains the middled values...
Thank for your help
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
void testMatch(const boost::regex &ex, const string st) {
cout << "Matching " << st << endl;
if (boost::regex_match(st, ex)) {
cout << " matches" << endl;
}
else {
cout << " doesn’t match" << endl;
}
}
void testSearch(const boost::regex &ex, const string st) {
cout << "Searching " << st << endl;
string::const_iterator start, end;
start = st.begin();
end = st.end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
while(boost::regex_search(start, end, what, ex, flags))
{
cout << " " << what.str() << endl;
cout << " result = " << what[1] << endl;
cout << " r2 = " << what[2] << endl;
cout << " r2 = " << what[3] << endl;
cout << " r2 = " << what[4] << endl;
cout << " r2 = " << what[5] << endl;
cout << "size = " << what.size() << endl;
start = what[0].second;
}
}
int main(int argc, char *argv[])
{
static const boost::regex ex("POLYGON\\(((\\-?\\d+) (\\-?\\d+))(\\, (\\-?\\d+) (\\-?\\d+))*\\)");
testSearch(ex, "POLYGON(1 2)");
testSearch(ex, "POLYGON(-1 2, 3 4)");
testSearch(ex, "POLYGON(100 20, 30 40, 20 10, 21 21)");
return 0;
}

I am not a regex expert, but I read your regular expression and it seems to be correct.
This forum post appears to be talking about exactly the same thing, where Boost.Regex only returns the last result of a regular expression. Apparently by default Boost only keeps track of the last match of a repetition of matches. However, there is an experimental feature that allows you to change this. More info here, under "Repeated Captures".
There are 2 other "solutions" though:
Use a regex to track the first pair of numbers, then get the substring with that pair removed and do another regex on that substring, until you've got all input.
Use Boost.Spirit, it's probably more suited for parsing input than Boost.Regex.

I have got the result from IRC channel.
The regular expression is :
static const boost::regex ex("[\\d\\s]+");
static const boost::regex ex("[\\-\\d\\s]+");

Related

How to ignore certain input lines in C++?

Okay so a little background this code is supposed to read through a file containing DNA and calculate the number of nucleotides A, C, T, G and print them out and also do some other slight calculations. My code runs fine for most files except for files that contain lines that start with # and + in the file. I need to skip those lines in order to get an accurate number. So my question is how to skip or ignore these lines in my calculations.
My code is
#include <iostream>
#include <stream>
#include <string>
#include <vector>
#include <map>
int main(int argc, char** argv) {
// Ignore how the above argc and argv are used here
auto arguments = std::vector<std::string>(argv, argv + argc);
// "arguments" box has what you wrote on the right side after &&
if (arguments.size() != 2) {
// ensure you wrote a file name after "./a.out"
std::cout << "Please give a file name as argument\n";
return 1;
}
auto file = std::fstream(arguments[1]);
if (!file) {
// ensure the file name you gave is from the available files
std::cout << "Cannot open " << arguments[1] << "\n";
return 1;
}
auto counts = std::map<char,int>({{'G',0.0},{'A',0.0},{'C',0.0},{'T',0.0}});
// Just a test loop to print all lines from the file
for (auto dna = std::string(); std::getline(file, dna); ) {
//std::cout << dna << "\n";
for (auto nucleotide:dna) {
counts[nucleotide]=counts[nucleotide] + 1;
}
}
double total = counts['A'] + counts['T'] + counts['G'] + counts['C'];
double GC = (counts['G'] + counts['C'])*100/total;
double AT = (counts['A'] + counts['T'])*100/total;
double ratio = AT/GC;
auto classification = "";
if ( 40.0 < GC < 60.0) {
classification = "moderate GC content";
}
if (60 <= GC) {
classification = "high GC content";
}
if (GC <= 40.0) {
classification = "low GC content";
}
std::cout << "GC-content: " << GC << "\n";
std::cout << "AT-content: " << AT << "\n";
std::cout << "G count: " << counts['G'] << "\n";
std::cout << "C count: " << counts['C'] << "\n";
std::cout << "A count: " << counts['A'] << "\n";
std::cout << "T count: " << counts['T'] << "\n";
std::cout << "Total count: " << total << "\n";
std::cout << "AT/GC Ratio: " << ratio << "\n";
std::cout << "GC Classification: " << classification << "\n";
}
The file that is giving me trouble is this which is like this
#ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
NGATGATAAACAAGAGGGTAAAAAGAAAAAAGCTACAGACATTTCTGCTAATCTATTATTTTGTTCCTTTTTTTTT
+ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
If anyone can help me with this. I will be very grateful. I only need a hint or an idea of the concept I am missing so I can make my code compatible with all files. Thanks in advance

Your actual problem seems to be the standard case of "input is not always clean syntax".
The solution is always "do not expect clean syntax".
First read whole lines into a buffer.
Then check for syntax.
Skip broken syntax.
Scan clean syntax from buffer.

Is it possible to have memory problems that don’t crash a program?

I wrote a text cipher program. It seems to works on text strings a few characters long but does not work on a longer ones. It gets the input text by reading from a text file. On longer text strings, it still runs without crashing, but it doesn’t seem to work properly.
Below I have isolated the code that performs that text scrambling. In case it is useful, I am running this in a virtual machine running Ubuntu 19.04. When running the code, enter in auto when prompted. I removed the rest of code so it wasn't too long.
#include <iostream>
#include <string>
#include <sstream>
#include <random>
#include <cmath>
#include <cctype>
#include <chrono>
#include <fstream>
#include <new>
bool run_cypher(char (&a)[27],char (&b)[27],char (&c)[11],char (&aa)[27],char (&bb)[27],char (&cc)[11]) {
//lowercase cypher, uppercase cypher, number cypher, lowercase original sequence, uppercase original sequence, number original sequence
std::ifstream out_buffer("text.txt",std::ios::in);
std::ofstream file_buffer("text_out.txt",std::ios::out);
//out_buffer.open();
out_buffer.seekg(0,out_buffer.end);
std::cout << "size of text: " << out_buffer.tellg() << std::endl;//debug
const int size = out_buffer.tellg();
std::cout << "size: " << size << std::endl;//debug
out_buffer.seekg(0,out_buffer.beg);
char *out_array = new char[size + 1];
std::cout << "size of out array: " << sizeof(out_array) << std::endl;//debug
for (int u = 0;u <= size;u = u + 1) {
out_array[u] = 0;
}
out_buffer.read(out_array,size);
out_buffer.close();
char original[size + 1];//debug
for (int bn = 0;bn <= size;bn = bn + 1) {//debug
original[bn] = out_array[bn];//debug
}//debug
for (int y = 0;y <= size - 1;y = y + 1) {
std::cout << "- - - - - - - -" << std::endl;
std::cout << "out_array[" << y << "]: " << out_array[y] << std::endl;//debug
int match;
int case_n; //0 = lowercase, 1 = uppercase
if (isalpha(out_array[y])) {
if (islower(out_array[y])) {
//std::cout << "out_array[" << y << "]: " << out_array[y] << std::endl;//debug
//int match;
for (int ab = 0;ab <= size - 1;ab = ab + 1) {
if (out_array[y] == aa[ab]) {
match = ab;
case_n = 0;
std::cout << "matched letter: " << aa[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
std::cout << "case_n: " << case_n << std::endl;//debug
}
}
}
if (isupper(out_array[y])) {
for (int cv = 0;cv <= size - 1;cv = cv + 1) {
if (out_array[y] == bb[cv]) {
case_n = 1;
match = cv;
std::cout << "matched letter: " << bb[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
std::cout << "case_n: " << case_n << std::endl;//debug
}
}
}
if (case_n == 0) {
out_array[y] = a[match];
std::cout << "replacement letter: " << a[match] << " | new character: " << out_array[y] << std::endl;//debug
}
if (case_n == 1) {
std::cout << "replacement letter: " << b[match] << " | new character: " << out_array[y] << std::endl;//debug
out_array[y] = b[match];
}
}
if (isdigit(out_array[y])) {
for (int o = 0;o <= size - 1;o = o + 1) {
if (out_array[y] == cc[o]) {
match = o;
std::cout << "matched letter: " << cc[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
}
}
out_array[y] = c[match];
std::cout << "replacement number: " << c[match] << " | new character: " << out_array[y] << std::endl;//debug
}
std::cout << "- - - - - - - -" << std::endl;
}
std::cout << "original text: " << "\n" << original << "\n" << std::endl;
std::cout << "encrypted text: " << "\n" << out_array << std::endl;
delete[] out_array;
return 0;
}
int main() {
const int alpha_size = 27;
const int num_size = 11;
char l_a_set[] = "abcdefghijklmnopqrstuvwxyz";
char cap_a_set[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char n_a_set[] = "0123456789";
std::cout << "sizeof alpha_set: " << std::endl;//debug
char lower[alpha_size] = "mnbvcxzasdfghjklpoiuytrewq";
char upper[alpha_size] = "POIUYTREWQASDFGHJKLMNBVCXZ";
char num[num_size] = "9876543210";
int p_run; //control variable. 1 == running, 0 == not running
int b[alpha_size]; //array with values expressed as index numbers
std::string mode;
int m_set = 1;
while (m_set == 1) {
std::cout << "Enter 'auto' for automatic cypher generation." << std::endl;
std::cout << "Enter 'manual' to manually enter in a cypher. " << std::endl;
std::cin >> mode;
std::cin.ignore(1);
std::cin.clear();
if (mode == "auto") {
p_run = 2;
m_set = 0;
}
if (mode == "manual") {
p_run = 3;
m_set = 0;
}
}
if (p_run == 2) { //automatic mode
std::cout <<"lower cypher: " << lower << "\n" << "upper cypher: " << upper << "\n" << "number cypher: " << num << std::endl;//debug
run_cypher(lower,upper,num,l_a_set,cap_a_set,n_a_set);
return 0;//debug
}
while (p_run == 3) {//manual mode
return 0;//debug
}
return 0;
}
For example, using an array containing “mnbvcxzasdfghjklpoiuytrewq” as the cipher for lower case letters, I get “mnbv” if the input is “abcd”. This is correct.
If the input is “a long word”, I get “m gggz zzzv” as the output when it should be “m gkjz rkov”. Sort of correct but still wrong. If I use “this is a very very long sentence that will result in the program failing” as the input, I get "uas” as the output, which is completely wrong. The program still runs but it fails to function as intended. So as you can see, it does work, but not on any text strings that are remotely long. Is this a memory problem or did I make horrible mistake somewhere?

For your specific code, you should run it through a memory checking tool such as valgrind, or compile with an address sanitizer.
Here are some examples of memory problems that most likely won't crash your program:
Forgetting to delete a small object, which is allocated only once in the program. A memory leak can remain undetected for decades, if it does not make the program run out of memory.
Reading from allocated uninitialized memory. May still crash if the system allocates objects lazily at the first write.
Writing out of bounds slightly after an object that sits on heap, whose size is sizeof(obj) % 8 != 0. This is so, since heap allocation is usually done in multiples of 8 or 16. You can read about it at answers of this SO question.
Dereferencing a nullptr does not crash on some systems. For example AIX used to put zeros at and near address 0x0. Newer AIX might still do it.
On many systems without memory management, address zero is either a regular memory address, or a memory mapped register. This memory can be accessed without crashing.
On any system I have tried (POSIX based), it was possible to allocate valid memory at address zero through memory mapping. Doing so can even make writing through nullptr work without crashing.
This is only a partial list.
Note: these memory problems are undefined behavior. This means that even if the program does not crash in debug mode, the compiler might assume wrong things during optimization. If the compiler assumes wrong things, it might create an optimized code that crashes after optimization.
For example, most compilers will optimize this:
int a = *p; // implies that p != nullptr
if (p)
boom(p);
Into this:
int a = *p;
boom(p);
If a system allows dereferencing nullptr, then this code might crash after optimization. It will not crash due to the dereferencing, but because the optimization did something the programmer did not foresee.

Integer overflow and std::stoi

if x > INT_MAX or if x > INT_MIN the function will return 0... or that's what i'm trying to do :)
in my test case i pass in a value that is INT_MAX + 1... 2147483648 ... to introduce integer overflow to see how the program handles it.
i step through... my IDE debugger says that the value immediately goes to -2147483648 upon overflow and for some reason the program executes beyond both of these statements:
if (x > INT_MAX)
if (x < INT_MIN)
and keeps crashes at int revInt = std::stoi(strNum);
saying out of range
Must be something simple, but it's got me stumped. Why isn't the program returning before it ever gets to that std::stoi() given x > INT_MAX? Any help appreciated. Thanks! Full listing of function and test bed below: (sorry having trouble with the code insertion formatting..)
#include <iostream>
#include <algorithm>
#include <string> //using namespace std;
class Solution {
public: int reverse(int x)
{
// check special cases for int and set flags:
// is x > max int, need to return 0 now
if(x > INT_MAX)
return 0;
// is x < min int, need to return 0 now
if(x < INT_MIN)
return 0;
// is x < 0, need negative sign handled at end
// does x end with 0, need to not start new int with 0 if it's ploy numeric and the functions used handle that for us
// do conversion, reversal, output:
// convert int to string
std::string strNum = std::to_string(x);
// reverse string
std::reverse(strNum.begin(), strNum.end());
// convert reversed string to int
int revInt = std::stoi(strNum);
// multiply by -1 if x was negative
if (x < 0)
revInt = revInt * -1;
// output reversed integer
return revInt;
}
};
Main:
#include <iostream>
int main(int argc, const char * argv[]) {
// test cases
// instance Solution and call it's method
Solution sol;
int answer = sol.reverse(0); // 0
std::cout << "in " << 0 << ", out " << answer << "\n";
answer = sol.reverse(-1); // -1
std::cout << "in " << -1 << ", out " << answer << "\n";
answer = sol.reverse(10); // 1
std::cout << "in " << 10 << ", out " << answer << "\n";
answer = sol.reverse(12); // 21
std::cout << "in " << 12 << ", out " << answer << "\n";
answer = sol.reverse(100); // 1
std::cout << "in " << 100 << ", out " << answer << "\n";
answer = sol.reverse(123); // 321
std::cout << "in " << 123 << ", out " << answer << "\n";
answer = sol.reverse(-123); // -321
std::cout << "in " << -123 << ", out " << answer << "\n";
answer = sol.reverse(1024); // 4201
std::cout << "in " << 1024 << ", out " << answer << "\n";
answer = sol.reverse(-1024); // -4201
std::cout << "in " << -1024 << ", out " << answer << "\n";
answer = sol.reverse(2147483648); // 0
std::cout << "in " << 2147483648 << ", out " << answer << "\n";
answer = sol.reverse(-2147483648); // 0
std::cout << "in " << -2147483648 << ", out " << answer << "\n";
return 0;
}

Any test like (x > INT_MAX) with x being of type int will never evaluate to true, since the value of x cannot exceed INT_MAX.
Anyway, even if 2147483647 would be a valid range, its reverse 7463847412 is not.
So I think its better to let stoi "try" to convert the values and "catch" any out_of_range-exception`. The following code illustrates this approach:
int convert() {
const char* num = "12345678890123424542";
try {
int x = std::stoi(num);
return x;
} catch (std::out_of_range &e) {
cout << "invalid." << endl;
return 0;
}
}

C++ Regex Alpha without Equal sign

im new to Regex and C++.
My problem is, that '=' is matching when I search for [a-zA-Z]. But this is only a-z without '='?
Can anyone help me please?
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
for (const auto &s : string1) {
cout << "Current Value: " << s << endl;
// step(&state, s);
if (regex_search(&s, rg_left_letter)) {
cout << "matching: " << s << endl;
} else {
cout << "not matching: " << s << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
This outputs:
Current Value: s
matching: s
Current Value: =
matching: =
Current Value: s
matching: s
Current Value: ;
not matching: ;

When you write
regex_search(&s, rg_left_letter)
you basically search the C-String &s for a match character-wise, beginning at the character s. Therefore, your loop will search for a match in the remaining sub-strings
s=s;
=s;
s;
;
Which will always succeed, except in the last case, as there is always one character in the entire string that fits your regex. Note however that this assumes that std::string has some 0-termination added, which is, as far as I can tell, not guaranteed if you do not explicitely use the c_str() method, making your code UB.
What you really want to use is the function regex_match, together with your original regex just as simple as:
#include <iostream>
#include <regex>
int main()
{
std::regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
if(std::regex_match("s=s;", statement)) { std::cout << "Hooray!\n"; }
}

This is working for me:
int main(void) {
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
//for (const auto &s : string1) {
for (int i = 0; i < string1.size(); i++) {
cout << "Current Value: " << string1[i] << endl;
// step(&state, s);
if (regex_match(string1.substr(i, 1), rg_left_letter)) {
cout << "matching: " << string1[i] << endl;
} else {
cout << "not matching: " << string1[i] << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
cout << endl;
return 0;
}

Finding a number between 2 numbers using regex/boost in c++

I feel like this is a pretty basic question but I did not find a post for it. If you know one please link it below.
So what I'm trying to do is look through a string and extract the numbers in groups of 2.
here is my code:
int main() {
string line = "P112233";
boost::regex e ("P([0-9]{2}[0-9]{2}[0-9]{2})");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
boost::regex f("([0-9]{2})"); //finds 11
boost::smatch match2;
line = match[0];
if (boost::regex_search(line, match2, f))
{
float number1 = boost::lexical_cast<float>(match2[0]);
cout << number1 << endl; // this works and prints out 11.
}
boost::regex g(" "); // here I want it to find the 22
boost::smatch match3;
if (boost::regex_search(line, match3, g))
{
float number2 = boost::lexical_cast<float>(match3[0]);
cout << number2 << endl;
}
boost::regex h(" "); // here I want it to find the 33
boost::smatch match4;
if (boost::regex_search(line, match4, h))
{
float number3 = boost::lexical_cast<float>(match4[0]);
cout << number3 << endl;
}
}
else
cout << "found nothing"<< endl;
return 0;
}
I was able to get the first number but I have no idea how to get the second(22) and third(33).
what's the proper expression I need to use?

As #Cornstalks mentioned you need to use 3 capture groups and then you access them like that:
int main()
{
std::string line = "P112233";
boost::regex e("P([0-9]{2})([0-9]{2})([0-9]{2})");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
std::cout << match[0] << std::endl; // prints the whole string
std::cout << match[1] << ", " << match[2] << ", " << match[3] << std::endl;
}
return 0;
}
Output:
P112233
11, 22, 33

I don't favour regular expressions for this kind of parsing. The key point being that the numbers are still strings when you're done with that hairy regex episode.
I'd use Boost Spirit here instead, which parses into the numbers all at once, and you don't even have to link to the Boost Regex library either, because Spirit is header-only.
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
static qi::int_parser<int, 10, 2, 2> two_digits;
int main() {
std::string const s = "P112233";
std::vector<int> nums;
if (qi::parse(s.begin(), s.end(), "P" >> *two_digits, nums))
{
std::cout << "Parsed " << nums.size() << " pairs of digits:\n";
for(auto i : nums)
std::cout << " * " << i << "\n";
}
}
Parsed 3 pairs of digits:
* 11
* 22
* 33

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js