C++ std::regex segmentation fault - regex

I have various data that I need to parse and get the weight out of it.
I'm using
C++11
std::regex
Debian 9.9
gcc 6.3.0
The problem is that sometimes segmentation fault occurs, it happens very rarely.
The input that throws the error mostly consist of just space and newline characters.
Here is the regex:
(?:\b(?:(kilogram\.*s*\.*|kg\.*s*\.*)(?:[^[:alnum:]])*)(?:\s*weight\s*)*(?:\s*is\s*|\s*are\s*)*)\W*([\d\.,]*\d+\b)|(?:(?:[\s\.]?|^)([\d\.,]*\d+)\W*(kilogram\.*s*\.*|kg\.*s*\.*)\b)
Example regex that works on regex101.com but throws segmentation fault in C++ on my Debian server regex101
Here are some more regex101 examples of input, just to fast get an idea of what regex is searching for.
Here is an example of C++ code that fails.
And here is the same C++ code that works, but using another online compiler (cpp.sh).
Can someone please help me to solve this segmentation fault problem?
Thank you.

I have the same issue with a simple regex .+ and [a-zA-Z0-9\\+/=]+.
I have tried different compilers: g++, clang++, clang-cl on Windows, and g++, clang++ on Linux (WSL).
On Windows, the application freeze and ends suddenly. On Ubuntu (WSL), I have the Segmentation Fault.
The error happens for g++ on Windows with c++11, c++14, c++17 and also c++20.
Limit
In your example, your data regex101 has 31275 characters which, I suppose, is too many for regex_match.
Here is the program I used to guess the maximal length of the data.
#include <iostream>
#include <regex>
int main(int argc, char **argv) {
int length = argc > 1 ? std::stoi(std::string(argv[1])) : 30000;
std::regex testRegex(".+");
std::string data = "";
for (int i = 0; i < length; ++i) {
data += "a";
}
std::cout << "Match: " << std::regex_match(data, testRegex) << std::endl;
return 0;
}
// Limit before crash (it's a bit random so the limit is not accurate)
// Windows 11
// clang++ Windows : 4999998
// clang-cl Windows : 4999998
// g++ Windows : 6833
// WSL Ubuntu 20.04
// clang++ WSL : 23804
// g++ WSL : 26187
How to solve
According to this test, data has a size limit, and the application will stop if the limit is exceeded.
What you can do is:
Remove some unnecessary spaces before using regex_match
Split the data in half
On Windows, you can use clang++ to increase the limit to 5M chars
For me, I split my data in half because the regex [a-zA-Z0-9\\+/=]+ doesn't require the entire input.
If anybody knows how we can increase the limit (with some flags or #define), I am interested.

Related

c++ (on Clion) for loop stops in the middle with no errors (exit code 0) [duplicate]

When using CLion I have found the output sometimes cuts off.
For example when running the code:
main.cpp
#include <stdio.h>
int main() {
int i;
for (i = 0; i < 1000; i++) {
printf("%d\n", i);
}
fflush(stdout); // Shouldn't be needed as each line ends with "\n"
return 0;
}
Expected Output
The expected output is obviously the numbers 0-999 on each on a new line
Actual Output
After executing the code multiple times within CLion, the output often changes:
Sometimes it executes perfectly and shows all the numbers 0-999
Sometimes it cuts off at different points (e.g. 0-840)
Sometimes it doesn't output anything
The return code is always 0!
Screenshot
Running the code in a terminal (i.e. not in CLion itself)
However, the code outputs the numbers 0-999 perfectly when compiling and running the code using the terminal.
I have spent so much time on this thinking it was a problem with my code and a memory issue until I finally realised that this was just an issue with CLion.
OS: Ubuntu 14.04 LTS
Version: 2016.1
Build: #CL-145.258
Update
A suitable workaround is to run the code in debug mode (thanks to #olaf).
The consensus is that this is an IDE issue. Therefore, I have reported the bug.
A suitable workaround is to execute the code in debug mode (no breakpoint required).
I will update this question, as soon as this bug is fixed.
Update 1
WARNING: You should not change information in registry unless you have been asked specifically by JetBrains. Registry is not in the main menu for a reason! Use the following solution at your own risk!!!
JetBrains have contacted me and provided a suitable solution:
Go to the Find Action Dialog box (CTRL+SHIFT+A)
Search for "Registry..."
Untick run.processes.with.pty
Should then work fine!
Update 2
The bug has been added here:
https://youtrack.jetbrains.com/issue/CPP-6254
Feel free to upvote it!

regex_error being thrown when trying to do simple things like [:digit:] or \d

Every time I put [:digit:] in a regex like so: regex r("[:digit:]") it throws an exception and .what() just returns regex_error instead of a descriptive, meaningful explanation of the error. Same things happens when I try regex r("\\d"). And when I try regex r("\d") my compiler says that \d is an unfamiliar character escape sequence. I'm in Code::Blocks by the way. Here's my code:
#include <regex>
#include <iostream>
using namespace std;
int main()
{
regex r("\d"); //and or r("[:digit:]")
string i = "5";
if(regex_match(i,r))
{
cout << "Integer";
}
return 0;
}
After getting a newer version of Code::Blocks and the MinGW GCC compiler suite it worked.
P.S. I kept having an error when trying to set the compiler after downloading Code::Blocks. I had to go to Global compiler settings and click Reset defaults for it to auto-detect my compiler. As seen here.

Clearing the screen in C++ using other compilers

Hello I'm interested in learning a way to clear the screen. I'm using C++ but it seem that some possible code to use are only known to work with Windows compilers. I'm using Ubuntu with a "g++ compiler".
code i have research in order to use and have tried...
---This don't work with g++ compiler
system("cls"); error: sh: 1: cls: not found
system("clrscr"); sh: 1: clrscr: not found
I stumble upon this code that it works, i know it prints lot's of lines ...
cout << string(50, '\n');
any cleaner methods that I could possibly use ?
The Unix command for clearing the terminal is clear.
Alternatively, send the terminal codes for doing same (this varies by terminal, but this sequence works for most):
cout << "\033[H\033[2J";
(I got the sequence by simply running clear | less on my system. Try it and see if you get the same result.)

Regex library not working correctly in c++

I have been looking up places to work with regex in c++ , as I want to learn regular expressions in c++ (do give me a step by step link also if you guys have any). I am using g++ to compile my programs and working in Ubuntu.
earlier my program were not compiling but then I read this post where it said to compile the program by
"g++ -std=c++0x sample.cpp"
to use the regex header.
My first program works correctly, i tried implementing regex_match
#include<stdio.h>
#include<iostream>
#include<regex>
using namespace std;
int main()
{
string str = "Hello world";
regex rx ("ello");
if(regex_match(str.begin(), str.end(), rx))
{
cout<<"True"<<endl;
}
else
cout<<"False"<<endl;
return(0);
}
for which my program returned false ... ( as the expression is not matching completely)
I also rechecked it by making it match...it works.
Now I am writing another program to implement regex_replace and regex_search . Both of which doesnt work ( for regex_search just replace regex_match in the above program with regex_search. kindly help.I dont know where I am getting wrong.
The <regex> header is not fully supported by GCC.
You can see GCC support here.

swprintf fails with unicode characters in xcode, but works in visual studio

While trying to convert some existing code to support unicode characters this problem popped up. If i try to pass a unicode character (in this case im using the euro symbol) into any of the *wprintf functions it will fail, but seemingly only in xcode. The same code works fine in visual studio and I was even able to get a friend to test it successfully with gcc on linux. Here is the offending code:
wchar_t _teststring[10] = L"";
int _iRetVal = swprintf(_teststring, 10, L"A¥€");
wprintf(L"return: %d\n", _iRetVal);
// print values stored in string to check if anything got corrupted
for (int i=0; i<wcslen(_teststring); ++i) {
wprintf(L"%d: (%d)\n", i, _teststring[i]);
}
In xcode the call to swprintf will return -1, while in visual studio it will succeed and proceed to print out the correct values for each of the 3 chars (65, 165, 8364).
I have googled long and hard for solutions, one suggestion that has appeared a number of times is using a call such as:
setlocale(LC_CTYPE, "UTF-8");
I have tried various combinations of arguments with this function with no success, upon further investigation it appears to be returning null if i try to set the locale to any value other than the default "C".
I'm at a loss as to what else i can try to solve this problem, and the fact it works in other compilers/platforms just makes it all the more frustrating. Any help would be much appreciated!
EDIT:
Just thought i would add that when the swprintf call fails it sets an error code (92) which is defined as:
#define EILSEQ 92 /* Illegal byte sequence */
It should work if you fetch the locale from the environment:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main(void) {
setlocale(LC_ALL, "");
wchar_t _teststring[10] = L"";
int _iRetVal = swprintf(_teststring, 10, L"A¥€");
wprintf(L"return: %d\n", _iRetVal);
// print values stored in string to check if anything got corrupted
for (int i=0; i<wcslen(_teststring); ++i) {
wprintf(L"%d: (%d)\n", i, _teststring[i]);
}
}
On my OS X 10.6, this works as expected with GCC 4.2.1, but when compiled with CLang 1.6, it places the UTF-8 bytes in the result string.
I could also compile this with Xcode (using the standard C++ console application template), but because graphical applications on OS X don't have the required locale environment variables, it doesn't work in Xcode's console. On the other hand, it always works in the Terminal application.
You could also set the locale to en_US.UTF-8 (setlocale(LC_ALL, "en_US.UTF-8")), but that is non-portable. Depending on your goal there may be better alternatives to wsprintf.
If you are using Xcode 4+ make sure you have set an appropriate encoding for your files that contain your strings. You can find the encoding settings on a right pane under "Text Settings" group.
Microsoft had a plan to be compatible with other compilers starting from VS 2015 but finally it never happened because of problems with legacy code, see link.
Fortunately you can still enable ISO C (C99) standard in VS 2015 by adding _CRT_STDIO_ISO_WIDE_SPECIFIERS preprocessor macro. It is recommended while writing portable code.
I found that using "%S" (upper case) in the formatting string works.
"%s" is for 8-bit characters, and "%S" is for 16-bit or 32-bit characters.
See: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
I'm using Qt Creator 4.11, which uses Clang 10.