How to match "\n" in Poco::RegularExpression C++? - c++

My current code is:
#include <iostream>
#include <Poco/Foundation.h>
#include <Poco/RegularExpression.h>
int main()
{
Poco::RegularExpression regex("[A-Z]+\s+[A-Z]+");
Poco::RegularExpression::MatchVec mvec;
constad std::string astring = "ABC\nDEFG";
int matches = regex.match(astring,0,mvec);
std::cout << "Hello World\n";
return 0;
}
The position of the '\n' in the string I am trying to match can be, a single space, multiple spaces, or new line(hence why I am using whitespace meta character).
The number of matches returned is zero. Is there a flag I need to set or something?

The problem is the scape sequence in your regex.
In this case you want to add a backslash (\) into the string astring, using the token \s, but in C/C++ or Java it must be writen as double \\. So, to fix your problem you must add another backslash:
Poco::RegularExpression regex("[A-Z]+\\s+[A-Z]+");
Here you can find the reference:
http://en.cppreference.com/w/cpp/language/escape

This should work
Poco::RegularExpression s ("\\s"); // White char
Poco::RegularExpression n ("\\n"); // New line
Poco::RegularExpression r ("\\r"); // Carrige return
Poco::RegularExpression t ("\\t"); // Tabulator

Related

Garbled character output when using cout API

I am trying to run this simple code in VS 2015
#include "stdafx.h"
# include <iostream>
int main()
{
char * szOldPath = "\"C:\icm\scripts\StartupSync\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
However, the output of szOldPath is not proper and the console is printing--
unall.bat" nonprodupSync
I suspect this might be because of Unicode and I should be using wcout. So I disabled Unicode by going to Configuration Properties -> General --> Character Set and tried setting it to Not Set or Multi Byte. But still running into this issue.
I understand it is not good to disable UNICODE but I am trying to understand some legacy code written in our company and this experiment is a part of this exercise,is there any way I can get the cout command to print szOldPath successfully?
Your issue has nothing to do with Unicode.
\r is the escape sequence for a carriage return. So, you are printing out "C:\icm\scripts\StartupSync, and then \r tells the terminal to move the cursor back to the beginning of the current line, and then unall.bat" nonprod is printed, overwriting what was already there.
You need to escape all of the \ characters in your string literal, just like you had to escape the " characters.
Also, your variable needs to be declared as a pointer to const char when assigning a string literal to the pointer. This is enforced in C++11 and later:
#include "stdafx.h"
#include <iostream>
int main()
{
const char * szOldPath = "\"C:\\icm\\scripts\\StartupSync\\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
Alternatively, in C++11 and later, you can use a raw string literal instead to avoid having to escape any characters with a leading \:
const char * szOldPath = R"("C:\icm\scripts\StartupSync\runall.bat" nonprod)";

Problem removing backslash characters on std::string

I'm trying to execute CMD commands which I'm getting deserializing a JSON message.
When I deserialize message, I store the value in a std::string variable whose value is "tzutil /s \"Romance Standard Time_dstoff\"":
I would like to remove backslash characters ('\') when I receive commands with floating quotes parameters (e.g."tzutil /s "Romance Standard Time_dstoff"").
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
system(command.c_str());
Are there any way to do it?
I will appreciate any kind of help.
If you wish to remove all occurrences the character then you may use
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), char_to_remove), str.end());
If you wish to replace them with another character then try
#include <algorithm>
std::replace(str.begin(), str.end(), old_char, new_char);
Here is a function I made in C++ for one of my own projects for replacing sub-strings.
std::string
Replace(std::string str,
const std::string& oldStr,
const std::string& newStr)
{
size_t index = str.find(oldStr);
while(index != str.npos)
{
str = str.substr(0, index) +
newStr + str.substr(index + oldStr.size());
index = str.find(oldStr, index + newStr.size());
}
return str;
}
int main(){
std::string command = GetCommandFromJsonSource();
command = Replace(command, "\\\"", "\""); // unescape only double quotes
}
Although the source code of your program does contain, the string represented by the literal doesn't contain any backslashes, as demonstrated by the following example:
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
std::cout << command;
// output:
tzutil /s "Romance Standard Time_dstoff"
As such, there is nothing to remove from the string.
Backslash is an escape character. \" is an escape sequence that represents a single character, the double quote. It is a way to type a double quote character within a string literal without that quote being interpreted as the end of the string instead.
To write a backslash into a string literal, you can by escaping it with a backslash. The following string does contain backslashes: "tzutil /s \\"Romance Standard Time_dstoff\\"". In this case, removing all backslashes can be done like so:
command.erase(std::remove(command.begin(), command.end(), '\\'), command.end());
However, simply removing all instances of the character might not be sensible. If your string contains escape sequences, what you probably should want to do instead is to unescape them. This is somewhat more complicated. You wouldn't want to remove all backslashes, but instead replace \" with " and \\ with \ and \n with a newline and so on.
You can use std::quoted to convert from and to a string literal.
#include <iomanip> // -> std::quoted
#include <iostream>
#include <sstream>
int main() {
std::istringstream s("\"Hello world\\n\"");
std::string hello;
s >> std::quoted(hello);
std::cout << std::quoted(s) << ": " << s;
}

Regex for different newlines

say I have a text, represented as std::string, which contains several different newline, e.g. \r\n but also just \n or even just \r.
I would like now to unify this by replacing all non \r\n newlines, namely all \r and all \n newlines with \r\n.
A simple boost::replace_all(text, "\n", "\r\n"); doesn't work unfortunatly because that would also replace the \n within the already valid \r\n's.
I think std::regex should be a good way to handle this... but how should I express this in a regex? Here is some code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
std::regex reg(""); // What to put here?
text = std::regex_replace(text, reg, "\r\n");
std::cout << text;
}
The text should at the end just be "aaa\r\nbbb\r\nccc\r\nddd\r\n"
std::regex_replace(text, reg, "\r\n|\r|\n");
should match.
More info here:
Match linebreaks - \n or \r\n?
You could do that in two steps:
\n -> \r\n
\r\r\n -> \r\n
or in one step:
(?:\r\n|\n|\r) -> \r\n
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
text = std::regex_replace(text, std::regex("(?:\\r\\n|\\n|\\r)"), "\r\n");
std::cout << text;
}
To swap "\n" with no preceding "\r" you can actually use a look ahead:
std::regex_replace("\n\n\n\n\n", std::regex("[^\r](?=\n)"), "$1\r\n");
This cannot handle the the last new line of a file, so you would need another operation.
To swap "\r" with no following "\n" is a bit easier:
std::regex_replace(text, std::regex("\r[^\n]"), "\r\n");
Note depending on the c++ regexp flavor good chance you can't support look behinds if you're considering it.
\R stands for any kind of linebreak, ie.: \n or \r or \r\n

unchecked exception while running regex- get file name without extention from file path

I have this simple program
string str = "D:\Praxisphase 1 project\test\Brainstorming.docx";
regex ex("[^\\]+(?=\.docx$)");
if (regex_match(str, ex)){
cout << "match found"<< endl;
}
expecting the result to be true, my regex is working since I have tried it online, but when trying to run in C++ , the app throws unchecked exception.
First of all, use raw string literals when defining regex to avoid issues with backslashes (the \. is not a valid escape sequence, you need "\\." or R"(\.)"). Second, regex_match requires a full string match, thus, use regex_search.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str = R"(D:\Praxisphase 1 project\test\Brainstorming.docx)";
// OR
// string str = R"D:\\Praxisphase 1 project\\test\\Brainstorming.docx";
regex ex(R"([^\\]+(?=\.docx$))");
if (regex_search(str, ex)){
cout << "match found"<< endl;
}
return 0;
}
See the C++ demo
Note that R"([^\\]+(?=\.docx$))" = "[^\\\\]+(?=\\.docx$)", the \ in the first are literal backslashes (and you need two backslashes in a regex pattern to match a \ symbol), and in the second, the 4 backslashes are necessary to declare 2 literal backslashes that will match a single \ in the input text.

using \ in a string as literal instead of an escape

bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
So the problem right now is: Xcode is not reading in the '\', when I was debugging in stringMatch function, expr appears only to be 'asb' instead of the literal a\sb'.
And Xcode is spitting out an warning at the line:
string a = "a\sb" : Unknown escape sequence
Edit: I have already tried using "a\\sb", it reads in as "a\\sb" as literal.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
C and C++ deal with backslashes as escape sequences by default. You got to tell C to not use your backslash as an escape sequence by adding an extra backslash to your string.
These are the common escape sequences:
\a - Bell(beep)
\b - Backspace
\f - Formfeed
\n - New line
\r - Carriage Return
\t - Horizontal Tab
\\ - Backslash
\' - Single Quotation Mark
\" - Double Quatation Mark
\ooo - Octal Representation
\xdd - Hexadecimal Representaion
EDIT: Xcode is behaving abnormally on your machine. So I can suggest you this.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a" "\x5C" "sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
Don't worry about the spaces in the string a declaration, Xcode concatenates strings separated with a space.
EDIT 2: Indeed Xcode is reading your "a\\b" literally, that's how it deals with escaped backslashes. When you'll output string a = "a\\sb" to console, you'll see, a\sb. But when you'll pass string a between methods as argument or as a private member then it will take the extra backslash literally. You have to design your code considering this fact so that it ignores the extra backslash. It's upto you how you handle the string.
EDIT 3: Edit 1 is your optimal answer here, but here's another one.
Add code in your stringMatch() method to replace double backslashes with single backslash.
You just need to add this extra line at the very start of the function:
expr=[expr stringByReplacingOccurrencesOfString:#"\\\\" withString:#"\\"];
This should solve the double backslash problem.
EDIT 4:
Some people think Edit 3 is ObjectiveC and thus is not optimal, so another option in ObjectiveC++.
void searchAndReplace(std::string& value, std::string const& search,std::string const& replace)
{
std::string::size_type next;
for(next = value.find(search); // Try and find the first match
next != std::string::npos; // next is npos if nothing was found
next = value.find(search,next) // search for the next match starting after
// the last match that was found.
)
{
// Inside the loop. So we found a match.
value.replace(next,search.length(),replace); // Do the replacement.
next += replace.length(); // Move to just after the replace
// This is the point were we start
// the next search from.
}
}
EDIT 5: If you change the const char * in stringMatch() to 'string` it will be less complex for you.
expr.replace(/*size_t*/ pos1, /*size_t*/ n1, /*const string&*/ str );
EDIT 6: From C++11 on, there exists something like raw string literals.
This means you don't have to escape, instead, you can write the following:
string a = R"raw(a\sb)raw";
Note that the raw in the string can be replaced by any delimiter of your choosing. This for the case you want to use a sub string like )raw in the actual string. Using these raw string literals mainly make sense when you have to escape characters a lot, like in combination with std::regex.
P.S. You have all the answers now, so it's upto you which one you implement that gives you the best results.
Xcode is spitting out that warning because it is interpreting \s in "a\sb" as an escape sequence, but \s is not a valid escape sequence. It gets replaced with just s so the string becomes "asb".
Escaping the backslash like "a\\sb" is the correct solution. If this somehow didn't work for you please post more details on that.
Here's an example.
#include <iostream>
#include <string>
int main() {
std::string a = "a\\sb";
std::cout << a.size() << ' ' << a << '\n';
}
The output of this program looks like:
If you get different output please post it. Also please post exactly what problem you observed when you tried "a\\sb" earlier.
Regexs can be a pain in C++ because backslashes have to be escaped this way. C++11 has raw strings that don't allow any kind of escaping so that escaping the backslash is unnecessary: R"(a\sb)".