Regex for different newlines - c++

say I have a text, represented as std::string, which contains several different newline, e.g. \r\n but also just \n or even just \r.
I would like now to unify this by replacing all non \r\n newlines, namely all \r and all \n newlines with \r\n.
A simple boost::replace_all(text, "\n", "\r\n"); doesn't work unfortunatly because that would also replace the \n within the already valid \r\n's.
I think std::regex should be a good way to handle this... but how should I express this in a regex? Here is some code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
std::regex reg(""); // What to put here?
text = std::regex_replace(text, reg, "\r\n");
std::cout << text;
}
The text should at the end just be "aaa\r\nbbb\r\nccc\r\nddd\r\n"

std::regex_replace(text, reg, "\r\n|\r|\n");
should match.
More info here:
Match linebreaks - \n or \r\n?

You could do that in two steps:
\n -> \r\n
\r\r\n -> \r\n
or in one step:
(?:\r\n|\n|\r) -> \r\n
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
text = std::regex_replace(text, std::regex("(?:\\r\\n|\\n|\\r)"), "\r\n");
std::cout << text;
}

To swap "\n" with no preceding "\r" you can actually use a look ahead:
std::regex_replace("\n\n\n\n\n", std::regex("[^\r](?=\n)"), "$1\r\n");
This cannot handle the the last new line of a file, so you would need another operation.
To swap "\r" with no following "\n" is a bit easier:
std::regex_replace(text, std::regex("\r[^\n]"), "\r\n");
Note depending on the c++ regexp flavor good chance you can't support look behinds if you're considering it.

\R stands for any kind of linebreak, ie.: \n or \r or \r\n

Related

Problem removing backslash characters on std::string

I'm trying to execute CMD commands which I'm getting deserializing a JSON message.
When I deserialize message, I store the value in a std::string variable whose value is "tzutil /s \"Romance Standard Time_dstoff\"":
I would like to remove backslash characters ('\') when I receive commands with floating quotes parameters (e.g."tzutil /s "Romance Standard Time_dstoff"").
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
system(command.c_str());
Are there any way to do it?
I will appreciate any kind of help.
If you wish to remove all occurrences the character then you may use
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), char_to_remove), str.end());
If you wish to replace them with another character then try
#include <algorithm>
std::replace(str.begin(), str.end(), old_char, new_char);
Here is a function I made in C++ for one of my own projects for replacing sub-strings.
std::string
Replace(std::string str,
const std::string& oldStr,
const std::string& newStr)
{
size_t index = str.find(oldStr);
while(index != str.npos)
{
str = str.substr(0, index) +
newStr + str.substr(index + oldStr.size());
index = str.find(oldStr, index + newStr.size());
}
return str;
}
int main(){
std::string command = GetCommandFromJsonSource();
command = Replace(command, "\\\"", "\""); // unescape only double quotes
}
Although the source code of your program does contain, the string represented by the literal doesn't contain any backslashes, as demonstrated by the following example:
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
std::cout << command;
// output:
tzutil /s "Romance Standard Time_dstoff"
As such, there is nothing to remove from the string.
Backslash is an escape character. \" is an escape sequence that represents a single character, the double quote. It is a way to type a double quote character within a string literal without that quote being interpreted as the end of the string instead.
To write a backslash into a string literal, you can by escaping it with a backslash. The following string does contain backslashes: "tzutil /s \\"Romance Standard Time_dstoff\\"". In this case, removing all backslashes can be done like so:
command.erase(std::remove(command.begin(), command.end(), '\\'), command.end());
However, simply removing all instances of the character might not be sensible. If your string contains escape sequences, what you probably should want to do instead is to unescape them. This is somewhat more complicated. You wouldn't want to remove all backslashes, but instead replace \" with " and \\ with \ and \n with a newline and so on.
You can use std::quoted to convert from and to a string literal.
#include <iomanip> // -> std::quoted
#include <iostream>
#include <sstream>
int main() {
std::istringstream s("\"Hello world\\n\"");
std::string hello;
s >> std::quoted(hello);
std::cout << std::quoted(s) << ": " << s;
}

How to match "\n" in Poco::RegularExpression C++?

My current code is:
#include <iostream>
#include <Poco/Foundation.h>
#include <Poco/RegularExpression.h>
int main()
{
Poco::RegularExpression regex("[A-Z]+\s+[A-Z]+");
Poco::RegularExpression::MatchVec mvec;
constad std::string astring = "ABC\nDEFG";
int matches = regex.match(astring,0,mvec);
std::cout << "Hello World\n";
return 0;
}
The position of the '\n' in the string I am trying to match can be, a single space, multiple spaces, or new line(hence why I am using whitespace meta character).
The number of matches returned is zero. Is there a flag I need to set or something?
The problem is the scape sequence in your regex.
In this case you want to add a backslash (\) into the string astring, using the token \s, but in C/C++ or Java it must be writen as double \\. So, to fix your problem you must add another backslash:
Poco::RegularExpression regex("[A-Z]+\\s+[A-Z]+");
Here you can find the reference:
http://en.cppreference.com/w/cpp/language/escape
This should work
Poco::RegularExpression s ("\\s"); // White char
Poco::RegularExpression n ("\\n"); // New line
Poco::RegularExpression r ("\\r"); // Carrige return
Poco::RegularExpression t ("\\t"); // Tabulator

unchecked exception while running regex- get file name without extention from file path

I have this simple program
string str = "D:\Praxisphase 1 project\test\Brainstorming.docx";
regex ex("[^\\]+(?=\.docx$)");
if (regex_match(str, ex)){
cout << "match found"<< endl;
}
expecting the result to be true, my regex is working since I have tried it online, but when trying to run in C++ , the app throws unchecked exception.
First of all, use raw string literals when defining regex to avoid issues with backslashes (the \. is not a valid escape sequence, you need "\\." or R"(\.)"). Second, regex_match requires a full string match, thus, use regex_search.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str = R"(D:\Praxisphase 1 project\test\Brainstorming.docx)";
// OR
// string str = R"D:\\Praxisphase 1 project\\test\\Brainstorming.docx";
regex ex(R"([^\\]+(?=\.docx$))");
if (regex_search(str, ex)){
cout << "match found"<< endl;
}
return 0;
}
See the C++ demo
Note that R"([^\\]+(?=\.docx$))" = "[^\\\\]+(?=\\.docx$)", the \ in the first are literal backslashes (and you need two backslashes in a regex pattern to match a \ symbol), and in the second, the 4 backslashes are necessary to declare 2 literal backslashes that will match a single \ in the input text.

Erase first line from a string

What is the simplest way to erase the first line from a string?
Example:
"abc\ndef\nghi"
=>
"def\nghi"
You would use the .find to find where the first \n is and then use the .erase to remove starting from the first character to where you found \n.
#include <iostream>
#include <string>
int main()
{
std::string myString = "abc\ndef\nghi";
myString.erase(0, myString.find("\n") + 1);
std::cout << myString;
}
Caesar's answer fails when the source is MacOS because:
\n => Un*x
\r\n => windows
\r => MacOS
A better way using boost::regex could be :
boost::regex kNewLine("\r\n|\n|\r");
boost::split_regex(oSplitMessage, iRawMessage, kNewLine);
I hope it helps.

Getting rid of newline in CString in C++

I have a html/xml document that is originally a CString and I want to get rid of all the newlines, essentially put everything into one line. I've tried converting it to std::String and using:
#include <algorithm>
#include <string>
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
But it didn't work.
In order to stop your block of text looking odd, you'd want to replace the line breaks with a space. Make sure to replace both newline('\n') and carriage return('\r') characters.
CString str = "Line 1 Windows Style\r\n Line 2 Unix Style\n Line 3";
str.Replace('\r', " ");
str.Replace('\n', " ");
str.Replace(" ", " ");
You need only to use the method remove
CString str = _T("Test newline \nremove"), str2;
str.Remove('\n');
How about?
str.Replace("\n", "");
Documented here