What is the simplest way to erase the first line from a string?
Example:
"abc\ndef\nghi"
=>
"def\nghi"
You would use the .find to find where the first \n is and then use the .erase to remove starting from the first character to where you found \n.
#include <iostream>
#include <string>
int main()
{
std::string myString = "abc\ndef\nghi";
myString.erase(0, myString.find("\n") + 1);
std::cout << myString;
}
Caesar's answer fails when the source is MacOS because:
\n => Un*x
\r\n => windows
\r => MacOS
A better way using boost::regex could be :
boost::regex kNewLine("\r\n|\n|\r");
boost::split_regex(oSplitMessage, iRawMessage, kNewLine);
I hope it helps.
Related
I use a various regexes to parse a C source file, line by line. First i read all the content of file in a string:
ifstream file_stream("commented.cpp",ifstream::binary);
std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());
Then i use a set of regex, which should be applied continusly until the match found, here i will give only one for example:
vector<regex> rules = { regex("^//[^\n]*$") };
char * search =(char*)txt.c_str();
int position = 0, length = 0;
for (int i = 0; i < rules.size(); i++) {
cmatch match;
if (regex_search(search + position, match, rules[i],regex_constants::match_not_bol | regex_constants::match_not_eol))
{
position += ( match.position() + match.length() );
}
}
But it don't work. It will match the comment not in the current line, but it will search whole string, for the first match, regex_constants::match_not_bol and regex_constants::match_not_eol should make the regex_search to recognize ^$ as start/end of line only, not end start/end of whole block. So here is my file:
commented.cpp:
#include <stdio.h>
//comment
The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:
#include <stdio.h>
But instead it searches whole string, and immideatly finds //comment. I need help, to make regex_search match only in current line. The options match_not_bol and match_not_eol do not help me. Of course i can read a file line by line in a vector, and then do match of all rules on each string in vector, but it is very slow, i have done that, and it take too long time to parse a big file like that, that's why i want to let regex deal with new lines, and use positioning counter.
If it is not what you want please comment so I will delete the answer
What you are doing is not a correct way of using a regex library.
Thus here is my suggestion for anyone that wants to use std::regex library.
It only supports ECMAScript that somehow is a little
poor than all modern regex library.
It has bugs as many as you like ( just I found ):
the same regex but different results on Linux and Windows only C++
std::regex and ignoring flags
std::regex_match and lazy quantifier with strange behavior
In some cases (I test specifically with std::match_results ) It is 200 times slower in comparison to std.regex in d language
It has very confusing flag-match and almost it does not work (at least for me)
conclusion: do not use it at all.
But if anyone still demands to use c++ anyway then you can:
use boost::regex about Boost library because:
It is PCRE support
It has less bug ( I have not seen any )
It is smaller in bin file ( I mean executable file after compiling )
It is faster then std::regex
use gcc version 7.1.0 and NOT below. The last bug I found is in version 6.3.0
use clang version 3 or above
If you have enticed (= persuade) to NOT use c++ then you can use:
Use d regular expression link library for large task: std.regex and why:
Fast Faster Command Line Tools in D
Easy
Flexible drn
Use native pcre or pcre2 link that have been written in c
Extremely fast but a little complicated
Use perl for a simple task and specially Perl one-liner link
#include <stdio.h>
//comment
The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:
#include <stdio.h>
But instead it searches whole string, and immideatly finds //comment. I need help, to make regex_search match only in current line.
Are you trying to match all // comments in a source code file, or only the first line?
The former can be done like this:
#include <iostream>
#include <fstream>
#include <regex>
int main()
{
auto input = std::ifstream{"stream_union.h"};
for(auto line = std::string{}; getline(input, line); )
{
auto submatch = std::smatch{};
auto pattern = std::regex(R"(//)");
std::regex_search(line, submatch, pattern);
auto match = submatch.str(0);
if(match.empty()) continue;
std::cout << line << std::endl;
}
std::cout << std::endl;
return EXIT_SUCCESS;
}
And the later can be done like this:
#include <iostream>
#include <fstream>
#include <regex>
int main()
{
auto input = std::ifstream{"stream_union.h"};
auto line = std::string{};
getline(input, line);
auto submatch = std::smatch{};
auto pattern = std::regex(R"(//)");
std::regex_search(line, submatch, pattern);
auto match = submatch.str(0);
if(match.empty()) { return EXIT_FAILURE; }
std::cout << line << std::endl;
return EXIT_SUCCESS;
}
If for any reason you're trying to get the position of the match, tellg() will do that for you.
My current code is:
#include <iostream>
#include <Poco/Foundation.h>
#include <Poco/RegularExpression.h>
int main()
{
Poco::RegularExpression regex("[A-Z]+\s+[A-Z]+");
Poco::RegularExpression::MatchVec mvec;
constad std::string astring = "ABC\nDEFG";
int matches = regex.match(astring,0,mvec);
std::cout << "Hello World\n";
return 0;
}
The position of the '\n' in the string I am trying to match can be, a single space, multiple spaces, or new line(hence why I am using whitespace meta character).
The number of matches returned is zero. Is there a flag I need to set or something?
The problem is the scape sequence in your regex.
In this case you want to add a backslash (\) into the string astring, using the token \s, but in C/C++ or Java it must be writen as double \\. So, to fix your problem you must add another backslash:
Poco::RegularExpression regex("[A-Z]+\\s+[A-Z]+");
Here you can find the reference:
http://en.cppreference.com/w/cpp/language/escape
This should work
Poco::RegularExpression s ("\\s"); // White char
Poco::RegularExpression n ("\\n"); // New line
Poco::RegularExpression r ("\\r"); // Carrige return
Poco::RegularExpression t ("\\t"); // Tabulator
say I have a text, represented as std::string, which contains several different newline, e.g. \r\n but also just \n or even just \r.
I would like now to unify this by replacing all non \r\n newlines, namely all \r and all \n newlines with \r\n.
A simple boost::replace_all(text, "\n", "\r\n"); doesn't work unfortunatly because that would also replace the \n within the already valid \r\n's.
I think std::regex should be a good way to handle this... but how should I express this in a regex? Here is some code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
std::regex reg(""); // What to put here?
text = std::regex_replace(text, reg, "\r\n");
std::cout << text;
}
The text should at the end just be "aaa\r\nbbb\r\nccc\r\nddd\r\n"
std::regex_replace(text, reg, "\r\n|\r|\n");
should match.
More info here:
Match linebreaks - \n or \r\n?
You could do that in two steps:
\n -> \r\n
\r\r\n -> \r\n
or in one step:
(?:\r\n|\n|\r) -> \r\n
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string text = "a\rb\nc\r\nd\n";
text = std::regex_replace(text, std::regex("(?:\\r\\n|\\n|\\r)"), "\r\n");
std::cout << text;
}
To swap "\n" with no preceding "\r" you can actually use a look ahead:
std::regex_replace("\n\n\n\n\n", std::regex("[^\r](?=\n)"), "$1\r\n");
This cannot handle the the last new line of a file, so you would need another operation.
To swap "\r" with no following "\n" is a bit easier:
std::regex_replace(text, std::regex("\r[^\n]"), "\r\n");
Note depending on the c++ regexp flavor good chance you can't support look behinds if you're considering it.
\R stands for any kind of linebreak, ie.: \n or \r or \r\n
my regex doesn't works. Why?
boost::regex re("anonuuid|anon_id", boost::regex::icase);
target_string = "anonuuid final.device_anonuuid anon_id";
boost::replace_all(target_string, "anonuuid", "device_anonuuid");
The idea is to find and replace the WHOLE word anonuuid OR anon_id. I've used the word boundary tag \b but even with it, it's not working. Below is the result of my code.
device_anonuuid final.device_device_anonuuid anon_id"
But i wish to get this
device_anonuuid final.device_anonuuid device_anonuuid
Thanks, in advance.
You want regex_replace_all, see it Live On Coliru.
Also note:
you need to escape \b (e.g. "\\b")
you need to pass an lvalue as format
there's also regex_replace_all_copy that returns a new string instead of modifying the input string
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <string>
#include <iostream>
int main()
{
boost::regex re("\\b(anonuuid|anon_id)\\b", boost::regex::icase);
std::string target_string = "anonuuid final.device_anonuuid anon_id";
std::string format = "QQQQ";
boost::replace_all_regex(target_string, re, format, boost::match_flag_type::match_default);
std::cout << target_string;
}
I have a html/xml document that is originally a CString and I want to get rid of all the newlines, essentially put everything into one line. I've tried converting it to std::String and using:
#include <algorithm>
#include <string>
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
But it didn't work.
In order to stop your block of text looking odd, you'd want to replace the line breaks with a space. Make sure to replace both newline('\n') and carriage return('\r') characters.
CString str = "Line 1 Windows Style\r\n Line 2 Unix Style\n Line 3";
str.Replace('\r', " ");
str.Replace('\n', " ");
str.Replace(" ", " ");
You need only to use the method remove
CString str = _T("Test newline \nremove"), str2;
str.Remove('\n');
How about?
str.Replace("\n", "");
Documented here