The code below doesn't give the same result in Visual Studio 2015 and IDEOne.com (C++14). More strange, in both cases the results are incorrect !
#include <iostream>
#include <regex>
int main()
{
const char* pszTestString = "ENDRESS+HAUSER*ST-DELL!HP||BESTMATCH&&ABCD\\ABCD";
const char* pszExpectedString = "ENDRESS\\+HAUSER\\*ST\\-DELL\\!HP\\||BESTMATCH\\&&ABCD\\\\ABCD";
std::cout << std::regex_replace(pszTestString, std::regex("[-+!\"\\[\\](){}^~*?:]|&&|\\|\\|"), "\\$0") << std::endl;
std::cout << pszExpectedString << std::endl;
return 0;
}
Under Visual Studio 2015 I got this strange result, the second line contains the expected result for both compilers :
ENDRESS\$0HAUSER\$0ST\$0DELL\$0HP\$0BESTMATCH\$0ABCD\ABCD
ENDRESS\+HAUSER\*ST\-DELL\!HP\||BESTMATCH\&&ABCD\\ABCD
With IDEOne (C++14 compiler) :
ENDRESS\+HAUSER\*ST\-DELL\!HP\||BESTMATCH\&&ABCD\ABCD
ENDRESS\+HAUSER\*ST\-DELL\!HP\||BESTMATCH\&&ABCD\\ABCD
We can see in the latter that there is a mistake : before the last "ABCD" there must be two backslashes and not a single one
What the heck is going on ? I wrote a manual parser instead of using std::regex_replace for the moment, but I really want make it work under VS2015 (and any other IDE ideally) and make a benchmark before choosing the manual parsing solution.
VS2015 default compiler does not treat $0 as a zeroth backreference. You need to use the "native" ECMAScript $& backreference to refer to the whole match from inside the replacement pattern.
Also, revo is right, in order to match \ you need to add it to the character class.
And note that in VS2015 you can use raw string literals. It is best practice to use raw string literals to define regex patterns as they help avoid overescaping (also called as backslash hell).
Solution:
std::cout << std::regex_replace(pszTestString,
std::regex(R"([-+!\\\"\[\](){}^~*?:]|&&|\|\|)"), "\\$&") << std::endl;
^^ ^^
Related
I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.
Is there a simple way to escape all occurrences of \ in a string? I start with the following string:
#include <string>
#include <iostream>
std::string escapeSlashes(std::string str) {
// I have no idea what to do here
return str;
}
int main () {
std::string str = "a\b\c\d";
std::cout << escapeSlashes(str) << "\n";
// Desired output:
// a\\b\\c\\d
return 0;
}
Basically, I am looking for the inverse to this question. The problem is that I cannot search for \ in the string, because C++ already treats it as an escape sequence.
NOTE: I am not able to change the string str in the first place. It is parsed from a LaTeX file. Thus, this answers to a similar question does not apply. Edit: The parsing failed due to an unrelated problem, the question here is about string literals.
Edit: There are nice solutions to find and replace known escape sequences, such as this answer. Another option is to use boost::regex("\p{cntrl}"). However, I haven't found one that works for unknown (erroneous) escape sequences.
You can use raw string literal. See http://en.cppreference.com/w/cpp/language/string_literal
#include <string>
#include <iostream>
int main() {
std::string str = R"(a\b\c\d)";
std::cout << str << "\n";
return 0;
}
Output:
a\b\c\d
It is not possible to convert the string literal a\b\c\d to a\\b\\c\\d, i.e. escaping the backslashes.
Why? Because the compiler converts \c and \d directly to c and d, respectively, giving you a warning about Unknown escape sequence \c and Unknown escape sequence \d (\b is fine as it is a valid escape sequence). This happens directly to the string literal before you have any chance to work with it.
To see this, you can compile to assembler
gcc -S main.cpp
and you will find the following line somewhere in your assembler code:
.string "a\bcd"
Thus, your problem is either in your parsing function or you use string literals for experimenting and you should use raw strings R"(a\b\c\d)" instead.
first time regex (in c++ that is)
I have a hard time writing
(?<=name=")(?:[^\\"]+|\\.)*(?=")
that matches for example name="blabla" xyz as blabla as code...
How do I
std::regex TheName("(?<=name=")(?:[^\\"]+|\\.)*(?=")");
correctly please?
You need to use capturing rather than positive lookbehind in C++ regex. Also, it is advisable to use the unroll-the-loop principle to unroll your ([^"\\]|\\.)* subpattern to make the regex as fast as it can be, see [^\"\\]*(?:\\.[^\"\\]*)*. Also, it is advisable to use raw string literals (see R"(<PATTERN>)") when defining regex patterns in order to avoid overescaping.
See the C++ demo:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::string s = "name=\"bla \\\"bla\\\"\"";
std::regex TheName(R"(name=\"([^\"\\]*(?:\\.[^\"\\]*)*)\")");
std::smatch m;
if (regex_search(s, m, TheName)) {
std::cout << m[1].str() << std::endl;
}
return 0;
}
Result: bla \"bla\"
tl:dr
How can I concatenate const char* with std::string, neatly and
elegantly, without multiple function calls. Ideally in one function
call and have the output be a const char*. Is this impossible, what
is an optimum solution?
Initial Problem
The biggest barrier I have experienced with C++ so far is how it handles strings. In my opinion, of all the widely used languages, it handles strings the most poorly. I've seen other questions similar to this that either have an answer saying "use std::string" or simply point out that one of the options is going to be best for your situation.
However this is useless advice when trying to use strings dynamically like how they are used in other languages. I cannot guaranty to always be able to use std::string and for the times when I have to use const char* I hit the obvious wall of "it's constant, you can't concatenate it".
Every solution to any string manipulation problem I've seen in C++ requires repetitive multiple lines of code that only work well for that format of string.
I want to be able to concatenate any set of characters with the + symbol or make use of a simple format() function just how I can in C# or Python. Why is there no easy option?
Current Situation
Standard Output
I'm writing a DLL and so far I've been output text to cout via the << operator. Everything has been going fine so far using simple char arrays in the form:
cout << "Hello world!"
Runtime Strings
Now it comes to the point where I want to construct a string at runtime and store it with a class, this class will hold a string that reports on some errors so that they can be picked up by other classes and maybe sent to cout later, the string will be set by the function SetReport(const char* report). So I really don't want to use more than one line for this so I go ahead and write something like:
SetReport("Failure in " + __FUNCTION__ + ": foobar was " + foobar + "\n"); // __FUNCTION__ gets the name of the current function, foobar is some variable
Immediately of course I get:
expression must have integral or unscoped enum type and...
'+': cannot add two pointers
Ugly Strings
Right. So I'm trying to add two or more const char*s together and this just isn't an option. So I find that the main suggestion here is to use std::string, sort of weird that typing "Hello world!" doesn't just give you one of those in the first place but let's give it a go:
SetReport(std::string("Failure in ") + std::string(__FUNCTION__) + std::string(": foobar was ") + std::to_string(foobar) + std::string("\n"));
Brilliant! It works! But look how ugly that is!! That's some of the ugliest code I've every seen. We can simplify to this:
SetReport(std::string("Failure in ") + __FUNCTION__ + ": foobar was " + std::to_string(foobar) + "\n");
Still possibly the worst way I've every encounter of getting to a simple one line string concatenation but everything should be fine now right?
Convert Back To Constant
Well no, if you're working on a DLL, something that I tend to do a lot because I like to unit test so I need my C++ code to be imported by the unit test library, you will find that when you try to set that report string to a member variable of a class as a std::string the compiler throws a warning saying:
warning C4251: class 'std::basic_string<_Elem,_Traits,_Alloc>' needs to have dll-interface to be used by clients of class'
The only real solution to this problem that I've found other than "ignore the warning"(bad practice!) is to use const char* for the member variable rather than std::string but this is not really a solution, because now you have to convert your ugly concatenated (but dynamic) string back to the const char array you need. But you can't just tag .c_str() on the end (even though why would you want to because this concatenation is becoming more ridiculous by the second?) you have to make sure that std::string doesn't clean up your newly constructed string and leave you with garbage. So you have to do this inside the function that receives the string:
const std::string constString = (input);
m_constChar = constString.c_str();
Which is insane. Because now I traipsed across several different types of string, made my code ugly, added more lines than should need and all just to stick some characters together. Why is this so hard?
Solution?
So what's the solution? I feel that I should be able to make a function that concatenates const char*s together but also handle other object types such as std::string, int or double, I feel strongly that this should be capable in one line, and yet I'm unable to find any examples of it being achieved. Should I be working with char* rather than the constant variant, even though I've read that you should never change the value of char* so how would this help?
Are there any experienced C++ programmers who have resolved this issue and are now comfortable with C++ strings, what is your solution? Is there no solution? Is it impossible?
The standard way to build a string, formatting non-string types as strings, is a string stream
#include <sstream>
std::ostringstream ss;
ss << "Failure in " << __FUNCTION__ << ": foobar was " << foobar << "\n";
SetReport(ss.str());
If you do this often, you could write a variadic template to do that:
template <typename... Ts> std::string str(Ts&&...);
SetReport(str("Failure in ", __FUNCTION__, ": foobar was ", foobar, '\n'));
The implementation is left as an exercise for the reader.
In this particular case, string literals (including __FUNCTION__) can be concatenated by simply writing one after the other; and, assuming foobar is a std::string, that can be concatenated with string literals using +:
SetReport("Failure in " __FUNCTION__ ": foobar was " + foobar + "\n");
If foobar is a numeric type, you could use std::to_string(foobar) to convert it.
Plain string literals (e.g. "abc" and __FUNCTION__) and char const* do not support concatenation. These are just plain C-style char const[] and char const*.
Solutions are to use some string formatting facilities or libraries, such as:
std::string and concatenation using +. May involve too many unnecessary allocations, unless operator+ employs expression templates.
std::snprintf. This one does not allocate buffers for you and not type safe, so people end up creating wrappers for it.
std::stringstream. Ubiquitous and standard but its syntax is at best awkward.
boost::format. Type safe but reportedly slow.
cppformat. Reportedly modern and fast.
One of the simplest solution is to use an C++ empty string. Here I declare empty string variable named _ and used it in front of string concatenation. Make sure you always put it in the front.
#include <cstdio>
#include <string>
using namespace std;
string _ = "";
int main() {
char s[] = "chararray";
string result =
_ + "function name = [" + __FUNCTION__ + "] "
"and s is [" + s + "]\n";
printf( "%s", result.c_str() );
return 0;
}
Output:
function name = [main] and s is [chararray]
Regarding __FUNCTION__, I found that in Visual C++ it is a macro while in GCC it is a variable, so SetReport("Failure in " __FUNCTION__ "; foobar was " + foobar + "\n"); will only work on Visual C++. See: https://msdn.microsoft.com/en-us/library/b0084kay.aspx and https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
The solution using empty string variable above should work on both Visual C++ and GCC.
My Solution
I've continued to experiment with different things and I've got a solution which combines tivn's answer that involves making an empty string to help concatenate long std::string and character arrays together and a function of my own which allows single line copying of that std::string to a const char* which is safe to use when the string object leaves scope.
I would have used Mike Seymour's variadic templates but they don't seem to be supported by the Visual Studio 2012 I'm running and I need this solution to be very general so I can't rely on them.
Here is my solution:
Strings.h
#ifndef _STRINGS_H_
#define _STRINGS_H_
#include <string>
// tivn's empty string in the header file
extern const std::string _;
// My own version of .c_str() which produces a copy of the contents of the string input
const char* ToCString(std::string input);
#endif
Strings.cpp
#include "Strings.h"
const std::string str = "";
const char* ToCString(std::string input)
{
char* result = new char[input.length()+1];
strcpy_s(result, input.length()+1, input.c_str());
return result;
}
Usage
m_someMemberConstChar = ToCString(_ + "Hello, world! " + someDynamicValue);
I think this is pretty neat and works in most cases. Thank you everyone for helping me with this.
As of C++20, fmtlib has made its way into the ISO standard but, even on older iterations, you can still download and use it.
It gives similar capabilities as Python's str.format()(a), and your "ugly strings" example then becomes a relatively simple:
#include <fmt/format.h>
// Later on, where code is allowed (inside a function for example) ...
SetReport(fmt::format("Failure in {}: foobar was {}\n", __FUNCTION__, foobar));
It's much like the printf() family but with extensibility and type safety built in.
(a) But, unfortunately, not its string interpolation feature (use of f-strings), which has the added advantage of putting the expressions in the string at the place where they're output, something like:
set_report(f"Failure in {__FUNCTION__}: foobar was {foobar}\n");
If fmtlib ever got that capability, I'd probably wet my pants in excitement :-)
Let
exp = ^[0-9!##$%^&*()_+-=[]{};':"\|,.<>/?\s]*$
be a regular expression that allows me to find all sequences of numbers with or without special characters.
by using exp I manage to extract all sequences of numbers that are greater than 5. But the number 98200 cannot be extracted. I am not using any limits to how long should the sequence of numbers be.
Source code:
#include <boost/regex.hpp>
#include iostream;
using namespace std;
int main()
{
string s = "16000";
string exp = ^[0-9!##$%^&*()_+-=[]{};':"\\|,.<>\\/?\\s]*$
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
In C#, you need to escape the ]. You don't need to escape [ {} () when they are inside a character class. Also, if you want to include the dash as an included character in the character class, it should be at the beginning or end of the list. The sequence that you have of +-= translates to [+,-./0123456789:;<=] which makes your regex redundant. Finally, because of the terminal quantifier, you are allowing matching of zero length strings. This may be what you want, but if not, consider the '+' quantifier.
What about simply
[^A-Za-z]+
with or without the ^ $ anchors at the beginning/end
Indiscriminately escaping everything works for me.. :)
string exp = "^[0-9\\!##\\$\\%\\^&*\\(\\)_\\+\\-=\\[\\]\\{\\};\\\':\\\"\\\\|,\\.<>\\/?\\s]*$";
Note the double backslash... I'm sure you can workout which of the characters in your list means anything special, and only escape those, as I don't have the time to lookup what has special meaning in this context, I escaped everything, and this works fine for a few of the cases I tested
16000 => returns 1 16A000 => returns 0 16#000 => returns 1
Which I'm guessing is what you want...
I have shifted the brackets to the front of the character class and therewith I get the output 1 for 98200 using the following code:
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98200";
string exp = "^[][0-9!##$%^&*()_+-={};':\"\\|,.<>\\/?\\s]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
EDIT: Note, that I used my experience with emacs regular
expressions. The info pages of emacs explain: "To include a ] in a
character set, you must make it the first character." I tried this
with boost::regexp and it worked. Later on when I had more time I read
in the boost manual
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
that this is not specified for the perl regular expression syntax.
The perl syntax is the standard setting for boost::regex. According to the
specification the comment by
https://stackoverflow.com/users/2872922/ron-rosenfeld is the best
answer.
In the following program I eliminate the character range which was incidentally encoded into your regular expression.
Testing shows that the bracket at the beginning of the character set is included into the character set. So it turns out that my statement was right even if it is not specified in the official manual of boost::regex.
Nevertheless, I suggest that https://stackoverflow.com/users/2872922/ron-rosenfeld inserts his comment as an answer and you mark it as the solution. This will help others reading this thread.
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98-[2]00";
string exp = "^[][0-9!##$%^&*()_+={};':\"|,.<>/?\\s-]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
I asked at http://lists.boost.org/boost-users/2013/12/80707.php
The answer of John Maddock (the author of the boost::regex library) is:
>I discovered that if one uses an closing bracket as the first character of
>a
>character class the character class includes this bracket.
>This works with the standard setting of boost::regex (i.e., perl-regular
>expressions) but it is not documented in the
>manual page
>
>http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/
>perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
>
>Is this an undocumented feature, a bug or did I misinterpret something in
>the manual?
It's a feature, both Perl and POSIX extended regular expression behave the
same way.
John.