Matching of strings with special characters - c++

I need to generate a string that can match another both containing special characters. I wrote what I thought would be a simple method, but so far nothing has given me a successful match.
I know that specials characters in c++ are preceded with a "\". Per example a single quote would be written as "\'".
string json_string(const string& incoming_str)
{
string str = "\\\"" + incoming_str + "\\\"";
return str;
}
And this is the string I have to compare to:
bool comp = json_string("hello world") == "\"hello world\"";
I can see in the cout stream that in fact I'm generating the string as needed but the comparison still gives a false value.
What am I missing? Any help would be appreciated.

One way is to filter one string and compare this filtered string. For example:
#include <iostream>
#include <algorithm>
using namespace std;
std::string filterBy(std::string unfiltered, std::string specialChars)
{
std::string filtered;
std::copy_if(unfiltered.begin(), unfiltered.end(),
std::back_inserter(filtered), [&specialChars](char c){return specialChars.find(c) == -1;});
return filtered;
}
int main() {
std::string specialChars = "\"";
std::string string1 = "test";
std::string string2 = "\"test\"";
std::cout << (string1 == filterBy(string2, specialChars) ? "match" : "no match");
return 0;
}
Output is match. This code also works if you add an arbitrary number of characters to specialChars.
If both strings contain special characters, you can also put string1 through the filterBy function. Then, something like:
"\"hello \" world \"" == "\"hello world "
will also match.
If the comparison is performance-critical, you might also have a comparison that uses two iterators, getting a comparison complexity of log(N+M), where N and M are the sizes of the two strings, respectively.

bool comp = json_string("hello world") == "\"hello world\"";
This will definitely yield false. You are creating string \"hello world\" by json_string("hello world") but comparing it to "hello world"
The problem is here:
string str = "\\\"" + incoming_str + "\\\"";
In your first string literal of str, the first character backlash that you’re assuming to be treated like escape character is not actually being treated an escape character, rather just a backslash in your string literal. You do the same in your last string literal.
Do this:
string str = "\"" + incoming_str + "\"";

In C++ string literals are delimited by quotes.
Then the problem arises: How can I define a string literal that does itself contain quotes? In Python (for comparison), this can get easy (but there are other drawbacks with this approach not of interest here): 'a string with " (quote)'.
C++ doesn't have this alternative string representation1, instead, you are limited to using escape sequences (which are available in Python, too – just for completeness...): Within a string (or character) literal (but nowhere else!), the sequence \" will be replaced by a single quote in the resulting string.
So "\"hello world\"" defined as character array would be:
{ '"', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '"', 0 };
Note that now the escape character is not necessary...
Within your json_string function, you append additional backslashes, though:
"\\\""
{ '\', '"', 0 }
//^^^
Note that I wrote '\' just for illustration! How would you define single quote? By escaping again! '\'' – but now you need to escape the escape character, too, so a single backslash actually needs to be written as '\\' here (wheras in comparison, you don't have to escape the single quote in a string literal: "i am 'singly quoted'" – just as you didn't have to escape the double quote in the character literal).
As JSON uses double quotes for strings, too, you'd most likely want to change your function:
return "\"" + incoming_str + "\"";
or even much simpler:
return '"' + incoming_str + '"';
Now
json_string("hello world") == "\"hello world\""
would yield true...
1 Side note (stolen from answer deleted in the meanwhile): Since C++11, there are raw string literals, too. Using these, you don't have to escape either.

Related

C ++ about char and string

what is char ctemp = ' '; and string stemp = ""; means? when they put ' ' and " " inside without writing anything inside? Help please! Will appreciate who answer it.
Single quotes (') indicate a character literal: a single character. Double quotes (") denote a string literal, i.e: an array of characters.
' ' is a single space character, while " " is a single space character followed by a null terminator, as is customary for C-style strings.
Character literals are directly assignable to char variables.
The type of a string literal is const char[N], where N is the length of the literal, including the null terminator. In C and C++, a static array decays to (is implicitly convertible to) a pointer to the first element, and std::string is constructible from a const char * pointer (see constructor (5)), which in C usually means a pointer to an array of characters terminated by a null terminator.
The char ctemp = ' ' will put the value ' ' (32 in ASCII decimal) inside the ctemp variable.
The string stemp = ""; will create an empty string in stemp.
Here
char ctemp = ' ';
you are assigning a whitespace character ' ' to ctemp.
Here
string stemp = "";
the initializer "" creates a empty string.
' ' is the space character. "" is an empty string. " " is a string that contains only the space character.
Note that a statement like string stemp = "" implicitly invokes the string(char const *) constructor to create a new string instance from a char const * pointer.
the first one means a "whitespace" like when you write something and need to divide the words with the space key. That empty space is still part of the string and so you can say your char is only that empty space.
The second one is of type string but it is even less than a white space. It is a completely empty string.
string is array(collection) of char
ctemp = ' '
mean whitespace character
stemp = ""
mean empty string no character in string
you can put ' ' to char variable.
you can put " " to array of char.
In C++ the single quote is used to identify the single character, and double quotes are used for string literals. A string literal “x” is a string, it is containing character ‘x’ and a null terminator ‘\0’. So “x” is two-character array in this case.
Some Examples:
string s = "" ; => empty string
char s =' ' ; => space (you should have only one character inside the single quotes)
string s = " " ; => space followed by '\0' character (two character array)
Whitespace character and empty string. You can see the string as a sequence of characters, but they are two different types

c++11/regex - search for exact string, escape [duplicate]

This question already has answers here:
std::regex escape special characters for use in regex
(3 answers)
Closed 6 years ago.
Say you have a string which is provided by the user. It can contain any kind of character. Examples are:
std::string s1{"hello world");
std::string s1{".*");
std::string s1{"*{}97(}{.}}\\testing___just a --%#$%# literal%$#%^"};
...
Now I want to search in some text for occurrences of >> followed by the input string s1 followed by <<. For this, I have the following code:
std::string input; // the input text
std::regex regex{">> " + s1 + " <<"};
if (std::regex_match(input, regex)) {
// add logic here
}
This works fine if s1 did not contain any special characters. However, if s1 had some special characters, which are recognized by the regex engine, it doesn't work.
How can I escape s1 such that std::regex considers it as a literal, and therefore does not interpret s1? In other words, the regex should be:
std::regex regex{">> " + ESCAPE(s1) + " <<"};
Is there a function like ESCAPE() in std?
important I simplified my question. In my real case, the regex is much more complex. As I am only having troubles with the fact the s1 is interpreted, I left these details out.
You will have to escape all special characters in the string with \. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex.
// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };
std::string input = ">> "+ s1 +" <<";
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );
// "sanitized" can now safely be used in another expression

How to replace/remove a character in a character buffer?

I am trying to modify someone's code which uses this line:
out.write(&vecBuffer[0], x.length());
However, I want to modify the buffer beforehand so it removes any bad characters I don't want to be output. For example if the buffer is "Test%string" and I want to get rid of %, I want to change the buffer to "Test string" or "Teststring" whichever is easier.
std::replace will allow replacing one specific character with
another, e.g. '%' with ' '. Just call it normally:
std::replace( vecBuffer.begin(), vecBuffer.end(), '%', ' ' );
Replace the '%' with a predicate object, call replace_if,
and you can replace any character for which the predicate
object returns true. But always with the same character. For
more flexibility, there's std::transform, which you pass
a function which takes a char, and returns a char; it will
be called on each character in the buffer.
Alternatively, you can do something like:
vecBuffer.erase(
std::remove( vecBuffer.begin(), vecBuffer.end(), '%' ).
vecBuffer.end() );
To remove the characters. Here too, you can replace remove
with remove_if, and use a predicate, which may match many
different characters.
The simplest library you can use is probably the Boost String Algorithms library.
boost::replace_all(buffer, "%", "");
will replace all occurrences of % by nothing, in place. You could specify " " as a replacement, or even "REPLACEMENT", as suits you.
std::string str("Test string");
std::replace_if(str.begin(), str.end(), boost::is_any_of(" "), '');
std::cout << str << '\n';
You do not need to use the boost library. The easiest way is to replace the % character with a space, using std::replace() from the <algorithm> header:
std::replace(vecBuffer.begin(), vecBuffer.end(), '%', ' ');
I assume that vecBuffer, as its name implies, is an std::vector. If it's actually a plain array (or pointer), then you would do:
std::replace(vecBuffer, vecBuffer + SIZE_OF_BUFFER, '%', ' ');
SIZE_OF_BUFFER should be the size of the array (or the amount of characters in the array you want to process, if you don't want to convert the whole buffer.)
Assuming you have a function
bool goodChar( char c );
That returns true for characters you are approved of and false otherwise,
then how about
void fixBuf( char* buf, unsigned int len ) {
unsigned int co = 0;
for ( unsigned int cb = 0 ; cb < len ; cb++ ) {
if goodChar( buf[cb] ) {
buf[co] = buf[cb];
co++;
}
}
}

How do I add a backslash after every character in a string?

I need to transform a literal filepath (C:/example.txt) to one that is compatible with the various WinAPI Registry functions (C://example.txt) and I have no idea on how to go about doing it.
I've broken it down to having to add a backslash after a certain character (/ in this case) but i'm completely stuck after that.
Guidance and Code Examples will be greatly appreciated.
I'm using C++ and VS2012.
In C++, strings are made up of individual characters, like "foo". Strings can be composed of printable characters, such as the letters of the alphabet, or non-printable characters, such as the enter key or other control characters.
You cannot type one of these non-printable characters in the normal way when populating a string. For example, if you want a string that contains "foo" then a tab, and then "bar", you can't create this by typing:
fooTABbar
because this will simply insert that many spaces -- it won't actually insert the TAB character.
You can specify these non-printable characters by "escaping" them out. This is done by inserting a back slash character (\) followed by the character's code. In the case of the string above TAB is represented by the escape sequence \t, so you would write: "foo\tbar".
The character \ is not itself a non-printable character, but C++ (and C) recognize it to be special -- it always denotes the beginning of an escape sequence. To include the character "\" in a string, it has to itself be escaped, with \\.
So in C++ if you want a string that contains:
c:\windows\foo\bar
You code this using escape sequences:
string s = "c:\\windows\\foo\\bar"
\\ is not two chars, is one char:
for(size_t i = 0, sz = sPath.size() ; i < sz ; i++)
if(sPath[i]=='/') sPath[i] = '\\';
But be aware that some APIs work with \ and some with /, so you need to check in which cases to use this replacement.
If replacing every occurrence of a forward slash with two backslashes is really what you want, then this should do the job:
size_t i = str.find('/');
while (i != string::npos)
{
string part1 = str.substr(0, i);
string part2 = str.substr(i + 1);
str = part1 + R"(\\)" + part2; // Use "\\\\" instead of R"(\\)" if your compiler doesn't support C++11's raw string literals
i = str.find('/', i + 1);
}
EDIT:
P.S. If I misunderstood the question and your intention is actually to replace every occurrence of a forward slash with just one backslash, then there is a simpler and more efficient solution (as #RemyLebeau points out in a comment):
size_t i = str.find('/');
while (i != string::npos)
{
str[i] = '\\';
i = str.find('/', i + 1);
}
Or, even better:
std::replace_if(str.begin(), str.end(), [] (char c) { return (c == '/'); }, '\\');

Can scanf/sscanf deal with escaped characters?

int main()
{
char* a = " 'Fools\' day' ";
char* b[64];
sscanf(a, " '%[^']s ", b);
printf ("%s", b);
}
--> puts "Fools" in b
Obviously, I want to have "Fools' day" in b. Can I tell sscanf() not to consider escaped apostrophes as the end of the character sequence?
Thanks!
No. Those functions just read plain old characters. They don't interpret the contents according to any escaping rules because there's nothing to escape from — quotation marks, apostrophes, and backslashes aren't special in the input string.
You'll have to use something else to parse your string. You can write a little state machine to read the string one character at a time, keeping track of whether the previous character was a backslash. (Don't just scan to the next apostrophe and then look one character backward; if you're allowed to escape backslashes as well as apostrophes, then you could end up re-scanning all the way back to the start of the string to see whether you have an odd or even number of escape characters. Always parse strings forward, not backward.)
Replace
char* a = " 'Fools\' day' ";
with
char* a = " 'Fools' day' ";
The ' character isn't special inside a C string (although it is special within a single char). So there is not need to escape it.
Also, if all you want is "Fools' day", why put the extra 's at the start and end? Maybe you are confusing C strings with those in some other language?
Edit:
As Rob Kennedy's comment says, I was assuming you are supplying the string yourself. Otherwise, see Rob's answer.
Why on earth would you write such a thing, instead of using std::string? Since your question is tagged C++.
int main(int argc, char* argv[])
{
std::string a = " 'Fools' day' ";
std::string b(a.begin() + 2, std::find(a.begin() + 2, a.end(), ' '));
std::cout << b;
std::cin.get();
}
Edit: Oh wait a second, you want to read a string within a string? Just use escaped double quotes, e.g.
int main(int argc, char* argv[]) {
std::string a = " \"Fool's day\" ";
auto it = std::find(a.begin(), a.end(), '"');
std::string b(it, std::find(it, a.end(), '"');
std::cout << b;
}
If the user put the string in, they won't have to escape single quotes, although they would have to escape double quotes, and you'd have to make your own system for that.