using \ in a string as literal instead of an escape - c++

bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
So the problem right now is: Xcode is not reading in the '\', when I was debugging in stringMatch function, expr appears only to be 'asb' instead of the literal a\sb'.
And Xcode is spitting out an warning at the line:
string a = "a\sb" : Unknown escape sequence
Edit: I have already tried using "a\\sb", it reads in as "a\\sb" as literal.

bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
C and C++ deal with backslashes as escape sequences by default. You got to tell C to not use your backslash as an escape sequence by adding an extra backslash to your string.
These are the common escape sequences:
\a - Bell(beep)
\b - Backspace
\f - Formfeed
\n - New line
\r - Carriage Return
\t - Horizontal Tab
\\ - Backslash
\' - Single Quotation Mark
\" - Double Quatation Mark
\ooo - Octal Representation
\xdd - Hexadecimal Representaion
EDIT: Xcode is behaving abnormally on your machine. So I can suggest you this.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a" "\x5C" "sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
Don't worry about the spaces in the string a declaration, Xcode concatenates strings separated with a space.
EDIT 2: Indeed Xcode is reading your "a\\b" literally, that's how it deals with escaped backslashes. When you'll output string a = "a\\sb" to console, you'll see, a\sb. But when you'll pass string a between methods as argument or as a private member then it will take the extra backslash literally. You have to design your code considering this fact so that it ignores the extra backslash. It's upto you how you handle the string.
EDIT 3: Edit 1 is your optimal answer here, but here's another one.
Add code in your stringMatch() method to replace double backslashes with single backslash.
You just need to add this extra line at the very start of the function:
expr=[expr stringByReplacingOccurrencesOfString:#"\\\\" withString:#"\\"];
This should solve the double backslash problem.
EDIT 4:
Some people think Edit 3 is ObjectiveC and thus is not optimal, so another option in ObjectiveC++.
void searchAndReplace(std::string& value, std::string const& search,std::string const& replace)
{
std::string::size_type next;
for(next = value.find(search); // Try and find the first match
next != std::string::npos; // next is npos if nothing was found
next = value.find(search,next) // search for the next match starting after
// the last match that was found.
)
{
// Inside the loop. So we found a match.
value.replace(next,search.length(),replace); // Do the replacement.
next += replace.length(); // Move to just after the replace
// This is the point were we start
// the next search from.
}
}
EDIT 5: If you change the const char * in stringMatch() to 'string` it will be less complex for you.
expr.replace(/*size_t*/ pos1, /*size_t*/ n1, /*const string&*/ str );
EDIT 6: From C++11 on, there exists something like raw string literals.
This means you don't have to escape, instead, you can write the following:
string a = R"raw(a\sb)raw";
Note that the raw in the string can be replaced by any delimiter of your choosing. This for the case you want to use a sub string like )raw in the actual string. Using these raw string literals mainly make sense when you have to escape characters a lot, like in combination with std::regex.
P.S. You have all the answers now, so it's upto you which one you implement that gives you the best results.

Xcode is spitting out that warning because it is interpreting \s in "a\sb" as an escape sequence, but \s is not a valid escape sequence. It gets replaced with just s so the string becomes "asb".
Escaping the backslash like "a\\sb" is the correct solution. If this somehow didn't work for you please post more details on that.
Here's an example.
#include <iostream>
#include <string>
int main() {
std::string a = "a\\sb";
std::cout << a.size() << ' ' << a << '\n';
}
The output of this program looks like:
If you get different output please post it. Also please post exactly what problem you observed when you tried "a\\sb" earlier.
Regexs can be a pain in C++ because backslashes have to be escaped this way. C++11 has raw strings that don't allow any kind of escaping so that escaping the backslash is unnecessary: R"(a\sb)".

Related

How to match *anything* until a delimiter is encountered in RE-flex lexer?

I was using RE/flex lexer for my project. In that, I want to match the syntax corresponding to ('*)".*?"\1. For eg, it should match "foo", ''"bar"'', but should not match ''"baz"'.
But RE/flex matcher doesn't work with lookaheads, lookbehinds and backreferences. So, is there a correct way to match this using reflex matcher? The nearest I could achieve was the following lexer:
%x STRING
%%
'*\" {
textLen = 0uz;
quoteLen = size();
start(STRING);
}
<STRING> {
\"'* {
if (size() - textLen < quoteLen) goto MORE_TEXT;
matcher().less(textLen + quoteLen);
start(INITIAL);
res = std::string{matcher().begin(), textLen};
return TokenKind::STR;
}
[^"]* {
MORE_TEXT:
textLen = size();
matcher().more();
}
<<EOF>> {
std::cerr << "Lexical error: Unterminated 'STRING' \n";
return TokenKind::ERR;
}
}
%%
The meta-character . in RE-flex matches any character, be it valid or invalid UTF8 sequence. Whereas the inverted character class - [^...] - matches only valid UTF8 sequences that are absent in the character class.
So, the problem with above lexer is that, it matches only valid UTF8 sequences inside strings. Whereas, I want it to match anything inside string until the delimiter.
I considered three workarounds. But all three seems to have some issues.
Use skip(). This skips all characters till it reaches delimiter. But in the process, it consumes all the string content. I don't get to keep them.
Use .*?/\" instead of [^"]*. This works for every properly terminated strings. But gets the lexer jammed if the string is not terminated.
Use consume string content character by character using .. Since . is synchronizing, it can even match invalid UTF8 sequences. But this approach feels way too slow.
So is there any better approach for solving this?
I didn't found any proper way to solve the problem. But I just did a dirty hack with 2nd workaround mentioned above.
Instead of RE/flex generated scanner loop, I added a custom loop inside string begin rule. In there, instead of failing with scanner jammed error, I am flushing remaining text and displaying unterminated string error message.
%x STRING
%%
'*\" {
auto textLen = 0uz;
const auto quoteLen = size();
matcher().pattern(PATTERN_STRING);
while (true) {
switch (matcher().scan()) {
case 1:
if (size() - textLen < quoteLen) break;
matcher().less(textLen + quoteLen);
res = std::string{matcher().begin(), textLen};
return TokenKind::STR;
case 0:
if (!matcher().at_end()) matcher().set_end(true);
std::cerr << "Lexical error: Unterminated 'STRING' \n";
return TokenKind::ERR;
default:
std::unreachable();
case 2:;
}
textLen = size();
matcher().more();
}
}
<STRING>{
\"'* |
.*?/\" |
<<EOF>> std::unreachable();
}
%%

C++ breaking string into new line after punctuation

I'm trying to learn strings and I've figured out how to replace as well as insert into an existing string. I have 3 strings at the moment which I've declared as constants, I've merged them into one string variable which puts them all one after eachother.
I've also changed every single occurance of "Hi" to "Bye" in those strings. My 3 strings bundled into a single one are as following:
"Hi! My name is xxxx! I would like to be on my own but I don't know how to, could you help me?"
I want it to display as:
Hi!
My name is xxxx!
I would like to be on my own but I don't know how to, could you help me?
As soon as a puncutation occurs I'd like to insert a line break "\n", using replace works but that means the punctuation will disappear, using insert will first insert the line break before the punctuation, and it won't continue to the next one which results in:
"Hi!
My name is xxxx! I would like to be on my own but I don't know how to, could you help me?"
I changed the code to only include dots to simplify it, once solved the same solution can be applied to any other part such as question marks or exclamation marks.
Any tips on how to fix this?
#include <iostream>
#include <string>
using namespace std;
string const Text0 = "Hi.";
string const Text1 = "My name is xxxx.";
string const Text2 = "I would like to be on my own but I don't know how to, could you help me.";
string const Text3 = "I would, but I don't know how to.";
string text = Text0 + Text1 + Text2 + Text3;
int main() {
while (text.find("I") != string::npos) {
text.replace(text.find("I"), 1, "J");
}
while (text.find("like") != string::npos) {
text.replace(text.find("like"), 4, "milk");
}
text.insert(text.find("."), "\n");
cout << text;
return 0;
}
You can create your own short function that will add newline after every punctuation sign.
For example:
void addNewLines(std::string *text)
{
for (int i = 0; i < text->length(); i++)
{
if ((*text)[i] == '!' || (*text)[i] == '?' || (*text)[i] == '.')
{
(*text)[i + 1] = '\n';
}
}
}
As you can see in this piece of code, in the for loop you are going from the first to the last character of the string, and after every punctuation sign you replace empty space with \n character.
I'm using pointers here to prevent copying of the string to the function, in case it is a huge string, but you could do it without pointers, that way syntax is a little bit cleaner.

How to use regex_replace()

I need to insert a backslash before certain special characters like(',",\,?) when they are present in a string.
I don't want to use boost or any other string functions. Preferably algorithms of c++.
#include <stdio.h>
#include <regex>
#include <bits/stdc++.h>
int main(){
std::string str;
std::cout <<"Enter the string : ";
std::getline(std::cin, str);
str=std::regex_replace(str, std::regex("\\"), "\\\\");
str=std::regex_replace(str, std::regex("\'"), "\\\'");
str=std::regex_replace(str, std::regex("\?"), "\\\?");
str=std::regex_replace(str, std::regex("\""), "\\\"");
std::cout<< str<<std::endl;
}
input: testing\"input"?
output:testing\\\"input\"\?
Error message:
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
This can be done in a very simple approach. You need to look up more documentation on regex. Without special flags it will use the std::ECMAScript syntax.
You can put all your search characters in a character class. So in [] brackets. Example:
R"(['"\?])"
Then, for the replace string, you need to read about std::regex_replace. In the "fmt"-string, you can use special characters for back referencing.
For example "$&" will give you a copy the complete match.
With that you program will be as simpe as
#include <iostream>
#include <regex>
int main()
{
std::string text{R"(one 'two' ?three? "four" \five\)"};
std::cout << std::regex_replace(text, std::regex(R"(['"\?])"), R"(\$&)") << "\n";
return 0;
}
The raw string R"(some_raw_string)" will help you with the somehow unreadable escape character orgies.
I need to insert a backslash before certain special characters like(',",\,?) when they are present in a string.
Ok sure, so the regex_replace function will definitely do that for you. The trap to watch out for in this case is literal escaping and the interpretation of special characters.
The first level here is the special characters in C++ for string literals. This mainly concerns the double-quote character to start and end string literals, and the backslash character used to escape special characters, or to encode non-alphanumeric characters.
The second level is the special characters as far as the regular expression engine is concerned, which has its own regular expression grammar. This is more complex than the string literals in the language.
So if you want to encode a special character for a regular string literal, you need to escape it once. If you want to encode a special character to pass it literally to the regex compiler, you need to escape it twice.
For example, if you type:
"abc\n"
then the backslash-n will be interpreted as a linefeed character, so gives the byte sequence (including null-termination):
{ 0x61, 0x62, 0x63, 0x0a, 0x00 }
So if you want the backslash to be interpreted literally, you have to escape it, thus:
"abc\\n"
which results in:
{ 0x61, 0x62, 0x63, 0x5c, 0x6e, 0x00 }
If you just want to print this string, you will get the expected results. But if you pass this string to the regex engine, it will see the fourth byte is the backslash and treat it specially, escaping or interpreting the following character. If this is not valid, it throws an exception - which is what you're seeing.
When dealing with regular expressions, I think it's easier to work with raw strings. This is a special way you can write a literal string so the compiler does no interpretation of the string contents. This means you can pass strings to the regex engine directly, and essentially skip to the second level.
This is a new feature of C++11, where you prefix the string with a capital-R and then enclose the string contests with parentheses and an optional delimiter string (which simply needs to be unique).
I have tweaked your program to work the way you describe, using raw strings:
//
// Build with minimum C++ language level of C++11, eg:
//
// c++ --std=c++11 -o ans ans.cpp
#include <iostream>
#include <regex>
int main (int argc, char* argv[])
{
std::string str;
std::cout << "Enter the string : ";
std::getline(std::cin, str);
str = std::regex_replace(str, std::regex(R"(\\)"), R"(\\)");
str = std::regex_replace(str, std::regex(R"(')"), R"(\')");
str = std::regex_replace(str, std::regex(R"(\?)"), R"(\?)");
str = std::regex_replace(str, std::regex(R"(\")"), R"(\")");
std::cout << str << std::endl;
return 0;
}
Here's a sample session, exercising all the symbols:
Enter the string : one 'two' ?three? "four" \five\
one \'two\' \?three\? \"four\" \\five\\

Problem removing backslash characters on std::string

I'm trying to execute CMD commands which I'm getting deserializing a JSON message.
When I deserialize message, I store the value in a std::string variable whose value is "tzutil /s \"Romance Standard Time_dstoff\"":
I would like to remove backslash characters ('\') when I receive commands with floating quotes parameters (e.g."tzutil /s "Romance Standard Time_dstoff"").
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
system(command.c_str());
Are there any way to do it?
I will appreciate any kind of help.
If you wish to remove all occurrences the character then you may use
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), char_to_remove), str.end());
If you wish to replace them with another character then try
#include <algorithm>
std::replace(str.begin(), str.end(), old_char, new_char);
Here is a function I made in C++ for one of my own projects for replacing sub-strings.
std::string
Replace(std::string str,
const std::string& oldStr,
const std::string& newStr)
{
size_t index = str.find(oldStr);
while(index != str.npos)
{
str = str.substr(0, index) +
newStr + str.substr(index + oldStr.size());
index = str.find(oldStr, index + newStr.size());
}
return str;
}
int main(){
std::string command = GetCommandFromJsonSource();
command = Replace(command, "\\\"", "\""); // unescape only double quotes
}
Although the source code of your program does contain, the string represented by the literal doesn't contain any backslashes, as demonstrated by the following example:
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
std::cout << command;
// output:
tzutil /s "Romance Standard Time_dstoff"
As such, there is nothing to remove from the string.
Backslash is an escape character. \" is an escape sequence that represents a single character, the double quote. It is a way to type a double quote character within a string literal without that quote being interpreted as the end of the string instead.
To write a backslash into a string literal, you can by escaping it with a backslash. The following string does contain backslashes: "tzutil /s \\"Romance Standard Time_dstoff\\"". In this case, removing all backslashes can be done like so:
command.erase(std::remove(command.begin(), command.end(), '\\'), command.end());
However, simply removing all instances of the character might not be sensible. If your string contains escape sequences, what you probably should want to do instead is to unescape them. This is somewhat more complicated. You wouldn't want to remove all backslashes, but instead replace \" with " and \\ with \ and \n with a newline and so on.
You can use std::quoted to convert from and to a string literal.
#include <iomanip> // -> std::quoted
#include <iostream>
#include <sstream>
int main() {
std::istringstream s("\"Hello world\\n\"");
std::string hello;
s >> std::quoted(hello);
std::cout << std::quoted(s) << ": " << s;
}

How to match "\n" in Poco::RegularExpression C++?

My current code is:
#include <iostream>
#include <Poco/Foundation.h>
#include <Poco/RegularExpression.h>
int main()
{
Poco::RegularExpression regex("[A-Z]+\s+[A-Z]+");
Poco::RegularExpression::MatchVec mvec;
constad std::string astring = "ABC\nDEFG";
int matches = regex.match(astring,0,mvec);
std::cout << "Hello World\n";
return 0;
}
The position of the '\n' in the string I am trying to match can be, a single space, multiple spaces, or new line(hence why I am using whitespace meta character).
The number of matches returned is zero. Is there a flag I need to set or something?
The problem is the scape sequence in your regex.
In this case you want to add a backslash (\) into the string astring, using the token \s, but in C/C++ or Java it must be writen as double \\. So, to fix your problem you must add another backslash:
Poco::RegularExpression regex("[A-Z]+\\s+[A-Z]+");
Here you can find the reference:
http://en.cppreference.com/w/cpp/language/escape
This should work
Poco::RegularExpression s ("\\s"); // White char
Poco::RegularExpression n ("\\n"); // New line
Poco::RegularExpression r ("\\r"); // Carrige return
Poco::RegularExpression t ("\\t"); // Tabulator