How to use regex_replace() - c++

I need to insert a backslash before certain special characters like(',",\,?) when they are present in a string.
I don't want to use boost or any other string functions. Preferably algorithms of c++.
#include <stdio.h>
#include <regex>
#include <bits/stdc++.h>
int main(){
std::string str;
std::cout <<"Enter the string : ";
std::getline(std::cin, str);
str=std::regex_replace(str, std::regex("\\"), "\\\\");
str=std::regex_replace(str, std::regex("\'"), "\\\'");
str=std::regex_replace(str, std::regex("\?"), "\\\?");
str=std::regex_replace(str, std::regex("\""), "\\\"");
std::cout<< str<<std::endl;
}
input: testing\"input"?
output:testing\\\"input\"\?
Error message:
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error

This can be done in a very simple approach. You need to look up more documentation on regex. Without special flags it will use the std::ECMAScript syntax.
You can put all your search characters in a character class. So in [] brackets. Example:
R"(['"\?])"
Then, for the replace string, you need to read about std::regex_replace. In the "fmt"-string, you can use special characters for back referencing.
For example "$&" will give you a copy the complete match.
With that you program will be as simpe as
#include <iostream>
#include <regex>
int main()
{
std::string text{R"(one 'two' ?three? "four" \five\)"};
std::cout << std::regex_replace(text, std::regex(R"(['"\?])"), R"(\$&)") << "\n";
return 0;
}
The raw string R"(some_raw_string)" will help you with the somehow unreadable escape character orgies.

I need to insert a backslash before certain special characters like(',",\,?) when they are present in a string.
Ok sure, so the regex_replace function will definitely do that for you. The trap to watch out for in this case is literal escaping and the interpretation of special characters.
The first level here is the special characters in C++ for string literals. This mainly concerns the double-quote character to start and end string literals, and the backslash character used to escape special characters, or to encode non-alphanumeric characters.
The second level is the special characters as far as the regular expression engine is concerned, which has its own regular expression grammar. This is more complex than the string literals in the language.
So if you want to encode a special character for a regular string literal, you need to escape it once. If you want to encode a special character to pass it literally to the regex compiler, you need to escape it twice.
For example, if you type:
"abc\n"
then the backslash-n will be interpreted as a linefeed character, so gives the byte sequence (including null-termination):
{ 0x61, 0x62, 0x63, 0x0a, 0x00 }
So if you want the backslash to be interpreted literally, you have to escape it, thus:
"abc\\n"
which results in:
{ 0x61, 0x62, 0x63, 0x5c, 0x6e, 0x00 }
If you just want to print this string, you will get the expected results. But if you pass this string to the regex engine, it will see the fourth byte is the backslash and treat it specially, escaping or interpreting the following character. If this is not valid, it throws an exception - which is what you're seeing.
When dealing with regular expressions, I think it's easier to work with raw strings. This is a special way you can write a literal string so the compiler does no interpretation of the string contents. This means you can pass strings to the regex engine directly, and essentially skip to the second level.
This is a new feature of C++11, where you prefix the string with a capital-R and then enclose the string contests with parentheses and an optional delimiter string (which simply needs to be unique).
I have tweaked your program to work the way you describe, using raw strings:
//
// Build with minimum C++ language level of C++11, eg:
//
// c++ --std=c++11 -o ans ans.cpp
#include <iostream>
#include <regex>
int main (int argc, char* argv[])
{
std::string str;
std::cout << "Enter the string : ";
std::getline(std::cin, str);
str = std::regex_replace(str, std::regex(R"(\\)"), R"(\\)");
str = std::regex_replace(str, std::regex(R"(')"), R"(\')");
str = std::regex_replace(str, std::regex(R"(\?)"), R"(\?)");
str = std::regex_replace(str, std::regex(R"(\")"), R"(\")");
std::cout << str << std::endl;
return 0;
}
Here's a sample session, exercising all the symbols:
Enter the string : one 'two' ?three? "four" \five\
one \'two\' \?three\? \"four\" \\five\\

Related

How to remove duplicate phrases that are separated by being inside double quotes or separated by a comma in a file with c++

I use this function to remove duplicate words in a file
But I need it to remove duplicate expressions instead
for example What the function is currently doing
If I have the expression
"Hello World"
"beautiful world"
The function will remove the word "world" from both expressions
And I need this function to replace the entire expression only if it is found more than once in the file
for example
If I have the expressions
"Hello World"
"Hello World"
"beautiful world"
"beautiful world"
The function will remove the expression "Hello world" and "beautiful world" and leave only one from each of them but it will not touch the word "world" because the function will treat everything that is within the quotes as one word
This is the code I use now
#include <string>
#include <sstream>
#include <iostream>
#include <unordered_set>
void Remove_Duplicate_Words(string str)
{
ofstream Write_to_file{ "test.txt" };
// Used to split string around spaces.
istringstream ss(str);
// To store individual visited words
unordered_set<string> hsh;
// Traverse through all words
do
{
string word;
ss >> word;
// If current word is not seen before.
while (hsh.find(word) == hsh.end()) {
cout << word << '\n';
Write_to_file << word << endl; // write to outfile
hsh.insert(word);
}
} while (ss);
}
int main()
{
ifstream Read_from_file{ "test.txt" };
string file_content{ ist {Read_from_file}, ist{} };
Remove_Duplicate_Words(file_content);
return 0;
}
How do I remove duplicate expressions instead of duplicate words?
Unfortunately my knowledge on this subject is very basic and usually what I do is try all kinds of things until I succeed. I tried to do it here too and I just can not figure out how to do it
Any help would be greatly appreciated
Requires a little bit of String parsing.
Your example works by reading tokens, which are similar to words (but not exactly). For your problem, the token becomes word OR quoted string. The more complex your definition of tokens, the harder the problem becomes. Try starting by thinking of tokens as either words or quoted strings on the same line. A quoted string across lines might be a little more complex.
Here's a similar SO question to get you started: Reading quoted string in c++. You need to do something similar, but instead of having set positions, your quoted string can occur anywhere in the line. So you read tokens something like this:
Read next word token (as you're doing now)
If last read token is quote character ("), read till next (") as a single token
Check on the set and output token only if it isn't already there (if token is quoted, don't forget to output the quotes)
Insert token into set.
Repeat till EOF
Hope that helps

Problem removing backslash characters on std::string

I'm trying to execute CMD commands which I'm getting deserializing a JSON message.
When I deserialize message, I store the value in a std::string variable whose value is "tzutil /s \"Romance Standard Time_dstoff\"":
I would like to remove backslash characters ('\') when I receive commands with floating quotes parameters (e.g."tzutil /s "Romance Standard Time_dstoff"").
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
system(command.c_str());
Are there any way to do it?
I will appreciate any kind of help.
If you wish to remove all occurrences the character then you may use
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), char_to_remove), str.end());
If you wish to replace them with another character then try
#include <algorithm>
std::replace(str.begin(), str.end(), old_char, new_char);
Here is a function I made in C++ for one of my own projects for replacing sub-strings.
std::string
Replace(std::string str,
const std::string& oldStr,
const std::string& newStr)
{
size_t index = str.find(oldStr);
while(index != str.npos)
{
str = str.substr(0, index) +
newStr + str.substr(index + oldStr.size());
index = str.find(oldStr, index + newStr.size());
}
return str;
}
int main(){
std::string command = GetCommandFromJsonSource();
command = Replace(command, "\\\"", "\""); // unescape only double quotes
}
Although the source code of your program does contain, the string represented by the literal doesn't contain any backslashes, as demonstrated by the following example:
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
std::cout << command;
// output:
tzutil /s "Romance Standard Time_dstoff"
As such, there is nothing to remove from the string.
Backslash is an escape character. \" is an escape sequence that represents a single character, the double quote. It is a way to type a double quote character within a string literal without that quote being interpreted as the end of the string instead.
To write a backslash into a string literal, you can by escaping it with a backslash. The following string does contain backslashes: "tzutil /s \\"Romance Standard Time_dstoff\\"". In this case, removing all backslashes can be done like so:
command.erase(std::remove(command.begin(), command.end(), '\\'), command.end());
However, simply removing all instances of the character might not be sensible. If your string contains escape sequences, what you probably should want to do instead is to unescape them. This is somewhat more complicated. You wouldn't want to remove all backslashes, but instead replace \" with " and \\ with \ and \n with a newline and so on.
You can use std::quoted to convert from and to a string literal.
#include <iomanip> // -> std::quoted
#include <iostream>
#include <sstream>
int main() {
std::istringstream s("\"Hello world\\n\"");
std::string hello;
s >> std::quoted(hello);
std::cout << std::quoted(s) << ": " << s;
}

unchecked exception while running regex- get file name without extention from file path

I have this simple program
string str = "D:\Praxisphase 1 project\test\Brainstorming.docx";
regex ex("[^\\]+(?=\.docx$)");
if (regex_match(str, ex)){
cout << "match found"<< endl;
}
expecting the result to be true, my regex is working since I have tried it online, but when trying to run in C++ , the app throws unchecked exception.
First of all, use raw string literals when defining regex to avoid issues with backslashes (the \. is not a valid escape sequence, you need "\\." or R"(\.)"). Second, regex_match requires a full string match, thus, use regex_search.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str = R"(D:\Praxisphase 1 project\test\Brainstorming.docx)";
// OR
// string str = R"D:\\Praxisphase 1 project\\test\\Brainstorming.docx";
regex ex(R"([^\\]+(?=\.docx$))");
if (regex_search(str, ex)){
cout << "match found"<< endl;
}
return 0;
}
See the C++ demo
Note that R"([^\\]+(?=\.docx$))" = "[^\\\\]+(?=\\.docx$)", the \ in the first are literal backslashes (and you need two backslashes in a regex pattern to match a \ symbol), and in the second, the 4 backslashes are necessary to declare 2 literal backslashes that will match a single \ in the input text.

Passing string Argument, read from a file

I am Trying to find a regex pattern in a text. Let's call the text: the original Text.
The following is the code for the patternFinder() program:
vector <pair <long,long> >CaddressParser::patternFinder(string pattern)
{
string m_text1=m_text;
int begin =0;
int end=0;
smatch m;
regex e (pattern);
vector<pair<long, long>> indices;
if(std::regex_search(m_text1,m,e))
{
begin=m.position();
end=m.position()+m.length()-1;
m_text1 = m.suffix().str();
indices.push_back(make_pair(begin,end));
while(end<m_length&&std::regex_search(m_text1,m,e))
{
begin=end+m.prefix().length()+1;
end=end+m.prefix().length()+m.length();
indices.push_back(make_pair(begin,end));
m_text1 = m.suffix().str();
}
return indices;
}
else return indices;
}
I have the following regular Expression:
"\\b[0-9]{3}\\b.*(Street).*[0-9]{5}"
and the Original text mentioned at the beginning is:
way 10.01.2013 700 West Market Street OH 35611 asdh
and only the bold text is supposed to match the regex.
Now the Problem is when the regex is passed as a string which has been read from a text file the patternFinder() does not recognize the pattern.Though when a direct string (which is identical to the one in the text file) is passed as an argument to patternFinder() it works.
Where could this problem coming from?
The following is the code of my fileReader() function which I don't think is very relevant to mention:
string CaddressParser::fileReader(string fileName)
{
string text;
FILE *fin;
fin=fopen(fileName.c_str(),"rb" );
int length=getLength(fileName);
char *buffer= new char[length];
fread(buffer,length,1,fin);
buffer[length]='\0';
text =string(buffer);
fclose(fin);
return text;
}
Note that there is an apparent syntactic difference when writing the regex directly into C++ code and when reading it from a file.
In C++, the backslash character has escape semantics, so to put a literal backslash into a string literal, you must escape it itself with a backslash. So to get a a two-character string \b in memory, you have to use a string literal "\\b". The two backslashes are interpreted by the C++ compiler as a single backslash character to be stored in the literal. In other words, strlen("\\b") is 2.
On the other hand, contents of a text file are read by your program and never processed by the C++ compiler. So to get the two characters \ and b into a string read from a file, write just the two-character string \b into the file.
The problem is probably in the function reading the string from the file. Print the string read and make sure the regular expression is being read correctly.
The problem is in these 2 lines
buffer[length]='\0';
text =string(buffer);
buffer[length] should have been buffer[length - 1]

using \ in a string as literal instead of an escape

bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
So the problem right now is: Xcode is not reading in the '\', when I was debugging in stringMatch function, expr appears only to be 'asb' instead of the literal a\sb'.
And Xcode is spitting out an warning at the line:
string a = "a\sb" : Unknown escape sequence
Edit: I have already tried using "a\\sb", it reads in as "a\\sb" as literal.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
C and C++ deal with backslashes as escape sequences by default. You got to tell C to not use your backslash as an escape sequence by adding an extra backslash to your string.
These are the common escape sequences:
\a - Bell(beep)
\b - Backspace
\f - Formfeed
\n - New line
\r - Carriage Return
\t - Horizontal Tab
\\ - Backslash
\' - Single Quotation Mark
\" - Double Quatation Mark
\ooo - Octal Representation
\xdd - Hexadecimal Representaion
EDIT: Xcode is behaving abnormally on your machine. So I can suggest you this.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a" "\x5C" "sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
Don't worry about the spaces in the string a declaration, Xcode concatenates strings separated with a space.
EDIT 2: Indeed Xcode is reading your "a\\b" literally, that's how it deals with escaped backslashes. When you'll output string a = "a\\sb" to console, you'll see, a\sb. But when you'll pass string a between methods as argument or as a private member then it will take the extra backslash literally. You have to design your code considering this fact so that it ignores the extra backslash. It's upto you how you handle the string.
EDIT 3: Edit 1 is your optimal answer here, but here's another one.
Add code in your stringMatch() method to replace double backslashes with single backslash.
You just need to add this extra line at the very start of the function:
expr=[expr stringByReplacingOccurrencesOfString:#"\\\\" withString:#"\\"];
This should solve the double backslash problem.
EDIT 4:
Some people think Edit 3 is ObjectiveC and thus is not optimal, so another option in ObjectiveC++.
void searchAndReplace(std::string& value, std::string const& search,std::string const& replace)
{
std::string::size_type next;
for(next = value.find(search); // Try and find the first match
next != std::string::npos; // next is npos if nothing was found
next = value.find(search,next) // search for the next match starting after
// the last match that was found.
)
{
// Inside the loop. So we found a match.
value.replace(next,search.length(),replace); // Do the replacement.
next += replace.length(); // Move to just after the replace
// This is the point were we start
// the next search from.
}
}
EDIT 5: If you change the const char * in stringMatch() to 'string` it will be less complex for you.
expr.replace(/*size_t*/ pos1, /*size_t*/ n1, /*const string&*/ str );
EDIT 6: From C++11 on, there exists something like raw string literals.
This means you don't have to escape, instead, you can write the following:
string a = R"raw(a\sb)raw";
Note that the raw in the string can be replaced by any delimiter of your choosing. This for the case you want to use a sub string like )raw in the actual string. Using these raw string literals mainly make sense when you have to escape characters a lot, like in combination with std::regex.
P.S. You have all the answers now, so it's upto you which one you implement that gives you the best results.
Xcode is spitting out that warning because it is interpreting \s in "a\sb" as an escape sequence, but \s is not a valid escape sequence. It gets replaced with just s so the string becomes "asb".
Escaping the backslash like "a\\sb" is the correct solution. If this somehow didn't work for you please post more details on that.
Here's an example.
#include <iostream>
#include <string>
int main() {
std::string a = "a\\sb";
std::cout << a.size() << ' ' << a << '\n';
}
The output of this program looks like:
If you get different output please post it. Also please post exactly what problem you observed when you tried "a\\sb" earlier.
Regexs can be a pain in C++ because backslashes have to be escaped this way. C++11 has raw strings that don't allow any kind of escaping so that escaping the backslash is unnecessary: R"(a\sb)".