formatting a string which contains quotation marks - c++

I am having problem formatting a string which contains quotationmarks.
For example, I got this std::string: server/register?json={"id"="monkey"}
This string needs to have the four quotation marks replaced by \", because it will be used as a c_str() for another function.
How does one do this the best way on this string?
{"id"="monkey"}
EDIT: I need a solution which uses STL libraries only, preferably only with String.h. I have confirmed I need to replace " with \".
EDIT2: Nvm, found the bug in the framework

it is perfectly legal to have the '"' char in a C-string. So the short answer is that you need to do nothing. Escaping the quotes is only required when typing in the source code
std::string str("server/register?json={\"id\"=\"monkey\"}")
my_c_function(str.c_str());// Nothing to do here
However, in general if you want to replace a substring by an other, use boost string algorithms.
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main(int, char**)
{
std::string str = "Hello world";
boost::algorithm::replace_all(str, "o", "a"); //modifies str
std::string str2 = boost::algorithm::replace_all_copy(str, "ll", "xy"); //doesn't modify str
std::cout << str << " - " << str2 << std::endl;
}
// Displays : Hella warld - Hexya warld

If you std::string contains server/register?json={"id"="monkey"}, there's no need to replace anything, as it will already be correctly formatted.
The only place you would need this is if you hard-coded the string and assigned it manually. But then, you can just replace the quotes manually.

Related

C++ Regex always matching entire string

Whenever I use a regex function it matches the entire string for some reason.
#include <iostream>
#include <regex>
int main() {
std::string text = "This (is a) test";
std::regex pattern("\(.+\)");
std::cout << std::regex_replace(text, pattern, "isnt") << std::endl;
return 0;
}
Output: isnt
Your pattern unfortunately is not what it seems to be. Here is the problem.
Imagine for some reason you want to match tabs in with you regex. You might try this.
std::regex my_regex("\t");
This would work, but the string your std::regex class has seen is " ", not "\t". This is because of how C++ threats escaped characters. To pass literal "\t", you had to do the following.
std::regex my_regex("\\t");
So the correct syntax for your regex is.
std::regex pattern("\\(.+\\)");

How to remove duplicate phrases that are separated by being inside double quotes or separated by a comma in a file with c++

I use this function to remove duplicate words in a file
But I need it to remove duplicate expressions instead
for example What the function is currently doing
If I have the expression
"Hello World"
"beautiful world"
The function will remove the word "world" from both expressions
And I need this function to replace the entire expression only if it is found more than once in the file
for example
If I have the expressions
"Hello World"
"Hello World"
"beautiful world"
"beautiful world"
The function will remove the expression "Hello world" and "beautiful world" and leave only one from each of them but it will not touch the word "world" because the function will treat everything that is within the quotes as one word
This is the code I use now
#include <string>
#include <sstream>
#include <iostream>
#include <unordered_set>
void Remove_Duplicate_Words(string str)
{
ofstream Write_to_file{ "test.txt" };
// Used to split string around spaces.
istringstream ss(str);
// To store individual visited words
unordered_set<string> hsh;
// Traverse through all words
do
{
string word;
ss >> word;
// If current word is not seen before.
while (hsh.find(word) == hsh.end()) {
cout << word << '\n';
Write_to_file << word << endl; // write to outfile
hsh.insert(word);
}
} while (ss);
}
int main()
{
ifstream Read_from_file{ "test.txt" };
string file_content{ ist {Read_from_file}, ist{} };
Remove_Duplicate_Words(file_content);
return 0;
}
How do I remove duplicate expressions instead of duplicate words?
Unfortunately my knowledge on this subject is very basic and usually what I do is try all kinds of things until I succeed. I tried to do it here too and I just can not figure out how to do it
Any help would be greatly appreciated
Requires a little bit of String parsing.
Your example works by reading tokens, which are similar to words (but not exactly). For your problem, the token becomes word OR quoted string. The more complex your definition of tokens, the harder the problem becomes. Try starting by thinking of tokens as either words or quoted strings on the same line. A quoted string across lines might be a little more complex.
Here's a similar SO question to get you started: Reading quoted string in c++. You need to do something similar, but instead of having set positions, your quoted string can occur anywhere in the line. So you read tokens something like this:
Read next word token (as you're doing now)
If last read token is quote character ("), read till next (") as a single token
Check on the set and output token only if it isn't already there (if token is quoted, don't forget to output the quotes)
Insert token into set.
Repeat till EOF
Hope that helps

How to assign string a char array that starts from the middle of the array?

For example in the following code:
char name[20] = "James Johnson";
And I want to assign all the character starting after the white space to the end of the char array, so basically the string is like the following: (not initialize it but just show the idea)
string s = "Johnson";
Therefore, essentially, the string will only accept the last name. How can I do this?
i think you want like this..
string s="";
for(int i=strlen(name)-1;i>=0;i--)
{
if(name[i]==' ')break;
else s+=name[i];
}
reverse(s.begin(),s.end());
Need to
include<algorithm>
There's always more than one way to do it - it depends on exactly what you're asking.
You could either:
search for the position of the first space, and then point a char* at one-past-that position (look up strchr in <cstring>)
split the string into a list of sub-strings, where your split character is a space (look up strtok or boost split)
std::string has a whole arsenal of functions for string manipulation, and I recommend you use those.
You can find the first whitespace character using std::string::find_first_of, and split the string from there:
char name[20] = "James Johnson";
// Convert whole name to string
std::string wholeName(name);
// Create a new string from the whole name starting from one character past the first whitespace
std::string lastName(wholeName, wholeName.find_first_of(' ') + 1);
std::cout << lastName << std::endl;
If you're worried about multiple names, you can also use std::string::find_last_of
If you're worried about the names not being separated by a space, you could use std::string::find_first_not_of and search for letters of the alphabet. The example given in the link is:
std::string str ("look for non-alphabetic characters...");
std::size_t found = str.find_first_not_of("abcdefghijklmnopqrstuvwxyz ");
if (found!=std::string::npos)
{
std::cout << "The first non-alphabetic character is " << str[found];
std::cout << " at position " << found << '\n';
}

Boost xpressive regex results in garbage character

I am trying to write some code that changes a string like "/path/file.extension" to another specified extension. I am trying to use boost::xpressive to do so. But, I am having problems. It appears that a garbage character appears in the output:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
std::string str( "xml.xml.xml.xml");
sregex date = sregex::compile( "(\\.*)(\\.xml)$");
std::string format( "\1.zipxml");
std::string str2 = regex_replace( str, date, format );
std::cout << "str = " << str << "\n";
std::cout << "str2 = " << str2 << "\n";
return 0;
}
Now compile and run it:
[bitdiot#kantpute foodir]$ g++ badregex.cpp
[bitdiot#kantpute foodir]$ ./a.out > output
[bitdiot#kantpute foodir]$ less output
[bitdiot#kantpute foodir]$ cat -vte output
str = xml.xml.xml.xml$
str2 = xml.xml.xml^A.zipxml$
In the above example, I redirect output to a file, and use cat to print out the non-printable character. Notice the ctrl-A in the str2.
Anyways, am I using boost libraries incorrectly? Is this a boost bug? Is there another regular expression I can use that can allow me to string replace the ".tail" with some other string? (It's fix in my example.)
thanks.
At least as I'm reading things, the culprit is right here: std::string format( "\1.zipxml");.
You forgot to escape the backslash, so \1 is giving you a control-A. You almost certainly want \\1.
Alternatively (if your compiler is new enough) you could use a raw string instead, so it would be something like: R"(\1.zipxml)", and you wouldn't have to escape your backslashes. I probably wouldn't bother to mention this, except for the fact that if you're writing REs in C++ strings, raw strings are pretty much your new best friend (IMO, anyway).
As Jerry Coffin pointed out to me. It was a stupid mistake on my part.
The errant code is the following:
std::string format( "\1.zipxml");
This should be replaced with:
std::string format( "$1.zipxml");
Thanks for your help everyone.

Split a wstring by specified separator

I have a std::wstring variable that contains a text and I need to split it by separator. How could I do this? I wouldn't use boost that generate some warnings. Thank you
EDIT 1
this is an example text:
hi how are you?
and this is the code:
typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;
boost::char_separator<wchar_t> sep;
Tok tok(this->m_inputText, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
cout << *tok_iter;
}
the results are:
hi
how
are
you
?
I don't understand why the last character is always splitted in another token...
In your code, question mark appears on a separate line because that's how boost::tokenizer works by default.
If your desired output is four tokens ("hi", "how", "are", and "you?"), you could
a) change char_separator you're using to
boost::char_separator<wchar_t> sep(L" ", L"");
b) use boost::split which, I think, is the most direct answer to "split a wstring by specified character"
#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::wstring m_inputText = L"hi how are you?";
std::vector<std::wstring> tok;
split(tok, m_inputText, boost::is_any_of(L" "));
for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
tok_iter != tok.end(); ++tok_iter)
{
std::wcout << *tok_iter << '\n';
}
}
test run: https://ideone.com/jOeH9
You're default constructing boost::char_separator. The documentation says:
The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped.
Since std::ispunct(L'?') is true, it is treated as a "kept" delimiter, and reported as a separate token.
Hi you can use wcstok function
You said you don't want boost so...
This is maybe a wierd approach to use in C++ but I used it one in a MUD where i needed a lot of tokenization in C.
take this block of memory assigned to the char * chars:
char chars[] = "I like to fiddle with memory";
If you need to tokenize on a space character:
create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
if not already set set address of splitvalues[counter] to current memory address - 1
if value is ' ' write 0 there
increment counter
when you finish you have the original string destroyed so do not use it, instead you have the array of strings pointing to the tokens. the count of tokens is the counter variable (upperbound of the array).
the approach is this:
iterate the string and on first occurence update token start pointer
convert the char you need to split on to zeroes that mean string termination in C
count how many times you did this
PS. Not sure if you can use a similar approach in a unicode environment tough.