Getting rid of newline in CString in C++ - c++

I have a html/xml document that is originally a CString and I want to get rid of all the newlines, essentially put everything into one line. I've tried converting it to std::String and using:
#include <algorithm>
#include <string>
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
But it didn't work.

In order to stop your block of text looking odd, you'd want to replace the line breaks with a space. Make sure to replace both newline('\n') and carriage return('\r') characters.
CString str = "Line 1 Windows Style\r\n Line 2 Unix Style\n Line 3";
str.Replace('\r', " ");
str.Replace('\n', " ");
str.Replace(" ", " ");

You need only to use the method remove
CString str = _T("Test newline \nremove"), str2;
str.Remove('\n');

How about?
str.Replace("\n", "");
Documented here

Related

Problem removing backslash characters on std::string

I'm trying to execute CMD commands which I'm getting deserializing a JSON message.
When I deserialize message, I store the value in a std::string variable whose value is "tzutil /s \"Romance Standard Time_dstoff\"":
I would like to remove backslash characters ('\') when I receive commands with floating quotes parameters (e.g."tzutil /s "Romance Standard Time_dstoff"").
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
system(command.c_str());
Are there any way to do it?
I will appreciate any kind of help.
If you wish to remove all occurrences the character then you may use
#include <algorithm>
str.erase(std::remove(str.begin(), str.end(), char_to_remove), str.end());
If you wish to replace them with another character then try
#include <algorithm>
std::replace(str.begin(), str.end(), old_char, new_char);
Here is a function I made in C++ for one of my own projects for replacing sub-strings.
std::string
Replace(std::string str,
const std::string& oldStr,
const std::string& newStr)
{
size_t index = str.find(oldStr);
while(index != str.npos)
{
str = str.substr(0, index) +
newStr + str.substr(index + oldStr.size());
index = str.find(oldStr, index + newStr.size());
}
return str;
}
int main(){
std::string command = GetCommandFromJsonSource();
command = Replace(command, "\\\"", "\""); // unescape only double quotes
}
Although the source code of your program does contain, the string represented by the literal doesn't contain any backslashes, as demonstrated by the following example:
std::string command = "tzutil /s \"Romance Standard Time_dstoff\""; //Problem
std::cout << command;
// output:
tzutil /s "Romance Standard Time_dstoff"
As such, there is nothing to remove from the string.
Backslash is an escape character. \" is an escape sequence that represents a single character, the double quote. It is a way to type a double quote character within a string literal without that quote being interpreted as the end of the string instead.
To write a backslash into a string literal, you can by escaping it with a backslash. The following string does contain backslashes: "tzutil /s \\"Romance Standard Time_dstoff\\"". In this case, removing all backslashes can be done like so:
command.erase(std::remove(command.begin(), command.end(), '\\'), command.end());
However, simply removing all instances of the character might not be sensible. If your string contains escape sequences, what you probably should want to do instead is to unescape them. This is somewhat more complicated. You wouldn't want to remove all backslashes, but instead replace \" with " and \\ with \ and \n with a newline and so on.
You can use std::quoted to convert from and to a string literal.
#include <iomanip> // -> std::quoted
#include <iostream>
#include <sstream>
int main() {
std::istringstream s("\"Hello world\\n\"");
std::string hello;
s >> std::quoted(hello);
std::cout << std::quoted(s) << ": " << s;
}

Retrieve char defined in string

I'm currently writing an assembler and VM program. My assembler reads in a .asm file and converts it to byte code that my VM then runs.
Currently I read in a line from my assembly file, break that line into it's components, and then determine what the line contains (is it a directive, or an instruction)
getline(assemblyFile, line);
istringstream iss(line);
vector<string> instruction{
std::istream_iterator<std::string>(iss),{}
};
This gives me a vector of strings that has been working well for me up to this point. If my directive is an int, I'm able to retrieve it simply by saying
mem[dataCounter] = stoi(instruction[VALUE]);
This was also working well when I was using ASCII values for my characters. However, I'm trying now to be able to provide either ASCII representation, or use a notation like
J .BYT 'J'
Where the first J is a label, the .BYT tells me what data type it is, and my 'J' is the byte I'm wanting to store in my byte array. If I don't use quotes,
J .BYT J
the following works nicely
mem[dataCounter] = int(instruction[VALUE].c_str()[0]);
(gives me the decimal/byte value), where instruction is whole line, and VALUE is an index of 2. If I use the former, it of course returns the first quote. Not using quotes may be the solution in and of itself, however, I'm also having trouble reading in special characters, such as spaces, or newline characters. In the case of spaces, my directive looks like
SPACE .BYT ' '
which returns me a vector that has four elements, "SPACE", ".BYT", "'" and "'", and in the case of my newline which I've been attempting as
NEWLN .BYT \n
I have three elements with the last being "\n".
In none of these cases have I been able to find yet a way to retrieve the characters I am attempting to represent in my .asm file to their equivalent char/decimal value. I would like to continue to use string as it's been convenient and changing would require a fair bit of refactoring, but can be done to support the functionality.
What methods/functions are available that can help me retrieve these characters, in particular the special characters?
I would use strtok() and treat special characters with caution.
For example, I would examine whether the token is a newline, and if it is explicitly state it.
For the ' ', I would search for it in the string, and if found, remember its information (starting position for example in the string) and then erase it from the string. Afterwards, I would split into tokens.
Minimal Example for demonstrative purposes only:
#include <cstdio>
#include <cstring>
#include <string>
#include <iostream>
int main ()
{
//std::string str ="SPACE .BYT \n";
//std::string str = "J .BYT 'J'";
std::string str ="SPACE .BYT ' '";
std::size_t start_position_to_erase = str.find("' '");
if(start_position_to_erase != std::string::npos) {
std::cout << "Found: " << std::string(str, start_position_to_erase, start_position_to_erase+3) << std::endl;
str.erase(start_position_to_erase, 3);
}
char * pch;
printf ("Splitting string \"%s\" into tokens:\n", str.c_str());
pch = strtok ((char*)str.c_str()," ");
while (pch != NULL)
{
if(pch[0] == '\n')
printf ("\\n");
else
printf ("%s\n",pch);
pch = strtok (NULL, " ");
}
return 0;
}
Output:
Found: ' '
Splitting string "SPACE .BYT " into tokens:
SPACE
.BYT

Boost xpressive regex results in garbage character

I am trying to write some code that changes a string like "/path/file.extension" to another specified extension. I am trying to use boost::xpressive to do so. But, I am having problems. It appears that a garbage character appears in the output:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
std::string str( "xml.xml.xml.xml");
sregex date = sregex::compile( "(\\.*)(\\.xml)$");
std::string format( "\1.zipxml");
std::string str2 = regex_replace( str, date, format );
std::cout << "str = " << str << "\n";
std::cout << "str2 = " << str2 << "\n";
return 0;
}
Now compile and run it:
[bitdiot#kantpute foodir]$ g++ badregex.cpp
[bitdiot#kantpute foodir]$ ./a.out > output
[bitdiot#kantpute foodir]$ less output
[bitdiot#kantpute foodir]$ cat -vte output
str = xml.xml.xml.xml$
str2 = xml.xml.xml^A.zipxml$
In the above example, I redirect output to a file, and use cat to print out the non-printable character. Notice the ctrl-A in the str2.
Anyways, am I using boost libraries incorrectly? Is this a boost bug? Is there another regular expression I can use that can allow me to string replace the ".tail" with some other string? (It's fix in my example.)
thanks.
At least as I'm reading things, the culprit is right here: std::string format( "\1.zipxml");.
You forgot to escape the backslash, so \1 is giving you a control-A. You almost certainly want \\1.
Alternatively (if your compiler is new enough) you could use a raw string instead, so it would be something like: R"(\1.zipxml)", and you wouldn't have to escape your backslashes. I probably wouldn't bother to mention this, except for the fact that if you're writing REs in C++ strings, raw strings are pretty much your new best friend (IMO, anyway).
As Jerry Coffin pointed out to me. It was a stupid mistake on my part.
The errant code is the following:
std::string format( "\1.zipxml");
This should be replaced with:
std::string format( "$1.zipxml");
Thanks for your help everyone.

formatting a string which contains quotation marks

I am having problem formatting a string which contains quotationmarks.
For example, I got this std::string: server/register?json={"id"="monkey"}
This string needs to have the four quotation marks replaced by \", because it will be used as a c_str() for another function.
How does one do this the best way on this string?
{"id"="monkey"}
EDIT: I need a solution which uses STL libraries only, preferably only with String.h. I have confirmed I need to replace " with \".
EDIT2: Nvm, found the bug in the framework
it is perfectly legal to have the '"' char in a C-string. So the short answer is that you need to do nothing. Escaping the quotes is only required when typing in the source code
std::string str("server/register?json={\"id\"=\"monkey\"}")
my_c_function(str.c_str());// Nothing to do here
However, in general if you want to replace a substring by an other, use boost string algorithms.
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main(int, char**)
{
std::string str = "Hello world";
boost::algorithm::replace_all(str, "o", "a"); //modifies str
std::string str2 = boost::algorithm::replace_all_copy(str, "ll", "xy"); //doesn't modify str
std::cout << str << " - " << str2 << std::endl;
}
// Displays : Hella warld - Hexya warld
If you std::string contains server/register?json={"id"="monkey"}, there's no need to replace anything, as it will already be correctly formatted.
The only place you would need this is if you hard-coded the string and assigned it manually. But then, you can just replace the quotes manually.

Split a wstring by specified separator

I have a std::wstring variable that contains a text and I need to split it by separator. How could I do this? I wouldn't use boost that generate some warnings. Thank you
EDIT 1
this is an example text:
hi how are you?
and this is the code:
typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;
boost::char_separator<wchar_t> sep;
Tok tok(this->m_inputText, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
cout << *tok_iter;
}
the results are:
hi
how
are
you
?
I don't understand why the last character is always splitted in another token...
In your code, question mark appears on a separate line because that's how boost::tokenizer works by default.
If your desired output is four tokens ("hi", "how", "are", and "you?"), you could
a) change char_separator you're using to
boost::char_separator<wchar_t> sep(L" ", L"");
b) use boost::split which, I think, is the most direct answer to "split a wstring by specified character"
#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::wstring m_inputText = L"hi how are you?";
std::vector<std::wstring> tok;
split(tok, m_inputText, boost::is_any_of(L" "));
for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
tok_iter != tok.end(); ++tok_iter)
{
std::wcout << *tok_iter << '\n';
}
}
test run: https://ideone.com/jOeH9
You're default constructing boost::char_separator. The documentation says:
The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped.
Since std::ispunct(L'?') is true, it is treated as a "kept" delimiter, and reported as a separate token.
Hi you can use wcstok function
You said you don't want boost so...
This is maybe a wierd approach to use in C++ but I used it one in a MUD where i needed a lot of tokenization in C.
take this block of memory assigned to the char * chars:
char chars[] = "I like to fiddle with memory";
If you need to tokenize on a space character:
create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
if not already set set address of splitvalues[counter] to current memory address - 1
if value is ' ' write 0 there
increment counter
when you finish you have the original string destroyed so do not use it, instead you have the array of strings pointing to the tokens. the count of tokens is the counter variable (upperbound of the array).
the approach is this:
iterate the string and on first occurence update token start pointer
convert the char you need to split on to zeroes that mean string termination in C
count how many times you did this
PS. Not sure if you can use a similar approach in a unicode environment tough.