Convert escape sequences the way a compiler would [duplicate]

Convert escape sequences the way a compiler would [duplicate] - c++

What's the easiest way to convert a C++ std::string to another std::string, which has all the unprintable characters escaped?
For example, for the string of two characters [0x61,0x01], the result string might be "a\x01" or "a%01".

Take a look at the Boost's String Algorithm Library. You can use its is_print classifier (together with its operator! overload) to pick out nonprintable characters, and its find_format() functions can replace those with whatever formatting you wish.
#include <iostream>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>
struct character_escaper
{
template<typename FindResultT>
std::string operator()(const FindResultT& Match) const
{
std::string s;
for (typename FindResultT::const_iterator i = Match.begin();
i != Match.end();
i++) {
s += str(boost::format("\\x%02x") % static_cast<int>(*i));
}
return s;
}
};
int main (int argc, char **argv)
{
std::string s("a\x01");
boost::find_format_all(s, boost::token_finder(!boost::is_print()), character_escaper());
std::cout << s << std::endl;
return 0;
}

Assumes the execution character set is a superset of ASCII and CHAR_BIT is 8. For the OutIter pass a back_inserter (e.g. to a vector<char> or another string), ostream_iterator, or any other suitable output iterator.
template<class OutIter>
OutIter write_escaped(std::string const& s, OutIter out) {
*out++ = '"';
for (std::string::const_iterator i = s.begin(), end = s.end(); i != end; ++i) {
unsigned char c = *i;
if (' ' <= c and c <= '~' and c != '\\' and c != '"') {
*out++ = c;
}
else {
*out++ = '\\';
switch(c) {
case '"': *out++ = '"'; break;
case '\\': *out++ = '\\'; break;
case '\t': *out++ = 't'; break;
case '\r': *out++ = 'r'; break;
case '\n': *out++ = 'n'; break;
default:
char const* const hexdig = "0123456789ABCDEF";
*out++ = 'x';
*out++ = hexdig[c >> 4];
*out++ = hexdig[c & 0xF];
}
}
}
*out++ = '"';
return out;
}

Assuming that "easiest way" means short and yet easily understandable while not depending on any other resources (like libs) I would go this way:
#include <cctype>
#include <sstream>
// s is our escaped output string
std::string s = "";
// loop through all characters
for(char c : your_string)
{
// check if a given character is printable
// the cast is necessary to avoid undefined behaviour
if(isprint((unsigned char)c))
s += c;
else
{
std::stringstream stream;
// if the character is not printable
// we'll convert it to a hex string using a stringstream
// note that since char is signed we have to cast it to unsigned first
stream << std::hex << (unsigned int)(unsigned char)(c);
std::string code = stream.str();
s += std::string("\\x")+(code.size()<2?"0":"")+code;
// alternatively for URL encodings:
//s += std::string("%")+(code.size()<2?"0":"")+code;
}
}

One person's unprintable character is another's multi-byte character. So you'll have to define the encoding before you can work out what bytes map to what characters, and which of those is unprintable.

Have you seen the article about how to Generate Escaped String Output Using Spirit.Karma?

Related

How to output two versions of a string, one with escape character and the other not, in C++?

I have one chance to define the string, lets say
string s = "abc\"def\\hhh\"i";
After this definition, I want to output (using ofstream to write to a text file) two versions of this string afterwards. The first one is the output of s by default:
abc"def\hhh"i
The second one I want is:
abc\"def\\hhh\"i
I am writing a sort of "recursive" definition, defining another string with extra escape characters is not a solution.
I also looked up raw string, but it can only output the second not the first, and it is a feature for c++11, which is too new for some computers to compile.
How can I output the second version of the string without using c++11? If I have to use c++11, how to avoid defining the string twice?

It is very simple to write such functionality:
std::string to_quoted(std::string const& src) {
std::string result;
result.reserve(src.size()); // Not necessary but makes the code more efficient (mentioned in comments)
for (std::size_t i = 0; i < src.length(); i++) {
switch (src[i]) {
case '"': result += "\\""; break;
case '\\': result += "\\\\"; break;
// ...
default: result += src[i];
}
}
return result;
}
There may be better solutions, I think this is the simplest and quickest one. I don't understand what you mean by "defining another string", maybe you mean constructing another string is disallowed, in that case just output to the stream instead of concatenating characters.

I tend to write something like this as template with range input and iterator output. This provides much flexibility as you can output to a stream, another string or anything else that you could wrap into an output iterator, all using the same function.
Input doesn't even have to be a std::string, it could be a std::vector, a simple array or any type for which an overload of begin() and end() is provided (requirements of range-for loop).
Another advantage compared to simply returning an std::string from the function is that you don't have to create a temporary string for the result which avoids memory allocations which should improve performance.
#include <iostream>
#include <string>
#include <iterator>
template< typename Range, typename OutputIterator >
OutputIterator copy_escaped( const Range& in, OutputIterator out ){
for( const auto& c : in ){
switch( c ){
case '"':
*out++ = '\\';
*out++ = '"';
break;
case '\\':
*out++ = '\\';
*out++ = '\\';
break;
case '\n':
*out++ = '\\';
*out++ = 'n';
break;
case '\r':
*out++ = '\\';
*out++ = 'r';
break;
case '\t':
*out++ = '\\';
*out++ = 't';
break;
// Could add more stuff to escape here
// case ...:
default:
*out++ = c;
}
}
return out;
}
You could easily extend the function to escape additional characters.
Usage examples:
int main()
{
std::string s = "abc\"def\\hhh\"i";
// output normal
std::cout << s << std::endl;
// output escaped
copy_escaped( s, std::ostream_iterator<char>( std::cout ) );
std::cout << std::endl;
// output escaped to other string
std::string escaped_s;
escaped_s.reserve( s.size() ); // not required but improves performance
copy_escaped( s, back_inserter( escaped_s ) );
std::cout << escaped_s << std::endl;
}
Live demo.

Go word-by-word through a text file and replace certain words

My intended program is simple: Take each word of a text file and replace it with asterisks if it's a swear word. For instance, if the text file was "Hello world, bitch" then it would be modified to "Hello world, *****".
I have the tool for taking a word as a string and replacing it with asterisks if needed. I need help setting up the main part of my program because I get confused with all the fstream stuff. Should I instead make a new file with the replaced words and then overwrite the previous file?
#include <iostream>
#include <string>
#include <fstream>
const char* BANNED_WORDS[] = {"fuck", "shit", "bitch", "ass", "damn"};
void filter_word(std::string&);
void to_lower_case(std::string&);
int main (int argc, char* const argv[]) {
return 0;
}
void filter_word(std::string& word) {
std::string wordCopy = word;
to_lower_case(wordCopy);
for (int k = 0; k < sizeof(BANNED_WORDS)/sizeof(const char*); ++k)
if (wordCopy == BANNED_WORDS[k])
word.replace(word.begin(), word.end(), word.size(), '*');
}
void to_lower_case(std::string& word) {
for (std::string::iterator it = word.begin(); it != word.end(); ++it) {
switch (*it) {
case 'A': *it = 'a';
case 'B': *it = 'b';
case 'C': *it = 'c';
case 'D': *it = 'd';
case 'E': *it = 'e';
case 'F': *it = 'f';
case 'G': *it = 'g';
case 'H': *it = 'h';
case 'I': *it = 'i';
case 'J': *it = 'j';
case 'K': *it = 'k';
case 'L': *it = 'l';
case 'M': *it = 'm';
case 'N': *it = 'n';
case 'O': *it = 'o';
case 'P': *it = 'p';
case 'Q': *it = 'q';
case 'R': *it = 'r';
case 'S': *it = 's';
case 'T': *it = 't';
case 'U': *it = 'u';
case 'V': *it = 'v';
case 'W': *it = 'w';
case 'X': *it = 'x';
case 'Y': *it = 'y';
case 'Z': *it = 'z';
}
}
}

The usual solution to modifying a file is to generate a new
file, then delete the old and rename the new. In your case,
because your replacement text has exactly the same length as
your new text, you can do it in place, with something like:
std::fstream file( fileName, ios_base::in | ios_base::out );
if ( !file.is_open() ) {
// put error handling here...
std::string word;
std::fstream::pos_type startOfWord;
while ( file.peek() != std::fstream::traits::eof() ) {
if ( ::isalpha( file.peek() ) ) {
if ( word.empty() ) {
startOfWord = file.tellg();
}
word += file.get();
} else {
if ( !word.empty() ) {
if ( std::find_if( banned.begin(), banned.end(), CaseInsensitiveCompare() ) ) {
file.seekp( startOfWord );
file.write( std::string( word.size(), '*').c_str(), word.size() );
}
word.clear();
}
file.get();
}
}
with:
struct CaseInsensitiveCompare
{
bool operator()( unsigned char lhs, unsigned char rhs ) const
{
return ::tolower( lhs ) == ::tolower( rhs );
}
bool operator()( std::string const& lhs, std::string const& rhs ) const
{
return lhs.size() == rhs.size()
&& std::equal( lhs.begin(), lhs.end(), rhs.begin(), *this )
}
};
The tellg and seekp probably aren't the most efficient
operations around, but if the file is large, and you don't have
to seek too often, it may still be more efficient than writing
a completely new file. Of course, if efficiency is an issue,
you might want to consider mmap, and doing the job directly in
memory. That would certainly be the most efficient, and
probably the easiest to code as well. But it would be platform
dependent, and would require extra effort to handle files larger
than your available address space.
Also, for the future (since there is a standard tolower that
you can use), when doing code translation (which is really what
to_lower_case does), use a table. It's much simpler and
faster:
char
to_lower_case( char ch )
{
char translationTable[] =
{
// ...
};
return translationTable[static_cast<unsigned char>( ch )];
}
If you don't want your code to be dependent on the encoding, you
can use dynamic initialization:
if ( !initialized ) {
for ( int i = 0; i <= UCHAR_MAX; ++ i ) {
translationTable[i] = i;
}
static char const from[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static char const to[] = "abcdefghijklmnopqrstuvwxyz";
for ( int i = 0; i != sizeof(from); ++ i ) {
translationTable[from[i]] = to[i];
}
}
This is not a good idea for things like tolower, however;
you would have to know all of the possible upper case
characters, which in turn depends on the encoding. (The
functions in <ctype.h> do do something like this. And
redefine the translation table each time you change locale.) It
can be useful for other types of mappings.

I think you need a code to read file word by word and replace if the word is one of BANNED_WORDS
So here is a solution for main():
int main()
{
std::vector <std::string> words; // Vector to hold our words we read in.
std::string str; // Temp string to
std::cout << "Read from a file!" << std::endl;
std::ifstream fin("thisfile.txt"); // Open it up!
while (fin >> str) // Will read up to eof() and stop at every
{ // whitespace it hits. (like spaces!)
words.push_back(str);
}
fin.close(); // Close that file!
std::ofstream fout("temp.txt"); // open temp file
for (int i = 0; i < words.size(); ++i)
{ // replace all words and add it to temp file
filter_word(words.at(i));
fout<<words.at(i) << endl;
}
// Add code for replace the file
return 0;
}
And for to_lower_case() you can use
#include <ctype.h>
// ...
*it = tolower(*it);
As suggested by Paul Evans
Hope this will help you

How to add dot character to a character in string?

I want to add '.' character besides another character in a string but I don't know how to do it ? is it possible?
#include <iostream>
#include <string.h>
using namespace std;
int main(int argc, char *argv[]) {
string input;
char dot='.';
cin>>input;
for(int i=0;i<input.length();i++)
{
if( input[i]>=65 && input[i]<=90)
{
input[i]=input[i]+32;
}
if( (input[i]=='a') || (input[i]=='e') || (input[i]=='i') || (input[i]=='o') || input[i]=='y' || input[i]=='u' )
{
input.erase(i,i+1);
}
input[i]+=dot;
}
cout<<input<<endl;
return 0;
}

From the cpluplus.com reference ( http://www.cplusplus.com/reference/string/string/insert/ )
// inserting into a string
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string str="to be question";
string str2="the ";
string str3="or not to be";
string::iterator it;
// used in the same order as described above:
str.insert(6,str2); // to be (the )question
str.insert(6,str3,3,4); // to be (not )the question
str.insert(10,"that is cool",8); // to be not (that is )the question
str.insert(10,"to be "); // to be not (to be )that is the question
str.insert(15,1,':'); // to be not to be(:) that is the question
it = str.insert(str.begin()+5,','); // to be(,) not to be: that is the question
str.insert (str.end(),3,'.'); // to be, not to be: that is the question(...)
str.insert (it+2,str3.begin(),str3.begin()+3); // (or )
cout << str << endl;
return 0;
}
Also, check these links:
http://www.cplusplus.com/reference/string/string/
http://www.cplusplus.com/reference/string/string/append/
http://www.cplusplus.com/reference/string/string/push_back/

Before you try writing the code, you should write a detailed
specification of what it is supposed to do. With your code, I
can only guess: convert to lower case (naïvely, pretending that
you'll only encounter the 26 unaccented letters in ASCII), then
delete all vowels (again, very naïvely, since determining
whether something is a vowel or not is non-trivial, even in
English—consider the y in yet and day), and finally
inserting a dot after each character. The most obvious way of
doing that would be something like:
std::string results;
for ( std::string::const_iterator current = input.begin(),
end = input.end();
current != end;
++ current ) {
static std::string const vowels( "aeiouAEIOU" );
if ( std::find( vowels.begin(), vowels.end(), *current )
!= vowels.end() ) {
results.push_back(
tolower( static_cast<unsigned char>( *current ) ) );
}
results.push_back( '.' );
}
But again, I'm just guessing as to what you are trying to do.
Another alternative would be to use std::transform on the
initial string to make it all lower case. If you're doing this
sort of thing regularly, you'll have a ToLower functional
object; otherwise, it's probably too much of a bother to write
one just to be able to use std::transform once.

I’m assuming you want this input:
Hello world!
To give you this output:
h.ll. w.rld!
Rather than trying to modify the string in place, you can simply produce a new string as you go:
#include <cctype>
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char *argv[]) {
string input;
getline(cin, input);
string output;
const string vowels = "aeiouy";
for (int i = 0; i < input.size(); ++i) {
const char c = tolower(input[i]);
if (vowels.find(c) != string::npos) {
output += '.';
} else {
output += c;
}
}
cout << output << '\n';
return 0;
}
Notes:
<cctype> is for toupper().
<string.h> is deprecated; use <string>.
Read whole lines with getline(); istream::operator>>() reads words.
Use tolower(), toupper(), &c. for character transformations. c + 32 doesn’t describe your intent.
When you need comparisons, c >= 'A' && c <= 'Z' will work; you don't need to use ASCII codes.
Use const for things that will not change.

I'm not sure how this old question got bumped back onto the current list, but after reviewing the answers, it looks like all will miss the mark if the input is more than a single word. From your comments, it appears you want to remove all vowels and place a '.' before the character immediately prior to where the removal occurred. Thus your example "tour" becomes ".t.r".
Drawing from the other answers, and shamelessly removing 'y' as from the list of vowels, you can do something similar to:
#include <iostream>
#include <string>
int main()
{
std::string input;
if (!getline (std::cin, input)) {
return 1;
}
size_t i = 0;
for (; input[i]; i++)
{
switch (input[i])
{
case 'A': /* case fall-through intentional */
case 'E':
case 'I':
case 'O':
case 'U':
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
{
size_t pos = input.find_first_not_of("AEIOUaeiou", i+1);
if (pos == std::string::npos) {
pos = input.length();
}
input.erase(i, pos-i);
if (pos - i > 1) {
input.insert(i, 1, '.');
}
input.insert(i-1, 1, '.');
break;
}
}
}
std::cout << input << '\n';
}
Example Use/Output
Your example:
$ ./bin/vowels-rm-mark
tour
.t.r
A longer example:
$ ./bin/vowels-rm-mark
My dog has fleas and my cat has none.
My .dg .hs f.l.s. nd my .ct .hs .n.n.

Based on your comments, it sounds like you want something like this:
#include <iostream>
#include <string>
#include <algorithm>
int main(int argc, char *argv[])
{
std::string input;
std::cin >> input;
std::transform (input.begin(), input.end(), input.begin(), tolower);
size_t i = 0;
while (i < input.length())
{
switch (input[i])
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'y':
case 'u':
{
size_t pos = input.find_first_not_of("aeioyu", i+1);
if (pos == std::string::npos)
pos = input.length();
input.erase(i, pos-i);
break;
}
default:
{
input.insert(i, 1, '.'); // or: input.insert(i, ".");
i += 2;
break;
}
}
}
std::cout << input << std::endl;
return 0;
}

string conversion c++

I have a string and the first element is for example 'a'. I already declared a variable called a ( so int a=1 for example). My question now is, how can I convert the whole string to numbers (a=1,b=2,c=3,...z=26)? Example:
string str="hello"; this has to be changed to "85121215" and then changed to 85121215.

// transformation itself doesn't care what encoding we use
std::string transform_string(std::string const &in, std::function<int(char)> op)
{
std::ostringstream out;
std::transform(in.begin(), in.end(),
std::ostream_iterator<int>(out),
op);
return out.str();
}
// the per-character mapping is easy to isolate
int ascii_az_map(char ch)
{
if (ch < 'a' || ch > 'z') {
std::ostringstream error;
error << "character '" << ch << "'=" << (int)ch
<< " not in range a-z";
throw std::out_of_range(error.str());
}
return 1 + ch - 'a';
}
// so we can support other encodings if necessary
// NB. ebdic_to_ascii isn't actually implemented here
int ebcdic_az_map(char ch)
{
return ascii_az_map(ebcdic_to_ascii(ch));
}
// and even detect the platform encoding automatically (w/ thanks to Phresnel)
// (you can still explicitly select a non-native encoding if you want)
int default_az_map(char ch)
{
#if ('b'-'a' == 1) && ('j' - 'i' == 1)
return ascii_az_map(ch);
#elif ('j'-'i' == 8)
return ebcdic_az_map(ch);
#else
#error "unknown character encoding"
#endif
}
// use as:
std::string str = "hello";
std::string trans = transform_string(str, ascii_az_map);
// OR ... transform_string(str, ebcdic_az_map);
Note that since the per-character mapping is completely isolated, it's really easy to change the mapping to a lookup table, support different encodings etc.

Your definition is a bit small:
"hello" = "85121215
h = 8
e = 5
l = 12
o = 15
I assume you mean that
a = 1
b = 2
...
z = 26
in which case it is not that hard:
std::string meh_conv(char c) {
switch(c) { // (or `switch(tolower(c))` and save some typing)
case 'a': case 'A': return "1";
case 'b': case 'B': return "2";
....
case 'z': case 'Z': return "26";
....
// insert other special characters here
}
throw std::range_error("meh");
}
std::string meh_conv(std::string const &src) {
std::string dest;
for (const auto c : s)
dest += meh_conv(c);
return dest;
}
or use std::transform():
#include <algorithm>
std::string dest;
std::transform (src.begin(), src.end(), back_inserter(dest),
meh_conv)
(doesn't work for different incoming and outgoing types, at least not as is)
Addendum.
You possibly want to parametrize the replacement map:
std::map<char, std::string> repl;
repl['a'] = repl['A'] = "0";
repl[' '] = " ";
std::string src = "hello";
std::string dest;
for (const auto c : src) dest += repl[c];

I wrote you a simple example. It creates a map what contains the a-1, b-2, c-3 ... pairs. Then concatenate the values using a stringstream:
#include <iostream>
#include <map>
#include <sstream>
int main()
{
std::string str = "abc";
std::map<char,int> dictionary;
int n = 1;
for(char c='a'; c<='z'; c++)
dictionary.insert(std::pair<char,int>(c,n++));
//EDIT if you want uppercase characters too:
n=1;
for(char c='A'; c<='Z'; c++)
dictionary.insert(std::pair<char,int>(c,n++));
std::stringstream strstream;
for(int i=0; i<str.size(); i++)
strstream<<dictionary[str[i]];
std::string numbers = strstream.str();
std::cout<<numbers;
return 0;
}
C++ experts probably going to kill me for this solution, but it works ;)

Easy Approach,
you can find mod of char with 96 (ASCII value before a), as result it will always give you values in range 1-26.
int value;
string s;
cin>>s;
for(int i=0; i<s.size();i++){
value = s[j]%96;
cout<<value<<endl;
}

Escaping a C++ string

What's the easiest way to convert a C++ std::string to another std::string, which has all the unprintable characters escaped?
For example, for the string of two characters [0x61,0x01], the result string might be "a\x01" or "a%01".

Take a look at the Boost's String Algorithm Library. You can use its is_print classifier (together with its operator! overload) to pick out nonprintable characters, and its find_format() functions can replace those with whatever formatting you wish.
#include <iostream>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>
struct character_escaper
{
template<typename FindResultT>
std::string operator()(const FindResultT& Match) const
{
std::string s;
for (typename FindResultT::const_iterator i = Match.begin();
i != Match.end();
i++) {
s += str(boost::format("\\x%02x") % static_cast<int>(*i));
}
return s;
}
};
int main (int argc, char **argv)
{
std::string s("a\x01");
boost::find_format_all(s, boost::token_finder(!boost::is_print()), character_escaper());
std::cout << s << std::endl;
return 0;
}

Assumes the execution character set is a superset of ASCII and CHAR_BIT is 8. For the OutIter pass a back_inserter (e.g. to a vector<char> or another string), ostream_iterator, or any other suitable output iterator.
template<class OutIter>
OutIter write_escaped(std::string const& s, OutIter out) {
*out++ = '"';
for (std::string::const_iterator i = s.begin(), end = s.end(); i != end; ++i) {
unsigned char c = *i;
if (' ' <= c and c <= '~' and c != '\\' and c != '"') {
*out++ = c;
}
else {
*out++ = '\\';
switch(c) {
case '"': *out++ = '"'; break;
case '\\': *out++ = '\\'; break;
case '\t': *out++ = 't'; break;
case '\r': *out++ = 'r'; break;
case '\n': *out++ = 'n'; break;
default:
char const* const hexdig = "0123456789ABCDEF";
*out++ = 'x';
*out++ = hexdig[c >> 4];
*out++ = hexdig[c & 0xF];
}
}
}
*out++ = '"';
return out;
}

Assuming that "easiest way" means short and yet easily understandable while not depending on any other resources (like libs) I would go this way:
#include <cctype>
#include <sstream>
// s is our escaped output string
std::string s = "";
// loop through all characters
for(char c : your_string)
{
// check if a given character is printable
// the cast is necessary to avoid undefined behaviour
if(isprint((unsigned char)c))
s += c;
else
{
std::stringstream stream;
// if the character is not printable
// we'll convert it to a hex string using a stringstream
// note that since char is signed we have to cast it to unsigned first
stream << std::hex << (unsigned int)(unsigned char)(c);
std::string code = stream.str();
s += std::string("\\x")+(code.size()<2?"0":"")+code;
// alternatively for URL encodings:
//s += std::string("%")+(code.size()<2?"0":"")+code;
}
}

One person's unprintable character is another's multi-byte character. So you'll have to define the encoding before you can work out what bytes map to what characters, and which of those is unprintable.

Have you seen the article about how to Generate Escaped String Output Using Spirit.Karma?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert escape sequences the way a compiler would [duplicate] - c++

What's the easiest way to convert a C++ std::string to another std::string, which has all the unprintable characters escaped? For example, for the string of two characters [0x61,0x01], the result string might be "a\x01" or "a%01".

One person's unprintable character is another's multi-byte character. So you'll have to define the encoding before you can work out what bytes map to what characters, and which of those is unprintable.

Have you seen the article about how to Generate Escaped String Output Using Spirit.Karma?

Related

How to output two versions of a string, one with escape character and the other not, in C++?

Go word-by-word through a text file and replace certain words

How to add dot character to a character in string?

string conversion c++

Escaping a C++ string

Categories

Resources