Convert string with explicit escape sequence into relative character - c++

I need a function to convert "explicit" escape sequences into the relative non-printable character.
Es:
char str[] = "\\n";
cout << "Line1" << convert_esc(str) << "Line2" << endl:
would give this output:
Line1
Line2
Is there any function that does this?

I think that you must write such function yourself since escape characters is a compile-time feature, i.e. when you write "\n" the compiler would replace the \n sequence with the eol character. The resulting string is of length 1 (excluding the terminating zero character).
In your case a string "\\n" is of length 2 (again excluding terminating zero) and contains \ and n.
You need to scan your string and when encountering \ check the following char. if it is one of the legal escapes, you should replace both of them with the corresponding character, otherwise skip or leave them both as is.
( http://ideone.com/BvcDE ):
string unescape(const string& s)
{
string res;
string::const_iterator it = s.begin();
while (it != s.end())
{
char c = *it++;
if (c == '\\' && it != s.end())
{
switch (*it++) {
case '\\': c = '\\'; break;
case 'n': c = '\n'; break;
case 't': c = '\t'; break;
// all other escapes
default:
// invalid escape sequence - skip it. alternatively you can copy it as is, throw an exception...
continue;
}
}
res += c;
}
return res;
}

You can do that fairly easy, using the boost string algorithm library. For example:
#include <string>
#include <iostream>
#include <boost/algorithm/string.hpp>
void escape(std::string& str)
{
boost::replace_all(str, "\\\\", "\\");
boost::replace_all(str, "\\t", "\t");
boost::replace_all(str, "\\n", "\n");
// ... add others here ...
}
int main()
{
std::string str = "This\\tis\\n \\\\a test\\n123";
std::cout << str << std::endl << std::endl;
escape(str);
std::cout << str << std::endl;
return 0;
}
This is surely not the most efficient way to do this (because it iterates the string multiple times), but it is compact and easy to understand.
Update:
As ybungalobill has pointed out, this implementation will be wrong, whenever a replacement string produces a character sequence, that a later replacement is searching for or when a replacement removes/modifies a character sequence, that should have been replaced.
An example for the first case is "\\\\n" -> "\\n" -> "\n". When you put the "\\\\" -> "\\" replacement last (which seems to be the solution at a first glance), you get an example for the latter case "\\\\n" -> "\\\n". Obviously there is no simple solution to this problem, which makes this technique only feasible for very simple escape sequences.
If you need a generic (and more efficient) solution, you should implement a state machine that iterates the string, as proposed by davka.

I'm sure that there is, written by someone, but it's so trivial that I doubt it's been specifically published anywhere.
Just recreate it yourself from the various "find"/"replace"-esque algorithms in the standard library.

Have you considered using printf? (or one of its relatives)

Here's a cute way to do it on Unixy platforms.
It calls the operating system's echo command to make the conversion.
string convert_escapes( string input )
{
string buffer(input.size()+1,0);
string cmd = "/usr/bin/env echo -ne \""+input+"\"";
FILE * f = popen(cmd.c_str(),"r"); assert(f);
buffer.resize(fread(&buffer[0],1,buffer.size()-1,f));
fclose(f);
return buffer;
}

Related

wordexp and strings with spaces

I am trying to expand variables in a string that contains a unix file path. For example the string is:
std::string path = "$HOME/Folder With Two Spaces Next To Each Other".
This is my code for wordexp I use:
#include <wordexp.h>
#include <string>
#include <iostream>
std::string env_subst(const std::string &path)
{
std::string result = "";
wordexp_t p;
if (!::wordexp(path.c_str(), &p, 0))
{
if (p.we_wordc >= 1)
{
result = std::string(p.we_wordv[0]);
for (uint32_t i = 1; i < p.we_wordc; ++i)
{
result += " " + std::string(p.we_wordv[i]);
}
}
::wordfree(&p);
return result;
}
else
{
// Illegal chars found
return path;
}
}
int main()
{
std::string teststring = "$HOME/Folder With Two Spaces Next To Each Other";
std::string result = env_subst(teststring);
std::cout << "Result: " << result << std::endl;
return 0;
}
The output is:
Result: /home/nidhoegger/Folder With Two Spaces Next To Each Other
You see, that whereas there were two spaces between the words in the input, there is now only a single space.
Is there an easy way to fix that?
The reason your code removes the double spaces in your path is because your for loop adds only a single space after every word, regardless of the actual number of spaces. A possible solution to this problem would be to locate all the spaces in your path string beforehand, and then add them in. For example, you could use something like this:
std::string spaces[p.we_wordc];
uint32_t pos = path.find(" ", 0);
uint32_t j=0;
while(pos!=std::string::npos){
while(path.at(pos)==' '){
spaces[j]+=" ";
pos++;
}
pos=path.find(" ", pos+1);
j++;
}
to iterate through your path using std::string::find and store the spaces in a string array. Then, you could modify the line in your for loop to
result += spaces[i-1] + std::string(p.we_wordv[i]);
to add in the appropriate number of spaces.
If you want to preserve spaces in an unusually named file, enclose it in braces: std::string teststring = "\"~/filename with spaces\"";. But it doesn't make sense to note how many spaces there were in an original string, since you'll have to skip pairs of " and basically redo what wordexp() does. Leaving multiple spaces in a command doesn't make much sense: ls -al is exactly the same as ls -al, so the trimming is justified. And the OP's code is perfectly valid -- no need to add anything else.
P.S. Decided to add this as a note because I fell in the same pit as OP.

Sub-strings and delimiters

I'm trying to get a sentence delimited by certain characters (either a space, comma, or a dot) to check if it's a palindrome. If the input is "hello,potato.", I'll check this symmetry on "hello" alone and then potato alone.
The problem is, while I'm doing the first iteration of the loop that searches for the delimiter, the word "hello" is stored in the sub-sentence, but on the second iteration the word that should be stored as "potato" will be "potato.". And I am unable to remove the "." delimiter from the end of the input string.
for(int i=0;i<sentence.length();i++)
{
if(sentence[i]==' '||sentence[i]=='.'||sentence[i]==',')
{ //the couts is just to help me debug/trace
cout<<"i is now : "<<i<<endl;
if(i==delindex && i==sentence.length()-1)
{
subsentence=sentence.substr(temp+1,subsentence.length()-1);
}
else
{
subsentence=sentence.substr(delindex,i);
cout<<subsentence<<endl;
temp=delindex-1;
delindex=i+1;
}
}
}
What would be the best way to go about this?
god bless you man that strtok is what i have been looking for
Actually, you don't need strtok (and should probably avoid it for various safety reasons), as std::string has a wonderful method called find_first_of which acts pretty much like strtok, as in it accepts a bunch of chars and returns index when it stumbles on any of the chars. However to make robust tokenizer a combination of find_first_of and find_first_not_of is more suitable in this case.
Therefore you could simplify your token searching to:
#include <iostream>
#include <string>
int main()
{
std::string sentence = "hello,potato tomato.";
std::string delims = " .,";
size_t beg, pos = 0;
while ((beg = sentence.find_first_not_of(delims, pos)) != std::string::npos)
{
pos = sentence.find_first_of(delims, beg + 1);
std::cout << sentence.substr(beg, pos - beg) << std::endl;
}
}
https://ideone.com/rhMyvG

How do I print "\n" explicitly in a string in c++

hi i have a unknown string in c++ containing "\n" "\t" etc.
say;
string unknown1=read_from_file();
if unknown1="\n\t\n\t\n\n\n\n\n" now I want to print
"\n\t\n\t\n\n\n\n\n"
to the screen explcitly other than a bunch of empty spaces.... how should I do that? remember that I don't know what is in unknown1...
to emphasize, I know that we could print \n explicitly if we change "\n" into "\n" for every such character... But the problem is that I don't know what is inside unknown1... It is read from a file....
Thanks for answering, however we have further concerns:
The procedure answered has one more porblem... Suppose I don't know that the only possible special character in the file is \n or \t... maybe there are something like \u ? \l ? i think we can't exhaust every possibility right? Is there kind of built in C++ function just to output the corresponding characters?
\n , \t are escape sequences, but you can print them by adding an extra \ before them, \\ is used to obtain a single backslash. A single backslash means it is an escape sequence ( if it is a valid one ), but two backslashes represent the backslash character, so whenever you need to output a backslash, just add two of them.
So, if you use
string unknown1="\\n\\t\\n\\t\\n\\n\\n\\n\\n";
You will get your desired output.
If you are reading from a file , then do something like this
string unknown1="\n\t\n\t\n\n\n\n\n";
for ( int i = 0 ; i < unknown1.length() ; i++ )
{
if( unknown1[i] == '\n')
cout<<"\\n";
}
Like that, you will have to check for each escape sequence that may be used.
Run specific checks for the non-printable characters you are worried about like this.
char c;
while(c!=EOF){
if(c=='\n')printf("\\n");
if(c=='\t')printf("\\t");
and so on and so forth.
c = the next charater;
}
oops, I wrote C instead of C++ but #Arun A.S has the right syntax
See below example. You can add your own characters to the switch to extend the characters it handles.
#include <iostream>
#include <string>
std::string escapeSpecialChars(const std::string& str)
{
std::string result;
for(auto c : str)
{
switch(c)
{
case '\n':
result += "\\n";
break;
case '\t':
result += "\\t";
break;
default:
result += c;
break;
}
}
return result;
}
int main()
{
std::string str = "\n\n\n\t";
std::cout << escapeSpecialChars(str);
return 0;
}
You could create your own function to print characters using a std::map:
void printChar( const char ch )
{
static const std::map< char, std::string > char_map = {
{ '\n', "\\n" }
// add mappings as needed
};
auto found = char_map.find( ch );
if ( found != char_map.end() )
std::cout << found->second;
else
std::cout << ch;
}
// usage
std::string str = "a\nbc";
for ( auto ch : str ) printChar ( ch );

Convert first letter in string to uppercase

I have a string: "apple". How can I convert only the first character to uppercase and get a new string in the form of "Apple"?
I can also have a string with multibyte characters.
What if the first character of the string is a multibyte character ?
string str = "something";
str[0] = toupper(str[0]);
That's all you need to do. It also works for C strings.
Like what Carneigie said,
string str = "something";
str[0] = toupper(str[0]);
but also remember to:
#include <string>
#include <cctype>
all the way up
I cannot use str[0] because, I can have string which has multibyte characters
I don't know of any CRT implementation that supports non-ASCII character classification and conversion. If you want to support Unicode then everything is much more complicated since "converting the first character to uppercase" may be meaningless in other languages. You have to use a Unicode library written by experts for this.
To illustrate how complicated it is, consider the following case in English. Converting the three code-point sequence 'file' (with f-i ligature) shall break the first codepoint into two separate letters resulting in 'File'. Please note that the standard C/C++ interfaces for doing case classification and conversion don't take such cases into account, so it's even impossible to implement them to support Unicode correctly.
#include <iostream>
using namespace std;
void capitalize (string &s)
{
bool cap = true;
for(unsigned int i = 0; i <= s.length(); i++)
{
if (isalpha(s[i]) && cap == true)
{
s[i] = toupper(s[i]);
cap = false;
}
else if (isspace(s[i]))
{
cap = true;
}
}
}
(Only works with 'ASCII' characters.)
std::wstring s = L"apple";
if(islower(s.at(0) <= 'z' ? s.at(0) : 'A'))
s[0] += 'A' - 'a';
Or if you are feeling fancy and feel like torturing any future readers of your code:
std::wstringstream wss;
wss << std::uppercase << s[0]
<< std::nouppercase << s.substr(1);
wss >> s;

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Kind of looking for affirmation here. I have some hand-written code, which I'm not shy to say I'm proud of, which reads a file, removes leading whitespace, processes newline escapes '\' and removes comments starting with #. It also removes all empty lines (also whitespace-only ones). Any thoughts/recommendations? I could probably replace some std::cout's with std::runtime_errors... but that's not a priority here :)
const int RecipeReader::readRecipe()
{
ifstream is_recipe(s_buffer.c_str());
if (!is_recipe)
cout << "unable to open file" << endl;
while (getline(is_recipe, s_buffer))
{
// whitespace+comment
removeLeadingWhitespace(s_buffer);
processComment(s_buffer);
// newline escapes + append all subsequent lines with '\'
processNewlineEscapes(s_buffer, is_recipe);
// store the real text line
if (!s_buffer.empty())
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
is_recipe.close();
return 0;
}
void RecipeReader::processNewlineEscapes(string &s_string, ifstream &is_stream)
{
string s_temp;
size_t sz_index = s_string.find_first_of("\\");
while (sz_index <= s_string.length())
{
if (getline(is_stream,s_temp))
{
removeLeadingWhitespace(s_temp);
processComment(s_temp);
s_string = s_string.substr(0,sz_index-1) + " " + s_temp;
}
else
cout << "Error: newline escape '\' found at EOF" << endl;
sz_index = s_string.find_first_of("\\");
}
}
void RecipeReader::processComment(string &s_string)
{
size_t sz_index = s_string.find_first_of("#");
s_string = s_string.substr(0,sz_index);
}
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
const size_t sz_length = s_string.size();
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index <= sz_length)
s_string = s_string.substr(sz_index);
else if ((sz_index > sz_length) && (sz_length != 0)) // "empty" lines with only whitespace
s_string.clear();
}
Some extra info: the first s_buffer passed to the ifstream contains the filename, std::string s_buffer is a class data member, so is std::vector v_s_recipe. Any comment is welcome :)
UPDATE: for the sake of not being ungrateful, here is my replacement, all-in-one function that does what I want for now (future holds: parenthesis, maybe quotes...):
void readRecipe(const std::string &filename)
{
string buffer;
string line;
size_t index;
ifstream file(filename.c_str());
if (!file)
throw runtime_error("Unable to open file.");
while (getline(file, line))
{
// whitespace removal
line.erase(0, line.find_first_not_of(" \t\r\n\v\f"));
// comment removal TODO: store these for later output
index = line.find_first_of("#");
if (index != string::npos)
line.erase(index, string::npos);
// ignore empty buffer
if (line.empty())
continue;
// process newline escapes
index = line.find_first_of("\\");
if (index != string::npos)
{
line.erase(index,string::npos); // ignore everything after '\'
buffer += line;
continue; // read next line
}
else // no newline escapes found
{
buffer += line;
recipe.push_back(buffer);
buffer.clear();
}
}
}
Definitely ditch the hungarian notation.
It's not bad, but I think you're thinking of std::basic_string<T> too much as a string and not enough as an STL container. For example:
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
s_string.erase(s_string.begin(),
std::find_if(s_string.begin(), s_string.end(), std::not1(isspace)));
}
A few comments:
As another answer (+1 from me) said - ditch the hungarian notation. It really doesn't do anything but add unimportant trash to every line. In addition, ifstream yielding an is_ prefix is ugly. is_ usually indicates a boolean.
Naming a function with processXXX gives very very little information on what it is actually doing. Use removeXXX, like you did with the RemoveLeadingWhitespace function.
The processComment function does an unnecessary copy and assignment. Use s.erase(index, string::npos); (npos is default, but this is more obvious).
It's not clear what your program does, but you might consider storing a different file format (like html or xml) if you need to post-process your files like this. That would depend on the goal.
using find_first_of('#') may give you some false positives. If it's present in quotes, it's not necessarily indicating a comment. (But again, this depends on your file format)
using find_first_of(c) with one character can be simplified to find(c).
The processNewlineEscapes function duplicates some functionality from the readRecipe function. You may consider refactoring to something like this:
-
string s_buffer;
string s_line;
while (getline(is_recipe, s_line)) {
// Sanitize the raw line.
removeLeadingWhitespace(s_line);
removeComments(s_line);
// Skip empty lines.
if (s_line.empty()) continue;
// Add the raw line to the buffer.
s_buffer += s_line;
// Collect buffer across all escaped lines.
if (*s_line.rbegin() == '\\') continue;
// This line is not escaped, now I can process the buffer.
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
I'm not big on methods that modify the parameters. Why not return strings rather than modifying the input arguments? For example:
string RecipeReader::processComment(const string &s)
{
size_t index = s.find_first_of("#");
return s_string.substr(0, index);
}
I personally feel this clarifies intent and makes it more obvious what the method is doing.
I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.
A few comments:
If s_buffer contains the file name to be opened, it should have a better name like s_filename.
The s_buffer member variable should not be reused to store temporary data from reading the file. A local variable in the function would do as well, no need for the buffer to be a member variable.
If there is not need to have the filename stored as a member variable it could just be passed as a parameter to readRecipe()
processNewlineEscapes() should check that the found backslash is at the end of the line before appending the next line. At the moment any backslash at any position triggers adding of the next line at the position of the backslash. Also, if there are several backslashes, find_last_of() would probably easier to use than find_first_of().
When checking the result of find_first_of() in processNewlineEscapes() and removeLeadingWhitespace() it would be cleaner to compare against string::npos to check if anything was found.
The logic at the end of removeLeadingWhitespace() could be simplified:
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index != s_string.npos)
s_string = s_string.substr(sz_index);
else // "empty" lines with only whitespace
s_string.clear();
You might wish to have a look at Boost.String. It's a simple collection of algorithms to work with streams, and notably features trim methods :)
Now, on to the review itself:
Don't bother to remove the hungarian notation, if it's your style then use it, however you should try and improve the names of methods and variables. processXXX is definitely not indicating anything useful...
Functionally, I am worried about your assumptions: the main issue here is that you do not care for espace sequences (\n uses a backslash for example) and you do not worry for the presence of strings of charachters: std::cout << "Process #" << pid << std::endl; would yield an invalid line because of your "comment" preprocessing
Furthermore, since you remove the comments before processing the newline escapes:
i = 3; # comment \
running comment
will be parsed as
i = 3; running comment
which is syntactically incorrect.
From an interface point of view: there is not benefit in having the methods being class members here, you don't need an instance of RecipeReader really...
And finally, I find it awkward that two methods would read from the stream.
Little peeve of mine: returning by const value does not serve any purpose.
Here is my own version, as I believe than showing is easier than discussing:
// header file
std::vector<std::string> readRecipe(const std::string& fileName);
std::string extractLine(std::ifstream& file);
std::pair<std:string,bool> removeNewlineEscape(const std::string& line);
std::string removeComment(const std::string& line);
// source file
#include <boost/algorithm/string.hpp>
std::vector<std::string> readRecipe(const std::string& fileName)
{
std::vector<std::string> result;
ifstream file(fileName.c_str());
if (!file) std::cout << "Could not open: " << fileName << std::endl;
std::string line = extractLine(file);
while(!line.empty())
{
result.push_back(line);
line = extractLine(file);
} // looping on the lines
return result;
} // readRecipe
std::string extractLine(std::ifstream& file)
{
std::string line, buffer;
while(getline(file, buffer))
{
std::pair<std::string,bool> r = removeNewlineEscape(buffer);
line += boost::trim_left_copy(r.first); // remove leading whitespace
// based on the current locale
if (!r.second) break;
line += " "; // as we append, we insert a whitespace
// in order unintended token concatenation
}
return removeComment(line);
} // extractLine
//< Returns the line, minus the '\' character
//< if it was the last significant one
//< Returns a boolean indicating whether or not the line continue
//< (true if it's necessary to concatenate with the next line)
std::pair<std:string,bool> removeNewlineEscape(const std::string& line)
{
std::pair<std::string,bool> result;
result.second = false;
size_t pos = line.find_last_not_of(" \t");
if (std::string::npos != pos && line[pos] == '\')
{
result.second = true;
--pos; // we don't want to have this '\' character in the string
}
result.first = line.substr(0, pos);
return result;
} // checkNewlineEscape
//< The main difficulty here is NOT to confuse a # inside a string
//< with a # signalling a comment
//< assuming strings are contained within "", let's roll
std::string removeComment(const std::string& line)
{
size_t pos = line.find_first_of("\"#");
while(std::string::npos != pos)
{
if (line[pos] == '"')
{
// We have detected the beginning of a string, we move pos to its end
// beware of the tricky presence of a '\' right before '"'...
pos = line.find_first_of("\"", pos+1);
while (std::string::npos != pos && line[pos-1] == '\')
pos = line.find_first_of("\"", pos+1);
}
else // line[pos] == '#'
{
// We have found the comment marker in a significant position
break;
}
pos = line.find_first_of("\"#", pos+1);
} // looking for comment marker
return line.substr(0, pos);
} // removeComment
It is fairly inefficient (but I trust the compiler for optmizations), but I believe it behaves correctly though it's untested so take it with a grain of salt. I have focused mainly on solving the functional issues, the naming convention I follow is different from yours but I don't think it should matter.
I want to point out a small and sweet version which lacks \ support but skips whitespace-lines and comments. (Note the std::ws in the call to std::getline.
#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::stringstream input(
" # blub\n"
"# foo bar\n"
" foo# foo bar\n"
"bar\n"
);
std::string line;
while (std::getline(input >> std::ws, line)) {
line.erase(std::find(line.begin(), line.end(), '#'), line.end());
if (line.empty()) {
continue;
}
std::cout << "line: \"" << line << "\"\n";
}
}
Output:
line: "foo"
line: "bar"