how to define my own special character in cout

how to define my own special character in cout - c++

for example :
cout << " hello\n400";
will print:
hello
400
another example:
cout << " hello\r400";
will print:
400ello
there is a option to define my own special character?
i would like to make somthing like:
cout << " hello\d400";
would give:
hello
400
(/d is my special character, and i already got the function to get the stdout cursor one line down(cursorDown()),but i just don't how to define a special character that each time will be writted will call to my cursorDown() function)

As said by others there is no way you can make cout understand user defined characters , however what you could do is
std::cout is an object of type std::ostream which overloads operator<<. You could create an object of the struct which parses your string for your special characters and other user defined characters before printing it to a file or console using ostream similar to any log stream.
Example
or
Instead of calling cout << "something\dsomething"
you can call a method special_cout(std::string); which parses the string for user defined characters and executes the calls.

There is no way to define "new" special characters.
But you can make the stream interpret specific characters to have new meanings (that you can define). You can do this using locals.
Some things to note:
The characters in the string "xyza" is just a simple way of encoding a string. Escaped characters are C++ way of allowing you to represent representing characters that are not visible but are well defined. Have a look at an ASCII table and you will see that all characters in the range 00 -> 31 (decimal) have special meanings (often referred to as control characters).
See Here: http://www.asciitable.com/
You can place any character into a string by using the escape sequence to specify its exact value; i.e. \x0A used in a string puts the "New Line" character in the string.
The more commonly used "control characters" have shorthand versions (defined by the C++ language). '\n' => '\x0A' but you can not add new special shorthand characters as this is just a convenience supply by the language (its like a tradition that most languages support).
But given a character can you give it a special meaning in an IO stream. YES. You need to define a facet for a locale then apply that locale to the stream.
Note: Now there is a problem with applying locals to std::cin/std::out. If the stream has already been used (in any way) applying a local may fail and the OS may do stuff with the stream before you reach main() and thus applying a locale to std::cin/std::cout may fail (but you can do it to file and string streams easily).
So how do we do it.
Lets use "Vertical Tab" as the character we want to change the meaning of. I pick this as there is a shortcut for it \v (so its shorter to type than \x0B) and usually has no meaning for terminals.
Lets define the meaning as new line and indent 3 spaces.
#include <locale>
#include <algorithm>
#include <iostream>
#include <fstream>
class IndentFacet: public std::codecvt<char,char,std::mbstate_t>
{
public:
explicit IndentFacet(size_t ref = 0): std::codecvt<char,char,std::mbstate_t>(ref) {}
typedef std::codecvt_base::result result;
typedef std::codecvt<char,char,std::mbstate_t> parent;
typedef parent::intern_type intern_type;
typedef parent::extern_type extern_type;
typedef parent::state_type state_type;
protected:
virtual result do_out(state_type& tabNeeded,
const intern_type* rStart, const intern_type* rEnd, const intern_type*& rNewStart,
extern_type* wStart, extern_type* wEnd, extern_type*& wNewStart) const
{
result res = std::codecvt_base::ok;
for(;(rStart < rEnd) && (wStart < wEnd);++rStart,++wStart)
{
if (*rStart == '\v')
{
if (wEnd - wStart < 4)
{
// We do not have enough space to convert the '\v`
// So stop converting and a subsequent call should do it.
res = std::codecvt_base::partial;
break;
}
// if we find the special character add a new line and three spaces
wStart[0] = '\n';
wStart[1] = ' ';
wStart[2] = ' ';
wStart[3] = ' ';
// Note we do +1 in the for() loop
wStart += 3;
}
else
{
// Otherwise just copy the character.
*wStart = *rStart;
}
}
// Update the read and write points.
rNewStart = rStart;
wNewStart = wStart;
// return the appropriate result.
return res;
}
// Override so the do_out() virtual function is called.
virtual bool do_always_noconv() const throw()
{
return false; // Sometime we add extra tabs
}
};
Some code that uses the locale.
int main()
{
std::ios::sync_with_stdio(false);
/* Imbue std::cout before it is used */
std::cout.imbue(std::locale(std::locale::classic(), new IndentFacet()));
// Notice the use of '\v' after the first lien
std::cout << "Line 1\vLine 2\nLine 3\n";
/* You must imbue a file stream before it is opened. */
std::ofstream data;
data.imbue(std::locale(std::locale::classic(), new IndentFacet()));
data.open("PLOP");
// Notice the use of '\v' after the first lien
data << "Loki\vUses Locale\nTo do something silly\n";
}
The output:
> ./a.out
Line 1
Line 2
Line 3
> cat PLOP
Loki
Uses Locale
To do something silly
BUT
Now writing all this is not really worth it. If you want a fixed indent like that us a named variable that has those specific characters in it. It makes your code slightly more verbose but does the trick.
#include <string>
#include <iostream>
std::string const newLineWithIndent = "\n ";
int main()
{
std::cout << " hello" << newLineWithIndent << "400";
}

Related

Garbled character output when using cout API

I am trying to run this simple code in VS 2015
#include "stdafx.h"
# include <iostream>
int main()
{
char * szOldPath = "\"C:\icm\scripts\StartupSync\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
However, the output of szOldPath is not proper and the console is printing--
unall.bat" nonprodupSync
I suspect this might be because of Unicode and I should be using wcout. So I disabled Unicode by going to Configuration Properties -> General --> Character Set and tried setting it to Not Set or Multi Byte. But still running into this issue.
I understand it is not good to disable UNICODE but I am trying to understand some legacy code written in our company and this experiment is a part of this exercise,is there any way I can get the cout command to print szOldPath successfully?

Your issue has nothing to do with Unicode.
\r is the escape sequence for a carriage return. So, you are printing out "C:\icm\scripts\StartupSync, and then \r tells the terminal to move the cursor back to the beginning of the current line, and then unall.bat" nonprod is printed, overwriting what was already there.
You need to escape all of the \ characters in your string literal, just like you had to escape the " characters.
Also, your variable needs to be declared as a pointer to const char when assigning a string literal to the pointer. This is enforced in C++11 and later:
#include "stdafx.h"
#include <iostream>
int main()
{
const char * szOldPath = "\"C:\\icm\\scripts\\StartupSync\\runall.bat\" nonprod";
std::cout << szOldPath << std::endl;
return 0;
}
Alternatively, in C++11 and later, you can use a raw string literal instead to avoid having to escape any characters with a leading \:
const char * szOldPath = R"("C:\icm\scripts\StartupSync\runall.bat" nonprod)";

Why a "no matching function" error for call by reference with literal number?

The problem asks to create a program that asks the user to enter some text and that text will be surrounded by asterisks depending on the width of the screen for example if the user inputs "Hello world" the output should be:
****************
* Hello World! *
****************
I've tried to create the functions but I'm stuck becaus of a compiler error with the shown minimal code.
Question: Why does it tell me no matching function for within_width(text, 80)?
Some of the code I have is below:
#include <iostream>
#include <string>
void display_header (std::string &header) {
std::string text;
header = text;
}
bool within_width (std::string& text, unsigned short int& max_width) {
}
int main() {
std::string text;
std::cout << "Please enter header text: ";
std::getline(std::cin, text);
if (within_width(text, 80)) {
// call the display_header function and pass in the text
// inputted by the user
} else {
std::cout << text;
}
return 0;
}

This declaration of the function
bool within_width (std::string& text, unsigned short int& max_width)
asks for an unsigned short int variable, because it has a reference parameter, see the second &.
To satisfy it, you need to put the value 80 into a variable and give the variable as parameter.
unsigned short int MyWidth=80;
if (within_width(text, MyWidth))
Alternatively (but I assume you are not allowed) you can use a call by value parameter
bool within_width (std::string& text, unsigned short int max_width)
Then you could call as shown.

I won't give a full answer to the exercise here, just some clues.
the display_header() and within_width() functions need to know the string given in parameters but may not modify it ; thus the type of this parameter should be const std::string & (the const was missing).
the second parameter of the within_width() function is just an integer that will be compared to the length of the string ; you don't need to pass it by reference (or at least const), rather by value. Here, the (non-const) reference prevents from passing the literal constant 80.
(it seems to be the main concern of the question after edition)
You need to reason step by step.
all of this depends on the size of the string (12 for Hello World!) ; this information is available via size(text) (or text.size())
(https://en.cppreference.com/w/cpp/iterator/size)
(https://en.cppreference.com/w/cpp/string/basic_string/size)
This size will have to be compared to max_width
Displaying the line with header will require 4 more characters because * will be prepended and * will be appended.
Thus the two surrounding lines will have the length size(header)+4 too.
In order to create such a string made of *, you could use a constructor of std::string taking two parameters : the count of characters and the character to be repeated.
(https://en.cppreference.com/w/cpp/string/basic_string/basic_string)
Send all of this to std::cout in the correct order.

Edit: Just noticing that this answer probably goes far beyond the scope of the task you have been given (just filling in some skeleton that has been provided by your teacher).
I'll still leave it here to illustrate what could be done with arbitrary input. Maybe you want to experiment a little further than what you have been asked...
bool within_width(...)
Pretty simple: string.length() <= max – just wait a second, you need to consider asterisks and spaces at beginning and end of output, so: max - 4
But you can do better, you can split the string, best at word boundaries. That's a bit difficult more difficult, though:
std::vector<std::string> lines;
// we'll be starting with an initially empty line:
auto lineBegin = text.begin();
auto lineEnd = text.begin();
for(auto i = text.begin(); i != text.end(); ++)
// stop condition empty: we'll stop from inside the loop...
{
// ok, we need to find next whitespace...
// we might try using text.find_first_of("..."), but then we
// need to know any whitespace characters ourselves, so I personally
// would rather iterate manually and use isspace function to determine;
// advantage: we can do other checks at the same time, too
auto distance = std::distance(lineBegin, i);
if(std::distance(lineBegin, i) > maxLineLength)
{
if(lineEnd == lineBegin)
{
// OK, now we have a problem: the word itself is too long
// decide yourself, do you want to cut the word somewhere in the
// middle (you even might implement syllable division...)
// or just refuse to print (i. e. throw an exception you catch
// elsewhere) - decide yourself...
}
else
{
lines.emplace_back(lineBegin, lineEnd);
lineBegin = lineEnd; // start next line...
}
}
// OK, now handle current character appropriately
// note: no else: we need to handle the character in ANY case,
// if we terminated the previous line or not
if(std::isspace(static_cast<unsigned char>(*i)))
{
lineEnd = i;
}
// otherwise, we're inside a word and just go on
}
// last line hasn't been added!
lines.emplace_back(lineBegin, lineEnd);
Now you can calculate maximum length over all the strings contained. Best: Do this right when adding a new line to the vector, then you don't need a separate loop...
You might have noticed that I didn't remove whitespace at the end of the strings, so you wouldn't need to add you own one, apart, possibly, from the very last string (so you might add a lines.back() += ' ';).
The ugly part, so far, is that I left multiple subsequent whitespace. Best is removing before splitting into lines, but be aware that you need to leave at least one. So:
auto end = text.begin();
bool isInWord = false; // will remove leading whitespace, if there is
for(auto c : text)
{
if(std::isspace(static_cast<unsigned char>(c)))
{
if(isInWord)
{
*end++ = ' '; // add a single space
isInWord = false;
}
}
else
{
*end++ = c;
isInWord = true;
}
}
This would have moved all words towards the beginning of the string, but we yet to drop the surplus part of the string yet contained:
text.erase(end, text.end());
Fine, the rest is pretty simple:
iterate over maximum length, printing a single asterisk in every loop
iterate over all of your strings in the vector: std::cout << "* " << line << "*\n";
repeat the initial loop to print second line of asterisks
Finally: You introduced a fix line limit of 80 characters. If console is larger, you just won't be using the entire available width, which yet might be acceptable, if it is smaller, you will get lines broken at the wrong places.
You now could (but that's optional) try to detect the width of the console – which has been asked before, so I won't go any deeper into.
Final note: The code presented above is untested, so no guarantee to be bugfree!

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.

It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1

As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";

Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.

EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

Properly handle escape sequences in strings from argv in C++

I'm writing a larger program that takes arguments from the command line after the executable. Some of the arguments are expected to be passed after the equals sign of an option. For instance, the output to the log is a comma separated vector by default, but if the user wants to change the separator to a period or something else instead of a comma, they might give the argument as:
./main --separator="."
This works fine, but if a user wants the delimiter be a special character (for example: tab), they might expect to pass the escape sequence in one of the following ways:
./main --separator="\t"
./main --separator='\t'
./main --separator=\t
It doesn't behave the way I want it to (to interpret \t as a tab) and instead prints out the string as written (sans quotes, and with no quotes it just prints 't'). I've tried using double slashes, but I think I might just be approaching this incorrectly and I'm not sure how to even ask the question properly (I tried searching).
I've recreated the issue in a dummy example here:
#include <string>
#include <iostream>
#include <cstdio>
// Pull the string value after the equals sign
std::string get_option( std::string input );
// Verify that the input is a valid option
bool is_valid_option( std::string input );
int main ( int argc, char** argv )
{
if ( argc != 2 )
{
std::cerr << "Takes exactly two arguments. You gave " << argc << "." << std::endl;
exit( -1 );
}
// Convert from char* to string
std::string arg ( argv[1] );
if ( !is_valid_option( arg ) )
{
std::cerr << "Argument " << arg << " is not a valid option of the form --<argument>=<option>." << std::endl;
exit( -2 );
}
std::cout << "You entered: " << arg << std::endl;
std::cout << "The option you wanted to use is: " << get_option( arg ) << "." << std::endl;
return 0;
}
std::string get_option( std::string input )
{
int index = input.find( '=' );
std::string opt = input.substr( index + 1 ); // We want everything after the '='
return opt;
}
bool is_valid_option( std::string input )
{
int equals_index = input.find('=');
return ( equals_index != std::string::npos && equals_index < input.length() - 1 );
}
I compile like this:
g++ -std=c++11 dummy.cpp -o dummy
With the following commands, it produces the following outputs.
With double quotes:
/dummy --option="\t"
You entered: --option=\t
The option you wanted to use is: \t.
With single quotes:
./dummy --option='\t'
You entered: --option=\t
The option you wanted to use is: \t.
With no quotes:
./dummy --option=\t
You entered: --option=t
The option you wanted to use is: t.
My question is: Is there a way to specify that it should interpret the substring \t as a tab character (or other escape sequences) rather than the string literal "\t"? I could parse it manually, but I'm trying to avoid re-inventing the wheel when I might just be missing something small.
Thank you very much for your time and answers. This is something so simple that it's been driving me crazy that I'm not sure how to fix it quickly and simply.

The escape sequences are already parsed from the shell you use, and are passed to your command line parameters array argv accordingly.
As you noticed only the quoted versions will enable you to detect that a "\\t" string was parsed and passed to your main().
Since most shells may just skip a real TAB character as a whitespace, you'll never see it in your command line arguments.
But as mentioned it's mainly a problem of how the shell interprets the command line, and what's left going to your program call arguments, than how to handle it with c++ or c.
My question is: Is there a way to specify that it should interpret the substring \t as a tab character (or other escape sequences) rather than the string literal "\t"? I could parse it manually, but I'm trying to avoid re-inventing the wheel when I might just be missing something small.
You actually need to scan for a string literal
"\\t"
within the c++ code.

Can I use 2 or more delimiters in C++ function getline? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 4 years ago.
I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:
The program reads a text file... each line is goning to be like:
New Your, Paris, 100
CityA, CityB, 200
I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..
Though, how could I use comma and \n as a delimiter?
By the way,I'm manipulating string type,not char, so strtok is not possible :/
some scratch:
string line;
ifstream file("text.txt");
if(file.is_open())
while(!file.eof()){
getline(file, line);
// here I need to get each string before comma and \n
}

You can read a line using std::getline, then pass the line to a std::stringstream and read the comma separated values off it
string line;
ifstream file("text.txt");
if(file.is_open()){
while(getline(file, line)){ // get a whole line
std::stringstream ss(line);
while(getline(ss, line, ',')){
// You now have separate entites here
}
}

No, std::getline() only accepts a single character, to override the default delimiter. std::getline() does not have an option for multiple alternate delimiters.
The correct way to parse this kind of input is to use the default std::getline() to read the entire line into a std::string, then construct a std::istringstream, and then parse it further, into comma-separate values.
However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.

Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.
An alternative to this is to tokenize like strtok does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.
Someone relying on the C++ std library usually ends up using a std::stringstream, along with std::getline to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.
But zneak brought up std::string::find_first_of, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of, find_first_not_of, and more, which seem to exist for the sole purpose of parsing strings. But std::string stops short of providing useful tokenizing functions.
Another option is the <regex> library, which can do anything you want, but it is new and you will need to get used to its syntax.
But, with very little effort, you can leverage existing functions in std::string to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to() is the tokenizing function and tokenize demonstrates how it is used.
The code in this example will be slower than strtok, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start argument in std::string member functions, and never alter the source string. And even better techniques exist, no doubt.
To understand the example's code, start at the bottom, where main() is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.
#include <iostream>
#include <string>
#include <utility>
namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
auto space_is_it = [] (char c) {
// A few asks:
// * Suppress criticism WRT localization concerns
// * Avoid jumping to conclusions! And seeing monsters everywhere!
// Things like...ah! Believing "thoughts" that assumptions were made
// regarding character encoding.
// * If an obvious, portable alternative exists within the C++ Standard Library,
// you will see it in 2.0, so no new defect tickets, please.
// * Go ahead and ignore the rumor that using lambdas just to get
// local function definitions is "cheap" or "dumb" or "ignorant."
// That's the latest round of FUD from...*mumble*.
return c > '\0' && c <= ' ';
};
for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
if(!space_is_it(*rit)) {
if(rit != str.rbegin()) {
str.erase(&*rit - &*str.begin() + 1);
}
for(auto fit=str.begin(); fit != str.end(); ++fit) {
if(!space_is_it(*fit)) {
if(fit != str.begin()) {
str.erase(str.begin(), fit);
}
return;
} } } }
str.clear();
}
// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one
// from a set of delimiters. All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
const auto pos = str.find_first_of(std::forward<D>(delimiters));
if(pos == std::string::npos) {
// When none of the delimiters are present,
// clear the string and return its last value.
// This effectively makes the end of a string an
// implied delimiter.
// This behavior is convenient for parsers which
// consume chunks of a string, looping until
// the string is empty.
// Without this feature, it would be possible to
// continue looping forever, when an iteration
// leaves the string unchanged, usually caused by
// a syntax error in the source string.
// So the implied end-of-string delimiter takes
// away the caller's burden of anticipating and
// handling the range of possible errors.
found_delimiter = '\0';
std::string result;
std::swap(result, str);
trim(result);
return result;
}
found_delimiter = str[pos];
auto left = str.substr(0, pos);
trim(left);
str.erase(0, pos + 1);
return left;
}
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
char discarded_delimiter;
return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}
inline std::string pad_right(const std::string& str,
std::string::size_type min_length,
char pad_char=' ')
{
if(str.length() >= min_length ) return str;
return str + std::string(min_length - str.length(), pad_char);
}
inline void tokenize(std::string source) {
std::cout << source << "\n\n";
bool quote_opened = false;
while(!source.empty()) {
// If we just encountered an open-quote, only include the quote character
// in the delimiter set, so that a quoted token may contain any of the
// other delimiters.
const char* delimiter_set = quote_opened ? "'" : ",'{}";
char delimiter;
auto token = get_to(source, delimiter_set, delimiter);
quote_opened = delimiter == '\'' && !quote_opened;
std::cout << " " << pad_right('[' + token + ']', 16)
<< " " << delimiter << '\n';
}
std::cout << '\n';
}
}
int main() {
string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}
This outputs:
{1.5, null, 88, 'hi, {there}!'}
[] {
[1.5] ,
[null] ,
[88] ,
[] '
[hi, {there}!] '
[] }

I don't think that's how you should attack the problem (even if you could do it); instead:
Use what you have to read in each line
Then split up that line by the commas to get the pieces that you want.
If strtok will do the job for #2, you can always convert your string into a char array.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to define my own special character in cout - c++

Related

Garbled character output when using cout API

Why a "no matching function" error for call by reference with literal number?

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

Properly handle escape sequences in strings from argv in C++

Can I use 2 or more delimiters in C++ function getline? [duplicate]

Categories

Resources