I'm reading over a C++ class for parsing CSV files in one of my programming books for class. I primarily write in C# for work and don't interact with C++ code very often. One of the functions, getline, uses an uninitialized char variable and I'm confused as to whether it's a typo or not.
// getline: get one line, grow as needed
int Csv::getline(string& str)
{
char c;
for (line = ""; fin.get(c) && !endofline(c); )
line += c;
split();
str = line;
return !fin.eof();
}
fin is an istream. The documentation I'm reading shows the get (char& c); function being passed a reference, but which char in the stream is returned? What's the initial value of c?
The initial value of c is undefined but it does not matter what the initial value of c is since the call to get will set the value. Since there is a sequence point after the left hand side of the || and && operators we know that all the side effects of get will have been effected and endofline will see the modified value of c.
Related
I am so confused about the usage of std::cin.peek(), std::cin.get() I mean these versions returning an int.
As I've read on C++ primer we should never assign the return from these function to a char:
char c = std::cin.get(); // erroneous
But on Cppreference it does it in the example and in many websites and many programmers including me until I discovered the logic of not doing so.
https://en.cppreference.com/w/cpp/io/basic_istream/get
And I see also such usage checking for a new line:
while(std::cin.peek() != '\n' )
;// do something
In the above code I think it is OK because in fact there is no assignment from int to char but a comparison in which the newline character '\n' is promoted first to int then compared with int which I think is not evil.
If the code is OK then what is the point in using std::char_traits<char>::to_int_type() function?
So there is no way to assign any value returned from peek() and get() to a char?'
I've seen some recommended code like:
char c;
while(std::cin.peek() != std::char_traits<char>::to_int_type('\n')){
std::cin.get(c);
std::cout.put(c);
}
So what is the difference between implicit conversion of '\n' to int and using this trait function?
I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."
In c++, there are several ways of taking input. What is the difference between the two following cases?
char x;
x=cin.get();
/* The above code can be a one-liner */
vs
char x;
cin.get(x);
In this case they're the same (in terms of behavior of get and the character extracted from the stream). From documentation:
std::istream::get
int get();
istream& get (char& c);
"Extracts a single character from the stream.
The character is either returned (first signature), or set as the value of its argument (second signature)."
There basically is no difference between these two cases
cin.get(void) returns a char from the input stream
cin.get(char &c) returns void but sets the passed variable to a char read from the input stream
If you are really picky about memory, cin.get(char &c) will maybe save you 1 allocation for a char being a return type, however I would prefer use the one-liner char x = cin.get();
This code is a part of a larger code that indexes files, and tokenizes the words in each file so that you can be able to search a certain word in the large amount of file you have. (like Google)
This function is supposed to search your files for a word that you want to find. But I don't completely understand how it works!
Can someone please explain what this code does and how it does it?
In addition, I have several questions:
1) What exactly in "infile"?
2) What does the built-in function c_str() do?
3) Why does the variable "currentlineno" start at 1? Couldn't the first line in a file start at 0?
4) What is the difference between ++x and x++?
5) What is the difference between the condition "currentlineno < lineNumber" and "currentlineno != lineNumber" ?
This is the code:
void DisplayResult(string fileName, int lineNumber)
{
ifstream infile(fileName.c_str(), ifstream::in);
char line[1000];
int currentlineno = 1;
while(currentlineno < lineNumber)
{
infile.getline(line, 1000);
++currentlineno;
}
infile.getline(line, 1000);
cout<<endl<<"\nResult from ("<<fileName<<" ), line #"<<lineNumber<<": "<<endl;
cout<<"\t"<<line;
infile.close();
}
This function display the line at the corresponding line number pass by parameter.
1/ Infile permits to open a file as in put streams : http://www.cplusplus.com/reference/fstream/ifstream/
2/ c_str() permits to pass to a string structure to a simple char* (a char array). It is the structure use in the language C, which explains why the method name is "c_str". In C++, we usually use string more than char* cause it is really simpler.
3/ Why currentlineno start at 1 ? The function read the file content before the given line number. The, read one more time to display the wanted line.
4/ ++x is pre-incrementation, x++ is post-incrementation.
When you use ++x, x is incremented before to use it, otherwise, with x++, x is incremented after.
int x = 1;
cout << ++x; // display 2
x = 1;
cout << x++; // display 1
5/ Look at operators : http://www.cplusplus.com/doc/tutorial/operators/
1) What exactly in "infile"?
ANS:: Construct object and optionally open file. Link
2) What does the built-in function c_str() do?
ANS:: It is needed to get a const char* representation of the text stored
inside a std::string class. Link
3) Why does the variable "currentlineno" start at 1? Couldn't the first line in a file start at 0?
ANS:: Depends on the second input parameter of the function DisplayResult.
4) What is the difference between ++x and x++?
ANS:: See this. Probably you may have heard of Post-Increment and Pre-Increment.
5) What is the difference between the condition "currentlineno < lineNumber" and "currentlineno != lineNumber" ?
ANS:: Value of currentlineno should not exceed the value of lineNumber when condition is currentlineno < lineNumber. Value of currentlineno may exceed or may be less than the value of lineNumber but should not be equal to the value of lineNumber when condition is currentlineno != lineNumber.
This function does not search for words.
It takes as input a file name and a line number. It tries to find and read that line.
The output starts with a line stating: "The result from (fileName ), line #lineNumber: "
It is followed by a text indented by a tab and followed by the found line contents. This second line of output is left incomplete (not followed by a newline).
The found contents is empty, if the file has has less than the requested number of lines or if any of the lines before the requested line has more than 999 characters.
If the requested line has more than 999 characters it is truncated to 999 characters.
Other questions:
1) infile is a function-scope object of automatic storage duration and type std::basic_ifstream<char, std::char_traits<char>>, which is initialized for reading from the file named in fileName.
2) The member function c_str() built into the standard library string class returns a pointer to the string contents as a non-modifiable, nul-terminated character array, which is the format typically used in C for strings (type const char *). For historical reasons the file-based standard library streams take their file name arguments in this format.
3) Humans typically count line numbers starting with one. That is the convention used for the lineNumber parameter. The algorithm used must match this. The currentlineno local variable is used to mean 'the number of the next line to be read'. As such it must be initialized with 1. (This is somewhat confusing, considering the name of the variable.) Other implementations that initialize the line counter with 0 are possible - and indeed natural to most C++ programmers.
4) See any textbook or online reference of C++. Look for "pre-increment" (++x) and "post-increment" (x++) operators. They have the same side effect (increment x), but differ in the value of the expression. If you don't use the result they are equivalent (for basic types).
C++ programmers usually prefer pre-increment as it can generally be implemented more efficiently for user-defined types.
5) Even more basic textbook question. a < b tests for a less-than relationship, a != b tests for inequality.
Note: All answers assume that the types used are from the standard C++ library, i.e that appropriate includes of the <string> and <iostream> headers and necessary using directives or declarations are used.
I am taking a line of input which is separated by a space and trying to read the data into two integer variables.
for instance: "0 1" should give child1 == 0, child2 == 1.
The code I'm using is as follows:
int separator = input.find(' ');
const char* child1_str = input.substr(0, separator).c_str(); // Everything is as expected here.
const char* child2_str = input.substr(
separator+1, //Start with the next char after the separator
input.length()-(separator+1) // And work to the end of the input string.
).c_str(); // But now child1_str is showing the same location in memory as child2_str!
int child1 = atoi(child1_str);
int child2 = atoi(child2_str); // and thus are both of these getting assigned the integer '1'.
// do work
What's happening is perplexing me to no end. I'm monitoring the sequence with the Eclipse debugger (gdb). When the function starts, child1_str and child2_str are shown to have different memory locations (as they should). After splitting the string at separator and getting the first value, child1_str holds '0' as expected.
However, the next line, which assigns a value to child2_str not only assigns the correct value to child2_str, but also overwrites child1_str. I don't even mean the character value is overwritten, I mean that the debugger shows child1_str and child2_str to share the same location in memory.
What the what?
1) Yes, I'll be happy to listen to other suggestions to convert a string to an int -- this was how I learned to do it a long time ago, and I've never had a problem with it, so never needed to change, however:
2) Even if there's a better way to perform the conversion, I would still like to know what's going on here! This is my ultimate question. So even if you come up with a better algorithm, the selected answer will be the one that helps me understand why my algorithm fails.
3) Yes, I know that std::string is C++ and const char* is standard C. atoi requires a c string. I'm tagging this as C++ because the input will absolutely be coming as a std::string from the framework I am using.
First, the superior solutions.
In C++11 you can use the newfangled std::stoi function:
int child1 = std::stoi(input.substr(0, separator));
Failing that, you can use boost::lexical_cast:
int child1 = boost::lexical_cast<int>(input.substr(0, separator));
Now, an explanation.
input.substr(0, separator) creates a temporary std::string object that dies at the semicolon. Calling c_str() on that temporary object gives you a pointer that is only valid as long as the temporary lives. This means that, on the next line, the pointer is already invalid. Dereferencing that pointer has undefined behaviour. Then weird things happens, as is often the case with undefined behaviour.
The value returned by c_str() is invalid after the string is destructed. So when you run this line:
const char* child1_str = input.substr(0, separator).c_str();
The substr function returns a temporary string. After the line is run, this temporary string is destructed and the child1_str pointer becomes invalid. Accessing that pointer results in undefined behavior.
What you should do is assign the result of substr to a local std::string variable. Then you can call c_str() on that variable, and the result will be valid until the variable is destructed (at the end of the block).
Others have already pointed out the problem with your current code. Here's how I'd do the conversion:
std::istringstream buffer(input);
buffer >> child1 >> child2;
Much simpler and more straightforward, not to mention considerably more flexible (e.g., it'll continue to work even if the input has a tab or two spaces between the numbers).
input.substr returns a temporary std::string. Since you are not saving it anywhere, it gets destroyed. Anything that happens afterwards depends solely on your luck.
I recommend using an istringstream.