Why can't I read apostrophes using ifstream without it crashing? - c++

I'm using this code:
std::string word;
std::ifstream f((file_name + ".txt").c_str());
while (f >> word) {
good_input = true;
for (int i = 0; i < word.length(); ++i) {
if (ispunct(word.at(i))) {
word.erase(i--, 1);
}
else if (isupper(word.at(i))){
word.at(i) = tolower(word.at(i));
}
}
Every time I read the word "doesn't" from a text file, I get this error:
Debug Assertion Failed!
Program: directory\SortingWords(Length).exe
File: minkernel\crts\ucrt\src\appcrt\convert\istype.cpp
Line: 36
Expression: c >= -1 && c <= 255
For more information please visit... [etc.]
When I click "abort", my program exits with code 3. Don't know if that's helpful?
It looks like it's got something to do with the apostrophe maybe? This code works find for all other words in my document up until this one. And works great with documents that don't include apostrophes, yet they include plenty of other punctuation...
I tried changing the encoding of the text file (simply made with notepad), but that didn't help. Generally found lots of complaints about apostrophes but no working answers. Can anyone help me figure out what's going on?

As documentation for ispunct says:
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Visual C++ is nice enough to add an almost explicit message for this error if you link to the debug runtime (this is often the case with undefined behaviour - with the release runtime, it just crashes or behaves strangely; with the debug runtime, you get an error dialog box).
In theory, this means that in the character set used by your environment, ' is not representable as an unsigned char, i.e. its character code is too big or too low.
In practice, this seems very unlikely and perhaps even impossible on Windows. It is much more likely that your file doesn't really contain an apostrophe but a character that merely looks like one, e.g. an accent: ´
Here's how you can reproduce the problem in a simple manner:
#include <ctype.h>
int main()
{
ispunct('\'');
ispunct('´'); // undefined behaviour (crash or error message with Visual C++)
}
isupper has the same problem.
You can use those functions safely with static_cast, e.g.:
if (ispunct(static_cast<unsigned char>(word.at(i))))
Of course, now ispunct will return zero for the character. If you really need to cover ´, you have to do so explicitly, for example with a helper function like this:
bool extended_ispunct(int c)
{
return static_cast<unsigned char>(c) || c == '´';
}

Related

isdigit() function pass a Chinese parameter

When I try using the isdigit() function with a Chinese character, it reports an assert in Visual Studio 2013 in Debug mode, but there is no problem in Release mode.
I think if this function is to determine whether the parameter is a digit, why does it not return 0 if the Chinese is wrong?
This is my code:
string testString = "abcdefg12345中文";
int count = 0;
for (const auto &c : testString) {
if (isdigit(c)) {
++count;
}
}
and this is the assert :
You broke the contract of isdigit(int), which expects only ASCII characters in the range stated.
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
Your standard library implementation is being kind and asserting, rather than going on to blow stuff up.
There is an alternative, locale-aware isdigit(charT ch, const locale&) that you may be able to use here.
I suggest performing some further research on how "characters" work in computers, particularly with regards to encoding more "exotic"1 character sets.
1 From the perspective of computer history. Of course, to you, it is the less exotic alternative!
The isdigit() and related functions / macros in <ctypes.h> expect an int converted from an unsigned char, or EOF, which on most systems means a value in the range 0-255 (or -1 for EOF). So any value not in the range -1…255 is incorrect.
Problem 1: You are passing in a char, which on your system has range -128…+127. Solution to this problem is simple:
if (isdigit(static_cast<unsigned char>(c)))
This won't crash, however, it's not quite correct for Chinese characters.
Problem 2: Non-ASCII characters should probably use iswdigit() instead. This will correctly handle Chinese characters:
wstring testString = L"abcdefg12345中文";
int count = 0;
for (const auto &c : testString) {
if (iswdigit(c)) {
++count;
}
}

How do you import c++ 11 into eclipse neon? My code is giving me errors and I heard that is the solution.

The if-statement is giving me an error and I don't know why. Is it possible something is not imported? The for loop is giving me a notice saying, "range-based loop for loop is a C++11 extension".
string line;
string temp = "";
string beginning_time;
void convertTimeintoInt(string beginning_time)
{
for(char a : beginning_time)
{
if(a == ":")
continue;
else
temp += a;
}
}
Your error happens because you are comparing char a to string ":", instead of char ':'. You are comparing apples and oranges here.
As for the C++11 warning, I don't know about Eclipse Neon but it seems strange to me that it would understand it enough to know what it is, but not actually support it. I'm guessing there is a switch somewhere you need to enable to get C++11 (or 14/17/...) support.

C++ toupper Syntax

I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."

C++ code with GCC optimisation causes core with invalid free() on strings

I have C++ code that is built with gcc (4.1.2) with -O2.
When this code is compiled and run with no optimisation, the program executes without any issue.
When compiled with O1/O2/O3, the code will crash with a valgrind indicating an invalid free.
This has been narrowed to the string variables inside the function.
The code will read in a file, and will iterate the contents.
I have removed all processing code, and the following code snippet causes the core...
int MyParser::iParseConfig(Config &inConfig)
{
bool keepGoing = true;
while(keepGoing)
{
string valueKey = "";
keepGoing = false;
}
return 0;
}
When this is run with non-optimised, it works fine.
When I build and run this optimised, it will not work.
It looks to be an issue with the way GCC optimises the string class.
Any ideas how we can circumvent this?
If you are overflowing the charIndex, (when i gets higher than 99) who knows what your program state is in... the storage you declare is not very big (2 chars and a null).
I cannot explain why exactly this code crashes for you when compiled with optimizations, perhaps i gets more than 2 digits and you have a buffer overflow, maybe it's something different, but anyway I would change the code:
sprintf(charIndex, "%d", i++);
string valueKey = "";
valueKey.append("Value").append(charIndex);
string value = inConfig.sFindField(valueKey);
like this:
stringstream ss;
ss << "Value" << i++;
string value(ss.str());
It is more C++-like and should work. Try it.
If you are curious if this is really a buffer overflow situation, insert the line:
assert(i < 99);
before the call to printf. Or use snprintf:
snprintf(charIndex, sizeof(charIndex), "%d", i++);
Or make your buffer bigger.
This was an issue with header files being incorrectly included - there was a duplicate include of the MyParser.h file in the list of includes.
This caused some strange scenario around the string optimisation within the GCC optimisation levels.

C++: cin.peek(), cin >> char, cin.get(char)

I've got this code with use of cin.peek() method. I noticed strange behaviour, when input to program looks like qwertyu$[Enter] everything works fine, but when it looks like qwerty[Enter]$ it works only when I type double dollar sign qwerty[Enter]$$. On the other hand when I use cin.get(char) everything works also fine.
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
char ch;
int count = 0;
while ( cin.peek() != '$' )
{
cin >> ch; //cin.get(ch);
count++;
}
cout << count << " liter(a/y)\n";
system("pause");
return 0;
}
//Input:
// qwerty$<Enter> It's ok
//////////////////////////
//qwerty<Enter>
//$ Doesn't work
/////////////////////////////
//qwerty<Enter>
//$$ works(?)
It's because your program won't get input from the console until the user presses the ENTER key (and then it won't see anything typed on the next line until ENTER is pressed again, and so on). This is normal behavior, there's nothing you can do about it. If you want more control, create a UI.
Honestly I don't think the currently accepted answer is that good.
Hmm looking at it again I think since, operator<< is a formatted input command, and get() a plain binary, the formatted version could be waiting for more input than one character to do some formatting magic.
I presume it is way more complicated than get() if you look what it can do. I think >> will hang until it is absolutely sure it read a char according to all the flags set, and then will return. Hence it can wait for more input than just one character. For example you can specify skipws.
It clearly would need to peek into more than once character of input to get a char from \t\t\t test.
I think get() is unaffected by such flags and will just extract a character from a string, that is why it is easier for get() to behave in non-blocking fashion.
The reason why consider the currently accepted answer wrong is because it states that the program will not get any input until [enter] or some other flush-like thing. In my opinion this is obviously not the case since get() version works. Why would it, if it did not get the input?
It probably still can block due to buffering, but I think it far less likely, and it is not the case in your example.