Why can I assign/compare int and char in C - c++

I have the code like this :
#include <stdio.h>
main()
{
int c;
c = getchar();
while (c != EOF) {
putchar(c);
c = getchar();
}
}
The C documentation says that the getchar() returns the int value. And in the above program we have assigned c type as an int. And most importantly EOF is a integer constant defined in the header function.
Now if the code changes to something like this:
#include <stdio.h>
main()
{
char c;
c = getchar();
while (c != EOF) {
putchar(c);
c = getchar();
}
}
This code also works! Wait a min, as per C documentation getchar() returnsint, but see in the above code I'm storing it in char. And C compiler doesn't throw any error. And also in while loop I have compared c which is an char with EOF which is an int and compiler doesn't throw any error and my program executes!
Why does the compiler doesn't throw any error in the above two cases?
Thanks in advance.

No. It simply means that the returned value which is an int, implicitly converts into char type. That is all.
The compiler may generate warning messages for such conversion, as sizeof(int) is greater than sizeof(char). For example, if you compile your code with -Wconversion option with GCC, it gives these warning messages:
c.c:5:7: warning: conversion to 'char' from 'int' may alter its value
c.c:8:8: warning: conversion to 'char' from 'int' may alter its value
That means, you should use int to avoid such warning messages.

I'm afraid that the term "dynamic programming language" is too vaguely defined to make a such a fine distinction in this case.
Though I'd argue that implicit converting to one numeric type to another is not a dynamic language feature, but just syntax sugar.

No. Lets look at wikipedia's definition
These behaviors could include extension of the program, by adding new
code, by extending objects and definitions, or by modifying the type
system, all during program execution. These behaviors can be emulated
in nearly any language of sufficient complexity, but dynamic languages
provide direct tools to make use of them.
What you have demonstrated is that a char and int in C/C++ are pretty much the same, and C/C++ automatically casts between the two. Nothing more. There's no modification of the type system here.

Lets rewrite your code to illustrate what's going on
int main(int argc, char** argv)
{
char c;
c = EOF; /* supposing getchar() returns eof */
return (c == EOF) ? 0 : 1;
}
What should the return value of this program be? EOF is not a char, but you cast it to a char. When you do a comparison, that cast happens again, and it gets squashed to the same value. Another way of rewriting this to make it clear what's going on is:
#include <stdio.h>
main()
{
int c;
c = getchar();
while ((char)c != (char)EOF) {
putchar((char)c);
c = getchar();
}
}
EOF is getting squashed; it doesn't matter how it's getting squashed, it could be squashed to the letter 'M', but since it gets squashed the same way every time, you still see it as EOF.

C will let you do what you want, but you have to accept the consequences. If you want to assign an int to a char then you can. Doesn't make it a dynamic language.
(As an aside, the title of this question should be something like "why does c let me assign an int to a char?" and just contain the final paragraph. But presumably that wouldn't attract enough attention. If it had attracted any upvotes then I'd edit the title, but since it isn't I'll leave it as an example of how not to ask a question.)

If you are not reading 7-bit ASCII data, it is possible that \xFF is valid data. If EOF is \xFFFFFFFF and you return the value of getchar() to a char then you can not distinguish EOF from \xFF as data.

Related

I have problem understanding the std::cin.peek() and std::cin.get()

I am so confused about the usage of std::cin.peek(), std::cin.get() I mean these versions returning an int.
As I've read on C++ primer we should never assign the return from these function to a char:
char c = std::cin.get(); // erroneous
But on Cppreference it does it in the example and in many websites and many programmers including me until I discovered the logic of not doing so.
https://en.cppreference.com/w/cpp/io/basic_istream/get
And I see also such usage checking for a new line:
while(std::cin.peek() != '\n' )
;// do something
In the above code I think it is OK because in fact there is no assignment from int to char but a comparison in which the newline character '\n' is promoted first to int then compared with int which I think is not evil.
If the code is OK then what is the point in using std::char_traits<char>::to_int_type() function?
So there is no way to assign any value returned from peek() and get() to a char?'
I've seen some recommended code like:
char c;
while(std::cin.peek() != std::char_traits<char>::to_int_type('\n')){
std::cin.get(c);
std::cout.put(c);
}
So what is the difference between implicit conversion of '\n' to int and using this trait function?

C++ getting length of char array using a second function

I'm trying to get the length of a character array in a second function. I've looked at a few questions on here (1 2) but they don't answer my particular question (although I'm sure something does, I just can't find it). My code is below, but I get the error "invalid conversion from 'char' to 'const char*'". I don't know how to convert my array to what is needed.
#include <cstring>
#include <iostream>
int ValidInput(char, char);
int main() {
char user_input; // user input character
char character_array[26];
int valid_guess;
valid_guess = ValidGuess(user_input, character_array);
// another function to do stuff with valid_guess output
return 0;
}
int ValidGuess (char user_guess, char previous_guesses) {
for (int index = 0; index < strlen(previous_guesses); index++) {
if (user_guess == previous_guesses[index]) {
return 0; // invalid guess
}
}
return 1; // valid guess, reaches this if for loop is complete
}
Based on what I've done so far, I feel like I'm going to have a problem with previous_guesses[index] as well.
char user_input;
defines a single character
char character_array[26];
defines an array of 26 characters.
valid_guess = ValidGuess(user_input, character_array);
calls the function
int ValidGuess (char user_guess, char previous_guesses)
where char user_guess accepts a single character, lining up correctly with the user_input argument, and char previous_guesses accepts a single character, not the 26 characters of character_array. previous_guesses needs a different type to accommodate character_array. This be the cause of the reported error.
Where this gets tricky is character_array will decay to a pointer, so
int ValidGuess (char user_guess, char previous_guesses)
could be changed to
int ValidGuess (char user_guess, char * previous_guesses)
or
int ValidGuess (char user_guess, char previous_guesses[])
both ultimately mean the same thing.
Now for where things get REALLY tricky. When an array decays to a pointer it loses how big it is. The asker has gotten around this problem, kudos, with strlen which computes the length, but this needs a bit of extra help. strlen zips through an array, counting until it finds a null terminator, and there are no signs of character_array being null terminated. This is bad. Without knowing where to stop strlen will probably keep going1. A quick solution to this is go back up to the definition of character_array and change it to
char character_array[26] = {};
to force all of the slots in the array to 0, which just happens to be the null character.
That gets the program back on its feet, but it could be better. Every call to strlen may recount (compilers are smart and could compute once per loop and store the value if it can prove the contents won't change) the characters in the string, but this is still at least one scan through every entry in character_array to see if it's null when what you really want to do is scan for user_input. Basically the program looks at every item in the array twice.
Instead, look for both the null terminator and user_input in the same loop.
int index = 0;
while (previous_guesses[index] != '\0' ) {
if (user_guess == previous_guesses[index]) {
return 0; // prefer returning false here. The intent is clearer
}
index++;
}
You can also wow your friends by using pointers and eliminating the need for the index variable.
while (*previous_guesses != '\0' ) {
if (user_guess == *previous_guesses) {
return false;
}
previous_guesses++;
}
The compiler knows and uses this trick too, so use the one that's easier for you to understand.
For 26 entries it probably doesn't matter, but if you really want to get fancy, or have a lot more than 26 possibilities, use a std::set or a std::unordered_set. They allow only one of an item and have much faster look-up than scanning a list one by one, so long as the list is large enough to get over the added complexity of a set and take advantage of its smarter logic. ValidGuess is replaced with something like
if (used.find(user_input) != used.end())
Side note: Don't forget to make the user read a value into user_input before the program uses it. I've also left out how to store the previous inputs because the question does as well.
1 I say probably because the Standard doesn't say what to do. This is called Undefined Behaviour. C++ is littered with the stuff. Undefined Behaviour can do anything -- work, not work, visibly not work, look like it works until it doesn't, melt your computer, anything -- but what it usually does is the easiest and fastest thing. In this case that's just keep going until the program crashes or finds a null.

C++ toupper Syntax

I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."

How does this C++ code work?

enum STR2INT_ERROR { SUCCESS, OVERFLOW, UNDERFLOW, INCONVERTIBLE };
STR2INT_ERROR str2int (int &i, char const *s, int base = 0)
{
char *end;
long l;
errno = 0;
l = strtol(s, &end, base);
if ((errno == ERANGE && l == LONG_MAX) || l > INT_MAX) {
return OVERFLOW;
}
if ((errno == ERANGE && l == LONG_MIN) || l < INT_MIN) {
return UNDERFLOW;
}
if (*s == '\0' || *end != '\0') {
return INCONVERTIBLE;
}
i = l;
return SUCCESS;
}
I'm trying to write a program that can parse strings read in from a file into integer values. While looking for a method to do this I found this piece of code above on a stackoverflow post:
How to parse a string to an int in C++?
However, I can't understand how it works.
Specifically, why is the programmer checking if errno == ERANGE if errno is assigned to 0? (is ERANGE a special value? )
secondly, what does "char const *s" - in the arguments list- mean?
PS: I'm not very experienced when it comes to C++ programming.
The code is using strtol() to do the parsing. This is a standard C library function. You can find documentation on strtol() here amongst other places:
strtol() man page on die.net
The errno variable is a special global variable defined by the standard C library. If a function encounters an error it is set to an error code. So while errno is assigned zero at the start of the routine, the strtol() function will assign a new value to errno if it encounters an error. The following if-statements are checking for the overflow and underflow error conditions.
The char const *s parameter is the string to be parsed. Its a pointer to a constant (read-only) string of characters. By convention strings are terminated by a NULL byte.
Whenever I have done string to int conversions in C++ I used the atoi method. There should be plenty of examples online that suit what you want to do
Most of the specialness here is with errno, not the values being compared to.
errno is a global that's used by some (especially older) library functions to signal errors. You assign 0 to it (which implicitly means there's no problem). Then, if it runs into a problem, a library function can assign some non-zero value to it to tell you want went wrong.
After calling the library function, you then typically check 1) whether it's now non-zero, and 2) if so, what value it has. Based on the value that's been assigned, you can react to the type of error that arose.
I should add, however, that many uses of errno are mostly non-portable. The C standard says that errno exists, that no library function assigns 0 to errno, but not a lot more more than that. It does not specify what non-zero values any particular function may assign to it (well, it specifies some non-zero values that some functions assign, but doesn't limit assignments to those values or those functions).
First of all, this is clearly a C program in C++ disguise.
strtol is a function from standard C library, which does the actual work. Its doumentation may be accessed there: http://linux.die.net/man/3/strtol
All other things are just preliminaries and checks.
errno is a special global variable from the C library which may be modified by standard functions in order to set an appropriate error code (yes, it's C legacy and this is not thread-safe). Its value may be set to values defined in standard header "errno.h".
errno is a library-provided global variable that strtol (as well as other library functions) uses to indicate error conditions. In the above code strtol could change errno after the user set it to 0. ERANGE is indeed a named constant provided by the standard library, which stands for some special value used by strtol to indicate out-of-range errors.
Your char const *s question is too vague. What specifically do you not understand in it? The const part means that the user code inside str2int will not be allowed to modify the string pointed by s. The compiler will do its best to prevent any modifying (or potentially modifying) operations on string pointed by s.

printf with std::string?

My understanding is that string is a member of the std namespace, so why does the following occur?
#include <iostream>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString);
cin.get();
return 0;
}
Each time the program runs, myString prints a seemingly random string of 3 characters, such as in the output above.
C++23 Update
We now finally have std::print as a way to use std::format for output directly:
#include <print>
#include <string>
int main() {
// ...
std::print("Follow this command: {}", myString);
// ...
}
This combines the best of both approaches.
Original Answer
It's compiling because printf isn't type safe, since it uses variable arguments in the C sense1. printf has no option for std::string, only a C-style string. Using something else in place of what it expects definitely won't give you the results you want. It's actually undefined behaviour, so anything at all could happen.
The easiest way to fix this, since you're using C++, is printing it normally with std::cout, since std::string supports that through operator overloading:
std::cout << "Follow this command: " << myString;
If, for some reason, you need to extract the C-style string, you can use the c_str() method of std::string to get a const char * that is null-terminated. Using your example:
#include <iostream>
#include <string>
#include <stdio.h>
int main()
{
using namespace std;
string myString = "Press ENTER to quit program!";
cout << "Come up and C++ me some time." << endl;
printf("Follow this command: %s", myString.c_str()); //note the use of c_str
cin.get();
return 0;
}
If you want a function that is like printf, but type safe, look into variadic templates (C++11, supported on all major compilers as of MSVC12). You can find an example of one here. There's nothing I know of implemented like that in the standard library, but there might be in Boost, specifically boost::format.
[1]: This means that you can pass any number of arguments, but the function relies on you to tell it the number and types of those arguments. In the case of printf, that means a string with encoded type information like %d meaning int. If you lie about the type or number, the function has no standard way of knowing, although some compilers have the ability to check and give warnings when you lie.
Please don't use printf("%s", your_string.c_str());
Use cout << your_string; instead. Short, simple and typesafe. In fact, when you're writing C++, you generally want to avoid printf entirely -- it's a leftover from C that's rarely needed or useful in C++.
As to why you should use cout instead of printf, the reasons are numerous. Here's a sampling of a few of the most obvious:
As the question shows, printf isn't type-safe. If the type you pass differs from that given in the conversion specifier, printf will try to use whatever it finds on the stack as if it were the specified type, giving undefined behavior. Some compilers can warn about this under some circumstances, but some compilers can't/won't at all, and none can under all circumstances.
printf isn't extensible. You can only pass primitive types to it. The set of conversion specifiers it understands is hard-coded in its implementation, and there's no way for you to add more/others. Most well-written C++ should use these types primarily to implement types oriented toward the problem being solved.
It makes decent formatting much more difficult. For an obvious example, when you're printing numbers for people to read, you typically want to insert thousands separators every few digits. The exact number of digits and the characters used as separators varies, but cout has that covered as well. For example:
std::locale loc("");
std::cout.imbue(loc);
std::cout << 123456.78;
The nameless locale (the "") picks a locale based on the user's configuration. Therefore, on my machine (configured for US English) this prints out as 123,456.78. For somebody who has their computer configured for (say) Germany, it would print out something like 123.456,78. For somebody with it configured for India, it would print out as 1,23,456.78 (and of course there are many others). With printf I get exactly one result: 123456.78. It is consistent, but it's consistently wrong for everybody everywhere. Essentially the only way to work around it is to do the formatting separately, then pass the result as a string to printf, because printf itself simply will not do the job correctly.
Although they're quite compact, printf format strings can be quite unreadable. Even among C programmers who use printf virtually every day, I'd guess at least 99% would need to look things up to be sure what the # in %#x means, and how that differs from what the # in %#f means (and yes, they mean entirely different things).
use myString.c_str() if you want a c-like string (const char*) to use with printf
thanks
Use std::printf and c_str()
example:
std::printf("Follow this command: %s", myString.c_str());
You can use snprinft to determine the number of characters needed and allocate a buffer of the right size.
int length = std::snprintf(nullptr, 0, "There can only be %i\n", 1 );
char* str = new char[length+1]; // one more character for null terminator
std::snprintf( str, length + 1, "There can only be %i\n", 1 );
std::string cppstr( str );
delete[] str;
This is a minor adaption of an example on cppreference.com
printf accepts a variable number of arguments. Those can only have Plain Old Data (POD) types. Code that passes anything other than POD to printf only compiles because the compiler assumes you got your format right. %s means that the respective argument is supposed to be a pointer to a char. In your case it is an std::string not const char*. printf does not know it because the argument type goes lost and is supposed to be restored from the format parameter. When turning that std::string argument into const char* the resulting pointer will point to some irrelevant region of memory instead of your desired C string. For that reason your code prints out gibberish.
While printf is an excellent choice for printing out formatted text, (especially if you intend to have padding), it can be dangerous if you haven't enabled compiler warnings. Always enable warnings because then mistakes like this are easily avoidable. There is no reason to use the clumsy std::cout mechanism if the printf family can do the same task in a much faster and prettier way. Just make sure you have enabled all warnings (-Wall -Wextra) and you will be good. In case you use your own custom printf implementation you should declare it with the __attribute__ mechanism that enables the compiler to check the format string against the parameters provided.
The main reason is probably that a C++ string is a struct that includes a current-length value, not just the address of a sequence of chars terminated by a 0 byte. Printf and its relatives expect to find such a sequence, not a struct, and therefore get confused by C++ strings.
Speaking for myself, I believe that printf has a place that can't easily be filled by C++ syntactic features, just as table structures in html have a place that can't easily be filled by divs. As Dykstra wrote later about the goto, he didn't intend to start a religion and was really only arguing against using it as a kludge to make up for poorly-designed code.
It would be quite nice if the GNU project would add the printf family to their g++ extensions.
Printf is actually pretty good to use if size matters. Meaning if you are running a program where memory is an issue, then printf is actually a very good and under rater solution. Cout essentially shifts bits over to make room for the string, while printf just takes in some sort of parameters and prints it to the screen. If you were to compile a simple hello world program, printf would be able to compile it in less than 60, 000 bits as opposed to cout, it would take over 1 million bits to compile.
For your situation, id suggest using cout simply because it is much more convenient to use. Although, I would argue that printf is something good to know.
Here’s a generic way of doing it.
#include <string>
#include <stdio.h>
auto print_helper(auto const & t){
return t;
}
auto print_helper(std::string const & s){
return s.c_str();
}
std::string four(){
return "four";
}
template<class ... Args>
void print(char const * fmt, Args&& ...args){
printf(fmt, print_helper(args) ...);
}
int main(){
std::string one {"one"};
char const * three = "three";
print("%c %d %s %s, %s five", 'c', 3+4, one + " two", three, four());
}