Char data type in C/C++ - c++

I am trying to call a C++ DLL in Java. In its C++ head file, there are following lines:
#define a '102001'
#define b '102002'
#define c '202001'
#define d '202002'
What kind of data type are for a, b, c, and d? are they char or char array? and what are the correpsonding data type in Java that I should convert to?

As Mysticial pointed out, these are multicharacter literals. Their type is implementation-dependent, but it's probably Java long, because they use 48 bits.
In Java, you need to convert them to long manually:
static long toMulticharConst(String s) {
long res = 0;
for (char c : s.toCharArray()) {
res <<= 8;
res |= ((long)c) & 0xFF;
}
return res;
}
final long a = toMulticharConst("102001");
final long b = toMulticharConst("102002");
final long c = toMulticharConst("202001");
final long d = toMulticharConst("202002");

I might try to answer the first two questions. Being not familiar with java, I have to leave the last question to others.
Single and double quotes mean very different things in C. A character enclosed in single quotes is just the same as the integer representing it in the collating sequence(e.g. in ASCII implementation, 'a' means exactly the same as 97).
However, a string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is 0.
Because an integer is always large enough to hold several characters, some implementations of C compilers allow multiple characters in a character constant as well as a string constant, which means that writing 'abc' instead of "abc" may well go undetected. Yet, "abc" means a pointer points to a array containing 4 characters (a,b,c,and \0) while the meaning of 'abc' is platform-dependent. Many of the C compiler take it to mean "an integer that is composed somehow of the values of the characters a,b,and c.
For more informations, you might read the chapter 1.4 of the book "C traps and pitfalls"

Related

characters in c and c++

What I ask about is(is my understanding totally true ?)
char x='b'; (x has one value can be represented with two ways one of them is int and the other is char in the two languages ( C and C++))
in C
if I let the compiler choose it will choose (Will give priority to) the int value because the standard of the C makes character literal as int (give priority to int not char) like in this code
int main(){
printf("%d", sizeof('b'));
return 0;}
output is: 4 (int size) that is meaning the compiler treat b as 98 first and get the size it as int
but I can use the char value if I choose like in this code
int main(){
char x = 'b';
printf("%c", x);
return 0;}
output is: b (char)
in C ++
if I let the compiler choose it will choose (Will give priority to) the char value because the standard of the C++ makes character literal as char (give priority to char not int) like in this code
int main(){
printf("%d", sizeof('b'));
return 0;}
output is:1 (char size) that is meaning the compiler treat b as char
but I can use the int value if I choose like in this code
int main(){
char x = 'b';
printf("%d", x);
return 0;}
output is 98 (98 is the int which represents b in the ASCII Code )
Character constants, this: 'A', are of type int in C and of type char in C++. The compiler picks the type specified by the respective language standard.
Declared character variables of type char are always char and 1 byte large (typically 8 bits on non-exotic systems).
printf("%c", some_char); is a variadic function (accepts any number of parameters) and those have special implicit type promotion rules. Something called default argument promotion will integer-promote the passed character variable to int. Read about integer promotion here: Implicit type promotion rules.
printf expects that promotion to happen, so %c will mean that parameter is converted back to char according to the internals of printf. This holds true in C++ as well, though stdio.h should be avoided.
printf("%d", x); In case x is a char this would pedantically be undefined behavior. But in practice the above mentioned integer promotion is likely to occur and so it prints an integer. Also note that there's nothing magic with char as such, they are just 1 byte large integers. So it already has the value 98 before conversion.
Some of the things you've said are true, some are mistaken. Let's go through them in turn.
char x='b'; (x has two values one of them is int and the other is char in the two languages ( C and C++))
Well, no, not really. When you say char x you get a variable generally capable of holding one character, and this is perfectly, equally true in both C and C++. The only difference, as we'll see, is the type (not the value) of the character constant 'b'.
in C if I let the compiler choose
I'm not sure what you mean by "choose", because there's not really any choice in the matter.
it will choose (Will give priority to) the int value like in this code
printf("%d", sizeof('b'));
output is: 4 (int size) that is meaning the compiler change b to 98 first and get the size it as int
Not exactly. You got 4 because, yes, the type of a character constant like 'b' is int. In ASCII, the character constant 'b' has the value 98 no matter what. The compiler didn't change a character to an int here, and the value didn't change from 'b' to 98 here. It was an int all along (because that's the type of character constants in C), and it had the value 98 all along (because that's the value of the letter b in ASCII).
but I can use the char value if I choose like in this code
`printf("%c", x);
Right. But there's nothing magical about that. Consider:
char c1 = 'b';
int i1 = 'b';
char c2 = 98;
int i2 = 98;
printf("%c %c %c %c\n", c1, i1, c2, i2);
printf("%d %d %d %d\n", c1, i1, c2, i2);
This prints
b b b b
98 98 98 98
You can print an int, or a char, using %c, and you'll get the character with that value.
You can print an int, or a char, using %d, and you'll get a numeric value.
(More on this later.)
in C ++ if I let the compiler choose it will choose (Will give priority to) the char value like in this code
printf("%d", sizeof('b'));
output is:1 (char size) that is meaning the compiler did not change b to int
What you're seeing is one of the big differences between C and C++: character constants like 'b' are type int in C, but type char in C++.
(Why is it this way? I'll speculate on that later.)
but I can use the int value if I choose like in this code
char x = 'b';
printf("%d", x);
output is 98 (98 is the int which represents b in the ASCII Code )
Right. And this would work exactly the same in C and C++. (Also my earlier printfs of c1, i1, c2, and i2 would work exactly the same in C and C++.)
So why are the types of character constants different? I'm not sure, but I believe it's like this:
C likes to promote everything to int. (Incidentally, that's why we were able to pass characters straight to printf and print them using %d: the characters all get promoted to int before being passed to printf, so it works just fine.) So there would be no point having character constants of type char, because any time you used one, for anything, it would get promoted to int. So character constants might as well start out being int.
In C++, on the other hand, the type of things matters more. And you might have two overloaded functions, f(int) and f(char), and if you called f('b'), clearly you want the version of f() called that accepts a char. So in C++ there was a reason, a good reason, to have character constants be type char, just like it looks like they are.
Addendum:
The fundamental issue that you're asking about here, that we've been kind of dancing around in this answer and these comments, is that in C (as in most languages) there are several forms of constant, that let you write constants in forms that are convenient and meaningful to you. For any different form of constant you can write, there are several things of interest, but most importantly what is the actual value? and what is the type?.
It may be easier to show this by example. Here is a rather large number of ways of representing the constant value 98 in a C or C++ program:
Form of constant
base
type
value
98
10
int
98
0142
8
unsigned
98
0x62
16
unsigned
98
98.
10
double
98
9.8e1
10
double
98
98.f
10
float
98
9.8e1f
10
float
98
98L
10
long
98
98U
10
unsigned
98
'b'
ASCII
int/char
98
This table is not even complete; there are more ways than these to write constants in C and C++.
But with one exception, every row in this table is equally true of both C and C++, except the last row. In C, the type of a character constant is int. In C++, the type of a character constant is char.
The type of a constant determines what happens when a constant appears in a larger expression. In that respect the type of a constant functions analogously to the type of a variable. If I write
int a = 1, b = 3;
double c = a / b;
it doesn't work right, because the rule in C is that when you divide an int by an int, you get truncating integer division. But the point is that the type of an operand directly determines the meaning of an expression. So the type of a constant becomes very interesting, too, as seen by the different behavior of these two lines:
double c2 = 1 / 3;
double c3 = 1. / 3;
Similarly, it can make a difference whether a constant has type int or type char. An expression that depends on whether a character constant has type int or type char will behave slightly differently in C versus C++. (In practice, pretty much the only difference that can be easily seen concerns sizeof.)
For completeness, it may be useful to look at the several other forms of character constant, in the same framework:
Form of constant
base
type
value
'b'
ASCII
int/char
98
'\142'
8
int/char
98
'\x62'
16
int/char
98

C++ Turning Character types into int type

So I read and was taught that subtracting '0' from my given character turns it into an int, however my Visual Studio isn't recognizing that here, saying a value of type "const char*" cannot be used to initialize an entity of type int in C++ programming here.
bigint::bigint(const char* number) : bigint() {
int number1 = number - '0'; // error code
for (int i = 0; number1 != 0 ; ++i)
{
digits[i] = number1 % 10;
number1 /= 10;
digits[i] = number1;
}
}
The goal of the first half is to simply turn the given number into a type int. The second half is outputting that number backwards with no leading zeroes. Please note this function is apart of the class declared given in a header file here:
class bigint {
public:
static const int MAX_DIGITS = 50;
private:
int digits[MAX_DIGITS];
public:
// constructors
bigint();
bigint(int number);
bigint(const char * number);
}
Is there any way to convert the char parameter to an int so I can then output an int? Without using the std library or strlen, since I know there is a way to use the '0' char but I can't seem to be doing it right.
You can turn a single character in the range '0'..'9' into a single digit 0..9 by subtracting '0', but you cannot turn a string of characters into a number by subtracting '0'. You need a parsing function like std::stoi() to do the conversion work character-by-character.
But that's not what you need here. If you convert the string to a number, you then have to take the number apart. The string is already in pieces, so:
bigint::bigint(const char* number) : bigint() {
while (number) // keep looping until we hit the string's null terminator
{
digits[i] = number - '0'; // store the digit for the current character
number++; // advance the string to the next character
}
}
There could be some extra work involved in a more advanced version, such as sizing digits appropriately to fit the number of digits in number. Currently we have no way to know how many slots are actually in use in digits, and this will lead to problems later when the program has to figure out where to stop reading digits.
I don't know what your understanding is, so I will go over everything I see in the code snippet.
First, what you're passing to the function is a pointer to a char, with const keyword making the char immutable or "read only" if you prefer.
A char is actually a 8-bit sized 1 integer. It can store a numerical value in binary form, which can be also interpreted as a character.
Fundamental types - cppreference.com
Standard also expects char to be a "type for character representation". It could be represented in ASCII code, but it could be something else like EBCDIC maybe, I'm not sure. For future reference just remember that ASCII is not guaranteed, although you're likely to never use a system where it's no ASCII (if I'm correct). But it's not so much that char is somehow enforcing encoding - it's the functions that you pass those chars and char pointers to, that interpret their content as characters in ASCII encoding, while on some obscure or legacy platforms they could actually interpret them as characters in some less common encoding. Standard however demands that encoding used has this property: codes for characters '0' to '9' are subsequent, and thus '9' - '0' means: subtract code of '0' from code of '9'. The result is 9, because code for '9' is 9 positions from code for '0' in ASCII. Ranges 'a'-'z' and 'A'-'Z' have this quality as well, in case you need that, but it's a little bit trickier if your input is in base higher than 10, like a popular base of 16 called hexadecimal.
A pointer stores an address, so the most basic functionality for it is to "point" to a variable. But it can be used in various ways, one of which, very frequent in C, is to store address of the beginning of an array of variables of the same type. Those could be chars. We could interpret such an array as a line of text, or a string (a concept, not to be confused with C++ specific string class).
Since a pointer does not contain information on length or end of such an array, we need to get that information across to the function we pass the pointer to. Sometimes we can just provide the length, sometimes we provide the end pointer. When dealing with "lines of text" or c-style strings, we use (and c standard library functions expect) what is callled a null-terminated string. In such a string, the first char after the last one used for a line is a null, which is, to simplify, basically a 0. A 0, but not a '0'.
So what you're passing to the function, and what you interpret as, say 416, is actually a pointer to a place in memory where '4' is econded and stored as a number, followed by '1' and then '6', taking up three bytes. And depending on how you obtained this line of text, '6' is probably followed by a NULL, that is - a zero.
NULL - cppreference.com
Conversion of such a string to a number first requires a data type able to hold it. In case of 416 it could be anything from short upwards. If you wanted to do that on your own, you would need to iterate over entire line of text and add the numbers multiplied by proper powers of 10, take care of signedness too and maybe check if there are any edge cases. You could however use a standard function like int atoi (const char * str);
atoi - cplusplus.com
Now, that would be nice of course, but you're trying to work with "bigints". However you define them, it means your class' purpose is to deal with numbers to big to be stored in built-in types. So there is no way you can convert them just like that.
What you're trying to do right now seems to be a constructor that creates a bigint out of number represented as a c style string. How shall I put it... you want to store your bigint internally as an array of it's digits in base 10 (a good choice for code simplicity, readability and maintainability, as well as interoperation with base 10 textual representation, but it doesn't make efficient use of memory and processing power.) and your input is also an array of digits in base 10, except internally you're storing numbers as numbers, while your input is encoded characters. You need to:
sanitize the input (you need criteria for what kind of input is acceptable, fe. if there can be any leading or trailing whitespace, can the number be followed by any non-numerical characters to be discarded, how to represent signedness, is + for positive numbers optional or forbidden etc., throw exception if the input is invalid.
convert whatever standard you enforce for your input into whatever uniform standard you employ internally, fe. strip leading whitespace, remove + sign if it's optional and you don't use it internally etc.
when you know which positions in your internal array correspond with which positions in the input string, you can iterate over it and copy every number, decoding it first from ASCII.
A side note - I can't be sure as to what exactly it is that you expect your input to be, because it's only likely that it is a textual representation - as it could just as easily be an array of unencoded chars. Of course it's obviously the former, which I know because of your post, but the function prototype (the line with return type and argument types) does not assure anyone about that. Just another thing to be aware of.
Hope this answer helped you understand what is happening there.
PS. I cannot emphasize strongly enough that the biggest problem with your code is that even if this line worked:
int number1 = number - '0'; // error code
You'd be trying to store a number on the order of 10^50 into a variable capable of holding on the order of 10^9
The crucial part in this problem, which I have a vague feeling you may have found on spoj.com is that you're handling BIGints. Integers too big to be stored in a trivial manner.
1 ) The standard does not actually require for char to be this size directly, but indirectly it requires for it to be at least 8 bits, possibly more on obscure platforms. And yes, I think there were some platforms where it was indeed over 8 bits. Same thing with pointers that may behave strange on obscure architectures.

Why is parameter to isdigit integer?

The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) smells like the function was inherited from C, but even that does not explain why the parameter type is int not char while at the same time...
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Is there any technical reason why isdigitstakes an int not a char?
The reaons is to allow EOF as input. And EOF is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).
Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.
But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.
The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}

What is union doing to help translate a byte array into a different type such as a word in this c++ code?

I am looking at some c++ code and I want to find out what union is doing to help translate a byte array into, well a different type such as a word. At least that is what I think is going on. Truly what I want is to figure out the purpose for this code, but I think I understand some of it.
My research has brought me bits and pieces of understanding, but I am not confident that I see the big picture correctly.
So lets say I have a union defined as:
typedef union _BYTE_TO_WORD {
BYTE b[2];
WORD w;
short s;
} BYTE_TO_WORD;
Note that the byte here is 8 bits and the Word is an unsigned short and both shorts (signed and unsigned) are 16 bits.
then what happens if in the main code I have a struct....
byte[] data = someData;
struct TWO_WORDS {
_BYTE_TO_WORD word1;
_BYTE_TO_WORD word2;
}*theWordsIWant = (struct TWO_WORDS*)&data;
I think that the code above takes two bytes of data and put it into word1 and then the next two bytes of data are put into word2. With all the information about unions and structs out there, I can't seem to pin down a search that explains this code. If I am wrong here, please tell me.
So if I am right about that, then what in word1 or word2 has the value. So my research says that word1 would have a byte array in it, since it can only hold one value.
The translation must be another part of the code (that I haven't found yet) where we do this (assuming I could cast the byte to a WORD):
theWordsIWant.w = (WORD)theWordsIWant.b;
So then the bonus question is, why go to all this trouble with the union, when you could simply cast it as a different variable?
WORD w = (WORD)theWordsIWant.b;
Perhaps what is really going on is that the code will "cast a pointer to anything" as one answer here suggests (How to convert from byte array to word array in c).
I am pretty sure I am missing something, either in the motivation for doing this, or the way it works. But then again, maybe I actually understand it after all? I don't know. You tell me.
This statement:
theWordsIWant.w = (WORD)theWordsIWant.b;
will not have the effect of loading two bytes from b and making them into a word. Since b is an array, the expression theWordsIWant.b produces a pointer to the first element, a BYTE * pointer. Its value is the address of the two characters, and so you're converting the address of the bytes to type WORD, not the contents of the bytes themselves.
What the union saves you from doing (at the cost of portability) is, rather, this type of code:
WORD w = ((WORD) b[1] << 8) | b[0];
the union does it using logic that is very similar to this type of code:
WORD w = *(WORD *) b; // rather than: WORD w = (WORD) b;
That is: convert the pointer to the bytes to a WORD * pointer (pointer to WORD) and then dereference it to access both bytes simultaneously as a single WORD. What we are doing here is using pointer conversions to do type punning: we are creating an aliased view of b[0] and b[1] as if they were a single object of type WORD.
The union type in C and C++ does this declaratively. A union is like a struct, except that all the members are at offset 0: they overlap. The union has well-defined, portable behavior if we always access only that member which we last stored there. If we assign a value to w, and then access w, the behavior is uncontroversial. With unions, the possibility is that we can assign to members b[0] and b[1] and then retrieve w. The behavior is then "unspecified" (in C, as of the C99 standard).
In C++ uses of unions for type punning is not any more defined than using pointers for the same purpose; it is undefined behavior. Any aspect of whether such code works is thanks to the implementation.

return value of sizeof() operator in C & C++ [duplicate]

This question already has answers here:
Size of character ('a') in C/C++
(4 answers)
Closed 9 years ago.
#include<stdio.h>
int main()
{
printf("%d", sizeof('a'));
return 0;
}
Why does the above code produce different results when compiling in C and C++ ?
In C, it prints 4 while in C++, it is the more acceptable answer i.e. 1.
When I replace the 'a' inside sizeof() with a char variable declared in main function, the result is 1 in both cases!
Because, and this might be shocking, C and C++ are not the same language.
C defines character literals as having type int, while C++ considers them to have type char.
This is a case where multi-character constants can be useful:
const int foo = 'foo';
That will generate an integer whose value will probably be 6713199 or 7303014 depending on the byte-ordering and the compiler's mood. In other words, multiple-character character literals are not "portable", you cannot depend on the resulting value being easy to predict.
As commenters have pointed out (thanks!) this is valid in both C and C++, it seems C++ makes multi-character character literals a different type. Clever!
Also, as a minor note that I like to mention when on topic, note that sizeof is not a function and that values of size_t are not int. Thus:
printf("the size of a character is %zu\n", sizeof 'a');
or, if your compiler is too old not to support C99:
printf("the size of a character is %lu\n", (unsigned long) sizeof 'a');
represent the simplest and most correct way to print the sizes you're investigating.
In C, the 'a' is a character constant, which is treated as an integer, so you get a size of 4, whereas in C++ it's treated as a char.
possible duplicate question Size of character ('a') in C/C++
You example is one of the cases where C++ is not compatible with C. In C APIs that return a single character (like getch) return and int because this allows space for an EOF marker (-1). So for this reason it sees 'c' as an int.
C++ introduces operator and function overloading which means that we probably want to handle
cout << 'c'
differently to:
cout << 99