I know basics of casting in C++—or I thought I knew.
Yesterday, I was trying to convert an 8-element uint_8 type array to a 2-element uint_32 type array.
I cast my values to 32bit format and while trying to display them into 32bit format, the computer gives me their address not their values.. You can see where I became confused about this code from comment part.
int main()
{
uint8_t info[8];
info[0] = '2';
info[1] = '0';
info[2] = '2';
info[3] = '0';
info[4] = '0';
info[5] = '0';
info[6] = '0';
info[7] = '0';
uint32_t *divided = (uint32_t*)&info[0];
uint32_t *dividedTwo = (uint32_t*)&info[4];
std::cout << "Address of info " << &info << std::endl; //Output 0x7ffdd25cabe0 as expected.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
}
Many of your print statements cause undefined behaviour or don't make sense.
std::cout << "Address of info " << &info << std::endl; //Output 0x7ffdd25cabe0 as expected.
This line prints the address of your array. As you see, you get some pointer value out.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
This line prints the contents of your array as a C string, and causes undefined behaviour since your string is not null terminated.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
This line prints the same pointer as in #1, just with a different type.
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
Same pointer, same string as in #2. Same undefined behaviour.
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
This line prints the second half of your string - same undefined behaviour.
What you actually mean to print is likely something along these lines (in C to take the implicit type handling of iostream out of the example):
#include <inttypes.h>
printf("%p\n", (void *)info); // address of the array
printf("%" PRIx64 "\n", *(uint64_t *)info); // entire 64-bit value
printf("%" PRIx32 "\n", *divided); // first 32 bits of the 64-bit array
printf("%" PRIx32 "\n", *dividedTwo); // second 32 bits of the 64-bit array
Note that you are filling your array with char literals (e.g. '2'), not integer literals - you may want to fix that to make the output clearer for yourself.
Watch out for potential alignment problems with this type of casting - it's not (strictly speaking) legit.
std::cout << "Expected value of info " << (uint8_t*)info<< std::endl; //Output 20200000 as expected.
uint8_t is not only an integer type, but also a character type. It is an alias of unsigned char. When you insert a pointer to character type into a character stream, the behaviour is to treat it as a null terminated character string. The null termination is a pre-coondition and lack of null termination results in undefined behaviour.
Your array is not null terminated. Therefore the behaviour of the program is undefined.
std::cout << "But why this return the true value? " << (uint8_t*)divided << std::endl; //Output 20200000 but why 8bit returns the true value instead of 32bit casting?
std::cout << "Same here, my value was type of 32... " << (uint8_t*)dividedTwo << std::endl; //Output 0000
Both of these are the same. Attempts to print a non-null terminated strings resulting in undefined behaviour.
std::cout << "Expected value of divided " << (uint32_t*)divided << std::endl; //Output 0x7ffdd25cabe0 not as my expected. What is the reason?
uint32_t is not a character type. Pointers to types other than character types are treated differently. Instead of printing a null terminated character string, the address of the pointed object is printed. In this case the address happens to be 0x7ffdd25cabe0. It is unclear what you expected instead.
Note that attempting to access the pointed object through the reinterpreted divided and dividedTwo pointers would result in undefined behaviour because no object of such type exist at the pointed address.
Is there any better solution to convert an 8-element uint_8 type array to a 2-element uint_32 type array instead of "shifting (<< >> etc.)"?
Shifting is usually the best way because it can be used to produce the same output regardless of the byte endianness of the CPU, and is therefore portable and can be used for communication between separate systems over the network or transfer of files.
Other, non-shifting ways to convert produce output depending on the endianness, so they cannot be used for example in communication between different systems. But, here is a correct example of how to do that:
uint8_t info8 [8] = ...;
uint32_t info32[sizeof info8 / sizeof(uint32_t)];
std::memcpy(info32, info8, sizeof info32);
Related
The code successfully compiles it but I can't understand why, for certain values of number, the program crashes and for other values it doesn't. Could someone explain the behavior of adding a long int with a char* that the compiler uses?
#include <iostream>
int main()
{
long int number=255;
std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
number=15155;
std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
return 0;
}
Test results:
Value 1 : >
Value 2 : Segmentation fault
Note: I'm not looking for a solution on how to add a string with a number.
In C++, "" is a const char[1] array, which decays into a const char* pointer to the first element of the array (in this case, the string literal's '\0' nul terminator).
Adding an integer to a pointer performs pointer arithmetic, which will advance the memory address in the pointer by the specified number of elements of the type the pointer is declared as (in this case, char).
So, in your example, ... << ("" + number) << ... is equivalent to ... << &""[number] << ..., or more generically:
const char *ptr = &""[0];
ptr = reinterpret_cast<const char*>(
reinterpret_cast<const uintptr_t>(ptr)
+ (number * sizeof(char))
);
... << ptr << ...
Which means you are going out of bounds of the array when number is any value other than 0, thus your code has undefined behavior and anything could happen when operator<< tries to dereference the invalid pointer you give it.
Unlike in many scripting languages, ("" + number) is not the correct way to convert an integer to a string in C++. You need to use an explicit conversion function instead, such as std::to_string(), eg:
#include <iostream>
#include <string>
int main()
{
long int number = 255;
std::cout << "Value 1 : " << std::flush << std::to_string(number) << std::flush << std::endl;
number = 15155;
std::cout << "Value 2 : " << std::flush << std::to_string(number) << std::flush << std::endl;
return 0;
}
Or, you can simply let std::ostream::operator<< handle that conversion for you, eg:
#include <iostream>
int main()
{
long int number = 255;
std::cout<< "Value 1 : " << std::flush << number << std::flush << std::endl;
number = 15155;
std::cout<< "Value 2 : " << std::flush << number << std::flush << std::endl;
return 0;
}
Pointer arithmetic is the culprit.
A const char* is accepted by operator<<, but will not point to a valid memory address in your example.
If you switch on -Wall, you will see a compiler warning about that:
main.cpp: In function 'int main()':
main.cpp:6:59: warning: array subscript 255 is outside array bounds of 'const char [1]' [-Warray-bounds]
6 | std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
main.cpp:8:59: warning: array subscript 15155 is outside array bounds of 'const char [1]' [-Warray-bounds]
8 | std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
Value 1 : q
Live Demo
The following code:
#include<iostream>
int main (void) {
int lista[5] = {0,1,2,3,4};
std::cout << lista << std::endl;
std::cout << &lista << std::endl;
std::cout << lista+1 << std::endl;
std::cout << &lista+1 << std::endl;
std::cout << lista+2 << std::endl;
std::cout << &lista+2 << std::endl;
std::cout << lista+3 << std::endl;
std::cout << &lista+3 << std::endl;
return (0);
}
Outputs:
0x22ff20
0x22ff20
0x22ff24
0x22ff34
0x22ff28
0x22ff48
0x22ff2c
0x22ff5c
I understood that an array is another form to express a pointer, but we cannot change its address to point anywhere else after declaration. I also understood that an array has its value as the first position in memory. Therefore, 0x22ff20 in this example is the location of the array's starting position and the first variable is stored there.
What I did not understand is: why the other variables are not stored in sequence with the array address? I mean, why lista+1 is different from &lista+1. Should not they be the same?
In pointer arithmetic, types matter.
It's true that the value is the same for both lista and &lista, their types are different: lista (in the expression used in cout call) has type int* whereas &lista has type int (*)[5].
So when you add 1 to lista, it points to the "next" int. But &lista + 1 points to the location after 5 int's (which may not be a valid).
Answering the question as asked:
std::cout << &lista+1 << std::endl;
In this code you take the address of array lista and add 1 to obtained answer. Given the sizeof of the array is sizeof(int) * 5, which means when you increment a pointer to it by 1 you add sizeof(int) * 5 to the pointer address, you end up with a number you see.
I was trying to perform bitwise operations on a char array, as if it were an int, essentially treating bytes like a contiguous area in memory. The code below illustrates my problem.
char *cstr = new char[5];
std::strcpy(cstr, "abcd");
int *p = (int *)(void *)cstr;
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
std::cout << sizeof(*p) << "\n";
(*p)++;
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
The following output is produced:
1684234849 0x55f046e7de70
abcd 0x55f046e7de70
4
1684234850 0x55f046e7de70
bbcd 0x55f046e7de70
Quick explanation of the code and how it works (to my understanding):
I initialize cstr with "abcd"
char *cstr = new char[5];
std::strcpy(cstr, "abcd");
I point p to the address of cstr and specify that I want it to be an int
int *p = (int *)(void *)cstr;
I test that p is pointing where it should and that it occupies 4 bytes
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n";
std::cout << sizeof(*p) << "\n";
I then increment the integer at the address p is pointing to
(*p)++;
So now, since "abcd" is a contiguous block of 32 bits in memory, incrementing by 1 should produce "abce". Instead, the code increments the integer successfully, but leaves the char array as "bbce". This last part checks the new values of the integer and cstr
std::cout << *p << " " << p << "\n";
std::cout << cstr << " " << (void *)cstr << "\n"
Is this expected behavior?
PS: I compiled the code on a linux machine using this command: g++ main.cpp -o main.
file main
produces the following output: "1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0"
x86-64 CPUs (like yours) store the least significant byte of multi-byte integers at the lowest memory address. So incrementing the integer that "abcd" corresponds to results in incrementing its least-significant byte, which is stored first in memory. This converted the "a" character into a "b". How code like this behaves is very dependent on how the CPU encodes integers and strings and your expectations of what this code will do have to take those details into account.
To expect the string "abce", you have to make lots of assumptions:
You have to expect integers to occupy 4 bytes.
You have to expect the least significant byte to be stored last.
You have to expect the encoding of the character "e" to be one more than the encoding of the character "d".
You have to expect that incrementing a "d" to an "e" won't overflow when viewed as a signed integer increment.
Some of these are reasonable assumptions and some of them aren't, but unless you have reasonable grounds for all these assumptions, your expectation isn't justified.
Is this expected behavior?
It is what people familiar with your platform would expect. But generally it's easy to avoid relying on these kinds of assumptions and so the best advice is not to rely on them. Assumption 3 is often unavoidable and reasonable on all modern platforms.
Why after using strdup(value) (int)value returns you different output than before?
How to get the same output?
My short example went bad, please use the long one:
Here the full code for tests:
#include <stdio.h>
#include <iostream>
int main()
{
//The First Part
char *c = "ARD-642564";
char *ca = "ARD-642564";
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are equal
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
//The Second Part
c = strdup("ARD-642564");
ca = strdup("ARD-642564");
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are NOT equal Why?
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
int x;
std::cin >> x;
}
Because an array decays to a pointer in your case, you are printing a pointer (ie, on non-exotic computers, a memory address). There is no guarantee that a pointer fits in an int.
In the first part of your code, c and ca don't have to be equal. Your compiler performs a sort of memory optimization (see here for a full answer).
In the second part, strdup allocates dynamically a string twice, such that the returned pointers are not equal. The compiler does not optimize these calls because he does not seem to control the definition of strdup.
In both cases, c and ca may not be equal.
"The strdup() function shall return a pointer to a new string, which is a duplicate of the string pointed to by s1." source
So it's quite understandable that the pointers differ.
in a function, that gets unsigned char && unsigned char length,
void pcap_callback(u_char *args, const struct pcap_pkthdr* pkthdr, const u_char* packet)
{
std::vector<unsigned char> vec(packet, packet+pkthdr->len); // optimized from foo.
std::stringstream scp;
for (int i=0;i<pkthdr->len;i++) {
scp<<vec[i];
}
std::string mystr = std::string(scp.rdbuf()->str());
std::cout << "WAS: " << packet << std::endl;
std::cout << "GOOD: " << scp.str() << std::endl;
std::cout << "BAD: " << scp.str().c_str() << std::endl;
std::cout << "TEST: " << mystr.size() << std::endl;
assert(mystr.size() == pkthdr->len);
}
Results:
WAS: prints nothing (guess there is a pointer to const.. case)
GOOD: prints data
BAD: prints nothing
TEST, assert: prints that mystr.size() is equal to passed unsigned char size.
I tried:
string.assign(scp.rdbuf());
memcpy(char, scp.str(), 10);
different methods of creating/allocating temporary chars, strings
No help.. it is wanted to get a std::cout'able std::string that contains data, (which was picked from foo, which was unsigned char, which was packet data).
Guessing either the original foo may not be null-terminated, or the problem is something like this - simple, but can't get in.. what are the things to look for here?
(this code is another attempt to use libpcap, just to print packets in C++ way, without using known C++ magic wrappers like libpcapp).
For a quick test, throw in a check for scp.str().size() == strlen(scp.str().c_str()) to see if there are embedded '\0' characters in the string, which is what I suspect is happening.
I think you're going about this the wrong way. It looks like you're dealing with binary data here, in which case you can't expect to meaningfully output it to the screen as text. What you really need is a hex dump.
const unsigned char* ucopy = packet;
std::ios_base::fmtflags old_flags = std::cout.flags();
std::cout.setf(std::ios::hex, std::ios::basefield);
for (const unsigned char* p = ucopy, *e = p + pkthdr->len; p != e; ++p) {
std::cout << std::setw(2) << std::setfill('0') << static_cast<unsigned>(*p) << " ";
}
std::cout.flags(old_flags);
This will output the data byte-by-byte, and let you examine the individual hex values of the binary data. A null byte will simply be output as 00.
Check std::cout.good() after the failed output attempt. My guess is that there's some failure on output (i.e. trying to write a nonprintable character to the console), which is setting failbit on cout.
Also check to ensure the string does not start with a NULL, which would cause empty output to be the expected behavior :)
(Side note, please use reinterpret_cast for unsigned char *ucopy = (unsigned char*)packet; if you're in C++ ;) )