How to get string from an address? - c++

I have a ULONG value that contains the address.
The address is basically of string(array of wchar_t terminated by NULL character)
I want to retrieve that string.
what is the best way to do that?

#KennyTM's answer is right on the money if by "basically of a string" you mean it's a pointer to an instance of the std::string class. If you mean it's a pointer to a C string, which I suspect may be more likely, you need:
char *s = reinterpret_cast<char *>(your_ulong);
Or, in your case:
whcar_t *s = reinterpret_cast<wchar_t *>(your_ulong);
Note also that you can't safely store pointers in any old integral type. I can make a compiler with a 32-bit long type and a 64-bit pointer type. If your compiler supports it, a proper way to store pointers in integers is to use the stdint.h types intptr_t and uintptr_t, which are guaranteed (by the C99 standard) to be big enough to store pointer types.
However, C99 isn't part of C++, and many C++ compilers (read: Microsoft) may not provide this kind of functionality (because who needs to write portable code?). Fortunately, stdint.h is useful enough that workarounds exist, and portable (and free) implementations of stdint.h for compatability with older compilers can be found easily on the internet.

string& s = *reinterpret_cast<string*>(your_ulong);

Related

C++ Why do we need to explicitly cast from one type to another?

I was writing some code recently and found myself doing a lot of c-style casts, such as the following:
Client* client = (Client*)GetWindowLong(hWnd, GWL_USERDATA);
I thought to myself; why do we actually need to do these?
I can somewhat understand why this would be needed in circumstances where there is lot of code where the compiler may not what types can be converted to what, such as when using reflection.
but when casting from a long to a pointer where both types are of the same size, I don't understand why the compiler would not allow us to do this?
when casting from a long to a pointer where both types are of the same size, I don't understand why the compiler would not allow us to do this?
Ironically, this is the place where compiler's intervention is most important!
In vast majority of situations, converting between long and a pointer is a programming error which you don't want to go unnoticed, even if your platform allows it.
For example, when you write this
unsigned long *ptr = getLongPtr();
unsigned long val = ptr; // Probably an error
it is almost a certainty that you are missing an asterisk in front of ptr:
unsigned long val = *ptr; // This is what it should be
Finding errors like this without compiler's help is very hard, hence the compiler wants you to tell it that you know when you are doing on conversions like that.
Moreover, something that is fine on one platform may not work on other platforms. For example, an integral type and a pointer may have the same size on 32-bit platforms, but have different sizes on 64-bit platform. If you want to maintain any degree of portability, the compiler should warn you of the conversion even on the 32-bit platform, where the sizes are identical. Compiler warning will help you identify an error, and switch to a portable pointer-as-integer type intptr_t.
I think the idea is that we want compiler to tell us when we are doing something dodgy and/or potentially unintended. That way we don't do it by accident. So the compiler complains unless we explicitly tell the compiler that this is what we really, really want. We do that by using the a cast.
Edited to add:
It might be better to ask why we are allowed to cast between types. Originally C was created as a strongly typed language. Although it allows promotion/conversion between related object types (like between ints and floats) it is supposed to prevent access and assignment to the wrong type as a language feature, a safety measure. However occasionally this is useful so casting was put in the language to allow us to circumvent the type rules on those occasions when we need to.

Length of a C string: std::strlen() vs. std::char_traits<char>::length()

Both are equivalent in that they return the length of the null-terminated character sequence. Are there reasons of preferring one to the other?
Use the simpler alternative. std::char_traits::length is great and all, but for C strings it does the same and is much longer code.
Do yourself a favour and avoid code bloat. I’m a huge fan of C++ functions over C equivalent (e.g. I will never use std::strcpy or std::memcpy, there’s a perfectly fine std::copy). But avoiding std::strlen is just silly.
One reason to use C++ functions exclusively is interface uniformity: for instance, both std::strcpy and std::memcpy have atrocious interfaces. However, std::strlen is a perfectly fine algorithm in the best tradition of C++. It doesn’t generalise, true, but the neither do other class-specific free functions found in the standard library.
std::strlen() is a holdover from the C Standard Library and only operates on a const char* (it is unsafe in that it has undefined behavior if the string is not null terminated). If the string is using a wide character set (e.g. const unsigned short*), std::strlen() is useless.
std::char_traits<T>::length() will operate on whatever the T type is (e.g. if it is an unsigned short, it will still operate properly, but also requires a null terminated value - that is the last value must be T(0) - if an array of T's passed to it is not null terminated, behavior is undefined as well).
In general, when dealing with strings, it is better to use std::string::length() instead of using C-style character strings.
Both Zack Howland and Konrad Rudolph have a point. Thanks. I accept both answers. The summarized reply would be:
There doesn't seem to be any except personal preference either for shorter code or the C++ standard library (I leave out generalization since it wasn't the point of the question as can be seen from the title).
std::strlen() is a C standard library compatibility (even though it is part of ISO C++) function that takes const char* as an argument. length() is a method of the std::string family of classes. So if you want to use strlen() on std::string you'd have to write:
strlen(mystring.c_str())
which is less tidy than mystr.length(). Apart from that, there should be no tangible difference (for char type, that is).

What type for operations on pointers

I've searched quite a bit, but couldn't find anything helpful - but then I'm not sure I'm searching for the right thing.
Is there any scalar defined by the standard that has to be at least as large as a pointer? I.e. sizeof(?) >= sizeof(void*).
I need it because I'm writing a small garbage collector and want something along the lines of this:
struct Tag {
uint32_t desc:sizeof(uint32_t)*8-2; // pointer to typedescriptor
uint32_t free:1;
uint32_t mark:1;
};
I'd prefer something that's valid according to the standard (if we're at it, I was quite surprised that sizeof(uint32_t)*8-2 is valid for the bitfield definition - but VS2010 allows it).
So does size_t fulfill this requirement?
Edit: So after my inclusion of both C and C++ lead to some problems (well and there I thought they would be similar in that regard), I'd actually settle for one of them (I don't really need C++ for this part of the code and I can link C and c++ together so that should work). And C99 seems to be the right standard in this case from the answers.
You could include <stdint.h> (or <cstdint>) and use uintptr_t or intptr_t.
Since MSVC refuses to support C99, you may need to include <Windows.h> and use ULONG_PTR or LONG_PTR instead. (See C99 stdint.h header and MS Visual Studio)
(Also, please use CHAR_BIT instead of 8.)
C99 has the optional uintptr_t in <stdint.h>which guarantees that you can convert between a uintptr_t and a pointer value, though it doesn't say anything about any operations on integer.
Generally, on common platforms a void* is the same as any other pointer and converting a pointer to an integer, manipulating that integer and converting it back to a pointer yields well defined resultes, but C does not guarantee this so you'll have to know the compilers/platform you want to target.
Best you probably can do is use the above mentioned uintptr_t if you have a C99 compiler, or compile a program on the target platform which checks whether sizeof(void*) is equal to any of the sizeof unsigned short,int,long,long long and generate a header file where you typedef your own uintptr according to what the program found out.

Best Practices: Should I create a typedef for byte in C or C++?

Do you prefer to see something like t_byte* (with typedef unsigned char t_byte) or unsigned char* in code?
I'm leaning towards t_byte in my own libraries, but have never worked on a large project where this approach was taken, and am wondering about pitfalls.
If you're using C99 or newer, you should use stdint.h for this. uint8_t, in this case.
C++ didn't get this header until C++11, calling it cstdint. Old versions of Visual C++ didn't let you use C99's stdint.h in C++ code, but pretty much every other C++98 compiler did, so you may have that option even when using old compilers.
As with so many other things, Boost papers over this difference in boost/integer.hpp, providing things like uint8_t if your compiler's standard C++ library doesn't.
I suggest that if your compiler supports it use the C99 <stdint.h> header types such as uint8_t and int8_t.
If your compiler does not support it, create one. Here's an example for VC++, older versions of which do not have stdint.h. GCC does support stdint.h, and indeed most of C99
One problem with your suggestion is that the sign of char is implementation defined, so if you do create a type alias. you should at least be explicit about the sign. There is some merit in the idea since in C# for example a char is 16bit. But it has a byte type as well.
Additional note...
There was no problem with your suggestion, you did in fact specify unsigned.
I would also suggest that plain char is used if the data is in fact character data, i.e. is a representation of plain text such as you might display on a console. This will present fewer type agreement problems when using standard and third-party libraries. If on the other hand the data represents a non-character entity such as a bitmap, or if it is numeric 'small integer' data upon which you might perform arithmetic manipulation, or data that you will perform logical operations on, then one of the stdint.h types (or even a type defined from one of them) should be used.
I recently got caught out on a TI C54xx compiler where char is in fact 16bit, so that is why using stdint.h where possible, even if you use it to then define a byte type is preferable to assuming that unsigned char is a suitable alias.
I prefer for types to convey the meaning of the values stored in it. If I need a type describing a byte as it is on my machine, I very much prefer byte_t over unsigned char, which could mean just about anything. (I have been working in a code base that used either signed char or unsigned char to store UTF-8 strings.) The same goes for uint8_t. It could just be used as that: an 8bit unsigned integer.
With byte_t (as with any other aptly named type), there rarely ever is a need to look up what it is defined to (and if so, a good editor will take 3secs to look it up for you; maybe 10secs, if the code base is huge), and just by looking at it it's clear what's stored in objects of that type.
Personally I prefer boost::int8_t and boost::uint8_t.
If you don't want to use boost you could borrow boost\cstdint.hpp.
Another option is to use portable version of stdint.h (link from this answer).
Besides your awkward naming convention, I think that might be okay. Keep in mind boost does this for you, to help with cross-platform-ability:
#include <boost/integer.hpp>
typedef boost::uint8_t byte_t;
Note that usually type's are suffixed with _t, as in byte_t.
I prefer to use standard types, unsigned char, uint8_t, etc., so any programmer looking at the source does not have to refer back to headers to grok the code. The more typedefs you use the more time it takes for others to get to know your typing conventions. For structures, absolutely use typedefs, but for primitives use them sparingly.

static_cast wchar_t* to int* or short* - why is it illegal?

In both Microsoft VC2005 and g++ compilers, the following results in an error:
On win32 VC2005: sizeof(wchar_t) is 2
wchar_t *foo = 0;
static_cast<unsigned short *>(foo);
Results in
error C2440: 'static_cast' : cannot convert from 'wchar_t *' to 'unsigned short *' ...
On Mac OS X or Linux g++: sizeof(wchar_t) is 4
wchar_t *foo = 0;
static_cast<unsigned int *>(foo);
Results in
error: invalid static_cast from type 'wchar_t*' to type 'unsigned int*'
Of course, I can always use reinterpret_cast. However, I would like to understand why it is deemed illegal by the compiler to static_cast to the appropriate integer type. I'm sure there is a good reason...
You cannot cast between unrelated pointer types. The size of the type pointed to is irrelevant. Consider the case where the types have different alignment requirements, allowing a cast like this could generate illegal code on some processesors. It is also possible for pointers to different types to have differrent sizes. This could result in the pointer you obtain being invalid and or pointing at an entirely different location. Reinterpret_cast is one of the escape hatches you hacve if you know for your program compiler arch and os you can get away with it.
As with char, the signedness of wchar_t is not defined by the standard. Put this together with the possibility of non-2's complement integers, and for for a wchar_t value c,
*reinterpret_cast<unsigned short *>(&c)
may not equal:
static_cast<unsigned short>(c)
In the second case, on implementations where wchar_t is a sign+magnitude or 1's complement type, any negative value of c is converted to unsigned using modulo 2^N, which changes the bits. In the former case the bit pattern is picked up and used as-is (if it works at all).
Now, if the results are different, then there's no realistic way for the implementation to provide a static_cast between the pointer types. What could it do, set a flag on the unsigned short* pointer, saying "by the way, when you load from this, you have to also do a sign conversion", and then check this flag on all unsigned short loads?
That's why it's not, in general, safe to cast between pointers to distinct integer types, and I believe this unsafety is why there is no conversion via static_cast between them.
If the type you're casting to happens to be the so-called "underlying type" of wchar_t, then the resulting code would almost certainly be OK for the implementation, but would not be portable. So the standard doesn't offer a special case allowing you a static_cast just for that type, presumably because it would conceal errors in portable code. If you know reinterpret_cast is safe, then you can just use it. Admittedly, it would be nice to have a straightforward way of asserting at compile time that it is safe, but as far as the standard is concerned you should design around it, since the implementation is not required even to dereference a reinterpret_casted pointer without crashing.
By spec using of static_cast restricted by narrowable types, eg: std::ostream& to std::ofstream&. In fact wchar_t is just extension but widely used.
Your case (if you really need it) should be fixed by reinterpret_cast
By the way MSVC++ has an option - either treat wchar_t as macro (short) or as stand-alone datatype.
Pointers are not magic "no limitations, anything goes" tools.
They are, by the language specification actually very constrained. They do not allow you to bypass the type system or the rest of the C++ language, which is what you're trying to do.
You are trying to tell the compiler to "pretend that the wchar_t you stored at this address earlier is actually an int. Now read it."
That does not make sense. The object stored at that address is a wchar_t, and nothing else. You are working in a statically typed language, which means that every object has one, and juts one, type.
If you're willing to wander into implementation-defined behavior-land, you can use a reinterpret_cast to tell the compiler to just pretend it's ok, and interpret the result as it sees fit. But then the result is not specified by the standard, but by the implementation.
Without that cast, the operation is meaningless. A wchar_t is not an int or a short.