Integer pointer to char array - c++

I am working on an application (C++ language on visual studio) where all the strings are referred with integer pointer.
For example, the class I am using has this integer pointer to the data and a variable for size.
{
..
..
unsigned short int *pData;
int iLen
}
I would like to know
Are there any advantages of using int pointer instead of char pointer?
After thinking a lot, I suspect that the reason may be to avoid the application crash which may happen if the char pointer is used without a null termination. But i am not 100% sure.
During debugging, how can we check the content of the pointer, where the content is a char array or string (on Visual studio).
I can only see the address when I check the content during debugging. because of this i am facing difficulty in debugging.
Using printf would work to display the content but I can't do it in all places.
i am suspecting that the reason for using integer pointer may be to avoid the application crash which may happen if the char pointer is used without a null termination. But i am not 100% sure.
may be to avoid such programmatic errors it is taken as integer pointer.
the class i am using has this integer pointer to the data and a variable for size.
{
..
..
unsigned int *pData;
int iLen
}
Please correct me if you think this can not be the reason.

Please let us know which language you're using so that we can help you a bit better. As for your questions:
There is no benefit of using an int array vs. char array. In fact having an integer array takes up more space (if each int represents its own character). This is because a char takes up one byte where an integer takes up four.
As for printing things when debugging I'm not a master of visual studio since I haven't used it in some time but most modern IDEs allow you to cast things before you print them. For example in lldb you can do po (char)myIntArray[0] (po stands for print). In visual studio writing a custom visualizer should do the trick.

I am not sure why you would want to do this, but if you wanted to store an EOF character in your string for some reason, you would need a pointer to int.

MS Visual Studio often uses UTF-16 to store strings. UTF-16 requires a 16-bit data type, your application uses unsigned short for that, while a more correct name would be uint16_t (or maybe wchar_t, but I am not sure about that).
Another way to store strings uses UTF-8; if you use this way, use a pointer to char (not convenient) or std::string (more convenient). However, you don't have any choice here; you are not going to change how your application stores strings (it's probably too tedious).
To view your UTF-16-encoded string, use a dedicated format specifier. For example, in a quick-watch window, enter:
object->pData,su
instead of just
object->pData

Related

int8_t and char: converts between pointers to integer types with different sign - but it doesn't

I'm working with some embedded code and I am writing something new from scratch so I am preferring to stick with the uint8_t, int8_t and so on types.
However, when porting a function:
void functionName(char *data)
to:
void functionName(int8_t *data)
I get the compiler warning "converts between pointers to integer types with different sign" when passing a literal string to the function. ( i.e. when calling functionName("put this text in"); ).
Now, I understand why this happens and these lines are only debug however I wonder what people feel is the most appropriate way of handling this, short of typecasting every literal string. I don't feel that blanket typecasting in any safer in practice than using potentially ambiguous types like "char".
You seem to be doing the wrong thing, here.
Characters are not defined by C as being 8-bit integers, so why would you ever choose to use int8_t or uint8_t to represent character data, unless you are working with UTF-8?
For C's string literals, their type is pointer to char, and that's not at all guaranteed to be 8-bit.
Also it's not defined if it's signed or unsigned, so just use const char * for string literals.
To answer your addendum (the original question was nicely answered by #unwind). I think it mostly depends on the context. If you are working with text i.e. string literals you have to use const char* or char* because the compiler will convert the characters accordingly. Short of writing your own string implementation you are probably stuck with whatever the compiler provides to you. However, the moment you have to interact with someone/something outside of your CPU context e.g. network, serial, etc. you have to have control over the exact size (which I suppose is where your question stems from). In this case I would suggest writing functions to convert strings or any data-type for that matter to uint8_t buffers for serialized sending (or receiving).
const char* my_string = "foo bar!";
uint8_t buffer* = string2sendbuffer(my_string);
my_send(buffer, destination);
The string2buffer function would know everything there is to know about putting characters in a buffer. For example it might know that you have to encode each char into two buffer elements using big-endian byte ordering. This function is most certainly platform dependent but encapsulates all this platform dependence so you would gain a lot of flexibility.
The same goes for every other complex data-type. For everything else (where the compiler does not have that strong an opinion) I would advise on using the (u)intX_t types provided by stdint.h (which should be portable).
It is implementation-defined whether the type char is signed or unsigned. It looks like you are using an environment where is it unsigned.
So, you can either use uint8_t or stick with char, whenever you are dealing with characters.

Convert pointer to int and back to typed object

I need a simple way to convert a pointer to an int, and then back to the pointer. I need it as an int for portability, since an int is easier to carry around in my project than a pointer.
Start player and return pointer:
SoundPlayer* player = new FxPlayerTiny();
return (int)player; // this works, but is it correct?
Stop player using pointer int:
FxPlayerTiny* player = static_cast<FxPlayerTiny*>((int*)num); // this gives me an error
FxPlayerTiny* player = (FxPlayerTiny*)((int*)obj); // this works, but is it correct?
delete player;
The best thing to do is to fix your project so it can deal with pointers. Integers and pointers are two different things. You can convert back and forth, but such conversions can lose information if you're not careful.
Converting a pointer value to int and back again can easily lose information, depending on the implementation and the underlying platform. For example, there are systems on int is smaller than a pointer. If you have 32-bit ints and 64-bit pointers, then converting a pointer to an int and back again will almost certainly give you an invalid pointer.
It's very likely that long or unsigned long is wide enough to hold a converted pointer value without loss of information; I've never worked on a system where it isn't. (Personally, I tend to prefer unsigned types, but not for any really good reason; either should work equally well.)
So you could write, for example:
SoundPlayer* player = new FxPlayerTiny();
return reinterpret_cast<unsigned long>player;
and convert the unsigned long value back to a pointer using reinterpret_cast,SoundPlayer*>.
Newer implementations provide typedefs uintptr_t and intptr_t, which are unsigned and signed integer types guaranteed to work correctly for round-trip pointer-to-integer-to-pointer conversions. They were introduced in C99, and optionally defined in the <stdint.h> header. (Implementations on which pointers are bigger than any integer type won't define them, but such implementations are rare.) C++ adopted them with the 2011 standard, defining them in the <cstdint> header. But Microsoft Visual C++ does support them as of the 2010 version.
This guarantee applies only to ordinary pointers, not to function pointers or member pointers.
So if you must do this, you can write:
#include <cstdint>
SoundPlayer* player = new FxPlayerTiny();
return reinterpret_cast<std::uintptr_t>player;
But first, consider this. new FxPlayerTiny() gives you a pointer value, and you're going to want a pointer value. If at all possible, just keep it as a pointer. Converting it to an integer means that you have to decide which of several techniques to use, and you have to keep track of which pointer type your integer value is supposed to represent. If you make a mistake, either using an integer type that isn't big enough or losing track of which pointer type you've stored, the compiler likely won't warn you about it.
Perhaps you have a good reason for needing to do that, but if you can just store pointers as pointers your life will be a lot easier.
It is not recommended to use an int for portability, especially because sizeof (int) and sizeof (void*) could differ (for example when compiling for Windows x64).
If ints are needed for a user interface, it maybe would be better to use a table of pointers and accessing them via indices.
However, if you want to convert an pointer to an integer and vice versa a simple conversion is enough but it's just the same, if sizeof *int) == sizeof (void*).

Does buffer overflow happen in C++ strings?

This is concerning strings in C++. I have not touched C/C++ for a very long time; infact I did programming in those languages only for the first year in my college, about 7 years ago.
In C to hold strings I had to create character arrays(whether static or dynamic, it is not of concern). So that would mean that I need to guess well in advance the size of the string that an array would contain. Well I applied the same approach in C++. I was aware that there was a std::string class but I never got around to use it.
My question is that since we never declare the size of an array/string in std::string class, does a buffer overflow occur when writing to it. I mean, in C, if the array’s size was 10 and I typed more than 10 characters on the console then that extra data would be writtein into some other object’s memory place, which is adjacent to the array. Can a similar thing happen in std::string when using the cin object.
Do I have to guess the size of the string before hand in C++ when using std::string?
Well! Thanks to you all. There is no one right answer on this page (a lot of different explanations provided), so I am not selecting any single one as such. I am satisfied with the first 5. Take Care!
Depending on the member(s) you are using to access the string object, yes. So, for example, if you use reference operator[](size_type pos) where pos > size(), yes, you would.
Assuming no bugs in the standard library implementation, no. A std::string always manages its own memory.
Unless, of course, you subvert the accessor methods that std::string provides, and do something like:
std::string str = "foo";
char *p = (char *)str.c_str();
strcpy(p, "blah");
You have no protection here, and are invoking undefined behaviour.
The std::string generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an out_of_range exception when an operation references memory outside the bounds of the string, the subscript operator [] (which does not perform bounds checking) does not.
Another problem occurs when converting std::string objects to C-style strings. If you use string::c_str() to do the conversion, you get a properly null-terminated C-style string. However, if you use string::data(), which writes the string directly into an array (returning a pointer to the array), you get a buffer that is not null terminated. The only difference between c_str() and data() is that c_str() adds a trailing null byte.
Finally, many existing C++ programs and libraries have their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand the semantics of the selected library. Generally speaking, libraries should be evaluated based on how easy or complex they are to use, the type of errors that can be made, how easy these errors are to make, and what the potential consequences may be.
refer https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/295-BSI.html
In c the cause is explained as follow:
void function (char *str) {
char buffer[16];
strcpy (buffer, str);
}
int main () {
char *str = "I am greater than 16 bytes"; // length of str = 27 bytes
function (str);
}
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrites the space allocated for the FP, return address and so on. This, in turn, corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack. However, this classic example shows that a buffer overflow can overwrite a function's return address, which in turn can alter the program's execution path. Recall that a function's return address is the address of the next instruction in memory, which is executed immediately after the function returns.
here is a good tutorial that can give your answer satisfactory.
In C++, the std::string class starts out with a minimum size (or you can specify a starting size). If that size is exceeded, std::string allocates more dynamic memory.
Assuming the library providing std::string is correctly written, you cannot cause a buffer overflow by adding characters to a std::string object.
Of course, bugs in the library are not impossible.
"Do buffer overflows occur in C++ code?"
To the extent that C programs are legal C++ code (they almost all are), and C programs have buffer overflows, C++ programs can have buffer overflows.
Being richer than C, I'm sure C++ can have buffer overflows in ways that C cannot :-}

View integer on memory locations allocated as char in Visual C++ 2008 debugger

I'm using Visual C++ 2008 to write and debug my project. I have a char* pointer. I want to view 4 bytes starting at my pointer as an integer in the debugger. How do I do it? (int)(*pointer) comes to mind but I'm afraid it will simply take the 1-byte value pointed to by pointer and convert it to an integer.
You have to cast your pointer to the desired pointer type and then dereference, like so:
*(int*)(pointer)
This works in GDB, though I imagine it's similar in other debuggers.

Fast 64 bit comparison

I'm working on a GUI framework, where I want all the elements to be identified by ascii strings of up to 8 characters (or 7 would be ok).
Every time an event is triggered (some are just clicks, but some are continuous), the framework would callback to the client code with the id and its value.
I could use actual strings and strcmp(), but I want this to be really fast (for mobile devices), so I was thinking to use char constants (e.g. int id = 'BTN1';) so you'd be doing a single int comparison to test for the id. However, 4 chars isn't readable enough.
I tried an experiment, something like-
long int id = L'abcdefg';
... but it looks as if char constants can only hold 4 characters, and the only thing making a long int char constant gives you is the ability for your 4 characters to be twice as wide, not have twice the amount of characters. Am I missing something here?
I want to make it easy for the person writing the client code. The gui is stored in xml, so the id's are loaded in from strings, but there would be constants written in the client code to compare these against.
So, the long and the short of it is, I'm looking for a cross-platform way to do quick 7-8 character comparison, any ideas?
Are you sure this is not premature optimisation? Have you profiled another GUI framework that is slow purely from string comparisons? Why are you so sure string comparisons will be too slow? Surely you're not doing that many string compares. Also, consider strcmp should have a near optimal implementation, possibly written in assembly tailored for the CPU you're compiling for.
Anyway, other frameworks just use named integers, for example:
static const int MY_BUTTON_ID = 1;
You could consider that instead, avoiding the string issue completely. Alternatively, you could simply write a helper function to convert a const char[9] in to a 64-bit integer. This should accept a null-terminated string "like so" up to 8 characters (assuming you intend to throw away the null character). Then your program is passing around 64-bit integers, but the programmer is dealing with strings.
Edit: here's a quick function that turns a string in to a number:
__int64 makeid(const char* str)
{
__int64 ret = 0;
strncpy((char*)&ret, str, sizeof(__int64));
return ret;
}
One possibility is to define your IDs as a union of a 64-bit integer and an 8-character string:
union ID {
Int64 id; // Assuming Int64 is an appropriate typedef somewhere
char name[8];
};
Now you can do things like:
ID id;
strncpy(id.name, "Button1", 8);
if (anotherId.id == id.id) ...
The concept of string interning can be useful for this problem, turning string compares into pointer compares.
Easy to get pre-rolled Components
binary search tree for the win -- you get a red-black tree from most STL implementations of set and map, so you might want to consider that.
Intrusive versions of the STL containers perform MUCH better when you move the container nodes around a lot (in the general case) -- however they have quite a few caveats.
Specific Opinion -- First Alternative
If I was you I'd stick to a 64-bit integer type and bundle it in a intrusive container and use the library provided by boost. However if you are new to this sort of thing then use stl::map it is conceptually simpler to grasp, and it has less chances of leaking resources since there is more literature and guides out there for these types of containers and the best practises.
Alternative 2
The problem you are trying to solve I believe: is to have a global naming scheme which maps to handles. You can create a mapping of names to handles so that you can use the names to retrieve handles:
// WidgetHandle is a polymorphic base class (i.e., it has a virtual method),
// and foo::Luv implement WidgetHandle's interface (public inheritance)
foo::WidgetHandle * LuvComponent =
Factory.CreateComponent<foo::Luv>( "meLuvYouLongTime");
....
.... // in different function
foo::WidgetHandle * LuvComponent =
Factory.RetrieveComponent<foo::Luv>("meLuvYouLongTime");
Alternative 2 is a common idiom for IPC, you create an IPC type say a pipe in one process and you can ask the kernel for to retrieve the other end of the pipe by name.
I see a distinction between easily read identifiers in your code, and the representation being passed around.
Could you use an enumerated type (or a large header file of constants) to represent the identifier? The names of the enumerated types could then be as long and meaningful as you wish, and still fit in (I am guessing) a couple of bytes.
In C++0x, you'll be able to use user-defined string literals, so you could add something like 7chars..id or "7chars.."id:
template <char...> constexpr unsigned long long operator ""id();
constexpr unsigned long long operator ""id(const char *, size_t);
Although I'm not sure you can use constexpr for the second one.