Global typecast operator overload? - c++

I'm writing some 'portable' code (meaning that it targets 32- and 64-bit MSVC2k10 and GCC on Linux) in which I have, more or less:
typedef unsigned char uint8;
C-strings are always uint8; this is for string-processing reasons. Legacy code needs char compiled as signed, so I can't set compiler switches to default it to unsigned. But if I'm processing a string I can't very well index an array:
char foo[500];
char *ptr = (foo + 4);
*ptr = some_array_that_normalizes_it[*ptr];
You can't index an array with a negative number at run-time without serious consequences. Keeping C-strings unsigned allows for such easier protection from bugs.
I would really like to not have to keep casting (char *) every time I use a function that takes char *'s, and also stop duplicating class functions so that they take either. This is especially a pain because a string constant is implicitly passed as a char *
int foo = strlen("Hello"); // "Hello" is passed as a char *
I want all of these to work:
char foo[500] = "Hello!"; // Works
uint8 foo2[500] = "Hello!"; // Works
uint32 len = strlen(foo); // Works
uint32 len2 = strlen(foo2); // Doesn't work
uint32 len3 = strlen((char *)foo2); // Works
There are probably caveats to allowing implicit type conversions of this nature, however, it'd be nice to use functions that take a char * without a cast every time.
So, I figured something like this would work:
operator char* (const uint8* foo) { return (char *)foo; }
However it does not. I can't figure out any way to make it work. I also can't find anything to tell me why there seems to be no way to do this. I can see the possible logic - implicit conversions like that could be a cause of FAR too many bugs - but I can't find anything that says "this will not work in C++" or why, or how to make it work (short of making uin8 a class which is ridiculous).

Global cast(typecast) operator, global assignment operator, global array subscript operator and global function call operator overloading are not allowed in C++.
MSVS C++ will be generate C2801 errors on them. Look at wiki for list of C++ operators and them overloading rules.

I'm not a big fan of operator [ab]using, but thats what c++ is for right?
You can do the following:
const char* operator+(const uint8* foo)
{
return (const char *)foo;
}
char* operator+(uint8* foo)
{
return (char *)foo;
}
With those defined, your example from above:
uint32 len2 = strlen(foo2);
will become
uint32 len2 = strlen(+foo2);
It is not an automatic cast, but this way you have an easy, yet explicit way of doing it.

Both compilers you mention do have a "treat chars as unsigned" switch. Why not use that?

Related

How to pass on 'char* data' when the data is stored as vector of uint8_t?

I have a class defined like this:
class sco
{
private:
public:
vector<uint8_t> data;
sco(vector<uint8_t> data);
~sco();
};
Where the constructor is:
sco::sco(vector<uint8_t> data) {
this->data = data;
}
I then have a function which is declared like this:
void send(unsigned& id, char* data, char len);
My problem is that I need to pass the data of a sco member to it, but the difference of type and the pointer is confusing me. If I have a member newSco with some data in it, would it be reasonable to call this send function as send(someId, (char*)&(newSco.data.begin()), newSco.data.size() ); ? Please note, that the function send is for a microcontroller, and it takes in char type so I can't change that, and neither can I change uint8_t as that is the type which is coming in from a serial communication. I have wasted over 3 days trying to convert types to something mutual, just to reverse it back because it destroyed everything. I give up and I will no longer try to manipulate the types, as I just do not have that sort of time and just need it to work even if it is bad practice. I thought uint8_t and char are of the same size, so it shouldn't matter.
send(someId, (char*)&(newSco.data.begin()), newSco.data.size() )
You almost had it with this, but not quite.
Here's why:
begin() gives you an iterator. You're taking the address of that iterator, so you're a level of indirection off. Using a C-style cast has masked what would otherwise be a type-related compilation error.
We could write (char*)&*(newSco.data.begin()) instead to dereference the iterator then take the address of the resulting first element.
But if the container were empty, this is very broken. You can't dereference a thing that doesn't exist.
So now we try:
send(someId, (char*)&newSco.data[0], newSco.data.size() )
Unfortunately this is also not safe if the container is empty, since .data[0] also effectively dereferences an element that may not exist. Some argue that the implied &* "cancels it out", but that's controversial and I've never believed it.
If you ever move to C++11 or later, you can use the perfectly safe:
send(someId, (char*)newSco.data.data(), newSco.data.size() )
Otherwise, stick with &newSco.data[0] but skip the entire send call when newSco.data.size() is zero. Can't emphasise that enough.
The cast to char* is itself safe; you can freely interpret uint8_ts as chars in this manner; there's a special rule for it. I've used this pattern myself a few times.
But, as above, a C-style cast is less than ideal. Prefer a nice reinterpret. I'd also add some const for good measure (skip this if your MC's API doesn't permit it):
if (!newSco.data.empty())
{
send(
someId,
reinterpret_cast<const char*>(&newSco.data[0]),
newSco.data.size()
);
}
There. Gorgeous. 😊
As a final note, since the API takes a char for the final parameter, I'd consider putting an upper bound on the container size. You can run the function in a loop, sending either CHAR_MAX or newSco.data.size() bytes at a time (whichever is smaller). Otherwise, if you expect more than CHAR_MAX elements in the container, you're going to get a nasty overflow!
This should work:
send(someId, static_cast<char*>(&newSco.data[0]), static_cast<char>(newSco.data.size()));
Additional Info:
How to convert vector to array
What is the difference between static_cast<> and C style casting?
Yes it is reasonable to cast the pointer to char* in this case. In general, you may not refer to the same memory location with different typed pointers due to strict aliasing rules. However, 'char' is a special case so the cast is allowed.
You shouldn't cast uint32_t* to int32_t though, for example, even though it may work in some cases.
Edit: As the comments below noted: casting from uint8_t* to char* may be fine, casting from iterator is not. use .data() instead of .begin().
If you can use at least C++11, the std::vector class provides data() which returns the raw array.If you are before C++11, I'm afraid you have to iterate over the vector and manually build your char*.
But, you cannot static_cast an unsigned char* to a char*. It is not allowed.But you are lucky, char* is an exception that does not break the strict aliasing rule. So you can use reinterpret_cast instead.
So you may solve your problem as follows:
Before C++11:
std::vector<uint8_t> a; // For the example
a.push_back('a');
a.push_back('b');
a.push_back('c');
a.push_back('d');
char b[a.size()];
for(unsigned int i = 0; i < a.size(); ++i)
{
b[i] = static_cast<char>(a[i]);
}
After C++11:
std::vector<uint8_t> a {'a', 'b', 'c', 'd'}; //uint8_t aka unsigned char
char * b = reinterpret_cast<char*>(a.data()); // b is a char* so the reinterpret_cast is safe
I hope it can help.
try to use reinterpret_cast.
Example:
#include <iostream>
#include <unistd.h>
#include <vector>
int main(int argc, char const *argv[])
{
std::vector<uint8_t> v{65,66,67,68,69,70};
char* ptr = reinterpret_cast<char*>(v.data());
for (auto i{0}; i < v.size(); i++) {
std::cout << *ptr++ << std::endl;
}
return 0;
}
for your case:
void send(someId, reinterpret_cast<char*>(sco.data.data()), sco.data.size());

working with binary data and unsigned char

I'm writing a program that reads a content of a binary file (specificly Windows PE file. Wikipedia page and detailed PE structure).
Because of the binary data in the file, the characters often "fall out" of the ascii range (0-127) and that result in negative values.
To make sure I won't work with unwanted negative values, I can either pass const unsigned char * or convert the resulting char in the calculation to unsigned char.
On one hand, passing const unsigned char * makes sense because the data is non-ascii that has a numaric value and thus should be treated as positive.
In addition, it'll let me perform calculations without the need to cast the result to unsigned char.
On the other hand, I can't pass constant strings (const char *, such as pre-defined strings "MZ", "PE\0\0" etc.) to functions without first casting them to const unsigned char *.
What would be the better approach or best-practice in this scenario?
I think I'd use unsigned char, but avoid casting, and instead define a little class named ustring (or something similar). You have a couple of choices with that. One would be to instantiate std::basic_string over unsigned char. This can be useful (it gives you all of std::string's functionality, but with unsigned chars instead of chars. The obvious disadvantage is that it's probably overkill, and has essentially no compatibility with std::string, even though it's almost exactly the same thing.
The other obvious possibility would be to define your own class. Since you apparently care mostly about string literals, I'd probably go this way. The class would be initalized with a string literal, and it would just hold a pointer to the string, but as unsigned char * instead of just char *.
Then there's one more step to make life better: define a user defined literal operator named something like _us, so creating an object of your type from a string literal will look something like this: auto DOS_sig = "MZ"_us;
class ustring {
unsigned char const *data;
unsigned long long len;
public:
ustring(unsigned char const *s, unsigned long long len)
: data(s)
, len(len)
{}
operator char const *() const { return data; }
bool operator==(ustring const &other) const {
// note: memcmp treats what you pass it as unsigned chars.
return len == other.len && 0 == memcmp(data, other.data, len);
}
// you probably want to add more stuff here.
};
ustring operator"" _us(char const * const s, unsigned long long len) {
return ustring((unsigned char const *)s, len);
}
If I'm not mistaken, this should be pretty easy to work with. For example, let's assume you've memory mapped what you think is a PE file, with its base address at mapped_file. To see if it has a DOS signature, you might do something like this:
if (ustring(&mapped_file[0], 2) == "MZ"_us)
std::cerr << "File appears to be an executable.\n";
else
std::cerr << "file does not appear to be an executable.\n";
Caution: I haven't tested this, so fencepost errors and such are likely--for example, I don't remember whether the length passed to the user defined literal operator includes the NUL terminator or not. This isn't intended to represent finished code, just a sketch of a general direction that might be useful to explore.

c++ char * initialization in constructor

I'm just curious, I want to know what's going on here:
class Test
{
char * name;
public:
Test(char * c) : name(c){}
};
1) Why won't Test(const char * c) : name(c){} work? Because char * name isn't const? But what about this:
main(){
char * name = "Peter";
}
name is char*, but "Peter" is const char*, right? So how does that initialization work?
2) Test(char * c) : name(c){ c[0] = 'a'; } - this crashes the program. Why?
Sorry for my ignorance.
Why won't Test(const char * c) : name(c) {} work? Because char * name isn't const?
Correct.
how does this initialization work: char * name = "Peter";
A C++ string literal is of type char const[] (see here, as opposed to just char[] in C, as it didn't have the const keyword1). This assignment is considered deprecated in C++, yet it is still allowed2 for backward compatibility with C.
Test(char * c) : name(c) { c[0] = 'a'; } crashes the program. Why?
What are you passing to Test when initializing it? If you're passing a string literal or an illegal pointer, doing c[0] = 'a' is not allowed.
1 The old version of the C programming language (as described in the K&R book published in 1978) did not include the const keyword. Since then, ANSI C borrowed the idea of const from C++.
2 Valid in C++03, no longer valid in C++11.
A conversion to const is a one-way street, so to speak.
You can convert from T * to T const * implicitly.
Conversion from T const * to T * requires an explicit cast. Even if you started from T *, then converted to T const *, converting back to T * requires an explicit cast, even though it's really just "restoring" the access you had to start with.
Note that throughout, T const * and const T * are precisely equivalent, and T stands for "some arbitrary type" (char in your example, but could just as easily be something else like int or my_user_defined_type).
Initializing a char * from a string literal (e.g., char *s = "whatever";) is allowed even though it violates this general rule (the literal itself is basically const, but you're creating a non-const pointer to it). This is simply because there's lots of code that depends on doing this, and nobody's been willing to break that code, so they have a rule to allow it. That rule's been deprecated, though, so at least in theory some future compiler could reject code that depends on it.
Since the string literal itself is basically const, any attempt at modifying it results in undefined behavior. On most modern systems, this will result in the process being terminated, because the memory storing the string literal will be marked at 'read only'. That's not the only possible result. Just for example, back in the days of MS-DOS, it would often succeed. It could still have bizarre side-effects though. For one example, many compilers "knew" that string literals were supposed to be read-only, so they'd "merge" identical string literals. Therefore if you had something like:
char *a = "Peter"; a[1] = 'a';
char *b = "Peter";
cout << b;
The compiler would have "merged" a and b to actually point at the same memory -- so when you modified a, that change would also affect b, so it would print out "Pater" instead of "Peter".
Note that the string literals didn't need to be entirely identical for this to happen either. As long as one was identical to the end of another, they could be merged:
char *a = "this?";
char *b = "What's this?";
a[2] = 'a';
a[3] = 't';
cout << b; // could print "What's that?"
Mandating one behavior didn't make sense, so the result was (and is) simply undefined.
First of all this is C++, you have std::string. You should really consider using it.
Regarding your question, "Peter" is a char literal, hence it is unmodifiable and surely you can't write on it. You can:
have a const char * member variable and initialize it like you are doing name(c), by declaring "Peter" as const
have a char * member variable and copy the content, eg name(strdup(c)) (and remember to release it in destructor.
Correct.
"Peter" is typically stored in a read-only memory location (actually, it depends on what type of device we are on) because it is a string literal. It is undefined what happens when you attempt to modify a string literal (but you can probably guess that you shouldn't).
You should use std::string anyways.
1a) Right
1b) "Peter" is not const char*, its is char* but it may not be modified. The reason is for compatibility with times before const existed in the language. A lot of code already existed that said char* p = "fred"; and they couldn't just make that code illegal overnight.
2) Can't say why that would crash the program without seeing how you are using that constructor.

Conversion from CString to char*/TCHAR*

I am well aware of techniques to convert CString to a C-style character. One of them is to use strcpy/_tcscpy, and others include using CStrBuf.
The problem:
char Destination[100];
CStringA Source; // A is for simplicity and explicit ANSI specification.
Source = "This is source string."
Now I want this:
Destination = Source;
To happen automatically. Well, that logically means writing a conversion operator in CString class. But, as implicit as it is, I dont have privileges to change the CString class.
I thought of writing a global conversion opertor and global assignment operator. But it doesnt work:
operator char* (const CStringA&); // Error - At least must be class-type
operator = ... // Won't work either - cannot be global.
Yes, it is definitely possible to write function (preferably a templated one). But that involves calling the function, and it is not smooth as assignment operator.
You cannot assign to arrays. This makes what you want impossible. Also, honestly, it's a pretty wrong thing to do - a magic-number-sized buffer?
Well, I don't want to say that this is in any way recommendable, but you could hijack some lesser-used operator for a quick hack:
void operator<<=(char * dst, const std::string & s)
{
std::strcpy(dst, s.c_str());
}
int main()
{
char buf[100];
std::string s = "Hello";
buf <<= s;
}
You could even rig up a moderately safe templated version for statically sized arrays:
template <typename TChar, unsigned int N>
inline void operator<<=(TChar (&dst)[N], const std::string & s)
{
std::strncpy(dst, s.c_str(), N);
}
An operator on CString won't solve the problem since you need to copy to Destination buffer although this assignment would change the value of Destination, which is impossible.
Somehow, you need an operator to achive this line:
strcpy(Destination, LPCSTR(Source)); // + buffer overflow protection
As you can see, converting Source is only half way. You still need to copy to the destination buffer.
Also, I wouldn't recommend it because the line Destination = Source is completely misleading in regard of the char[] semantics.
The only possible such assignment would be an initialisation of Destination:
char Destination[100] = Source;

implementation Strcat Function

I've got a programming question about the implementation of strcat() function.
I have been trying to solve that problem and I got some Access violation error.
My created function:
char str_cat(char str1, char str2)
{
return str1-'\0'+str2;
}
what is wrong in the above code?
One more question please,
is "iostream" a header file? where can I get it?
thanks
Unfortunately, everything is wrong with this function, even the return type and argument types. It should look like
char * strcat(const char *str1, const char *str2)
and it should work by allocating a new block of memory large enough to hold the concatenated strings using either malloc (for C) or new (for C++), then copy both strings into it. I think you've got your (home)work cut out for you, though, as I don't think you know much of what any of that means.
Nothing is right in the above code.
You need to take char * parameters
You need to return a char * if you have to return something (which isn't needed)
You'll need to loop over the string copying individual characters - no easy solution with + and -
You'll need to 0-terminate the result
E.g. like this:
void strcat(char * Dest, char const * Src) {
char * d = Dest;
while (*d++);
char const * s = Src;
while (*s) { *d++ = *s++; }
*d = 0;
}
Why do you need to do this? There's a perfectly good strcat in the standard library, and there's a perfectly good std::string which you can use + on.
Don't want to sound negative but there is not much right with this code.
Firstly, strings in C are char*, not char.
Second, there is no way to 'add' or 'subtract' them the way you would hope (which is sort of kind of possible in, say, python).
iostream is the standard I/O header for C++, it should be bundled with your distribution.
I would really suggest a tutorial on pointers to get you going - this I found just by googling "ansi c pointers" - I'm guessing the problem asks you for a C answer as opposed to C++, since in C++ you would use std::string and the overloaded operator+.