Strings of unsigned chars

Strings of unsigned chars - c++

Here's an interesting one. I'm writing an AES encryption algorithm, and have managed to get it making accurate encryptions. The trouble comes when I attempt to write the result to a file. I was getting files with incorrect output. Hex values would be mangled and it was just generally nonsensical (even by encrypted standards).
I did some debugging by sampling my encryption output before sending it to the file. What I found was that I was getting some type of overflow somewhere. When the correct hex value was supposed to be 9e, I would get ffffff9e. It would do this only to hex values above 7F, i.e. characters in the "extended" character set weren't being handled properly. This had happened to me earlier in my project as well, and the problem then had been using a char[][] container instead of an unsigned char[][] container.
My code uses strings to pass the encrypted data between the user interface and AES encryption class. I'm guessing that std::strings don't support the extended character set. So my question is: is there a way to instantiate an unsigned string, or will I have to find a way to replace all of my usage of strings?

std::string is really just a typedef, something like:
namespace std {
typedef basic_string<char> string;
}
It's fairly easy to create a variant for unsigned char:
typedef basic_string<unsigned char> ustring;
You will, however, have to change your code to use a ustring (or whatever name you prefer) instead of std::string though.
Depending on how you've written your code, that may not require editing all the code though. In particular, if you have something like:
namespace crypto {
using std::string;
class AES {
string data;
// ..
};
}
You can change the string type by changing only the using declaration:
namespace unsigned_types {
typedef std::basic_string<unsigned char> string;
}
// ...
namespace crypto {
using unsigned_types::string;
class AES {
string data;
};
}
Also note that different instantiations of a template are entirely separate types, even when the types over which they're intantiated are related, so the fact that you can convert implicitly between char and unsigned char doesn't mean you'll get a matching implicit conversion between basic_string<char> and basic_string<unsigned char>.

std::string is nothing more or less than a specialization of the std::basic_string<> template, so you can simply do a
typedef std::basic_string<unsigned char> ustring;
to get what you want.
Note that the C/C++ standards do not define whether char is the signed or the unsigned variety, so any program that casts a char directly to a larger type invokes implementation defined behaviour.

Cast your value to unsigned char first:
char input = 250; // just an example
unsigned int n = static_cast<unsigned char>(input); // NOT: "unsigned int n = input;"
// ^^^^^^^^^^^^^^^^^^^^^^^^^^
The problem is that your char happens to be signed, and so its value is not the "byte value" that you want -- you have to convert to unsigned char to get that.

Related

working with binary data and unsigned char

I'm writing a program that reads a content of a binary file (specificly Windows PE file. Wikipedia page and detailed PE structure).
Because of the binary data in the file, the characters often "fall out" of the ascii range (0-127) and that result in negative values.
To make sure I won't work with unwanted negative values, I can either pass const unsigned char * or convert the resulting char in the calculation to unsigned char.
On one hand, passing const unsigned char * makes sense because the data is non-ascii that has a numaric value and thus should be treated as positive.
In addition, it'll let me perform calculations without the need to cast the result to unsigned char.
On the other hand, I can't pass constant strings (const char *, such as pre-defined strings "MZ", "PE\0\0" etc.) to functions without first casting them to const unsigned char *.
What would be the better approach or best-practice in this scenario?

I think I'd use unsigned char, but avoid casting, and instead define a little class named ustring (or something similar). You have a couple of choices with that. One would be to instantiate std::basic_string over unsigned char. This can be useful (it gives you all of std::string's functionality, but with unsigned chars instead of chars. The obvious disadvantage is that it's probably overkill, and has essentially no compatibility with std::string, even though it's almost exactly the same thing.
The other obvious possibility would be to define your own class. Since you apparently care mostly about string literals, I'd probably go this way. The class would be initalized with a string literal, and it would just hold a pointer to the string, but as unsigned char * instead of just char *.
Then there's one more step to make life better: define a user defined literal operator named something like _us, so creating an object of your type from a string literal will look something like this: auto DOS_sig = "MZ"_us;
class ustring {
unsigned char const *data;
unsigned long long len;
public:
ustring(unsigned char const *s, unsigned long long len)
: data(s)
, len(len)
{}
operator char const *() const { return data; }
bool operator==(ustring const &other) const {
// note: memcmp treats what you pass it as unsigned chars.
return len == other.len && 0 == memcmp(data, other.data, len);
}
// you probably want to add more stuff here.
};
ustring operator"" _us(char const * const s, unsigned long long len) {
return ustring((unsigned char const *)s, len);
}
If I'm not mistaken, this should be pretty easy to work with. For example, let's assume you've memory mapped what you think is a PE file, with its base address at mapped_file. To see if it has a DOS signature, you might do something like this:
if (ustring(&mapped_file[0], 2) == "MZ"_us)
std::cerr << "File appears to be an executable.\n";
else
std::cerr << "file does not appear to be an executable.\n";
Caution: I haven't tested this, so fencepost errors and such are likely--for example, I don't remember whether the length passed to the user defined literal operator includes the NUL terminator or not. This isn't intended to represent finished code, just a sketch of a general direction that might be useful to explore.

string vs char* as class member variables. Why use char* at all?

class Student {
public:
string name;
};
vs
class Student {
public:
char* name;
};
Please correct me if I'm wrong. If we were to use char* instead of string, we will have to write our very own copy-constructor because we need to every time we have pointer variables as data members. Right?
So, my question is: Why use char* at all?
Using string, in the constructor, we can directly do:
Student(string s) {
name = s;
}
which is simpler compared to char*, which needs:
Student(string s) {
name = new char[strlen(s)+1]; // extra 1 to store the '\n'
strcpy(name,s);
}
Why not use string at all times instead of char* when being used as a data member of a class?

I think the only reason char* is used in C++ as a string is because of C. I'm sure if it was a new language, one which didn't strive to be compatible with C, char* would not be used like that. You will notice that functions that handle char* as if it were a string all come from C.
Note that in C, there is no string, so
struct Student { char* name; };
is perfectly valid C code, whereas
struct Student { string name; };
is not. Therefore, it is not unusual, when dealing with code which previously target C, to see those char* types.
There are usually little reason for using char* as a string, unless you are either writing a new string class, interfacing C functions, or dealing with legacy code.

You use char * instead of string, because a string is a string and a char * is a pointer to a character-aligned address.
Expanding on that, a string is an abstraction of a vector of characters with defined semantics. In C land, and in a lot of C++ programs, it represents an allocated block of memory along with a guarantee that it's terminated with the ascii NUL character 0x00. But a C++ implementation of string could instead use, say, a Pascal string with associated length, or it could represent strings in a string pool as a linked list.
A char * isn't providing that guarantee at all, and in fact might not be a string -- for example, it might be a collection of data with embedded 0x00 values. All it promises is that it's an address of something that the underlying architecture thinks is a character.

If you need a string, use std::string, not char*.
An obvious exception is interfacing to legacy code that uses char* to represent strings. Still, don't use char* outside of the interface layer.
You need to use char* when your data isn't a string, but a raw unstructured array of bytes. This is the case when you read or write binary data from files or network interfaces.

Sometimes, it is simpler to use char* instead of string, for example, when you are dealing with network and you need to transform a string full of bytes into a integer, float, etc.. I think that it's simpler to use a char*, instead of a string :
Using a char*
char* buffer[4];
read(buffer, 4); // A random read operation
int value = *((int*)(&buffer[0]); // Not sure if it was like that...
Using a string
std::string buffer;
buffer.resize(4);
read(buffer.data(), 4); // Will not work as buffer.data() returns a const char*
int value = *((int*)(&buffer.data()[0]));
The problem of string is that it's designed to prevent bad usage or strange manipulations. As someone said, it's also because C++ is inherited from C. So there is functions (from libc/glibc) which takes a char* and not a string.
EDIT
Even if char* is different from char**, it's pretty complex to build a bi-dimensional array using std::vector or std::string, you should either make your proper class, use char**, or library specific implementation (Boost, Maths libs, etc...)

About the only place where a competent C++ programmer will use char* is in the interface of an extern "C" program, or in very low level code, like an implementation of malloc (where you need to add a number of bytes to a void*). Even when calling into a C ABI, the char* needed by the interface will generally come from a &s[0], where s is an std::string, or if the interface is not const aware (and a lot of C interfaces aren't), then the results of a const_cast.
char const* is a bit more frequent: a string literal is, after all, a char const[], and I will occasionally define something like:
struct S
{
int value;
char const* name;
};
But only for static data, eg:
S const table[] =
{
{ 1, "one" },
{ 2, "two" },
// ...
};
This can be used to avoid order of initialization issues; in the above, the initialization is static, and guaranteed to take place before any dynamic initialization.
There are few other cases: I've used char const*, for example, when marshalling between to C ABIs. But they are rare.

C++ Class that works on array of bytes (like string on chars)

Is there any c++ class that can be used like a string. Which has all stuff needed like comparators and etc?
I want to have something like string class that works on array of bytes instead of chars. I'm just asking because I don't want to write again something that already exists. I will use this class in std::map and etc.

That's exactly what an std::string is. A char is essentially a byte. It takes up one byte of space and it accepts all logical and bitwise operators (bit shifting: <<, >>; logical comparisons: &, |; etc.).
If for some reason you need something like an std::string but for a different datatype, simply use std::basic_string<DATATYPE>. In the STL, string itself is a typedef for basic_string<char>.

There is no such thing as byte in c++. You can use std::vector with unsigned char which has similar effect as byte in Java for example.
typedef unsigned char BYTE;
typedef std::vector<BYTE> ByteString;

Win32 data types equivalant in Linux

I am trying to convert a C++ library which is using widely DWORD, CString and BYTE in the program, and now I am converting the code from C++ Win32 library to linux program .
Also I am using openSUSE 12.3 and Anjuta IDE to do this , please help me which types I should use instead of mentioned types ?
I think I should use unsigned int for DWORD and string for CString and unsigned char instead of BYTE is it right ?

CString will not convert directly to std::string, but it is a rough equivalent.
BYTE is indeed unsigned char and DWORD is unsigned int. WORD is unsigned short.
You should definitely use typedef actual_type WINDOWS_NAME; to fix the code up, don't go through everywhere to replace the types. I would add a new headerfile that is called something like "wintypes.h", and include that everywhere the "windows.h" is used.
Edit for comment:
With CString, it really depends on how it is used (and whether the code is using what MS calls "Unicode" or "ASCII" strings). I would be tempted to create a class CString and then use std::string inside that. Most of it can probably be done by simply calling the equivalent std::string function, but some functions may need a bit more programming - again, it does depend on what member functions of CString are actually being used.
For LP<type>, that is just a pointer to the <type>, so typedef BYTE* LPBYTE; and typedef DWORD* LPDWORD; will do that.

DWORD = uint32_t
BYTE = uint8_t
These types are not OS specifics and were added to C++11. You need to include <cstdint> to get them. If you have an old compiler you could use boost/cstdint, which is header only.
Use std::string instead CString, but you will need to change some code.
With these changes your code should compile on both Windows and Linux.

I would suggest to use uint32_t and uint8_t from <stdint.h> for DWORD and BYTE and normal char * or const char * for strings (or the std:string class for C++).
Probably best thought is to use typedefs for existing code:
typedef unsigned char BYTE;
These can be changed easily.
If you rewrite code use char, int, long were useful and the (u)intX_t types, were you need a defined size.

typedef unsigned long DWORD;
typedef unsigned char BYTE;
CString -> maybe basic_string<TCHAR> ?

How to convert char* to unsigned short in C++

I have a char* name which is a string representation of the short I want, such as "15" and need to output this as unsigned short unitId to a binary file. This cast must also be cross-platform compatible.
Is this the correct cast: unitId = unsigned short(temp);
Please note that I am at an beginner level in understanding binary.

I assume that your char* name contains a string representation of the short that you want, i.e. "15".
Do not cast a char* directly to a non-pointer type. Casts in C don't actually change the data at all (with a few exceptions)--they just inform the compiler that you want to treat one type into another type. If you cast a char* to an unsigned short, you'll be taking the value of the pointer (which has nothing to do with the contents), chopping off everything that doesn't fit into a short, and then throwing away the rest. This is absolutely not what you want.
Instead use the std::strtoul function, which parses a string and gives you back the equivalent number:
unsigned short number = (unsigned short) strtoul(name, NULL, 0);
(You still need to use a cast, because strtoul returns an unsigned long. This cast is between two different integer types, however, and so is valid. The worst that can happen is that the number inside name is too big to fit into a short--a situation that you can check for elsewhere.)

#include <boost/lexical_cast.hpp>
unitId = boost::lexical_cast<unsigned short>(temp);

To convert a string to binary in C++ you can use stringstream.
#include <sstream>
. . .
int somefunction()
{
unsigned short num;
char *name = "123";
std::stringstream ss(name);
ss >> num;
if (ss.fail() == false)
{
// You can write out the binary value of num. Since you mention
// cross platform in your question, be sure to enforce a byte order.
}
}

that cast will give you (a truncated) integer version of the pointer, assuming temp is also a char*. This is almost certainly not what you want (and the syntax is wrong too).
Take a look at the function atoi, it may be what you need, e.g. unitId = (unsigned short)(atoi(temp));
Note that this assumes that (a) temp is pointing to a string of digits and (b) the digits represent a number that can fit into an unsigned short

Is the pointer name the id, or the string of chars pointed to by name? That is if name contains "1234", do you need to output 1234 to the file? I will assume this is the case, since the other case, which you would do with unitId = unsigned short(name), is certainly wrong.
What you want then is the strtoul() function.
char * endp
unitId = (unsigned short)strtoul(name, &endp, 0);
if (endp == name) {
/* The conversion failed. The string pointed to by name does not look like a number. */
}
Be careful about writing binary values to a file; the result of doing the obvious thing may work now but will likely not be portable.

If you have a string (char* in C) representation of a number you must use the appropriate function to convert that string to the numeric value it represents.
There are several functions for doing this. They are documented here:
http://www.cplusplus.com/reference/clibrary/cstdlib

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Strings of unsigned chars - c++

Related

working with binary data and unsigned char

string vs char* as class member variables. Why use char* at all?

C++ Class that works on array of bytes (like string on chars)

Win32 data types equivalant in Linux

How to convert char* to unsigned short in C++

Categories

Resources