What do I declare message length as? int or char? - c++

I am relatively new to coding and would like to know how do I declare message length (mlen) and ciphertext (clen) length in my c++ code. However, I am not too sure what am I suppose to declare them as. (int? char? unsigned long long?)
The formula that was given to me to include in my code is:
*clen = mlen + CRYPTO_ABYTES
Information that was given to me is:
It operates on a state of 320-bits with message blocks of 64 bits.
UPDATE: Sorry for the bad question, I realized I was given unsigned long long for the message length, it was written in a smaller font that I did not realized it.

If there is no strict requirement regarding how big your type has to be, you should use the native C++ type designed to represent sizes:
std::size_t
The main advantages of std::size_t over types such as unsigned int, unsigned short etc is:
It is capable of holding the maximum possible size of any C++ type, including arrays (since it's defined as the result type of the sizeof operator).
As such, it's often optimized to your architecture (e.g. RAM address space).
It is typically used as the type to represent indices and lengths in new[], std::vector, and large parts of the STL in general. As you are likely going to use such data structures and algorithms, you can avoid useless conversions between different types, making your code more readable, without compiler warnings.
The type std::size_t in your code tells you something about its semantics (indices and sizes; unsigned implied). Other integer types such as unsigned int can be used to represent anything.
For additional information, check out cppreference.com.
On the other hand, if you need to pack your data tightly (e.g. for network protocol), you should use the smallest possible type such as std::uint16_t, std::uint32_t etc.

Related

Primitive that occupies 8 bits in OCaml

I was surprised to discover, when using Spacetime to profile my OCaml, that my char and even bool arrays used a word to represent each element. That's 8 bytes on my 64 bit machine, and causes way too much memory to be used.
I've substituted char array with Bytes where possible, but I also have char list and dynamic arrays (char BatDynArray). Is there some primitive or general method that I can use across all of these vector data structures and get an underlying 8 bit representation?
Edit: I read your question too fast: it’s possible you already know that; sorry! Here is a more targeted answer.
I think the general advice for storing a varying numbers of chars of varying number (i.e. when doing IO) is to use buffers, possibly resizable. Module Buffer implements a resizable character buffer, which is better than both char list (bad design, except for very short lists perhaps) and char BatDynArray (whose genericity incurs a memory penalty here, as you noticed).
Below is the original answer.
That’s due to the uniform representation of values. Whatever their type , every OCaml value is a machine word: either an immediate value (anything that can fit a 31- or 63-bit integer, so int, char, bool, etc.), or a pointer to a block, i.e. a sequence of machine words (a C-fashion array), prefixed with a header. When the value is a pointer to a block we say that it is “boxed”.
Cells of OCaml arrays are always machine words.
In OCaml, like in C++ but without the ad-hoc overloading, we just define specializations of array in the few cases where we actually want to save space. In your case:
instead of char array use string (immutable) or bytes (mutable) or Buffer.t (mutable appendable and resizable); these types signal to the GC that their cells are never pointers, so they can pack arbitrary binary data;
Unfortunately, the standard library has no specialization for bool array, but we can implement one (e.g. using bytes); you can find one in several third-party libraries, for instance module CCBV (“bitvectors”) in package containers-data.
Finally, you may not have realized it, but floats are boxed! That’s because they require 64 bits (IEEE 754 double-precision), which is more than the 31 or even 63 bits that are available for immediates. Fortunately(?), the compiler and runtime have some adhoc-ery to avoid boxing them as much as possible. In particular float array is specially optimized, so that it stores the raw floating-point numbers instead of pointers to them.
Some more background: we can distinguish between pointers and immediates just by testing one bit. Uniform representation is highly valuable for:
implementing garbage collection,
free parametric polymorphism (no code duplication, by contrast with what you’d get in a template language such as C++).

Does "size_t" make code portable, or just 'more' portable?

I'm struggling to understand the usefulness of the C++ std::size_t data type. I realize that this data type is platform dependent, and is supposedly meant to make code more portable. However, it seems like it doesn't solve all the problems.
Say for example I'm working on a machine that has 32 bit int. Let's say I decide to write a c style function of this machine that just copies the bytes from one object to another. Inside this function, the memcpy function is used to write the data from object2 to object1. I've chosen an arbitrarily large number.
void writeBytes(obj *pobj1, obj *pobj2)
{
memcpy(pobj1, pobj2, 1048575);
}
This code should (hopefully) compile just fine. Because memcpy uses size_t in its declaration, and because size_t on this platform should be 32 bits, the number of 1048575 should work just fine.
But now let's say I decide to port this function over to a machine that 16 bit ints. Now the memcpy function interprets size_t as being of size 16. In this case, 1048575 is exceeds the allowed values for what memcpy was declared for. The code then fails to compile.
So my question: how exactly was size_t useful in this case? How did it make our code more portable?
size_t is able to hold the size of the largest object you can create. It is not required to be your platform's largest native integer type.
Given your example, your code would work regardless of the native integer being 16-bit or 32-bit if your platform allows 1048575 byte objects. Or the inverse -- if 1048575 doesn't fit in a size_t, you never could have created an object that large to memcpy.
size_t is susceptible to the same overflow and underflow rules as any other integral type. It's only typedef of an unsigned integral. Its purpose is not to prevent you from assigning or casting values that are outside of its range. It defines a standard type that will:
"store the maximum size of a theoretically possible object of any type
(including array)."
If you care about that maximum size, use
std::numeric_limits<std::size_t>::max()
and make decisions off of that.
I think size_t makes (not only your) code more readable and consistent, not necessarily more portable.
It will help porting it, however.
Imagine functions working with some size use a happy mixture of int, short, long, unsigned, unsigned short, ...
When one such function dealing with a size calls another that needs a size parameter too, it's very helpful when one size_t fits all (or at least most, like for the result of read()).

Why does anybody ever use int data type?

In C++ (which i am learning and am still very very new to), I have noticed that almost everybody uses the int data type. But why? I know that short, long, and long long have definite sizes pretty much, but int seems like it might be short or long depending on the system. So, why don't people be more specific about the types? If they put a number into an int that is too big for a short, then on certain systems it will be really bad. If the number you're putting into an int is small enough to fit into a short, then on systems where it defaults to long memory space is wasted. So why does everybody use int?
According to the standard (C++11, §3.9.1/2),
Plain ints have the natural size suggested by the
architecture of the execution environment; the other signed integer types are provided to meet special
needs.
So int is the type you should use unless you have a good reason to use any other type, because int is supposed to map to the type that the architecture is optimized to use most of the time.
You're correct in that int has a variable size. But short, long, and long long also have variable sizes. So they aren't a better option.
I'm not going to speculate on why people use int because that would just be my opinion.
If you need an actually true sized integer though, you should use int32_t or uint32_t or int64_t. These types have required sizes.

C++ BOOL (typedef int) vs bool for performance

I read somewhere that using BOOL (typedef int) is better than using the standard c++ type bool because the size of BOOL is 4 bytes (i.e. a multiple of 4) and it saves alignment operations of variables into registers or something along those lines...
Is there any truth to this? I imagine that the compiler would pad the stack frames in order to keep alignments of multiple of 4s even if you use bool (1 byte)?
I'm by no means an expert on the underlying workings of alignments, registers, etc so I apologize in advance if I've got this completely wrong. I hope to be corrected. :)
Cheers!
First of all, sizeof(bool) is not necessarily 1. It is implementation-defined, giving the compiler writer freedom to choose a size that's suitable for the target platform.
Also, sizeof(int) is not necessarily 4.
There are multiple issues that could affect performance:
alignment;
memory bandwidth;
CPU's ability to efficiently load values that are narrower than the machine word.
What -- if any -- difference that makes to a particular piece of code can only be established by profiling that piece of code.
The only guaranteed size you can get in C++ is with char, unsigned char, and signed char 2), which are always exactly one byte and defined for every platform.0)1)
0) Though a byte does not have a defined size. sizeof(char) is always 1 byte, but might be 40 binary bits in fact
1) Yes, there is uint32_t and friends, but no, their definition is optional for actual C++ implementations. Use them, but you may get compile time errors if they are not available (compile time errors are always good)
2) char, unsigned char, and signed char are distinct types and it is not defined whether char is signed or not. Keep this in mind when overloading functions and writing templates.
There are three commonly accepted performance-driven practices in regards to booleans:
In if-statements order of checking the expressions matters and one needs to be careful about them.
If a check of a boolean expression causes a lot of branch mispredictions, then it should (if possible) be substituted with a bit twiddling hack.
Since boolean is a smallest data type, boolean variables should be declared last in structures and classes, so that padding does not add noticeable holes in the structure memory layout.
I've never heard about any performance gain from substituting a boolean with (unsigned?) integer however.

What are the arguments against using size_t?

I have a API like this,
class IoType {
......
StatusType writeBytes(......, size_t& bytesWritten);
StatusType writeObjects(......, size_t& objsWritten);
};
A senior member of the team who I respect seems to have a problem with the type size_t and suggest that I use C99 types. I know it sounds stupid but I always think c99 types like uint32_t and uint64_t look ugly. I do use them but only when it's really necessary, for instance when I need to serialize/deserialize a structure, I do want to be specific about the sizes of my data members.
What are the arguments against using size_t? I know it's not a real type but if I know for sure even a 32-bit integer is enough for me and a size type seems to be appropriate for number of bytes or number of objects, etc.
Use exact-size types like uint32_t whenever you're dealing with serialization of any sort (binary files, networking, etc.). Use size_t whenever you're dealing with the size of an object in memory—that's what it's intended for. All of the functions that deal with object sizes, like malloc, strlen, and the sizeof operator all size_t.
If you use size_t correctly, your program will be maximally portable, and it will not waste time and memory on platforms where it doesn't need to. On 32-bit platforms, a size_t will be 32 bits—if you instead used a uint64_t, you'd waste time and space. Conversely, on 64-bit platforms, a size_t will be 64 bits—if you instead used a uint32_t, your program could behave incorrectly (maybe even crash or open up a security vulnerability) if it ever had to deal with a piece of memory larger than 4 GB.
I can't think of anything wrong in using size_t in contexts where you don't need to serialize values. Also using size_t correctly will increase the code's safety/portability across 32 and 64 bit patforms.
Uhm, it's not a good idea to replace size_t (a maximally portable thing) with a less portable C99 fixed size or minimum size unsigned type.
On the other hand, you can avoid a lot of technical problems (wasted time) by using the signed ptrdiff_t type instead. The standard library’s use of unsigned type is just for historical reasons. It made sense in its day, and even today on 16-bit architectures, but generally it is nothing but trouble & verbosity.
Making that change requires some support, though, in particular a general size function that returns array or container size as ptrdiff_t.
Now, regarding your function signature
StatusType writeBytes(......, size_t& bytesWritten);
This forces the calling code’s choice of type for the bytes written count.
And then, with unsigned type size_t forced, it is easy to introduce a bug, e.g. by checking if that is less or more than some computed quantity.
A grotesque example: std::string("ah").length() < -5 is guaranteed true.
So instead, make that …
Size writeBytes(......);
or, if you do not want to use exceptions,
Size writeBytes(......, StatusType& status );
It is OK to have an enumeration of possible statuses as unsigned type or as whatever, because the only operations on status values will be equality checking and possibly as keys.