Fortran storage_size intrinsic function - fortran

I am looking at the storage_size intrinsic function introduced in Fortran 2008 to obtain the size of a user-defined type man storage size. It returns the size in bits, not bytes. I am wondering what the rationale is behind returning the size in bits instead of bytes.
Since I need the size in bytes, I am simply going to divide the result by 8. Is it safe to assume that the size returned will always be divisible by 8?

It is not even safe to expect byte is always 8 bits (see CHARACTER_STORAGE_SIZE in module iso_fortran_env)! For rationale for the storage_size() contact someone from SC22/WG5 or X3J3, but one of the former members always says (on comp.lang.fortran) these questions don't have much sense and a single clear answer. There was often just someone pushing this variant and not the other.
My guess would be the symmetry with the former function bit_size() is one of the reasons. And why is there bit_size() and not byte_size()? I would guess you do not have to multiply it with the byte size (and check how large is one byte) and you can apply the bit manipulation procedures instantly.
To your last question. Yes, on a machine with 8-bit bytes (other machines do not have Fortran 2008 compilers AFAIK) the bit size will always be divisible by 8 as one byte is the smallest addressable piece of memory and structures cannot use just part of one byte.

Related

What is a good example of leveraging the differences between int*_t, int_fast*_t and int_least*_t in C++11?

According to online documentation, there are differences in between these fixed width integer types. For int*_t we fixed the width to whatever the value of * is. Yet for the other two types, the adjectives fastest and smallest are used in the description to request the fastest or the smallest instances provided by the underlying data model.
What are the objective meanings of "the fastest" or the "smallest"? What is an example in where this would be advantageous or even necessary?
There is no objective meaning to "fastest"; it's basically a judgement call by the compiler writer. Typically, it means expanding smaller values to the native register width of the architecture, but that's not always fastest (e.g. a 1 billion entry array would probably be processed quicker if it were 8 bit values, but uint_fast8_t might be a 32 bit value because the CPU register manipulation goes faster for that size).
"smallest" usually means "the same size as the bits requested", but on weird architectures with limited size values to choose from (e.g. old Crays had everything as a 64 bit type), int_least16_t would work (and seamlessly become a 64 bit value), while the compiler would likely error out on int16_t (because it's impossible to make a true 16 bit integer value there).
Point is, if you're relying on overflow behaviors, you need to use an exact fixed width type. Otherwise, you should probably default to least types for maximum portability, switching to fast types in hot code paths, but profiling would be needed to determine if it really makes any difference.

Capacity of fundamental types across different platforms

I know that sizeof(type) will return different values, depending on the platform and the compiler.
However, I know that whenever talking about ints (int32) it is said that it can be one of 2^32 values.
If I'm on a platform where int32 is 8 bytes, it's theoretical maximum is 2^64. Can it really store that much data, or does it always store 4 bytes and use 4 bytes for padding?
The question really is, while I know that sizes of types will differ, I want to know whether asking for max_int on various platform will be constant or will it give me the value according to the type size.
Particularly when dealing with files. If I write int32 to file, will it always store 4 bytes, or will it depend?
EDIT:
Given all the comments and the fact that I'm trying to create an equivalent of C# BinaryReader, I think that using fixed size type is the best choice, since it would delegate all this to whoever uses it (making it more flexible). Right?
std::int32_t has always a size of 32bit (usually 4 bytes).
The size of int can vary and depends on the platform you compile for, but at least 16 bit (usually 2 bytes).
You can check the max value of your type in C++:
#include <limits>
std::numeric_limits<std::int32_t>::max()
std::numeric_limits<int>::max()

Example of systems/compilers where sizeof(bool) is higher than 1 (in c++)? [duplicate]

Size of char, signed char and unsigned char is defined to be 1 byte, by the C++ Standard itself. I'm wondering why it didn't define the sizeof(bool) also?
C++03 Standard $5.3.3/1 says,
sizeof(char), sizeof(signed char) and
sizeof(unsigned char) are 1; the
result of sizeof applied to any other
fundamental type (3.9.1) is
implementation-defined. [Note: in
particular,sizeof(bool) and
sizeof(wchar_t) are
implementation-defined.69)
I understand the rationale that sizeof(bool) cannot be less than one byte. But is there any rationale why it should be greater than 1 byte either? I'm not saying that implementations define it to be greater than 1, but the Standard left it to be defined by implementation as if it may be greater than 1.
If there is no reason sizeof(bool) to be greater than 1, then I don't understand why the Standard didn't define it as just 1 byte, as it has defined sizeof(char), and it's all variants.
The other likely size for it is that of int, being the "efficient" integer type for the platform.
On architectures where it makes any difference whether the implementation chooses 1 or sizeof(int) there could be a trade-off between size (but if you're happy to waste 7 bits per bool, why shouldn't you be happy to waste 31? Use bitfields when size matters) vs. performance (but when is storing and loading bool values going to be a genuine performance issue? Use int explicitly when speed matters). So implementation flexibility wins - if for some reason 1 would be atrocious in terms of performance or code size, it can avoid it.
As #MSalters pointed out, some platforms work more efficiently with larger data items.
Many "RISC" CPUs (e.g., MIPS, PowerPC, early versions of the Alpha) have/had a considerably more difficult time working with data smaller than one word, so they do the same. IIRC, with at least some compilers on the Alpha a bool actually occupied 64 bits.
gcc for PowerPC Macs defaulted to using 4 bytes for a bool, but had a switch to change that to one byte if you wanted to.
Even for the x86, there's some advantage to using a 32-bit data item. gcc for the x86 has (or at least used to have -- I haven't looked recently at all) a define in one of its configuration files for BOOL_TYPE_SIZE (going from memory, so I could have that name a little wrong) that you could set to 1 or 4, and then re-compile the compiler to get a bool of that size.
Edit: As for the reason behind this, I'd say it's a simple reflection of a basic philosophy of C and C++: leave as much room for the implementation to optimize/customize its behavior as reasonable. Require specific behavior only when/if there's an obvious, tangible benefit, and unlikely to be any major liability, especially if the change would make it substantially more difficult to support C++ on some particular platform (though, of course, if the platform is sufficiently obscure, it might get ignored).
Many platforms cannot effectively load values smaller than 32 bits. They have to load 32 bits, and use a shift-and-mask operation to extract 8 bits. You wouldn't want this for single bools, but it's OK for strings.
The operation resulted in 'sizeof' is MADUs (minimal addresible unit), not bytes. So family processors C54 *. C55 * Texas Instuments, the expression 1 MADU = 2 bytes.
For this platform sizeof (bool) = sizeof (char) = 1 MADUs = 2 bytes.
This does not violate the C + + standard, but clarifies the situation.

What platforms have something other than 8-bit char?

Every now and then, someone on SO points out that char (aka 'byte') isn't necessarily 8 bits.
It seems that 8-bit char is almost universal. I would have thought that for mainstream platforms, it is necessary to have an 8-bit char to ensure its viability in the marketplace.
Both now and historically, what platforms use a char that is not 8 bits, and why would they differ from the "normal" 8 bits?
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
In the past I've come across some Analog Devices DSPs for which char is 16 bits. DSPs are a bit of a niche architecture I suppose. (Then again, at the time hand-coded assembler easily beat what the available C compilers could do, so I didn't really get much experience with C on that platform.)
char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.
Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.
In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
It's not so much that it's "worth giving consideration" to something as it is playing by the rules. In C++, for example, the standard says all bytes will have "at least" 8 bits. If your code assumes that bytes have exactly 8 bits, you're violating the standard.
This may seem silly now -- "of course all bytes have 8 bits!", I hear you saying. But lots of very smart people have relied on assumptions that were not guarantees, and then everything broke. History is replete with such examples.
For instance, most early-90s developers assumed that a particular no-op CPU timing delay taking a fixed number of cycles would take a fixed amount of clock time, because most consumer CPUs were roughly equivalent in power. Unfortunately, computers got faster very quickly. This spawned the rise of boxes with "Turbo" buttons -- whose purpose, ironically, was to slow the computer down so that games using the time-delay technique could be played at a reasonable speed.
One commenter asked where in the standard it says that char must have at least 8 bits. It's in section 5.2.4.2.1. This section defines CHAR_BIT, the number of bits in the smallest addressable entity, and has a default value of 8. It also says:
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
So any number equal to 8 or higher is suitable for substitution by an implementation into CHAR_BIT.
Machines with 36-bit architectures have 9-bit bytes. According to Wikipedia, machines with 36-bit architectures include:
Digital Equipment Corporation PDP-6/10
IBM 701/704/709/7090/7094
UNIVAC 1103/1103A/1105/1100/2200,
A few of which I'm aware:
DEC PDP-10: variable, but most often 7-bit chars packed 5 per 36-bit word, or else 9 bit chars, 4 per word
Control Data mainframes (CDC-6400, 6500, 6600, 7600, Cyber 170, Cyber 176 etc.) 6-bit chars, packed 10 per 60-bit word.
Unisys mainframes: 9 bits/byte
Windows CE: simply doesn't support the `char` type at all -- requires 16-bit wchar_t instead
There is no such thing as a completely portable code. :-)
Yes, there may be various byte/char sizes. Yes, there may be C/C++ implementations for platforms with highly unusual values of CHAR_BIT and UCHAR_MAX. Yes, sometimes it is possible to write code that does not depend on char size.
However, almost any real code is not standalone. E.g. you may be writing a code that sends binary messages to network (protocol is not important). You may define structures that contain necessary fields. Than you have to serialize it. Just binary copying a structure into an output buffer is not portable: generally you don't know neither the byte order for the platform, nor structure members alignment, so the structure just holds the data, but not describes the way the data should be serialized.
Ok. You may perform byte order transformations and move the structure members (e.g. uint32_t or similar) using memcpy into the buffer. Why memcpy? Because there is a lot of platforms where it is not possible to write 32-bit (16-bit, 64-bit -- no difference) when the target address is not aligned properly.
So, you have already done a lot to achieve portability.
And now the final question. We have a buffer. The data from it is sent to TCP/IP network. Such network assumes 8-bit bytes. The question is: of what type the buffer should be? If your chars are 9-bit? If they are 16-bit? 24? Maybe each char corresponds to one 8-bit byte sent to network, and only 8 bits are used? Or maybe multiple network bytes are packed into 24/16/9-bit chars? That's a question, and it is hard to believe there is a single answer that fits all cases. A lot of things depend on socket implementation for the target platform.
So, what I am speaking about. Usually code may be relatively easily made portable to certain extent. It's very important to do so if you expect using the code on different platforms. However, improving portability beyond that measure is a thing that requires a lot of effort and often gives little, as the real code almost always depends on other code (socket implementation in the example above). I am sure that for about 90% of code ability to work on platforms with bytes other than 8-bit is almost useless, for it uses environment that is bound to 8-bit. Just check the byte size and perform compilation time assertion. You almost surely will have to rewrite a lot for a highly unusual platform.
But if your code is highly "standalone" -- why not? You may write it in a way that allows different byte sizes.
It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.
Many DSP chips have 16- or 32-bit char. TI routinely makes such chips for example.
The C and C++ programming languages, for example, define byte as "addressable unit of data large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). Since the C char integral data type must contain at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values. Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits
Quoted from http://en.wikipedia.org/wiki/Byte#History
Not sure about other languages though.
http://en.wikipedia.org/wiki/IBM_7030_Stretch#Data_Formats
Defines a byte on that machine to be variable length
The DEC PDP-8 family had a 12 bit word although you usually used 8 bit ASCII for output (on a Teletype mostly). However, there was also a 6-BIT character code that allowed you to encode 2 chars in a single 12-bit word.
For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof and the values in limits.h if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.
For this reason, I try to stick to data types like uint16_t when I need a data type of a particular bit length.
Edit: Sorry, I initially misread your question.
The C spec says that a char object is "large enough to store any member of the execution character set". limits.h lists a minimum size of 8 bits, but the definition leaves the max size of a char open.
Thus, the a char is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char size may be longer.
Historically, the x86 platform's opcode was one byte long, so char was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.
When thinking about multi-platform support, take advantage of the types defined in stdint.h. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char, short, int, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.
If you need to know the exact size of a char because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t.
what sort of consideration is it worth giving to platforms with non-8-bit char?
magic numbers occur e.g. when shifting;
most of these can be handled quite simply
by using CHAR_BIT and e.g. UCHAR_MAX instead of 8 and 255 (or similar).
hopefully your implementation defines those :)
those are the "common" issues.....
another indirect issue is say you have:
struct xyz {
uchar baz;
uchar blah;
uchar buzz;
}
this might "only" take (best case) 24 bits on one platform,
but might take e.g. 72 bits elsewhere.....
if each uchar held "bit flags" and each uchar only had 2 "significant" bits or flags that
you were currently using, and you only organized them into 3 uchars for "clarity",
then it might be relatively "more wasteful" e.g. on a platform with 24-bit uchars.....
nothing bitfields can't solve, but they have other things to watch out
for ....
in this case, just a single enum might be a way to get the "smallest"
sized integer you actually need....
perhaps not a real example, but stuff like this "bit" me when porting / playing with some code.....
just the fact that if a uchar is thrice as big as what is "normally" expected,
100 such structures might waste a lot of memory on some platforms.....
where "normally" it is not a big deal.....
so things can still be "broken" or in this case "waste a lot of memory very quickly" due
to an assumption that a uchar is "not very wasteful" on one platform, relative to RAM available, than on another platform.....
the problem might be more prominent e.g. for ints as well, or other types,
e.g. you have some structure that needs 15 bits, so you stick it in an int,
but on some other platform an int is 48 bits or whatever.....
"normally" you might break it into 2 uchars, but e.g. with a 24-bit uchar
you'd only need one.....
so an enum might be a better "generic" solution ....
depends on how you are accessing those bits though :)
so, there might be "design flaws" that rear their head....
even if the code might still work/run fine regardless of the
size of a uchar or uint...
there are things like this to watch out for, even though there
are no "magic numbers" in your code ...
hope this makes sense :)
The weirdest one I saw was the CDC computers. 6 bit characters but with 65 encodings. [There were also more than one character set -- you choose the encoding when you install the OS.]
If a 60 word ended with 12, 18, 24, 30, 36, 40, or 48 bits of zero, that was the end of line character (e.g. '\n').
Since the 00 (octal) character was : in some code sets, that meant BNF that used ::= was awkward if the :: fell in the wrong column. [This long preceded C++ and other common uses of ::.]
ints used to be 16 bits (pdp11, etc.). Going to 32 bit architectures was hard. People are getting better: Hardly anyone assumes a pointer will fit in a long any more (you don't right?). Or file offsets, or timestamps, or ...
8 bit characters are already somewhat of an anachronism. We already need 32 bits to hold all the world's character sets.
The Univac 1100 series had two operational modes: 6-bit FIELDATA and 9-bit 'ASCII' packed 6 or 4 characters respectively into 36-bit words. You chose the mode at program execution time (or compile time.) It's been a lot of years since I actually worked on them.

Usage of 'short' in C++

Why is it that for any numeric input we prefer an int rather than short, even if the input is of very few integers.
The size of short is 2 bytes on my x86 and 4 bytes for int, shouldn't it be better and faster to allocate than an int?
Or I am wrong in saying that short is not used?
CPUs are usually fastest when dealing with their "native" integer size. So even though a short may be smaller than an int, the int is probably closer to the native size of a register in your CPU, and therefore is likely to be the most efficient of the two.
In a typical 32-bit CPU architecture, to load a 32-bit value requires one bus cycle to load all the bits. Loading a 16-bit value requires one bus cycle to load the bits, plus throwing half of them away (this operation may still happen within one bus cycle).
A 16-bit short makes sense if you're keeping so many in memory (in a large array, for example) that the 50% reduction in size adds up to an appreciable reduction in memory overhead. They are not faster than 32-bit integers on modern processors, as Greg correctly pointed out.
In embedded systems, the short and unsigned short data types are used for accessing items that require less bits than the native integer.
For example, if my USB controller has 16 bit registers, and my processor has a native 32 bit integer, I would use an unsigned short to access the registers (provided that the unsigned short data type is 16-bits).
Most of the advice from experienced users (see news:comp.lang.c++.moderated) is to use the native integer size unless a smaller data type must be used. The problem with using short to save memory is that the values may exceed the limits of short. Also, this may be a performance hit on some 32-bit processors, as they have to fetch 32 bits near the 16-bit variable and eliminate the unwanted 16 bits.
My advice is to work on the quality of your programs first, and only worry about optimization if it is warranted and you have extra time in your schedule.
Using type short does not guarantee that the actual values will be smaller than those of type int. It allows for them to be smaller, and ensures that they are no bigger. Note too that short must be larger than or equal in size to type char.
The original question above contains actual sizes for the processor in question, but when porting code to a new environment, one can only rely on weak relative assumptions without verifying the implementation-defined sizes.
The C header <stdint.h> -- or, from C++, <cstdint> -- defines types of specified size, such as uint8_t for an unsigned integral type exactly eight bits wide. Use these types when attempting to conform to an externally-specified format such as a network protocol or binary file format.
The short type is very useful if you have a big array full of them and int is just way too big.
Given that the array is big enough, the memory saving will be important (instead of just using an array of ints).
Unicode arrays are also encoded in shorts (although other encode schemes exist).
On embedded devices, space still matters and short might be very beneficial.
Last but not least, some transmission protocols insists in using shorts, so you still need them there.
Maybe we should consider it in different situations. For example, x86 or x64 should consider more suitable type, not just choose int. In some cases, int have faster speed than short. The first floor have answered this question