Euler Project #8: What am I doing wrong? [duplicate]

Euler Project #8: What am I doing wrong? [duplicate] - c++

Not to long ago, someone told me that long are not 64 bits on 64 bit machines and I should always use int. This did not make sense to me. I have seen docs (such as the one on Apple's official site) say that long are indeed 64 bits when compiling for a 64-bit CPU. I looked up what it was on 64-bit Windows and found
Windows: long and int remain 32-bit in length, and special new data types
are defined for 64-bit integers.
(from http://www.intel.com/cd/ids/developer/asmo-na/eng/197664.htm?page=2)
What should I use? Should I define something like uw, sw ((un)signed width) as a long if not on Windows, and otherwise do a check on the target CPU bitsize?

In the Unix world, there were a few possible arrangements for the sizes of integers and pointers for 64-bit platforms. The two mostly widely used were ILP64 (actually, only a very few examples of this; Cray was one such) and LP64 (for almost everything else). The acronynms come from 'int, long, pointers are 64-bit' and 'long, pointers are 64-bit'.
Type ILP64 LP64 LLP64
char 8 8 8
short 16 16 16
int 64 32 32
long 64 64 32
long long 64 64 64
pointer 64 64 64
The ILP64 system was abandoned in favour of LP64 (that is, almost all later entrants used LP64, based on the recommendations of the Aspen group; only systems with a long heritage of 64-bit operation use a different scheme). All modern 64-bit Unix systems use LP64. MacOS X and Linux are both modern 64-bit systems.
Microsoft uses a different scheme for transitioning to 64-bit: LLP64 ('long long, pointers are 64-bit'). This has the merit of meaning that 32-bit software can be recompiled without change. It has the demerit of being different from what everyone else does, and also requires code to be revised to exploit 64-bit capacities. There always was revision necessary; it was just a different set of revisions from the ones needed on Unix platforms.
If you design your software around platform-neutral integer type names, probably using the C99 <inttypes.h> header, which, when the types are available on the platform, provides, in signed (listed) and unsigned (not listed; prefix with 'u'):
int8_t - 8-bit integers
int16_t - 16-bit integers
int32_t - 32-bit integers
int64_t - 64-bit integers
uintptr_t - unsigned integers big enough to hold pointers
intmax_t - biggest size of integer on the platform (might be larger than int64_t)
You can then code your application using these types where it matters, and being very careful with system types (which might be different). There is an intptr_t type - a signed integer type for holding pointers; you should plan on not using it, or only using it as the result of a subtraction of two uintptr_t values (ptrdiff_t).
But, as the question points out (in disbelief), there are different systems for the sizes of the integer data types on 64-bit machines. Get used to it; the world isn't going to change.

It is not clear if the question is about the Microsoft C++ compiler or the Windows API. However, there is no [c++] tag so I assume it is about the Windows API. Some of the answers have suffered from link rot so I am providing yet another link that can rot.
For information about Windows API types like INT, LONG etc. there is a page on MSDN:
Windows Data Types
The information is also available in various Windows header files like WinDef.h. I have listed a few relevant types here:
Type | S/U | x86 | x64
----------------------------+-----+--------+-------
BYTE, BOOLEAN | U | 8 bit | 8 bit
----------------------------+-----+--------+-------
SHORT | S | 16 bit | 16 bit
USHORT, WORD | U | 16 bit | 16 bit
----------------------------+-----+--------+-------
INT, LONG | S | 32 bit | 32 bit
UINT, ULONG, DWORD | U | 32 bit | 32 bit
----------------------------+-----+--------+-------
INT_PTR, LONG_PTR, LPARAM | S | 32 bit | 64 bit
UINT_PTR, ULONG_PTR, WPARAM | U | 32 bit | 64 bit
----------------------------+-----+--------+-------
LONGLONG | S | 64 bit | 64 bit
ULONGLONG, QWORD | U | 64 bit | 64 bit
The column "S/U" denotes signed/unsigned.

This article on MSDN references a number of type aliases (available on Windows) that are a bit more explicit with respect to their width:
http://msdn.microsoft.com/en-us/library/aa505945.aspx
For instance, although you can use ULONGLONG to reference a 64-bit unsigned integral value, you can also use UINT64. (The same goes for ULONG and UINT32.) Perhaps these will be a bit clearer?

Microsoft has also defined UINT_PTR and INT_PTR for integers that are the same size as a pointer.
Here is a list of Microsoft specific types - it's part of their driver reference, but I believe it's valid for general programming as well.

The easiest way to get to know it for your compiler/platform:
#include <iostream>
int main() {
std::cout << sizeof(long)*8 << std::endl;
}
Themultiplication by 8 is to get bits from bytes.
When you need a particular size, it is often easiest to use one of the predefined types of a library. If that is undesirable, you can do what often happens with autoconf software and have the configuration system determine the right type for the needed size.

For the history of how the choices were made for UNIX & Windows
(the idfferent choices were plausible, Microsoft wasn't being dumb, given its code base.):
The Long Road to 64 Bits - Double Double, Toil and Trouble, in
https://queue.acm.org/detail.cfm?id=1165766 2006 Queue
or
https://dl.acm.org/doi/pdf/10.1145/1435417.1435431 2009 CACM
Note: I helped design the 64/32-bit MIPS R4000, made the suggestion that led to <inttypes.h> and wrote the section of C99 explaining motivation for long long.

The size of long on Windows platforms is 32 bits (4 bytes).
You can check this using sizeof(long), which returns the size in bytes.

If you need to use integers of certain length, you probably should use some platform independent headers to help you. Boost is a good place to look at.

Related

Why are there different names for same type of data unit?

As far as i know, in c++ on a 32bit compiler, int = __int32 = long = DWORD. But why have so many? Why not just one?
If i were to pick a name, int32 seems most appropriate since there is no confusion there as to what it could be.

int is a pre-C99 type which is guaranteed to be at least 16 bits, but is 32 bits on most modern architectures. (It was originally intended to be the "native" word size, but even on 64-bit architectures it is usually still 32 bits, largely for backwards compatibility reasons.)
long is a pre-C99 type which is guaranteed to be at least 32 bits, but is allowed to be wider. (Few compilers make it longer, even on 64-bit architectures, largely for backwards compatibility reasons.)
__int32/__int32_t is a nonstandard typedef which was implemented by many C compilers and runtime libraries, to guarantee a fixed width pre-C99.
int32_t is a C99 type which is guaranteed to be exactly 32 bits.
DWORD is a typedef from the original Windows API which is guaranteed to be exactly 32 bits, from the days when there was no language-defined type of exactly 32 bits.
So basically, the large number of ways to say "32-bit integer" come from how C dragged its feet on standardizing fixed-width types, and from the long tenure of 32-bit processors dominating the field, causing everyone to standardize on 32 bits as the "normal" integer size.

Because of legacy applications. An int doesn't describe how big it is at all. It's an integer. Big deal.
In the 16-bit era, an int was not a long. DWORD being a double-word was precise. A word is known as 2 bytes, and therefore a DWORD must be two of them.
__intXX are Microsoft specific.

So, there are lots of different reasons why different projects (e.g Microsoft Windows) uses different types.
Where compilers TODAY are typically 32-bit, this has not always been the case. And there are compilers that are 64-bit.
The term DWORD originates from way back when Windows was a 16-bit segmented mode application (many members here have probably never worked on a 16-bit segmented mode environment). It is "two 16-bit words", treated, at least these days, as an unsigned 32-bit value.
The type int32_t is defined by the C standard document (and through inheritance, also in C++). It is GUARANTEED to only exist if it is actually exactly 32 bits. On a machine with 36-bit words, there is no int32_t (there is a int32_least_t, which should exist on all systems that support AT LEAST 32 bits).
long is 32 bits in a Windows 32- or 64-bit compiler, but 64-bits in a Linux 64-bit compiler, and 32-bits in a Linux 32-bit compiler. So it's definitely "variable size".
It is also often a good idea to pick your OWN name for types. That is assuming you do care at all - it's also fine to use int, long, etc, as long as you are not RELYING on them being some size - for(i = 0; i < 10; i++) x += i; will work with i and x being any integer type - the sum is even below 128, so char would work. Using int here will be fine, since it's likely to be a "fast" type. In some architectures, using long may make the code slower - especially in 16-bit architectures where long takes up two 16-bit words and needs to be dealt with using (typically) two or more operations for addition and subtraction for example. This can really slow code down in sensitive places.

It is because they represent different types which can be translated to different sizes.
int is a default 'integer' and its size is not specified.
`int32' says it is 32 bit (four bytes integer)
long is a 'longer version integer' which can occupy larger about of bytes. On your 32bit compiler it is still 4 bytes integer. A 'long long' type, which on Windows, as I remember was __int64 was 64bit.
DWORD is a Microsoft introduced type. It is a 'double word', where word, at that time, meant 'two bytes'
You choice of int32 is good when you know that you need 32bit integer.

`int` assumed to always be 32 bit in OpenCV?

It appears that in OpenCV, the int datatype is always assumed to be 32 bits. This is reflected in the documentation (for example, in the introduction), and also in the source code (for example, in the comments of modules/core/include/opencv2/core/cvdef.h, and the fact that it defines uint to be a 32-bit unsigned integer, but doesn't define a corresponding signed type).
How does this not break OpenCV on systems in which int isn't 32 bits? Afterall, int is only guaranteed to be 16 bits by the standard.
I would have expected OpenCV to define datatypes for all sizes that it uses (just like it does for int64), or use uint_8 and friends.

How does this not break OpenCV on systems in which int isn't 32 bits?
Probably, yes. You should try building on such a system to be sure. Then again, I wish you good luck finding such a system that still has enough memory and CPU power to do meaningful computer vision; 16-bit int is typically found on very small embedded systems these days.
The clean way to get a fast type of at least 32 bits wide is to use the int_fast32_t type from <stdint.h>, but this requires C99 support and Microsoft's C compiler has long not supported that standard.

Long Vs. Int C/C++ - What's The Point?

As I've learned recently, a long in C/C++ is the same length as an int. To put it simply, why? It seems almost pointless to even include the datatype in the language. Does it have any uses specific to it that an int doesn't have? I know we can declare a 64-bit int like so:
long long x = 0;
But why does the language choose to do it this way, rather than just making a long well...longer than an int? Other languages such as C# do this, so why not C/C++?

When writing in C or C++, every datatype is architecture and compiler specific. On one system int is 32, but you can find ones where it is 16 or 64; it's not defined, so it's up to compiler.
As for long and int, it comes from times, where standard integer was 16bit, where long was 32 bit integer - and it indeed was longer than int.

The specific guarantees are as follows:
char is at least 8 bits (1 byte by definition, however many bits it is)
short is at least 16 bits
int is at least 16 bits
long is at least 32 bits
long long (in versions of the language that support it) is at least 64 bits
Each type in the above list is at least as wide as the previous type (but may well be the same).
Thus it makes sense to use long if you need a type that's at least 32 bits, int if you need a type that's reasonably fast and at least 16 bits.
Actually, at least in C, these lower bounds are expressed in terms of ranges, not sizes. For example, the language requires that INT_MIN <= -32767, and INT_MAX >= +32767. The 16-bit requirements follows from this and from the requirement that integers are represented in binary.
C99 adds <stdint.h> and <inttypes.h>, which define types such as uint32_t, int_least32_t, and int_fast16_t; these are typedefs, usually defined as aliases for the predefined types.
(There isn't necessarily a direct relationship between size and range. An implementation could make int 32 bits, but with a range of only, say, -2**23 .. +2^23-1, with the other 8 bits (called padding bits) not contributing to the value. It's theoretically possible (but practically highly unlikely) that int could be larger than long, as long as long has at least as wide a range as int. In practice, few modern systems use padding bits, or even representations other than 2's-complement, but the standard still permits such oddities. You're more likely to encounter exotic features in embedded systems.)

long is not the same length as an int. According to the specification, long is at least as large as int. For example, on Linux x86_64 with GCC, sizeof(long) = 8, and sizeof(int) = 4.

long is not the same size as int, it is at least the same size as int. To quote the C++03 standard (3.9.1-2):
There are four signed integer types: “signed char”, “short int”,
“int”, and “long int.” In this list, each type provides at least as
much storage as those preceding it in the list. Plain ints have the
natural size suggested by the architecture of the execution
environment); the other signed integer types are provided to meet special needs.
My interpretation of this is "just use int, but if for some reason that doesn't fit your needs and you are lucky to find another integral type that's better suited, be our guest and use that one instead". One way that long might be better is if you 're on an architecture where it is... longer.

looking for something completely unrelated and stumbled across this and needed to answer. Yeah, this is old, so for people who surf on in later...
Frankly, I think all the answers on here are incomplete.
The size of a long is the size of the number of bits your processor can operate on at one time. It's also called a "word". A "half-word" is a short. A "doubleword" is a long long and is twice as large as a long (and originally was only implemented by vendors and not standard), and even bigger than a long long is a "quadword" which is twice the size of a long long but it had no formal name (and not really standard).
Now, where does the int come in? In part registers on your processor, and in part your OS. Your registers define the native sizes the CPU handles which in turn define the size of things like the short and long. Processors are also designed with a data size that is the most efficient size for it to operate on. That should be an int.
On todays 64bit machines you'd assume, since a long is a word and a word on a 64bit machine is 64bits, that a long would be 64bits and an int whatever the processor is designed to handle, but it might not be. Why? Your OS has chosen a data model and defined these data sizes for you (pretty much by how it's built). Ultimately, if you're on Windows (and using Win64) it's 32bits for both a long and int. Solaris and Linux use different definitions (the long is 64bits). These definitions are called things like ILP64, LP64, and LLP64. Windows uses LLP64 and Solaris and Linux use LP64:
Model ILP64 LP64 LLP64
int 64 32 32
long 64 64 32
pointer 64 64 64
long long 64 64 64
Where, e.g., ILP means int-long-pointer, and LLP means long-long-pointer
To get around this most compilers seem to support setting the size of an integer directly with types like int32 or int64.

I notice ints and longs have the same size. Why?

Just noticed this on OSX and I found it curious as I expected long to be bigger than int.
Is there any good reason for making them the same size?

This is a result of the loose nature of size definitions in the C and C++ language specifications. I believe C has specific minimum sizes, but the only rule in C++ is this:
1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
Moreover, sizeof(int) and sizeof(long) are not the same size on all platforms. Every 64-bit platform I've worked with has had long fit the natural word size, so 32 bits on a 32-bit architecture, and 64 bits on a 64-bit architecture.

int is essentially the most convenient and efficient integer type
long is/was the largest integer type
short is the smallest integer type
If the longest integer type is also the most efficient, the int is the same as long. A while ago (think pre-32 bit), sizeof(int) == sizeof(short) on a number of platforms since 16-bit was the widest natural integer.

int is supposed to be the natural word size of the architecture. In the old days, on 16 bit machines like the original IBM PC, ints were 16 bits and longs were 32 bits. On 32 bit machines like the 68000 series, ints were still "the natural word size", which was now 32 bits, and longs remained at 32 bits. Over time, longs grew to be 64 bits, and then we started using 64 bit architectures like the Intel Core 2, and so I expect int to grow to 64 bits sooner or later.
Interesting fact: On my laptop, with a Core 2 Duo and Mac OS X 10.5, int and long are both 32 bits. On my Linux box, also with a Core 2 Duo and Ubuntu, int is 32 bits and long is 64 bits.
Years ago, I was asked in a job interview where an int pointer would be after you added 3 to it. I answered "3 time sizeof(int) past where it is now". The interviewer pressed me, and I said it would depend on the architecture, since (at that time) Windows used 16 bit ints but since I was doing Unix programming I was more used to 32 bit ints. I didn't get the job - I suspect the interviewer didn't like the fact that I knew more than him.

As Tom correctly pointed, the only standard size in C++ is char, whose size is 1(*). From there on, only a 'not smaller than' relation holds between types. Most people will claim that it depends on the architecture, but it is more of a compiler/OS decision. The same hardware running MacOSX, Windows (32/64 bits) or Linux (32/64) will have different sizes for the same data types. Different compilers in the same architecture and OS can have different sizes. Even the exact same compiler on the same OS on the same hardware can have different sizes depending on compilation flags:
$ cat test.cpp
#include <iostream>
int main()
{
std::cout << "sizeof(int): " << sizeof(int) << std::endl;
std::cout << "sizeof(long): " << sizeof(long) << std::endl;
}
$ g++ -o test32 test.cpp; ./test32
sizeof(int): 4
sizeof(long): 4
$ g++ -o test64 test.cpp -m64; ./test64
sizeof(int): 4
sizeof(long): 8
That is the result of using gcc compiler on MacOSX Leopard. As you can see the hardware and software is the same and yet sizes do differ on two executables born out of the same code.
If your code depends on sizes, then you are better off not using the default types but specific types for your compiler that make size explicit. Or using some portable libraries that offer that support, as an example with ACE: ACE_UINT64 will be an unsigned integer type of 64 bits, regardless of the compiler/os/architecture. The library will detect the compiler and environment and use the appropriate data type on each platform.
(*) I have rechecked the C++ standard 3.9.1: char size shall be 'large enough to store any member of the implementation's basic character set'. Later in: 5.3.3: sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1, so yes, size of a char is 1 byte.
After reading other answers I found one that states that bool is the smallest integer type. Again, the standard is loose in the requirements and only states that it can represent true and false but not it's size. The standard is explicit to that extent: 5.3.3, footnote: "sizeof(bool) is not required to be 1".
Note that some C++ implementations have decided to use bools larger than 1 byte for other reasons. In Apple MacOSX PPC systems with gcc, sizeof(bool)==4.

int and long are not always the same size, so do not assume that they are in code. Historically there have been 8 bit and 16 bit, as well as the more familiar 32 bit and 64 bit architectures. For embedded systems smaller word sizes are still common. Search the net for ILP32 and LP64 for way too much info.

What is the bit size of long on 64-bit Windows?

Not to long ago, someone told me that long are not 64 bits on 64 bit machines and I should always use int. This did not make sense to me. I have seen docs (such as the one on Apple's official site) say that long are indeed 64 bits when compiling for a 64-bit CPU. I looked up what it was on 64-bit Windows and found
Windows: long and int remain 32-bit in length, and special new data types
are defined for 64-bit integers.
(from http://www.intel.com/cd/ids/developer/asmo-na/eng/197664.htm?page=2)
What should I use? Should I define something like uw, sw ((un)signed width) as a long if not on Windows, and otherwise do a check on the target CPU bitsize?

It is not clear if the question is about the Microsoft C++ compiler or the Windows API. However, there is no [c++] tag so I assume it is about the Windows API. Some of the answers have suffered from link rot so I am providing yet another link that can rot.
For information about Windows API types like INT, LONG etc. there is a page on MSDN:
Windows Data Types
The information is also available in various Windows header files like WinDef.h. I have listed a few relevant types here:
Type | S/U | x86 | x64
----------------------------+-----+--------+-------
BYTE, BOOLEAN | U | 8 bit | 8 bit
----------------------------+-----+--------+-------
SHORT | S | 16 bit | 16 bit
USHORT, WORD | U | 16 bit | 16 bit
----------------------------+-----+--------+-------
INT, LONG | S | 32 bit | 32 bit
UINT, ULONG, DWORD | U | 32 bit | 32 bit
----------------------------+-----+--------+-------
INT_PTR, LONG_PTR, LPARAM | S | 32 bit | 64 bit
UINT_PTR, ULONG_PTR, WPARAM | U | 32 bit | 64 bit
----------------------------+-----+--------+-------
LONGLONG | S | 64 bit | 64 bit
ULONGLONG, QWORD | U | 64 bit | 64 bit
The column "S/U" denotes signed/unsigned.

This article on MSDN references a number of type aliases (available on Windows) that are a bit more explicit with respect to their width:
http://msdn.microsoft.com/en-us/library/aa505945.aspx
For instance, although you can use ULONGLONG to reference a 64-bit unsigned integral value, you can also use UINT64. (The same goes for ULONG and UINT32.) Perhaps these will be a bit clearer?

Microsoft has also defined UINT_PTR and INT_PTR for integers that are the same size as a pointer.
Here is a list of Microsoft specific types - it's part of their driver reference, but I believe it's valid for general programming as well.

The easiest way to get to know it for your compiler/platform:
#include <iostream>
int main() {
std::cout << sizeof(long)*8 << std::endl;
}
Themultiplication by 8 is to get bits from bytes.
When you need a particular size, it is often easiest to use one of the predefined types of a library. If that is undesirable, you can do what often happens with autoconf software and have the configuration system determine the right type for the needed size.

For the history of how the choices were made for UNIX & Windows
(the idfferent choices were plausible, Microsoft wasn't being dumb, given its code base.):
The Long Road to 64 Bits - Double Double, Toil and Trouble, in
https://queue.acm.org/detail.cfm?id=1165766 2006 Queue
or
https://dl.acm.org/doi/pdf/10.1145/1435417.1435431 2009 CACM
Note: I helped design the 64/32-bit MIPS R4000, made the suggestion that led to <inttypes.h> and wrote the section of C99 explaining motivation for long long.

The size of long on Windows platforms is 32 bits (4 bytes).
You can check this using sizeof(long), which returns the size in bytes.

If you need to use integers of certain length, you probably should use some platform independent headers to help you. Boost is a good place to look at.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js