How to implement a bit array in Modern Fortran? - fortran

I am new to Fortran 2008 and am trying to implement a Sieve of Atkin. In C++ I implemented this using a std::bitset but was unable to find anything in Fortran 2008 that serves this purpose.
Can anyone point me at any example code or explain an implementation strategy for one?

Standard Fortran doesn't have a precise analogue of what I understand std:bitset to be -- though I grant you my understanding may be defective. Generally, and if you want to stick to standard Fortran, you would use integers as sets of bits. If one integer doesn't have enough bits for your purposes, use arrays of integers. This does mean, though, that the responsibility for tracking where, say, the 307-th bit of your bitset is falls on you
Prior to the 2008 standard you have functions such as bit_size, iand, ibset, btest and others (see your compiler documentation or Google for language references, or try the Intel Fortran documentation) for bit manipulation.
If you are unfamiliar with Fortran's boz literals then familiarise yourself with them. You can, for example, set the bits of an integer using a statement such as this
integer :: mybits
...
mybits = b'00000011000000100000000000001111'
With the b edit descriptor you can read and write binary literals too. For example the statements
write(*,*) mybits
write(*,'(b32.32)') mybits
will produce the output
50462735
00000011000000100000000000001111
If you can lay your hands on a modern-enough compiler then you will find that the 2008 standard added new bit-twiddling functions such as bge, bgt, dshiftl, iall and a whole lot more. These are defined for input arguments which are integer arrays or integers, but I don't have any experience of using them to pass on.
This should be enough to get you started.

Fortran has bit intrinsics for manipulating the bits of default integers. Bit arrays are straightforward to build off that...
Determine how many bits you need, divide by number of bits in default integer, allocate an integer array of default kind of the size you computed +1 if the modulo of the division was non-zero, and you're essentially done. The bit intrinsics are well covered in Metcalf and Reid.

What you may want could look like:
program test
logical,allocatable:: flips(:)
...
allocate(flips(ntris),status=err)
call tris(ntris,...,flips)
...
end
subroutine tris(nnewtris, ...,flips)
logical flips(nnewtris)
...
if(flips(i)) then
...
end if
return
end

Related

Using underscores to define kind/precision

I've been using using an underscore to define an integer as a specific kind in fortran.
Here is a snippet of code to demonstrate what 1_8 means, for example:
program main
implicit none
integer(2) :: tiny
integer(4) :: short
integer(8) :: long
tiny = 1
short = 1
long = 1
print*, huge(tiny)
print*, huge(1_2)
print*, huge(short)
print*, huge(1_4)
print*, huge(long)
print*, huge(1_8)
end program main
Which returns (with PGI or gfortran):
32767
32767
2147483647
2147483647
9223372036854775807
9223372036854775807
I'm using the huge intrinsic function to return largest number of the given kind. So 1_8 is clearly the same kind as integer(8). This works for real numbers also, although I haven't shown it here.
However, I've been unable to find any documentation of this functionality and I don't remember where I learned it. My question is:
Is using _KIND standard fortran? Does anybody have a source for this?
Edit: It's been pointed out that the kind values I've used (2,4,8) are not portable - different compilers/machines may give different values for huge(1_4), for example.
It is standard Fortran, as of Fortran 90, though the set of valid kind values for each type and the meaning of a kind value is processor dependent.
The Fortran standard is the definitive source for what is "standard" Fortran. ISO/IEC 1539-1:2010 is the current edition, which you can buy, or pre-publication drafts of that document are available from various places. https://gcc.gnu.org/wiki/GFortranStandards has a collection of useful links.
I would expect that many compiler manuals (e.g - see section 2.2 of the current PGI Fortran reference manual) and all reasonable textbooks on modern Fortran would also describe this feature.
As already answered, this notation is standard fortran, but using these specific numeric values for kinds is not portable. Previous answers have described compilers that use other schemes. There are several other approaches that are portable.
First, use the kind scheme to define symbols, then use those symbols for numeric constants:
integer, parameter :: RegInt_K = selected_int_kind (8)
write (*, *) huge (1_RegInt_K)
Or use the kind definitions that are provided in the ISO_FORTRAN_ENV module, e.g.,
write (*, *) huge (1_int32)
The kinds of this module are specified in the number of bits used for storage. Some of the kinds are only available with Fortran 2008. If those are missing in your compiler, you can use the kind definitions provided with the ISO_C_BINDING, which is older.
To add to IanH's answer the specification for an integer literal constant in Fortran 2008 is given in 4.4.2.2. Something like 1_8 fits in with R407 and R408 (for the obvious digit-string):
int-literal-constant is digit-string [_kind-param]
kind-param is digit-string or scalar-int-constant-name
You can find similar for other data types (although note that for characters the kind parameter is at the front of the literal constant).

Convert FORTRAN DEC UNION/MAP extensions to anything else

Edit: Gfortran 6 now supports these extensions :)
I have some old f77 code that extensively uses UNIONs and MAPs. I need to compile this using gfortran, which does not support these extensions. I have figured out how to convert all non-supported extensions except for these and I am at a loss. I have had several thoughts on possible approaches, but haven't been able to successfully implement anything. I need for the existing UDTs to be accessed in the same way that they currently are; I can reimplement the UDTs but their interfaces must not change.
Example of what I have:
TYPE TEST
UNION
MAP
INTEGER*4 test1
INTEGER*4 test2
END MAP
MAP
INTEGER*8 test3
END MAP
END UNION
END TYPE
Access to the elements has to be available in the following manners: TEST%test1, TEST%test2, TEST%test3
My thoughts thusfar:
Replace somehow with fortran EQUIVALENCE.
Define the structs in C/C++ and somehow make them visible to the FORTRAN code (doubt that this is possible)
I imagine that there must have been lots of refactoring of f77 to f90/95 when the UNION and MAP were excluded from the standard. How if at all was/is this handled?
EDIT: The accepted answer has a workaround to allow memory overlap, but as far as preserving the API, it is not possible.
UNION and MAP were never part of any FORTRAN standard, they are vendor extensions. (See, e.g., http://fortranwiki.org/fortran/show/Modernizing+Old+Fortran). So they weren't really excluded from the Fortran 90/95 standard. They cause variables to overlap in memory. If the code actually uses this feature, then you will need to use equivalence. The preferred way to move data between variables of different types without conversion is the transfer intrinsic, but to you that you would have to identify every place where a conversion is necessary, while with equivalence it is taking place implicitly. Of course, that makes the code less understandable. If the memory overlays are just to save space and the equivalence of the variables is not used, then you could get rid of this "feature". If the code is like your example, with small integers, then I'd guess that the memory overlay is being used. If the overlays are large arrays, it might have been done to conserve memory. If these declarations were also creating new types, you could use user defined types, which are definitely part of Fortran >=90.
If the code is using memory equivalence of variables of different types, this might not be portable, e.g., the internal representation of integers and reals are probably different between the machine on which this code originally ran and the current machine. Or perhaps the variables are just being used to store bits. There is a lot to figure out.
P.S. In response to the question in the comment, here is a code sample. But .... to be clear ... I do not think that using equivalence is good coding pratice. With the compiler options that I normally use with gfortran to debug code, gfortran rejects this code. With looser options, gfortran will compile it. So will ifort.
module my_types
use ISO_FORTRAN_ENV
type test_p1_type
sequence
integer (int32) :: int1
integer (int32) :: int2
end type test_p1_type
type test_p2_type
sequence
integer (int64) :: int3
end type test_p2_type
end module my_types
program test
use my_types
type (test_p1_type) :: test_p1
type (test_p2_type) :: test_p2
equivalence (test_p1, test_p2)
test_p1 % int1 = 2
test_p1 % int1 = 4
write (*, *) test_p1 % int1, test_p1 % int2, test_p2 % int3
end program test
The question is whether the union was used to save space or to have alternative representations of the same data. If you are porting, see how it is used. Maybe, because the space was limited, it was written in a way where the variables had to be shared. Nowadays with larger amounts of memory, maybe this is not necessary and the union may not be required. In which case, it is just two separate types
For those just wanting to compile the code with these extensions: Gfortran now supports UNION, MAP and STRUCTURE in version 6. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56226

Ada to C++: Pass an unsigned 64-bit value

I need to pass 2 pieces of data from an Ada program to some C++ code for processing.
Data - double.
Time - unsigned 64 bits.
I was able to make a procedure in Ada that worked with my C++ method using a Long_Float (double in C++) and Integer (int in C++, obviously not 64-bits though).
I used the following code (code not on me so syntax might be slightly off):
procedure send_data (this : in hidden_ptr; data : in Long_Float; time : in Integer);
pragma import (CPP, send_data, "MyClass::sendData");
Now that that's working, I'm trying to expand the time to the full 64-bits and ideally would like to have an unsigned long long on the C++ side. I don't see any types in Ada that match that so I created my own type:
type U64 is mod 2 ** 64;
When using that type with my send_data method I get an error saying there are no possible ways to map that type to a C++ type (something along those lines, again don't have the code or exact error phrase on me).
Is there a way to pass a user-defined type in Ada to C++? Perhaps there's another type in Ada I can use as an unsigned 64-bit value that would work? Is there a way to pass the address of my U64 type as a parameter to the C++ method instead if that's easier? I'm using the green hills adamulti compiler v3.5 (very new to ada, not sure if that info helps or not). Examples would be greatly appreciated!
As an addendum to #KeithThompson's comment/answer...
Ada's officially supported C interface types are in Interfaces.C, and there's no extra-long int or unsigned in there (in the 2005 version. Is the 2012 version official yet?).
You did the right thing to work around it. If your compiler didn't support that, more drastic measures will have to be taken.
The first obvious thing to try is to pass the 64-bit int by reference. That will require changes on the C side though.
I know C/C++ TIME structs tend to be 64-bit values, defined as unioned structs. So what you could do is define an Ada record to mimic one of those C structs, make sure it gets laid out the way C would (eg: with record representation and size clauses), and then make that object what your imported routine uses for its parameter.
After that you'll have to pull nasty parameter tricks. For example, you could try changing the Ada import's side of the parameter to a 64-bit float, unchecked_converting the actual parameter into a 64-bit float, and trying to pass it that way. The problem there is that a lot of CPU's pass floats in different registers than ints. So that likely won't work, and if it does it most certianly isn't portable.
There may well be other ways to fake it out, if you figure out how your compiler's CPP calling convention works. For example, if it uses two adajacent 32-bit registers to pass 64-bit ints, you could split your Ada 64-bit int into two and pass it in two parameters on the Ada side. If it passes 64-bit values by reference, you could just tell the Ada side that you're passing a pointer to your U64. Again, these solutions won't be portable, but they would get you going.
Put_Line("First = " & Interfaces.C.long'Image(Interfaces.C.long'First));
Put_Line("Last = " & Interfaces.C.long'Image(Interfaces.C.long'Last));
result:
First = -9223372036854775808
Last = 9223372036854775807

Fortran 90 kind parameter

I am having trouble understanding Fortran 90's kind parameter. As far as I can tell, it does not determine the precision (i.e., float or double) of a variable, nor does it determine the type of a variable.
So, what does it determine and what exactly is it for?
The KIND of a variable is an integer label which tells the compiler which of its supported kinds it should use.
Beware that although it is common for the KIND parameter to be the same as the number of bytes stored in a variable of that KIND, it is not required by the Fortran standard.
That is, on a lot of systems,
REAl(KIND=4) :: xs ! 4 byte ieee float
REAl(KIND=8) :: xd ! 8 byte ieee float
REAl(KIND=16) :: xq ! 16 byte ieee float
but there may be compilers for example with:
REAL(KIND=1) :: XS ! 4 BYTE FLOAT
REAL(KIND=2) :: XD ! 8 BYTE FLOAT
REAL(KIND=3) :: XQ ! 16 BYTE FLOAT
Similarly for integer and logical types.
(If I went digging, I could probably find examples. Search the usenet group comp.lang.fortran for kind to find examples. The most informed discussion of Fortran occurs there, with some highly experienced people contributing.)
So, if you can't count on a particular kind value giving you the same data representation on different platforms, what do you do? That's what the intrinsic functions SELECTED_REAL_KIND and SELECTED_INT_KIND are for. Basically, you tell the function what sort of numbers you need to be able to represent, and it will return the kind you need to use.
I usually use these kinds, as they usually give me 4 byte and 8 byte reals:
!--! specific precisions, usually same as real and double precision
integer, parameter :: r6 = selected_real_kind(6)
integer, parameter :: r15 = selected_real_kind(15)
So I might subsequently declare a variable as:
real(kind=r15) :: xd
Note that this may cause problems where you use mixed language programs, and you need to absolutely specify the number of bytes that variables occupy. If you need to make sure, there are enquiry intrinsics that will tell you about each kind, from which you can deduce the memory footprint of a variable, its precision, exponent range and so on. Or, you can revert to the non-standard but commonplace real*4, real*8 etc declaration style.
When you start with a new compiler, it's worth looking at the compiler specific kind values so you know what you're dealing with. Search the net for kindfinder.f90 for a handy program that will tell you about the kinds available for a compiler.
I suggest using the Fortran 2008 and later; INT8, INT16, INT32, INT64, REAL32, REAL64, REAL128. This is done by calling ISO_FORTRAN_ENV in Fortran 2003 and later. Kind parameters provides inconsistent way to ensure you always get the appropriate number of bit representation
Just expanding the other (very good) answers, specially Andrej Panjkov's answer:
The KIND of a variable is an integer label which tells the compiler
which of its supported kinds it should use.
Exactly. Even though, for all the numeric intrinsic types, the KIND parameter is used to specify the "model for the representation and behavior of numbers on a processor" (words from the Section 16.5 of the standard), that in practice means their bit model, that's not the only thing a KIND parameter may represent.
A KIND parameter for a type is any variation in its nature, model or behavior that is avaliable for the programmer to choose at compile time. For example, for the intrinsic character type, the kind parameter represents the character sets avaliable on the processor (ASCII, UCS-4,...).
You can even define your own model/behaviour variations on you defined Derived Types (from Fortran 2003 afterwards). You can create a Transform Matrix type and have a version with KIND=2 for 2D space (in which the underlying array would be 3x3) and KIND=3 for 3D space (with a 4x4 underlying array). Just remember that there is no automatic kind conversion for non-intrinsic types.
From the Portland Group Fortran Reference, the KIND parameter "specifies a precision for intrinsic data types." Thus, in the declaration
real(kind=4) :: float32
real(kind=8) :: float64
the variable float64 declared as an 8-byte real (the old Fortran DOUBLE PRECISION) and the variable float32 is declared as a 4-byte real (the old Fortran REAL).
This is nice because it allows you to fix the precision for your variables independent of the compiler and machine you are running on. If you are running a computation that requires more precision that the traditional IEEE-single-precision real (which, if you're taking a numerical analysis class, is very probable), but declare your variable as real :: myVar, you'll be fine if the compiler is set to default all real values to double-precision, but changing the compiler options or moving your code to a different machine with different default sizes for real and integer variables will lead to some possibly nasty surprises (e.g. your iterative matrix solver blows up).
Fortran also includes some functions that will help pick a KIND parameter to be what you need - SELECTED_INT_KIND and SELECTED_REAL_KIND - but if you are just learning I wouldn't worry about those at this point.
Since you mentioned that you're learning Fortran as part of a class, you should also see this question on Fortran resources and maybe look at the reference manuals from the compiler suite that you are using (e.g. Portland Group or Intel) - these are usually freely available.
One of the uses of kind could be to make sure that for different machine or OS, they truly use the same precision and the result should be the same. So the code is portable. E.g.,
integer, parameter :: r8 = selected_real_kind(15,9)
real(kind=r8) :: a
Now this variable a is always r8 type, which is a true "double precision" (so it occupies 64 bits of memory on the electronic computer), no matter what machine/OS the code is running on.
Also, therefore you can write things like,
a = 1.0_r8
and this _r8 make sure that 1.0 is converted to r8 type.
To summarize other answers: the kind parameter specifies storage size (and thus indirectly, the precision) for intrinsic data types, such as integer and real.
However, the recommended way now is NOT to specify the kind value of variables in source code, instead, use compiler options to specify the precision we want. For example, we write in the code: real :: abc and then compile the code by using the compiling option -fdefault-real-8 (for gfortran) to specify a 8 byte float number. For ifort, the corresponding option is -r8.
Update:
It seems Fortran experts here strongly object to the recommended way stated above. In spite of this, I still think the above way is a good practice that helps reduce the chance of introducing bugs in Fortran codes because it guarantees that you are using the same kind-value throughout your program (the chance that you do need use different kind-values in different parts of a code is rare) and thus avoids the frequently encountered bugs of kind-value mismatch between dummy and actual arguments in a function call.

How to handle arbitrarily large integers

I'm working on a programming language, and today I got the point where I could compile the factorial function(recursive), however due to the maximum size of an integer the largest I can get is factorial(12). What are some techniques for handling integers of an arbitrary maximum size. The language currently works by translating code to C++.
If you need larger than 32-bits you could consider using 64-bit integers (long long), or use or write an arbitrary precision math library, e.g. GNU MP.
If you want to roll your own arbitrary precision library, see Knuth's Seminumerical Algorithms, volume 2 of his magnum opus.
If you're building this into a language (for learning purposes I'd guess), I think I would probably write a little BCD library. Just store your BCD numbers inside byte arrays.
In fact, with today's gigantic storage abilities, you might just use a byte array where each byte just holds a digit (0-9). You then write your own routine to add, subtract multiply and divide your byte arrays.
(Divide is the hard one, but I bet you can find some code out there somewhere.)
I can give you some Java-like psuedocode but can't really do C++ from scratch at this point:
class BigAssNumber {
private byte[] value;
// This constructor can handle numbers where overflows have occurred.
public BigAssNumber(byte[] value) {
this.value=normalize(value);
}
// Adds two numbers and returns the sum. Originals not changed.
public BigAssNumber add(BigAssNumber other) {
// This needs to be a byte by byte copy in newly allocated space, not pointer copy!
byte[] dest = value.length > other.length ? value : other.value;
// Just add each pair of numbers, like in a pencil and paper addition problem.
for(int i=0; i<min(value.length, other.value.length); i++)
dest[i]=value[i]+other.value[i];
// constructor will fix overflows.
return new BigAssNumber(dest);
}
// Fix things that might have overflowed 0,17,22 will turn into 1,9,2
private byte[] normalize(byte [] value) {
if (most significant digit of value is not zero)
extend the byte array by a few zero bytes in the front (MSB) position.
// Simple cheap adjust. Could lose inner loop easily if It mattered.
for(int i=0;i<value.length;i++)
while(value[i] > 9) {
value[i] -=10;
value[i+1] +=1;
}
}
}
}
I use the fact that we have a lot of extra room in a byte to help deal with addition overflows in a generic way. Can work for subtraction too, and help on some multiplies.
There's no easy way to do it in C++. You'll have to use an external library such as GNU Multiprecision, or use a different language which natively supports arbitrarily large integers such as Python.
Other posters have given links to libraries that will do this for you, but it seem like you're trying to build this into your language. My first thought is: are you sure you need to do that? Most languages would use an add-on library as others have suggested.
Assuming you're writing a compiler and you do need this feature, you could implement integer arithmetic functions for arbitrarily large values in assembly.
For example, a simple (but non-optimal) implementation would represent the numbers as Binary Coded Decimal. The arithmetic functions could use the same algorithms as you'd use if you were doing the math with pencil and paper.
Also, consider using a specialized data type for these large integers. That way "normal" integers can use the standard 32 bit arithmetic.
My prefered approach would be to use my current int type for 32-bit ints(or maybe change it to internally to be a long long or some such, so long as it can continue to use the same algorithms), then when it overflows, have it change to storing as a bignum, whether of my own creation, or using an external library. However, I feel like I'd need to be checking for overflow on every single arithmetic operation, roughly 2x overhead on arithmetic ops. How could I solve that?
If I were implement my own language and want to support arbitrary length numbers, I will use a target language with the carry/borrow concept. But since there is no HLL that implements this without severe performance implications (like exceptions), I will certainly go implement it in assembly. It will probably take a single instruction (as in JC in x86) to check for overflow and handle it (as in ADC in x86), which is an acceptable compromise for a language implementing arbitrary precision. Then I will use a few functions written in assembly instead of regular operators, if you can utilize overloading for a more elegant output, even better. But I don't expect generated C++ to be maintainable (or meant to be maintained) as a target language.
Or, just use a library which has more bells and whistles than you need and use it for all your numbers.
As a hybrid approach, detect overflow in assembly and call the library function if overflow instead of rolling your own mini library.