Split line at the tab character - fortran

I would like to split a line at tab and read commas as character. I tried to follow a solution of this kind, using a pos variable containing "\t" or " ", but it returns me 0, so it doesn't find any tab. Which could be the right solution?
INTEGER :: i, dots, commas, A, T, C, G, InDel, M, Z, L, s, sf, numsize, InDelSlide, pos, base, cov
CHARACTER(len=1) :: ref
CHARACTER(len=10000) :: arg, seq, qual
CHARACTER(len=1024) :: buffer
CHARACTER(len=6) :: num
CHARACTER(len=5) chr
READ(5,'(A)') buffer
PRINT *, buffer
pos = INDEX(buffer, " ")
arg = buffer(1:pos-1)
READ(buffer(pos+1:), *) chr, base, ref, cov, seq, qual

Tab character in Fotran is simply achar(9). Use
pos = INDEX(buffer, achar(9))
The achar() function returns a character with the ASCII value you pass to it.

Related

Is there a function that removes all nonalphanumeric numbers in Fortran?

New to Fortran,been trying to think of a function that replaces all non alphanumeric characters and spaces on a string so that it turns something like [AS:1] to AS1.
Anyone here got a clue how to?
Like I got a trimmer for open spaces to work but I don't know how to make it work for anything that's a non-alphanumeric character.
The intrinsic function SCANcan be used for membership searches.
If we have a character char of length-1 and a set set of non-zero size, then we have that SCAN(char, set) returns 1 (0) if char is in (not in) the set. (SCAN will return 0 if the set is of size zero.)
This functions is elemental so, for example, SCAN(char_array, set) returns an indicator for which elements of char_array are in the set.
We also have PACK which returns another array corresponding to a selection mask:
print*, PACK(char_array, SCAN(char_array,set).eq.1)
Which means we can write a subroutine like
subroutine s(in, out, keep, len)
integer, intent(in) :: len
character, intent(in) :: in(len), keep*(*)
character, intent(out) :: out(len)
integer :: i
out = PACK(in, SCAN(in,keep).eq.1, [(' ',i=1,len)])
end subroutine s
taking an input character array of size len and returning an output character array of the same size with the elements which are in keep (and trailing elements being blanks).
Naturally, we don't like working with character arrays instead of scalars, so let's provide a nice subroutine using sequence association:
subroutine strip(in, out, keep)
character(*), intent(in) :: in, keep
character(*), intent(out) :: out
call s(in, out, keep, LEN(in))
end subroutine
Complete example:
module stripping
implicit none
private s
contains
subroutine strip(in, out, keep)
character(*), intent(in) :: in, keep
character(*), intent(out) :: out
call s(in, out, keep, len(in))
end subroutine strip
subroutine s(in, out, keep, len)
integer, intent(in) :: len
character, intent(in) :: in(len), keep*(*)
character, intent(out) :: out(len)
integer :: i
out = PACK(in, SCAN(in,keep).eq.1, [(' ',i=1,len)])
end subroutine s
end module stripping
program test
use stripping, only : strip
implicit none
character(10) in, out
character(*), parameter :: keep="abcd"
in = "a1b*2sdc]a"
call strip(in, out, keep)
print*, TRIM(out)
end program
There are doubtless better and clearer ways to do this: this answer mostly serves to have you think about what intrinsic functions there are and how they can be applied. There isn't an intrinsic function to do what you want in one step.
You need to define an external verification procedure that tells if a given character is to be kept or discarded. Then replace the equivalence check typically done in replace routines with this external function.
Here is an implementation that achieves the goal,
module str_mod
implicit none
integer, parameter :: IK = kind(0)
integer, parameter :: SK = kind("a")
integer, parameter :: LK = kind(.false.)
contains
! Returns `.true.` if it is a desired character.
function isDesired(char) result(desired)
character(1, SK), intent(in) :: char
logical(LK) :: desired
desired = (SK_"0" <= char .and. char <= SK_"9") .or. &
(SK_"A" <= char .and. char <= SK_"Z") .or. &
(SK_"a" <= char .and. char <= SK_"z")
end function
function replace(str, isDesired) result(strrep)
character(*, SK), intent(in) :: str
character(:, SK), allocatable :: strrep
procedure(logical(LK)) :: isDesired
integer(IK) :: i, counter
allocate(character(len(str), SK) :: strrep)
counter = 0_IK
do i = 1, len(str, kind = IK)
if (.not. isDesired(str(i:i))) cycle
counter = counter + 1_IK
strrep(counter:counter) = str(i:i)
end do
strrep = strrep(1:counter)
end function
end module str_mod
use str_mod
print *, replace("Fortran", isDesired)
print *, replace("(Fortran)", isDesired)
print *, replace("(Fortran) (Is) [_A_] (GREAT) {language}.", isDesired)
print *, replace("[AS:1]", isDesired)
end
Here is the program output,
Fortran
Fortran
FortranIsAGREATlanguage
AS1
Test it here. Note that this implementation performs two allocations of the output strings, which you could likely avoid by counting the desired characters in str first and then allocating the output string to the proper size and filling it with the identified characters. But any performance gain or difference will likely be negligible in most scenarios. You would likely see better performance benefits if you instead reimplement replace() in the above as a subroutine with str input argument being an allocatable with intent(inout). In such a case, you can avoid an extra copy on exit from the procedure, which can lead to ~25% runtime speedup for small arrays. But again, such performance concerns become relevant only when you call replace() on the order of billions of times.
You would have to write a function to do it. As inspiration, here's a subroutine I recently wrote to do SQL "escaping" of quotes in a string. The key here is having separate indexes for input and output position. Your requirement is even easier - if the character is not alphanumeric or space, don't advance the output length. There are several ways of doing the comparison, an exercise left for the reader.
subroutine escape (text)
character(*), intent(inout) :: text
character(100) :: newtext
integer i,j
newtext = ' '
j = 1
do i=1,len_trim(text)
if (text(i:i) == '"') then
newtext(j:j) = "\"
j = j + 1
end if
newtext(j:j) = text(i:i)
j = j + 1
end do
text = newtext
end subroutine escape

Haskell: How to carry a value from one function to another

I have a function 'one' that creates a string of length i,
fillWithEmpty :: Int -> String
fillWithEmpty i =
if i == 0 then "." else "." ++ fillWithEmpty(i - 1)
I then want the system to remember the length i so that it can replace a character in the string with 'S' at a position in the string of length i, given a value of a position needed to be replaced, e
replaceWithS :: String -> Int -> String
replaceWithS i e=
if i == e then "S" else "." ++ replaceWithS(i - 1)
Any help would be appreciated. Thanks
You can use explicit recursion to enumerate over the list. Each time you make a call where you decrement the index, and if the index is 0 you use an S instead of the value of the given string, so:
replaceWithS :: String -> Int -> String
replaceWithS "" _ = ""
replaceWithS (_:xs) 0 = … : …
replaceWithS (x:xs) i = … : replaceWithS … …
Here x is thus the head of the string (its first character), and xs is a list with the remaining characters. You here still need to fill in the … parts.

Writing a function that accepts any two numbers (any real or any integer)

I have a function that accepts two numbers and I don't care if they are integers or real or 32bits or 64bits. For the example below, I just write it as a simple multiplication. In Fortran 90 you could do this with an interface block, but you'd have to write 16 (!) functions if you wanted to cover all the possible interactions of multiplying two numbers, each of which could be int32, int64, real32, or real64.
With Fortran 2003 you have some other options like class(*) for polymorphism and I found one way to do this by simply converting all the inputs to reals, before multiplying:
! compiled on linux with gfortran 4.8.5
program main
integer, target :: i = 2
real(4), target :: x = 2.0
real(8), target :: y = 2.0
character, target :: c = 'a'
print *, multiply(i,x)
print *, multiply(x,i)
print *, multiply(i,i)
print *, multiply(y,y)
print *, multiply(c,c)
contains
function multiply(p,q)
real :: multiply
class(*) :: p, q
real :: r, s
r = 0.0 ; s = 0.0
select type(p)
type is (integer(4)) ; r = p
type is (integer(8)) ; r = p
type is (real(4)) ; r = p
type is (real(8)) ; r = p
class default ; print *, "p is not a real or int"
end select
select type(q)
type is (integer(4)) ; s = q
type is (integer(8)) ; s = q
type is (real(4)) ; s = q
type is (real(8)) ; s = q
class default ; print *, "q is not a real or int"
end select
multiply = r * s
end function multiply
end program main
This seems like an improvement. At least the amount of code here is linear in the number of types rather than quadratic, but I wonder if there is still a better way to do this? As you can see I still have to write the select type code twice, changing 'r' to 's' and 'p' to 'q'.
I tried to convert the select type blocks into a function but couldn't get that to work. But I am interested in any and all alternatives that can further improve on this. It seems like this would be a common problem but I so far haven't found any general approach that is better than this.
Edit to add: Apparently there are plans to improve Fortran w.r.t. this issue in the future as noted in the comment by #SteveLionel. #roygvib further provides a link to a specific proposal which also does a nice job of explaining the issue: https://j3-fortran.org/doc/year/13/13-236.txt
Not a solution for generics, but for "converting the select type blocks into a function", the following code seems to work (which might be useful if some nontrivial conversion is included (?)).
program main
implicit none
integer :: i = 2
real*4 :: x = 2.0
real*8 :: y = 2.0
character(3) :: c = 'abc'
print *, multiply( i, x )
print *, multiply( x, i )
print *, multiply( i, i )
print *, multiply( y, y )
print *, multiply( c, c )
contains
function toreal( x ) result( y )
class(*) :: x
real :: y
select type( x )
type is (integer) ; y = x
type is (real(4)) ; y = x
type is (real(8)) ; y = x
type is (character(*)) ; y = len(x)
class default ; stop "no match for x"
endselect
end
function multiply( p, q ) result( ans )
class(*) :: p, q
real :: ans
ans = toreal( p ) * toreal( q )
end
end program
! gfortran-8 test.f90 && ./a.out
4.00000000
4.00000000
4.00000000
4.00000000
9.00000000
Another approach may be just converting the actual arguments to reals (although it may not be useful for more practical purposes...)
program main
implicit none
integer :: i = 2
real*4 :: x = 2.0
real*8 :: y = 2.0
character :: c = 'a'
print *, multiply( real(i), real(x) )
print *, multiply( real(x), real(i) )
print *, multiply( real(i), real(i) )
print *, multiply( real(y), real(y) )
! print *, multiply( real(c), real(c) ) ! error
contains
function multiply( p, q ) result( ans )
real :: p, q
real :: ans
ans = p * q
end
end program
Here's an alternate approach using a statically overloaded function via an interface block as implicitly referred to in my question and #roygvib's answer. (I figured it makes sense to have this written explicitly, especially if it someone can improve on it.)
Two advantages of the interface block method are:
It's approximately 3x faster (as #roygvib also found, although I
don't know exactly how he wrote the function)
It only requires Fortran 90 (not Fortran 2003)
The main disadvantage is that you have to write the function multiple times. As noted in the question, in this example you'd have to write the multiplication function 16 times, to handle all combos of 32 & 64 bit reals and ints. It's not that terrible here, with the function being a single line of code, but you can easily see that this is more serious for many realistic use cases.
Below is the code I used to test the interface block method. To keep it relatively concise, I tested only the 4 permutations of 32 bit reals and ints. I re-used the main program to also test the #roygvib code. On my 2015 macbook, it took about 16 seconds (interface block) vs 48 seconds (class(*) method).
Module:
module mult_mod
use, intrinsic :: iso_fortran_env, only: i4 => int32, r4 => real32
interface mult
module procedure mult_real4_real4
module procedure mult_int4_real4
module procedure mult_real4_int4
module procedure mult_int4_int4
end interface mult
contains
function mult_real4_real4( p, q ) result( ans )
real(r4) :: p, q
real(r4) :: ans
ans = p * q
end function mult_real4_real4
function mult_int4_real4( p, q ) result( ans )
integer(i4) :: p
real(r4) :: q
real(r4) :: ans
ans = p * q
end function mult_int4_real4
function mult_real4_int4( p, q ) result( ans )
real(r4) :: p
integer(i4) :: q
real(r4) :: ans
ans = p * q
end function mult_real4_int4
function mult_int4_int4( p, q ) result( ans )
integer(i4) :: p, q
real(r4) :: ans
ans = p * q
end function mult_int4_int4
end module mult_mod
Program:
program main
use mult_mod
integer(i4) :: i = 2
real(r4) :: x = 2.0
integer(i4) :: i_end = 1e9
real(r4) :: result
do j = 1, i_end
result = mult( x, x )
result = mult( x, i )
result = mult( i, x )
result = mult( i, i )
end do
end program main

How to format to a string with minimum possible size

I have the following function:
function fname(proc, ct) result(filename)
implicit none
integer, intent(in) :: proc, ct
character(len=100) :: filename
write(filename,"(a,i9.9,a,i0,a)") "/step", ct, "-proc", proc, ".txt"
end function fname
In modern Fortran, is there an automatic way to have the resulting string to have the minimum possible size to fit all the formatted data? Notice the use of i0 format makes the size of the resulting string variable.
It is possible, but not directly in the read statement. If you wrap the integer to string conversion into a function:
function itoa(i) result(res)
character(:),allocatable :: res
integer,intent(in) :: i
character(range(i)+2) :: tmp
write(tmp,'(i0)') i
res = trim(tmp)
end function
you can then use allocatable deferred-length string
character(:), allocatable :: filename
filename = "/step" // itoa99(ct) // "-proc" // itoa(proc) // ".txt"
You can adjust the function, to take the integer format as a dummy argument instead of making more versions of it.
The other possibility is to have a large temporary string and trim it
character(100) :: tmp
character(:), allocatable :: filename
write(tmp,"(a,i9.9,a,i0,a)") "/step", ct, "-proc", proc, ".txt"
filename = trim(tmp)

Fortran Class(*)

Have written a routine to convert a character to integer
Integer :: j
Write (*,*) '# Call str_to_num ("12", j)'
Call str_to_num ("12", j)
Write (*,*) "j: ", j
I am using class(*) and getting error
Program received signal 11 (SIGSEGV): Segmentation fault.
However when I change Class(*) with Integer, I do get
" Subroutine str_to_num" being printed.
Furthermore, changing to Intent(inout) rather than Intent(out) gets
the routine to work
Subroutine str_to_num(s, num, fmt, wrn)
Character (len=*), Intent (in) :: s
Character (len=*), Intent (in), Optional :: fmt
Class (*), Intent (out) :: num
Character (len=*), Intent (inout), Optional :: wrn
Integer :: ios
Character (len=65) :: frmt
Write (*,*) " Subroutine str_to_num"
Select Type (num)
Type Is (Integer)
Read (s, *, iostat=ios) num
Type Is (Real)
Read (s, *, iostat=ios) num
Type Is (Double Precision)
Read (s, *, iostat=ios) num
Type Is (Real(Real128))
Read (s, *, iostat=ios) num
End Select
End Subroutine