Aligator: Pathway analysis fortran program crash with "Segmentation fault" - fortran
I try to run Aligator (http://x004.psycm.uwcm.ac.uk/~peter/) but it fails on my 64-bit Linux machine. When running aligator as stated in the document-file http://x004.psycm.uwcm.ac.uk/~peter/README_ALIGATOR.doc) I get this error:
./aligator < aligator.run > out.txt
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7FDCD5079577
#1 0x7FDCD5079B7E
#2 0x7FDCD457DCAF
#3 0x401DBF in MAIN__ at aligator.f:?
Segmentation fault
I have a feeling that this fortran-software was written for 32-bit system, and that's why I also had to compile the aligator.f like this:
f77 aligator.f -mcmodel=large -o aligator
The source-code of aligator.f:
CHARACTER*50 INFILE,OUTFILE
CHARACTER*8 PROCNAM(5),PXN
INTEGER JSNP(6000000,25),NSNP(6000000)
INTEGER ISNP(6000000)
INTEGER ILIS(100,2)
INTEGER NGS(1000000),IDSNP(1000000)
INTEGER IGS(1000000,25)
INTEGER ICH(20000),IGOC(20000)
INTEGER IDGE(30000),IPFC(35000,2)
INTEGER IDGO(20000),NGC(30000)
INTEGER JDGO(20000),MGC(30000)
INTEGER MPFC(20000),MCG(30000),NCG(30000)
INTEGER IGC(30000,1000),JSCO(20000)
INTEGER JGC(30000,1000),ISGSNP(200000)
INTEGER KSCO(50000,10000),MSCO(20000)
INTEGER NUMSIG(3,5)
INTEGER MUMSIG(3,5),ISLIM(200)
INTEGER ISCO(20000),MUM(20000),NUN(20000)
REAL PR(20000),CRIT(3),CUM(1000001),QR(20000)
REAL ZR(20000),YR(3,5),GM(20000),QZ(5),PMIN(5)
REAL DUM(1000001),QMIN(5)
REAL*8 QP(20000),CHX,RU(20000),QQ(20000)
REAL*8 XX,ST,YY
REAL*8 GETRAN
EXTERNAL GETRAN,QUIKSORT
OPEN(10,FILE='allsnp_gene_NCBI_0KB.txt')
C THIS FILE CONTAINS A LIST OF ALL SNPs ANNOTATED
C BY THE GENES THEY LIE WITHIN (NCBI ID NUMBERS)
OPEN(25,FILE='allsnp_gene_NCBI_0KB.list')
OPEN(11,FILE='gene_go_17_03_09.dat')
OPEN(12,FILE='pfc_17_03_09.dat')
READ(*,*) INFILE
READ(*,*) OUTFILE
OPEN(14,FILE=INFILE)
OPEN(15,FILE=OUTFILE)
READ(*,*) NSNPLIST
READ(*,*) DRIT
READ(*,*) NWIN
READ(*,*) NREP
READ(*,*) ST
NSG=5770877
C TOTAL NUMBER OF SNPS IN FILE allsnp_gene_NCBI_0KB.txt
NGO=754498
C TOTAL NUMBER OF GENE/GO CATEGORY ASSIGNMENTS
NPFC=28339
C TOTAL NUMBER OF GO CATEGORIES WITH NAMES/FUNCTION DATA
PROCNAM(1)='ALL CATS'
PROCNAM(2)='CELLULAR'
PROCNAM(3)='FUNCTION'
PROCNAM(4)='PROCESS '
PROCNAM(5)='PRO+FUNC'
CRIT(1)=0.05
CRIT(2)=0.01
CRIT(3)=0.001
NUM=0
DO 873 I=1,NPFC
READ(12,*) (IPFC(I,K),K=1,2)
873 CONTINUE
DO 871 I=1,1000000
NGS(I)=0
871 CONTINUE
DO 872 I=1,30000
MGC(I)=0
MCG(I)=0
872 CONTINUE
DO 604 I=10,99
READ(25,*) II,ILIS(I,1),ILIS(I,2)
604 CONTINUE
C READING IN START AND END POINTS ON THE COMPLETE LIST OF SNPS WHOSE
C RS NUMBERS START WITH 10, 11, .... 99
C TO MAKE IT FASTER TO LOCATE THE SNPS USED IN A PARTICULAR STUDY.
MMSP=0
DO 503 I=1,NSG
READ(10,*) ISNP(I),NSNP(I),(JSNP(I,K),K=1,NSNP(I))
503 CONTINUE
C READS IN COMPLETE LIST OF SNPS WITHIN GENES
MGE=0
MSP=0
DO 605 I=1,NSNPLIST
C THIS SECTION READS IN THE SNPS USED IN THE STUDY BEING ANALYSED
C CHECKS THEM AGAINST THE TOTAL LIST OF SNPS WITHIN GENES
C TO CREATE A LIST OF SNPS WITHIN GENES AND THE GENES WITHIN WHICH THEY LIE
READ(14,*) II,PP
RS=REAL(II)
RX=LOG10(RS)
IX=INT(RX)-1
RJ=REAL(IX)
IC=INT(RS/(10**RJ))
IUS=0
DO 696 J=ILIS(IC,1),ILIS(IC,2)
IF (II.EQ.ISNP(J)) THEN
IUS=J
GOTO 697
ENDIF
696 CONTINUE
697 CONTINUE
IF (IUS.EQ.0) THEN
GOTO 605
ENDIF
MSP=MSP+1
IDSNP(MSP)=ISNP(IUS)
NGS(MSP)=NSNP(IUS)
DO 698 J=1,NSNP(IUS)
JGX=JSNP(IUS,J)
JNE=0
DO 802 II=1,MGE
IF (JGX.EQ.IDGE(II)) THEN
JNE=II
ENDIF
802 CONTINUE
IF (JNE.EQ.0) THEN
MGE=MGE+1
IDGE(MGE)=JGX
JNE=MGE
ENDIF
IGS(MSP,J)=JNE
698 CONTINUE
IF (PP.LT.DRIT) THEN
NUM=NUM+1
ISGSNP(NUM)=MSP
ENDIF
C CREATES LIST OF SIGNIFICANT SNPS WITH P-val < SPECIFIED CRITERION
699 CONTINUE
605 CONTINUE
DUM(1)=0.0
DO 603 I=1,MSP
DUM(I+1)=DUM(I)+(1.0/FLOAT(MSP))
603 CONTINUE
DUM(MSP+1)=1.0
PRINT*,'SNPS READ'
PRINT*,'#SNPS = ',MSP
PRINT*,'#GENES = ',MGE
DO 502 I=1,NGO
READ(11,*) JGX,IID
INE=0
DO 842 II=1,MGE
IF (JGX.EQ.IDGE(II)) THEN
INE=II
GOTO 937
ENDIF
842 CONTINUE
937 CONTINUE
IF (INE.EQ.0) THEN
GOTO 502
ENDIF
JNE=0
DO 602 II=1,MMGO
IF (IID.EQ.JDGO(II)) THEN
JNE=II
ENDIF
602 CONTINUE
IF (JNE.EQ.0) THEN
MMGO=MMGO+1
C PRINT*,MMGO,IID
JDGO(MMGO)=IID
JNE=MMGO
ENDIF
INF=0
DO 722 J=1,MGC(INE)
IF (JNE.EQ.JGC(INE,J)) THEN
INF=J
ENDIF
722 CONTINUE
IF (INF.EQ.0) THEN
MGC(INE)=MGC(INE)+1
MCG(JNE)=MCG(JNE)+1
JGC(INE,MGC(INE))=JNE
ENDIF
C JGC CONTAINS ALL GO GATEGORY MEMBERSHIPS FOR EACH GENE
502 CONTINUE
MGO=0
DO 607 I=1,MMGO
C THIS SECTION REMOVES GO CATEGORIES WITH FEWER THAN 3 GENES
C OR MORE THAN 5000 (USER CAN EDIT AND RECOMPILE IF DESIRED)
IF (MCG(I).LT.3.OR.MCG(I).GT.5000) THEN
ICH(I)=0
GOTO 607
ENDIF
MGO=MGO+1
NCG(MGO)=MCG(I)
IDGO(MGO)=JDGO(I)
ICH(I)=MGO
IPF=0
DO 874 J=1,NPFC
IF (IDGO(MGO).EQ.IPFC(J,1)) THEN
IPF=J
GOTO 931
ENDIF
874 CONTINUE
931 CONTINUE
IF (IPF.EQ.0) THEN
PRINT*,'MISSING CATEGORY ',IDGO(MGO)
ELSE
MPFC(MGO)=IPFC(IPF,2)
ENDIF
607 CONTINUE
DO 609 I=1,MGE
NGC(I)=0
NGC(I)=0
DO 608 J=1,MGC(I)
JX=JGC(I,J)
NN=ICH(JX)
IF (NN.NE.0) THEN
NGC(I)=NGC(I)+1
IGC(I,NGC(I))=NN
ENDIF
608 CONTINUE
609 CONTINUE
PRINT*,'CATS READ'
PRINT*,'No.CATS = ',MGO
NMA=0
JMA=0
DO 226 I=1,MGE
IF (NGC(I).GE.NMA) THEN
NMA=NGC(I)
JMA=I
ENDIF
226 CONTINUE
PRINT*,'MAX #CATS PER GENE = ',NMA
PRINT*,'GENE ID ',IDGE(JMA)
NMA=0
JMA=0
DO 2 I=1,MSP
IF (NGS(I).GE.NMA) THEN
NMA=NGS(I)
JMA=I
ENDIF
2 CONTINUE
PRINT*,'NO. SNPS IN STUDY = ',NSNPLIST
PRINT*,'NO. SNPS IN GENES = ',MSP
PRINT*,'MAX #GENES PER SNP = ',NMA
PRINT*,'SNP ID ',IDSNP(JMA)
NMA=0
JMA=0
DO 3 I=1,MGO
IF (NGC(I).GE.NMA) THEN
NMA=NGC(I)
JMA=I
ENDIF
3 CONTINUE
DO 987 IY=1,20000
ISCO(IY)=0
JSCO(IY)=0
987 CONTINUE
DO 587 IY=1,3
DO 787 IZ=1,5
NUMSIG(IY,IZ)=0
787 CONTINUE
587 CONTINUE
JGE=0
DO 14 I=1,NUM
DO 198 J=1,20000
IGOC(J)=0
198 CONTINUE
KU=ISGSNP(I)
C GET THE ID OF EACH SIGNIFICANT SNP
DO 16 J=1,NGS(KU)
ICG=IGS(KU,J)
C WHAT GENES DOES IT LIE IN ?
IUS=0
DO 26 K=1,JGE
IF (ICG.EQ.JSCO(K)) THEN
IUS=K
ENDIF
26 CONTINUE
C IS THIS GENE ALREADY ON THE LIST ?
IF (IUS.EQ.0) THEN
JGE=JGE+1
JSCO(JGE)=ICG
C ADD GENE TO LIST
DO 18 KK=1,NGC(ICG)
K=IGC(ICG,KK)
IF (IGOC(K).EQ.0) THEN
ISCO(K)=ISCO(K)+1
IGOC(K)=1
ENDIF
C INCREASE THE COUNT OF SIGNIFICANT GENES FOR ALL GO CATEGORIES
C CONTAINING THIS GENE BY 1 - BUT ONLY IF THIS SNP HASN'T ALREADY
C CONTRIBUTED A GENE TO THIS CATEGORY. IF A SNP LIES
C IN SEVERAL OVERLAPPING GENES ALL IN THE SAME GO CATEGORY,
C THIS PREVENTS ALL THE GENES BEING COUNTED SEPARATELY.
18 CONTINUE
ENDIF
16 CONTINUE
14 CONTINUE
PRINT*,'P-VALUE CRITERION FOR DEFINING SIGNIFICANT SNPS = ',DRIT
PRINT*,'NUMBER OF GENES WITH SIGNIFICANT SNPS = ',JGE
PRINT*,'NUMBER OF SIMULATED REPLICATE GENE-LISTS = ',NWIN
PRINT*,'NUMBER OF BOOTSTRAP STUDIES = ',NREP
DO 979 IT=1,NWIN
PRINT*,'SIMULATED REPLICATE GENELIST',IT
KRE=1
DO 887 IY=1,10000
JSCO(IY)=0
GM(IY)=0
KSCO(IT,IY)=0
887 CONTINUE
KGE=0
666 CONTINUE
YY=GETRAN(XX,ST)
R=REAL(XX)
KU=0
DO 551 K=1,MSP
IF (R.GE.DUM(K).AND.R.LT.DUM(K+1)) THEN
KU=K
ENDIF
551 CONTINUE
C SELECT A RANDOM SNP
DO 298 J=1,20000
IGOC(J)=0
298 CONTINUE
DO 116 J=1,NGS(KU)
ICG=IGS(KU,J)
C WHAT GENES DOES THE SNP LIE IN ?
IUS=0
DO 216 K=1,KGE
IF (ICG.EQ.JSCO(K)) THEN
IUS=K
ENDIF
216 CONTINUE
C GENE ALREADY ON LIST ?
C PRINT*,II,NID-I,IDJ,IUS,ICG
IF (IUS.EQ.0) THEN
KGE=KGE+1
C PRINT*,IT,KGE
IF (KGE.GT.JGE) THEN
GOTO 994
ENDIF
C GENELIST REACHED THE SAME LENGTH AS THE ORIGINAL ?
JSCO(KGE)=ICG
DO 118 KK=1,NGC(ICG)
K=IGC(ICG,KK)
IF (IGOC(K).EQ.0) THEN
KSCO(IT,K)=KSCO(IT,K)+1
IGOC(K)=1
ENDIF
118 CONTINUE
C ADD 1 TO EACH GO CATEGORY CONTAINING THE GENE (SEE ABOVE)
ENDIF
116 CONTINUE
GOTO 666
994 CONTINUE
979 CONTINUE
DO 812 IX=1,5
PMIN(IX)=99.0
812 CONTINUE
NSCA=0
DO 22 I=1,MGO
NMP=MPFC(I)
IF (ISCO(I).LE.1) THEN
GOTO 22
ENDIF
C ONLY COUNT GO CATEGORIES WITH TWO OR MORE SIGNIFICANT GENES
PR(I)=0
ZR(I)=0
RU(I)=0.0d0
CUM(1)=0
DO 32 J=1,NWIN
IF (ISCO(I).LE.KSCO(J,I)) THEN
PR(I)=PR(I)+(1.0/FLOAT(NWIN))
ENDIF
C CATEGORY-SPECIFIC P-VALUE = NUMBER OF SIMULATED GENELISTS
C WHERE THE SCORE FOR THE CATEGORY (KSCO) IS AT LEAST AS HIGH
C AS THAT IN THE REAL DATA (ISCO)
CUM(J+1)=CUM(J)+(1.0/FLOAT(NWIN))
GM(I)=GM(I)+(FLOAT(KSCO(J,I))/FLOAT(NWIN))
C CALCULATING EXPECTED NUMBER OF SIGNIFICANT GENES IN THE CATEGORY
32 CONTINUE
CUM(NWIN+1)=1.0
IF (PR(I).LT.PMIN(1)) THEN
PMIN(1)=PR(I)
ENDIF
DO 813 J=1,3
IF (NMP.EQ.J.AND.PR(I).LE.PMIN(J+1)) THEN
PMIN(J+1)=PR(I)
ENDIF
813 CONTINUE
IF (NMP.GE.2.AND.PR(I).LE.PMIN(5)) THEN
PMIN(5)=PR(I)
ENDIF
C CALCULATING MINIMUM P-VALUES FOR CATEGORIES OF EACH FUNCTIONAL TYPE
DO 33 J=1,3
IF (PR(I).LE.CRIT(J)) THEN
NUMSIG(J,1)=NUMSIG(J,1)+1
NUMSIG(J,NMP+1)=NUMSIG(J,NMP+1)+1
IF (NMP.GE.2) THEN
NUMSIG(J,5)=NUMSIG(J,5)+1
ENDIF
ENDIF
C COUNTING NUMBER OF CATEGORIES REACHING A GIVEN CATEGORY-SPECIFIC P-VALUE
33 CONTINUE
QP(I)=DBLE(PR(I))
22 CONTINUE
NBOOT=NWIN
DO 716 IC=1,3
DO 816 ID=1,5
YR(IC,ID)=0
816 CONTINUE
716 CONTINUE
DO 819 I=1,5
QZ(I)=0
819 CONTINUE
DO 51 I=1,NREP
C PRINT*,I
C PERFORMING REPLICATE "STUDIES" TO GET SIGNIFICANCE OF NUMBERS OF
C SIGNIFICANTLY OVER-REPRESENTED CATEGORIES
YY=GETRAN(XX,ST)
R=REAL(XX)
IYS=0
DO 53 K=1,NWIN
IF (R.GE.CUM(K).AND.R.LT.CUM(K+1)) THEN
IYS=K
ENDIF
53 CONTINUE
C CHOOSE A REPLICATE GENE-LIST TO ACT AS THE "REAL DATA"
DO 54 K=1,MGO
MSCO(K)=KSCO(IYS,K)
QR(K)=0
54 CONTINUE
DO 58 K=1,3
DO 589 KK=1,5
MUMSIG(K,KK)=0
589 CONTINUE
58 CONTINUE
DO 52 J=1,NBOOT
528 CONTINUE
YY=GETRAN(XX,ST)
R=REAL(XX)
JYS=0
DO 55 K=1,NWIN
IF (R.GE.CUM(K).AND.R.LT.CUM(K+1)) THEN
JYS=K
ENDIF
55 CONTINUE
IF (JYS.EQ.IYS) THEN
GOTO 528
ENDIF
C SELECT REPLICATE GENE LISTS (WITH REPLACEMENT)
DO 56 K=1,MGO
IF (MSCO(K).LE.KSCO(JYS,K)) THEN
QR(K)=QR(K)+(1.0/FLOAT(NBOOT))
ENDIF
C OBTAIN CATEGORY-SPECIFIC P-VALUES FOR THE "REAL DATA"
56 CONTINUE
52 CONTINUE
DO 814 K=1,5
QMIN(K)=99.0
814 CONTINUE
DO 57 K=1,MGO
NMP=MPFC(K)
IF (MSCO(K).LE.1) THEN
GOTO 57
ENDIF
JGG=IDGO(K)
IF (QR(K).LT.QMIN(1)) THEN
QMIN(1)=QR(K)
ENDIF
DO 815 JX=1,3
IF (NMP.EQ.JX.AND.QR(K).LE.QMIN(JX+1)) THEN
QMIN(JX+1)=QR(K)
ENDIF
815 CONTINUE
IF (NMP.GE.2.AND.QR(K).LE.QMIN(5)) THEN
QMIN(5)=QR(K)
ENDIF
C OBTAIN MINIMUM P-VALUES FOR CATEGORIES OF VARIOUS FUNCTIONAL TYPES
DO 63 JJ=1,3
IF (QR(K).LE.CRIT(JJ)) THEN
MUMSIG(JJ,1)=MUMSIG(JJ,1)+1
MUMSIG(JJ,NMP+1)=MUMSIG(JJ,NMP+1)+1
IF (NMP.GE.2) THEN
MUMSIG(JJ,5)=MUMSIG(JJ,5)+1
ENDIF
ENDIF
C OBTAIN NUMBERS OF CATEGORIES WITH CATEGORY-SPECIFIC P-VALUES LESS THAN
C VARIOUS CRITERIA
63 CONTINUE
57 CONTINUE
DO 67 K=1,MGO
IF (ISCO(K).LE.1) THEN
GOTO 67
ENDIF
IF (PR(K).GE.QMIN(1)) THEN
ZR(K)=ZR(K)+(1.0/FLOAT(NREP))
ENDIF
C PROBABILITY THAT ANY CATEGORY-SPECIFIC P-VALUE IN A STUDY WHERE THERE
C IS NO FUNCTIONAL ASSOCIATION (i.e. RANDOMLY-SELECTED "REAL DATA") IS
C LESS THAN OR EQUAL TO THE P-VALUE OF A GIVEN CATEGORY IN THE ORIGINAL
C DATASET (i.e. A STUDY-WIDE P-VALUE CORRECTED FOR TESTING MULTIPLE NON-
C INDEPENDENT CATEGORIES)
DO 77 KK=1,MGO
IF (MSCO(KK).LE.1) THEN
GOTO 77
ENDIF
IF (PR(K).GE.QR(KK)) THEN
RU(K)=RU(K)+(1.0d0/DFLOAT(NREP))
ENDIF
C EXPECTED NUMBER OF CATEGORIES PER STUDY WITH CATEGORY-SPECIFIC P-vALUES
C LESS THAN OR EQUAL TO THAT OF A GIVEN CATEGORY IN THE ORIGINAL DATASET
77 CONTINUE
67 CONTINUE
DO 817 JJ=1,5
IF (QMIN(JJ).LE.PMIN(JJ)) THEN
QZ(JJ)=QZ(JJ)+(1.0/FLOAT(NREP))
ENDIF
C PROBABILITY THAT THE MINIMUM CATEGORY-SPECIFIC P-VALUE IN A STUDY WHERE THERE
C IS NO FUNCTIONAL ASSOCIATION (i.e. RANDOMLY-SELECTED "REAL DATA") IS
C LESS THAN OR EQUAL TO THE MINIMUM CATEGORY-SPECIFIC P-VALUE IN THE ORIGINAL
C DATA
817 CONTINUE
DO 64 JJ=1,3
DO 642 JH=1,5
IF (MUMSIG(JJ,JH).GE.NUMSIG(JJ,JH)) THEN
YR(JJ,JH)=YR(JJ,JH)+(1.0/FLOAT(NREP))
ENDIF
C PROBABILITY THAN THE NUMBER OF CATEGORIES REACHING SIGNIFICANCE IN A STUDY
C WHERE THERE IS NO FUNCTIONAL ASSOCIATION (i.e. RANDOMLY-SELECTED "REAL DATA")
C IS LESS THAN OR EQUAL TO THAT IN THE ORIGINAL DATA (i.e. TESTING FOR A
C SIGNIFICANTLY LARGE NUMBER OF OVER-REPRESENTED CATEGORIES)
642 CONTINUE
64 CONTINUE
51 CONTINUE
DO 712 IT=1,MGO
IF (ISCO(IT).EQ.0) THEN
QP(IT)=1d0+DBLE(GM(IT))
ENDIF
712 CONTINUE
X=QUIKSORT(QP,MGO,MUM)
DO 23 I=1,MGO
J=MUM(I)
NMP=MPFC(J)
PXN=PROCNAM(NMP+1)
JG=IDGO(J)
JH=ISCO(J)
IF (ISCO(J).LE.1) THEN
GOTO 23
ENDIF
C ONLY COUNT GO CATEGORIES WITH TWO OR MORE SIGNIFICANT GENES
WRITE(15,44) JG,PXN,JH,JGE,NCG(J),MGE,GM(J),PR(J),ZR(J),RU(J)
23 CONTINUE
WRITE(15,*)
WRITE(15,*)
DO 691 ISS=1,5
WRITE(15,*) PROCNAM(ISS)
WRITE(15,*)
WRITE(15,*) 'MIN P =',PMIN(ISS),' SIG = ',QZ(ISS)
WRITE(15,*)
DO 69 IS=1,3
WRITE(15,*) 'NO. OF P<',CRIT(IS),' =',NUMSIG(IS,ISS)
WRITE(15,*) 'SIG = ',YR(IS,ISS)
69 CONTINUE
WRITE(15,*)
691 CONTINUE
CLOSE(15)
44 FORMAT(I8,2X,A8,3(2X,I5),2X,I5,2X,F7.2,2X,F7.5,2X,F6.4,2X,F9.2)
STOP
END
REAL*8 FUNCTION GETRAN(X,S)
REAL*8 X,S,A,B,SN,SS,RN
INTEGER IS
A=7D0**5
B=(2D0**31)-1
SN=S*A
IS=INT(SN/B)
SS=DFLOAT(IS)
C PRINT*,SN,SS,IS
RN=SN-(SS*B)
S=RN
X=S/B
GETRAN=1D0
RETURN
END
REAL FUNCTION QUIKSORT(RLOD,NLO,NUM)
REAL*8 RLOD(20000)
REAL*8 SORT1(20000),SORT2(20000),SORT3(20000)
INTEGER NUM(20000),NUM1(20000),NUM2(20000)
DO 5 I=1,NLO
SORT1(I)=-1000.0D0
NUM1(I)=0
5 CONTINUE
DO 10 I=1,NLO
XLO=RLOD(I)
C PRINT*,XLO
DO 20 J=NLO+1-I,NLO
IF (XLO.GE.SORT1(J)) THEN
C PRINT*,XLO,SORT1(J),J
IPOS=J
ENDIF
20 CONTINUE
C PRINT*,IPOS
SORT2(IPOS)=XLO
NUM2(IPOS)=I
DO 30 K=2,IPOS
SORT2(K-1)=SORT1(K)
NUM2(K-1)=NUM1(K)
30 CONTINUE
DO 40 K=IPOS+1,NLO
SORT2(K)=SORT1(K)
NUM2(K)=NUM1(K)
40 CONTINUE
DO 50 L=1,NLO
NUM1(L)=NUM2(L)
SORT1(L)=SORT2(L)
C PRINT*,L,SORT1(L)
50 CONTINUE
10 CONTINUE
DO 60 I=1,NLO
NUM(I)=NUM1(I)
60 CONTINUE
QUIKSORT=1.0
RETURN
END
Because I am not a fortran-coder, I wonder if this code can be changed to fully support a 64-bit system?
EDIT:
I tried compiling like this:
gfortran aligator.f -o aligator -g -fcheck=all -fbacktrace
with those errors:
/tmp/ccbJsCPS.o: In function `MAIN__':
/home/alexsson/Programs/Aligator/aligator.f:354:(.text+0x628a): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:355:(.text+0x62ff): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:356:(.text+0x6374): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:357:(.text+0x6380): relocation truncated to fit: R_X86_64_PC32 against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:360:(.text+0x65c5): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:360:(.text+0x65eb): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:365:(.text+0x66c3): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:365:(.text+0x66e9): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:369:(.text+0x697d): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:370:(.text+0x69ee): relocation truncated to fit: R_X86_64_32S against `.bss'
/home/alexsson/Programs/Aligator/aligator.f:371:(.text+0x6a6a): additional relocation overflows omitted from the output
collect2: error: ld returned 1 exit status
Im not sure if I get any smarter...
Solution:
Thanks to tkoenig, the solution is to change the source-code of aligator.f to this: http://pastebin.com/tnYhuba3
It works on my Slackware 64-bit current (multilib enabled) - system with 16 GB RAM.
You have some really large arrays in there which cause problems with a 32-bit addressing mode. If you make the arrays allocatable and allocate them at the start of the program, as with with
CHARACTER*50 INFILE,OUTFILE
CHARACTER*8 PROCNAM(5),PXN
integer, dimension(:,:), allocatable :: jsnp
integer, dimension(:), allocatable :: nsnp, isnp
INTEGER ILIS(100,2)
INTEGER NGS(1000000),IDSNP(1000000)
INTEGER IGS(1000000,25)
INTEGER ICH(20000),IGOC(20000)
INTEGER IDGE(30000),IPFC(35000,2)
INTEGER IDGO(20000),NGC(30000)
INTEGER JDGO(20000),MGC(30000)
INTEGER MPFC(20000),MCG(30000),NCG(30000)
integer, dimension(:,:), allocatable :: igc, jgc, ksco
INTEGER JSCO(20000)
INTEGER ISGSNP(200000)
INTEGER MSCO(20000)
INTEGER NUMSIG(3,5)
INTEGER MUMSIG(3,5),ISLIM(200)
INTEGER ISCO(20000),MUM(20000),NUN(20000)
REAL PR(20000),CRIT(3),CUM(1000001),QR(20000)
REAL ZR(20000),YR(3,5),GM(20000),QZ(5),PMIN(5)
REAL DUM(1000001),QMIN(5)
REAL*8 QP(20000),CHX,RU(20000),QQ(20000)
REAL*8 XX,ST,YY
REAL*8 GETRAN
EXTERNAL GETRAN,QUIKSORT
allocate(JSNP(6000000,25))
allocate(NSNP(6000000), ISNP(6000000))
allocate(IGC(30000,1000),JGC(30000,1000),KSCO(50000,10000))
this should no longer be a concern.
I cannot check this because of lack of memory.
Depending on the range of the values you need to use for the INTEGER, it might also be interesting to change the default integer to a two-byte or one-byte kind, as in
integer, parameter :: i2 = selected_int_kind(4) ! select range of at least 10000, usually translates into two bytes
integer(kind=i2), dimension(:,:), allocatable :: jsnp
or
integer, parameter :: i1 = selected_int_kind(2) ! one-byte integer
This can lead to significant savings in memory and speed.
Related
Why does SAS skip an entire row of data values due to missing value?
When I run the following code the third observation is not output. Why does SAS omit the third observation? data info; input Gender $ Age Height Weight; datalines; M 45 72 149 F 64 62 M 61 72 271 F 29 73 125 M 16 65 178 ; Run; title "Listing of Dataset Demographics"; proc print data=info; run;
Defaults will get you, the default in SAS is FLOWOVER, so if a record is missing it looks for it on the next line. You want MISSOVER or TRUNCOVER instead. Your log tells you this happened with the following note: NOTE: SAS went to a new line when INPUT statement reached past the end of a line. This works: data info; infile cards truncover; input Gender $ Age Height Weight; datalines; M 45 72 149 F 64 62 M 61 72 271 F 29 73 125 M 16 65 178 ; Run; More details are available in the Example 2 in the documentation here. Specifically: When you omit the MISSOVER option or use FLOWOVER (which is the default), SAS moves the input pointer to line 2 and reads values for TEMP4 and TEMP5 (variables it cannot find). The next time the DATA step executes, SAS reads a new line which, in this case, is line 3. This message appears in the SAS log: NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
Lines of text do not have "observations". They just have lines. It didn't skip any of the lines of data. It just used two lines for the second observation because the first of the lines only had values for 3 of the 4 variables the INPUT statement requested. This behavior is what SAS calls the flowover option of the INFILE statement. This allows you to have more than one line of text to represent the data for a single observation without having to be too persnickety about which fields you insert the line breaks between across the different observations of data. If you don't want it to have to go hunt for the next field on the next line of text then make sure every variable has a value in the text lines. You can represent missing values by using a period for either numeric or character variables. So use something like this: data info; input Gender $ Age Height Weight; datalines; M 45 72 149 F 64 62 . M 61 72 271 . 29 73 125 M 16 65 178 ; When using flowover you can insert as many extra line breaks as you want as long as each new observation starts on a new line. Like this data info; input Gender $ Age Height Weight; datalines; M 45 72 149 F 64 62 . M 61 72 271 F 29 73 125 M 16 65 178 ; If you want SAS to just give up when a there are no more values on the line use the flowover option on the infile statement. data info; infile datalines flowover; input Gender $ Age Height Weight; datalines; M 45 72 149 F 64 62 M 61 72 271 F 29 73 125 M 16 65 178 ; There is also the older missover option, but you would normally never want that as it will set values at the end of the line that too short for an explicit INFORMAT width to missing instead of just use the number of characters that are available. PS Don't indent lines of data. That will just make the code harder to read and the diagnostic messages about invalid data values harder to interpret. To make it easier don't intend the DATALINES (aka CARDS) statement line either. That will also make it clearer the data step definition ends where the lines of data starts and prevent you from accidentally inserting other statements for the data step after the data.
How to determine the number of filled drums, and the room left in each drum
Not quite a homework problem, but it may as well be: You have a long list of positive integer values stored in column A. These are packets in unit U. A Drum can fit up to 500 U, but you cannot break up packets. How many drums are required for any given list of values in column A? This does not have to be the most efficient answer, processing in row order is absolutely fine. I Think you should be able to solve this with a formula, but the closest I got was =CEILING(SUM(A1:A1000)/500;1) Of course, this breaks up packets. Additionally, this problem requires me to be able to find the room left in each drum used, but emphasis for this question should remain on just the number required.
This cannot be done with a single simple formula. Each drum and packet needs to be counted. However contrary to my comment, for this particular problem a spreadsheet works well, and there is no need for a macro. First, set B2 to 500 for use in other formulas. If column A is not yet filled, use the formula =RANDBETWEEN(1,B$2) to add some values. Column C is the main formula that determines how full each drum is. Set C2 to =A2. C3 is =IF(C2+A3>B$2,A3,C2+A3). Fill C3 down to fill the remaining rows. For column D, use =IF(C2+A3>B$2,B$2-C2,""). However the last row of column D is shorter: =B$2-C21 and change 21 to whatever the last row is. Finally in column E we find the answer, which is simply =COUNT(D2:D21). Packets Drum Size How Full Room left in each drum used Number of filled drums ------- --------- -------- --------------------------- ---------------------- 206 500 206 294 13 309 309 68 377 84 461 39 305 305 195 387 387 113 118 118 8 126 374 479 479 21 492 492 8 120 120 291 411 89 262 262 108 370 130 440 440 60 88 88 100 188 102 290 210 478 478 22 87 87 413 For OpenOffice Calc, use semicolons ; instead of commas , in formulas.
gruobi: used model.write but cannot find the file
I am using Gurobi with C++ and want to save the Lp as a file.lp. Therefore, I used model.write("model.lp"); model.optimize(); This is my output and nor error occurs: Optimize a model with 105 rows, 58 columns and 186 nonzeros Coefficient statistics: Matrix range [1e+00, 1e+00] Objective range [1e+00, 1e+00] Bounds range [0e+00, 0e+00] RHS range [1e+00, 6e+00] Presolve removed 105 rows and 58 columns Presolve time: 0.00s Presolve: All rows and columns removed Iteration Objective Primal Inf. Dual Inf. Time 0 -0.0000000e+00 0.000000e+00 0.000000e+00 0s Solved in 0 iterations and 0.00 seconds Optimal objective -0.000000000e+00 obj: -0 Status2 Process finished with exit code 0 So there is probably a mistake in my LP, since the optimal solution should not be 0. This is why I want to have a look at the model.lp file. However, I cannot find it. I searched my whole computer. Am I missing anything?
Reading a formatted data file in Fortran 90
I have a file to read whose structure is: 26.0 24773 -55.4 -86.8 1 0.01 60 27 617.7 617.8 617. 24.2 25230 -55.7 -86.7 1 0.01 64 22 629.7 629.8 629. Station information and sounding indices I'm interested in reading only the number. I used the following format: read(333,10) a(i),b(i)... 10 format(1x,f6.1,2x,i5,... read(333,iostat=stat) Station if(stat/=0)then else endif if(stat==0)exit from this the error appears: forrtl: severe (64): input conversion error, unit 333, file
Pandas quantile failing with NaN's present
I've encountered an interesting situation while calculating the inter-quartile range. Assuming we have a dataframe such as: import pandas as pd index=pd.date_range('2014 01 01',periods=10,freq='D') data=pd.np.random.randint(0,100,(10,5)) data = pd.DataFrame(index=index,data=data) data Out[90]: 0 1 2 3 4 2014-01-01 33 31 82 3 26 2014-01-02 46 59 0 34 48 2014-01-03 71 2 56 67 54 2014-01-04 90 18 71 12 2 2014-01-05 71 53 5 56 65 2014-01-06 42 78 34 54 40 2014-01-07 80 5 76 12 90 2014-01-08 60 90 84 55 78 2014-01-09 33 11 66 90 8 2014-01-10 40 8 35 36 98 # test for q1 values (this works) data.quantile(0.25) Out[111]: 0 40.50 1 8.75 2 34.25 3 17.50 4 29.50 # break it by inserting row of nans data.iloc[-1] = pd.np.NaN data.quantile(0.25) Out[115]: 0 42 1 11 2 34 3 12 4 26 The first quartile can be calculated by taking the median of values in the dataframe that fall below the overall median, so we can see what data.quantile(0.25) should have yielded. e.g. med = data.median() q1 = data[data<med].median() q1 Out[119]: 0 37.5 1 8.0 2 19.5 3 12.0 4 17.0 It seems that quantile is failing to provide an appropriate representation of q1 etc. since it is not doing a good job of handling the NaN values (i.e. it works without NaNs, but not with NaNs). I thought this may not be a "NaN" issue, rather it might be quantile failing to handle even-numbered data sets (i.e. where the median must be calculated as the mean of the two central numbers). However, after testing with dataframes with both even and odd-numbers of rows I saw that quantile handled these situations properly. The problem seems to arise only when NaN values are present in the dataframe. I would like to use quntile to calculate the rolling q1/q3 values in my dataframe, however, this will not work with NaN's present. Can anyone provide a solution to this issue?
Internally, quantile uses numpy.percentile over the non-null values. When you change the last row of data to NaNs you're essentially left with an array array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) in the first column Calculating np.percentile(array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) gives 42. From the docstring: Given a vector V of length N, the qth percentile of V is the qth ranked value in a sorted copy of V. A weighted average of the two nearest neighbors is used if the normalized ranking does not match q exactly. The same as the median if q=50, the same as the minimum if q=0 and the same as the maximum if q=100.