I am trying to use gprof and the legend reads for the calls column
calls the number of times this function was invoked, if
this function is profiled, else blank.
I have some functions for which that field is missing? What does it mean? I didn't give any special options.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
68.77 9.38 9.38 void BilinearForm::int3d...
27.71 13.16 3.78 void BilinearForm::int2d...
1.54 13.37 0.21 BilinearForm::finalize()
0.73 13.47 0.10 11275600 0.00 0.00 frame_dummy
...
This is how the head looks like.
I have a cluster of consumers (50 or so instance) consuming from kafka partitions.
I notice that there is this one server that is consistently slow. Its cpu usage is always around 80-100%. While the other partitions is around 50%.
Originally I thought there is a slight chance that this is traffic dependent, so I manually switch the partitions that the slow loader is consuming.
However I did not observe an increase in processing speed.
I also don't see cpu steal from iostat, but since all consumer is running the same code I suspect there is some bottle neck in the hardware.
Unfortunately, I can't just replace the server unless I can provide conclusive proof that the hardware is the problem.
So I want to write a load testing script that pin point the bottle neck.
My plan is to write a while loop in python that does n computations, and find out what is the max computation that the slow consumer can do and what is the max computation that the fast consumer can do.
What other testing strategy can I do?
Perhaps I should test disk bottle neck by having my python script write to txt file?
Here is fast consumer iostat
avg-cpu: %user %nice %system %iowait %steal %idle
50.01 0.00 3.96 0.13 0.12 45.77
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.06 0.16 11.46 422953 30331733
xvdb 377.63 0.01 46937.99 35897 124281808572
xvdc 373.43 0.01 46648.25 26603 123514631628
md0 762.53 0.01 93586.24 22235 247796440032
Here is slow consumer iostat
avg-cpu: %user %nice %system %iowait %steal %idle
81.58 0.00 5.28 0.11 0.06 12.98
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 1.02 0.40 13.74 371145 12685265
xvdb 332.85 0.02 40775.06 18229 37636091096
xvdc 327.42 0.01 40514.44 10899 37395540132
md0 676.47 0.01 81289.50 11287 75031631060
I have FORTRAN 77 code from an engineering textbook that I would like to make use of. The problem is that I am unable to understand how I input the data into the arrays that are called namely: FDAM1(61),FDAM2(61),FPOW1(61),FPOW2(61),UDAM(61) and UPOW(61).
For your reference the code has been taken from Page 49 of this book: https://books.google.pt/books?id=i2hyniQpecYC&lpg=PR6&dq=optimal%20design%20siddall&pg=PA49#v=onepage&q=optimal%20design%20siddall&f=false
C PROGRAM TST (INPUT,OUTPUT,TAPE5=INPUT,TAPE6=OUTPUT)
C
C PROGRAM TO ESTIMATE MAXIMUM EXPECTED VALUE FOR ALTERNATE DESIGNS
C
C FDENS(I)= ARRAYS FOR DATA DEFINING DENSITY FUNCTIONS
C FDAM1(I)= ARRAY DEFINING DENSITY FUNCTION FOR DAMAGE IN DESIGN 1
C DFAM2(I)= ARRAY DEFINING DENSITY FUNCTION FOR DAMAGE IN DESIGN 2
C FPOW1(I)= ARRAY DEFINING DENSITY FUNCTION FOR POWER IN DESIGN 1
C FPOW2(I)= ARRAY DEFINING DENSITY FUNCTION FOR POWER IN DESIGN 2
C UDAM(I)= VALUE CURVE FOR DAMAGE
C UPOW(I)= VALUE CURVE FOR POWER
C
DIMENSION FDENS(61),FDAM1(61),FDAM2(61),FPOW1(61),FPOW2(61),
1UDAM(61),UPOW(61),FUNC(61)
C
C NORMALIZE DENSITY FUNCTIONS
C
DO 1 I=1,4
READ(5,10)(FDENS(J),J=1,61)
READ(5,11)RANGE
AREA=FSIMP(FDENS,RANGE,61)
DO 2 J=1,61
GO TO(3,4,5,6)I
3 FDAM1(J)=FDENS(J)/AREA
GO TO 2
4 FDAM2(J)=FDENS(J)/AREA
GO TO 2
5 FPOW1(J)=FDENS(J)/AREA
GO TO 2
6 FPOW2(J)=FDENS(J)/AREA
2 CONTINUE
1 CONTINUE
C
C DETERMINE EXPECTED VALUES
C
READ(5,10)(UDAM(J),J=1,61)
READ(5,10)(UPOW(J),J=1,61)
DO 20 I=1,6
GO TO (30,31,32,33,34,35)I
30 DO 40 J=1,61
40 FUNC(J)=FDAM1(J)*UDAM(J)
RANGE=12.
E1=FSIMP(FUNC,RANGE,61)
GO TO 20
31 DO 41 J=1,61
41 FUNC(J)=FDAM2(J)*UDAM(J)
C
RANGE=12.
E2=FSIMP(FUNC,RANGE,61)
GO TO 20
32 DO 42 J=1,61
RANGE=60.
42 FUNC(J)=FPOW1(J)*UPOW(J)
E3=FSIMP(FUNC,RANGE,61)
33 DO 43 J=1,61
43 FUNC(J)=FPOW2(J)*UPOW(J)
RANGE=60.
E4=FSIMP(FUNC,RANGE,61)
GO TO 20
34 E5=8.17
GO TO 20
35 E6=2.20
20 CONTINUE
DES1=E1+E3+E5
DES2=E2+E4+E6
C
C OUTPUT
C
WRITE(6,100)
100 FORMAT(/,1H ,15X,24HEXPECTED VALUES OF VALUE,//)
WRITE(6,101)
101 FORMAT(/,1H ,12X,6HDAMAGE,7X,5HPOWER,9X,5HPARTS,8X,5HTOTAL,//)
WRITE(6,102)E1,E3,E5,DES1
102 FORMAT(/,1H ,8HDESIGN 1,4X,F5.3,8X,F5.3,9X,F5.3,8X,F6.3)
WRITE(6,103)E2,E4,E6,DES2
103 FORMAT(/,1H ,8HDESIGN 2,4X,F5.3,8X,F5.3,9X,F5.3,8X,F6.3)
10 FORMAT(16F5.2)
11 FORMAT(F5.0)
STOP
END
SUBROUTINE FSIMP
FUNCTION FSIMP(FUNC,RANGE,MINT)
C.... CALCULATES INTEGRAL BY SIMPSONS RULE WITH
C MODIFICATION IF MINT IS EVEN
C.... INPUT
C FUNC = ARRAY OF EQUALLY SPACED VALUES OF FUNCTION
C DIMENSION MINT
C RANGE = RANGE OF INTEGRATION
C MINT = NUMBER OF STATIONS
C.... OUTPUT
C FSIMP = AREA
DIMENSION FUNC(1)
C.... CHECK MINT FOR ODD OR EVEN
XX=RANGE/(3.*FLOAT(MINT-1))
M=MINT/2*2
IF(M.EQ.MINT) GO TO 3
C.... ODD
AREA=FUNC(1)+FUNC(M)
MM=MINT-1
DO 1 I=2,MM,2
1 AREA=AREA+4.*FUNC(I)
MM=MM-1
DO 2 I=3,MM,2
2 AREA=AREA+2.*FUNC(I)
FSIMP=XX*AREA
RETURN
C.... EVEN
C.... USE SIMPSONS RULE FOR ALL BUT THE LAST 3 INTERVALS
3 M=MINT-3
AREA=FUNC(1)+FUNC(M)
MM=M-1
DO 4 I=2,MM,2
4 AREA=AREA+4.*FUNC(I)
MM=MM-1
DO 5 I=3,MM,2
5 AREA=AREA+2.*FUNC(I)
FSIMP=XX*AREA
C.... USE NEWTONS 3/3 RULE FOR LAST THREE INTERVALS
FSIMP=FSIMP+9./3.*XX*(FUNC(MINT-3)+3.*(FUNC(MINT-2)+FUNC(MINT-1))
1 +FUNC(MINT))
RETURN
END
Here is a minimal example to help you get started:
C Minimal working example of creaky old FORTRAN I/O
PROGRAM ABYSS
IMPLICIT NONE
C
REAL FDENS(61)
REAL XRANGE
INTEGER J
C
10 FORMAT(16F5.2)
11 FORMAT(F5.0)
909 FORMAT(/, 'BEHOLD! A DENSITY DISTRIBUTION',/)
910 FORMAT(10(F5.2, 3X),/)
911 FORMAT(/, 'XRANGE is ', F6.1)
C
CONTINUE
C
READ(5,10) (FDENS(J), J=1,61)
READ(5,11) XRANGE
C
WRITE(6,909)
WRITE(6,910) (FDENS(J), J=1,61)
WRITE(6,911) XRANGE
C
STOP
END
Apologies for writing this in F77; I'm sticking with the style of the code posted above for the sake of this example. Ideally, you'd use a F03 or F08 for new code or a completely different language which actually has decent I/O features and a rich standard library. But I digress.
This code will operate on the data (be careful to preserve the spaces):
0.1 0.3 0.5 0.9 1.30 1.90 2.50 3.20 3.80 4.20
4.70 5.0 5.1 5.2 5.2 5.1 4.9 4.7 4.6 4.4 4.2 3.9 3.8 3.6 3.4 3.2
3.0 2.9 2.7 2.5 2.4 2.2 2.1 1.9 1.8 1.6 1.5 1.4 1.2 1.1 1.0 0.9
0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.2 0.1 0.1
12.
to produce
BEHOLD! A DENSITY DISTRIBUTION
0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.30 0.50 0.90
1.30 1.90 2.50 3.20 3.80 4.20 4.70 5.00 5.10 5.20
5.20 5.10 4.90 4.70 4.60 4.40 4.20 3.90 3.80 3.60
3.40 3.20 3.00 2.90 2.70 2.50 2.40 2.20 2.10 1.90
1.80 1.60 1.50 1.40 1.20 1.10 1.00 0.90 0.80 0.70
0.60 0.50 0.40 0.30 0.30 0.20 0.10 0.10 0.00 0.00
0.00
XRANGE is 12.0
If the code is in abyss.f, the input data is in abyss.dat, you should be able to build the code with
gfortran -g -Wall -Og -o abyss abyss.f
and generate similar results by running
abyss < abyss.dat > abyss.out
A key point to note is that the original code is reading from unit 5 (traditionally taken as stdin, now officially canonized in iso_fortran_env as INPUT_UNIT). In your own code, I'd suggest reading from a data file, so replace the literal 5 with whatever variable contains the unit number of the file you're reading from (hint: consider using the newunit argument to the open command introduced in Fortran 2008. It solves the perennially stupid Fortran problem of trying to find a free I/O unit number.) While you can use I/O redirection, it's suboptimal; it's used here to show how to work around the limitations of the original code.
Also, for the sake of later generations and your own sanity, please avoid taking advantage of Cold-War-era FORTRAN misfeatures such as this spaces-equal-zeroes nonsense. If your data is worth using, it's worth putting in a sensible format which can be easily parsed; columnar, space-delimited values are as good a choice as any. Fortran may actually get a standard library which can read and write CSV files sometime around 2156 (give or take a century) so you have plenty of time to design something decent...
I read the manual about profiling (ocamlprof): http://caml.inria.fr/pub/docs/manual-ocaml-4.01/profil.html
I have a hard time to use it. The way I tried to do an example with gprof is:
For example I have a file name: ex.ml
I run: sudo ocamlopt -p ex.ml -o ex
then I use: gprof ex > profile.txt
It shows me a bunch of information but the column related to time is all 0
For instance (this taken from my real function):
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 415 0.00 0.00 caml_page_table_modify
0.00 0.00 0.00 57 0.00 0.00 caml_get_exception_backtrace
I don't understand why at the column time all functions return 0.00.
In the link above there is a file ocamlprof.dump, I don't know how to write a command generate it. How can I generate ocamlprof.dump. How can I know the locate of a name for example :caml_page_table_modify ?
Thank you very much for your help.
I've been profiling a program with gprof on Linux (Ubuntu 11.04) and Windows (7, latest version of MinGW), same program on more or less the same dataset each time, and getting significantly different results. (Significantly as in they would lead to different conclusions about what part of the code needs optimizing.)
It's possible that the results could be legitimately different on the two systems, but I also have to consider the possibility that one result set is inaccurate and should be ignored, and a priori the more likely one would be MinGW as gprof is less extensively tested on Windows than on Linux. A stronger argument for that conclusion is that the results on Windows look distinctly weird:
% cumulative self self total
time seconds seconds calls us/call us/call name
27.43 1.13 1.13 68589813 0.02 0.02 addt
21.48 2.02 0.89 tok
19.17 2.81 0.79 hash
9.95 3.21 0.41 slot
7.89 3.54 0.33 nextx
4.85 3.74 0.20 next
3.52 3.88 0.14 27809047 0.01 0.01 get
0.85 3.92 0.04 eol
0.73 3.95 0.03 __mingw_pformat
0.73 3.98 0.03 ch
0.73 4.01 0.03 tokx
0.49 4.03 0.02 slot
0.49 4.05 0.02 tok
0.24 4.06 0.01 166896 0.06 0.06 mk2
0.24 4.07 0.01 6693 1.49 1.49 initt
0.24 4.08 0.01 __pformat_putchars
0.24 4.09 0.01 hashs
0.24 4.10 0.01 pop
0.24 4.11 0.01 quoted
0.12 4.12 0.01 eat
0.12 4.12 0.01 expand
0.00 4.12 0.00 145841014 0.00 0.00 initparse
There are a lot of gaps, and then initparse, which is an initialization function called only once that calls almost nothing else, is reported as having been called one hundred and forty-five million times.
Should I disregard the results from Windows and just use the ones from Linux? Or is there some issue with the reporting of number of calls on Windows that doesn't affect the percentage time results? Or am I misreading the output or otherwise misusing the tool?