MinGW gprof inaccurate results? - profiling

I've been profiling a program with gprof on Linux (Ubuntu 11.04) and Windows (7, latest version of MinGW), same program on more or less the same dataset each time, and getting significantly different results. (Significantly as in they would lead to different conclusions about what part of the code needs optimizing.)
It's possible that the results could be legitimately different on the two systems, but I also have to consider the possibility that one result set is inaccurate and should be ignored, and a priori the more likely one would be MinGW as gprof is less extensively tested on Windows than on Linux. A stronger argument for that conclusion is that the results on Windows look distinctly weird:
% cumulative self self total
time seconds seconds calls us/call us/call name
27.43 1.13 1.13 68589813 0.02 0.02 addt
21.48 2.02 0.89 tok
19.17 2.81 0.79 hash
9.95 3.21 0.41 slot
7.89 3.54 0.33 nextx
4.85 3.74 0.20 next
3.52 3.88 0.14 27809047 0.01 0.01 get
0.85 3.92 0.04 eol
0.73 3.95 0.03 __mingw_pformat
0.73 3.98 0.03 ch
0.73 4.01 0.03 tokx
0.49 4.03 0.02 slot
0.49 4.05 0.02 tok
0.24 4.06 0.01 166896 0.06 0.06 mk2
0.24 4.07 0.01 6693 1.49 1.49 initt
0.24 4.08 0.01 __pformat_putchars
0.24 4.09 0.01 hashs
0.24 4.10 0.01 pop
0.24 4.11 0.01 quoted
0.12 4.12 0.01 eat
0.12 4.12 0.01 expand
0.00 4.12 0.00 145841014 0.00 0.00 initparse
There are a lot of gaps, and then initparse, which is an initialization function called only once that calls almost nothing else, is reported as having been called one hundred and forty-five million times.
Should I disregard the results from Windows and just use the ones from Linux? Or is there some issue with the reporting of number of calls on Windows that doesn't affect the percentage time results? Or am I misreading the output or otherwise misusing the tool?

Related

Understanding why some functions are not profiled in Gprof?

I am trying to use gprof and the legend reads for the calls column
calls the number of times this function was invoked, if
this function is profiled, else blank.
I have some functions for which that field is missing? What does it mean? I didn't give any special options.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
68.77 9.38 9.38 void BilinearForm::int3d...
27.71 13.16 3.78 void BilinearForm::int2d...
1.54 13.37 0.21 BilinearForm::finalize()
0.73 13.47 0.10 11275600 0.00 0.00 frame_dummy
...
This is how the head looks like.

rocksdb write stall with many writes of the same data

First of all I have to explain our setup, as it is quite different to the normal database server with raidgroups and so on. Usually our customers buy a server like a HP DL 380 Gen10 with two 300 GB HDDs (not SSDs) which are configured as a RAID 1 and running with Windows.
We are only managing the meta data of other storages here, so that a client can ask us and find its information on those large storages.
As our old database was always corrupt, we searched for a new more stable database which also has not that much overhead and found rocksdb, currently with version 6.12.0.
Unfortunately after it ran for some hours, it seems to block my program for many minutes due to a write stall:
2020/10/01-15:58:44.678646 1a64 [WARN] [db\column_family.cc:876] [default]
Stopping writes because we have 2 immutable memtables (waiting for flush),
max_write_buffer_number is set to 2
Am I right that the write workload is too much for the machine?
Our software is a service which retrieves at least one update per second from up to 2000 different servers (limit might be increased in the future if possible). Most of the time, the same database entries are only updated/written again, as one of the informations inside the entry is the current time of the respective server. Of course I could try to write the data less often to hdd, but if then a client requests this data from us, our information would not be up to date.
So my questions are:
I assume that currently every writerequest is really written to the disk, is there a way to enable some kind of cache (or maybe increase its size, if not sufficient?) so that the data is not written less often to the hdd, but read requests return the correct data from the memory?
I also see that there is a merge operator, but I'm not sure when this merge would take place? Is there already a cache like mentioned under 1. and the data is collected for some time, then merged and then written to hdd?
Are there any other optimizations which could help me in this situation?
Any help is welcome. Thanks in advance.
Here some more loglines if they might be interesting:
** File Read Latency Histogram By Level [default] **
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp
Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
L0 2/0 17.12 MB 0.5 0.0 0.0 0.0 0.4 0.4 0.0 1.0 0.0 81.5 5.15 0.00 52 0.099 0 0
L1 3/0 192.76 MB 0.8 3.3 0.4 2.9 3.2 0.3 0.0 8.0 333.3 327.6 10.11 0.00 13 0.778 4733K 119K
L2 20/0 1.02 GB 0.4 1.6 0.4 1.2 1.2 -0.0 0.0 3.1 387.5 290.0 4.30 0.00 7 0.614 2331K 581K
Sum 25/0 1.22 GB 0.0 4.9 0.8 4.1 4.9 0.7 0.0 11.8 257.4 254.5 19.56 0.00 72 0.272 7064K 700K
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0
** Compaction Stats [default] **
Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
Low 0/0 0.00 KB 0.0 4.9 0.8 4.1 4.5 0.3 0.0 0.0 349.5 316.4 14.40 0.00 20 0.720 7064K 700K
High 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.4 0.4 0.0 0.0 0.0 81.7 5.12 0.00 51 0.100 0 0
User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 50.9 0.03 0.00 1 0.030 0 0
Uptime(secs): 16170.9 total, 0.0 interval
Flush(GB): cumulative 0.410, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 4.86 GB write, 0.31 MB/s write, 4.92 GB read, 0.31 MB/s read, 19.6 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [default] **
2020/10/01-15:53:21.248110 1a64 [db\db_impl\db_impl_write.cc:1701] [default] New memtable created with log file: #10465. Immutable memtables: 0.
2020/10/01-15:58:44.678596 1a64 [db\db_impl\db_impl_write.cc:1701] [default] New memtable created with log file: #10466. Immutable memtables: 1.
2020/10/01-15:58:44.678646 1a64 [WARN] [db\column_family.cc:876] [default] Stopping writes because we have 2 immutable memtables (waiting for flush), max_write_buffer_number is set to 2
2020/10/01-16:02:57.448977 2328 [db\db_impl\db_impl.cc:900] ------- DUMPING STATS -------
2020/10/01-16:02:57.449034 2328 [db\db_impl\db_impl.cc:901]
** DB Stats **
Uptime(secs): 16836.8 total, 665.9 interval
Cumulative writes: 20M writes, 20M keys, 20M commit groups, 1.0 writes per commit group, ingest: 3.00 GB, 0.18 MB/s
Cumulative WAL: 20M writes, 0 syncs, 20944372.00 writes per sync, written: 3.00 GB, 0.18 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 517K writes, 517K keys, 517K commit groups, 1.0 writes per commit group, ingest: 73.63 MB, 0.11 MB/s
Interval WAL: 517K writes, 0 syncs, 517059.00 writes per sync, written: 0.07 MB, 0.11 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
Our watchdog caused a breakpoint after a mutex was blocked for several minutes, after that this showed up in the logfile:
2020/10/02-17:44:18.602776 2328 [db\db_impl\db_impl.cc:900] ------- DUMPING STATS -------
2020/10/02-17:44:18.602990 2328 [db\db_impl\db_impl.cc:901]
** DB Stats **
Uptime(secs): 109318.0 total, 92481.2 interval
Cumulative writes: 20M writes, 20M keys, 20M commit groups, 1.0 writes per commit group, ingest: 3.00 GB, 0.03 MB/s
Cumulative WAL: 20M writes, 0 syncs, 20944372.00 writes per sync, written: 3.00 GB, 0.03 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
** Compaction Stats [default] **
Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
L0 2/0 17.12 MB 0.5 0.0 0.0 0.0 0.4 0.4 0.0 1.0 0.0 81.5 5.15 0.00 52 0.099 0 0
L1 3/0 192.76 MB 0.8 3.3 0.4 2.9 3.2 0.3 0.0 8.0 333.3 327.6 10.11 0.00 13 0.778 4733K 119K
L2 20/0 1.02 GB 0.4 1.6 0.4 1.2 1.2 -0.0 0.0 3.1 387.5 290.0 4.30 0.00 7 0.614 2331K 581K
Sum 25/0 1.22 GB 0.0 4.9 0.8 4.1 4.9 0.7 0.0 11.8 257.4 254.5 19.56 0.00 72 0.272 7064K 700K
Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0
** Compaction Stats [default] **
Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
Low 0/0 0.00 KB 0.0 4.9 0.8 4.1 4.5 0.3 0.0 0.0 349.5 316.4 14.40 0.00 20 0.720 7064K 700K
High 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.4 0.4 0.0 0.0 0.0 81.7 5.12 0.00 51 0.100 0 0
User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 50.9 0.03 0.00 1 0.030 0 0
Uptime(secs): 109318.0 total, 92481.2 interval
Flush(GB): cumulative 0.410, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 4.86 GB write, 0.05 MB/s write, 4.92 GB read, 0.05 MB/s read, 19.6 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 1 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [default] **
RocksDB does have a built-in cache, which you can definitely explore, but I suspect your write stalls can be handled by increasing either the max_write_buffer_number or max_background_flushes configuration parameters.
The former will reduce the size of the memory table table to flush, while the latter will incerase the number of background threads available for flushing operations.
It's hard to say whether the workload is too high for the machine just from this information, because there are a lot of moving parts, but I doubt it. For starters, two background flush threads is pretty low. Also, I'm obviously not familiar with the workload you're testing for, but since you're not using the caching mechanism, the performance can only improve from here.
Disk-writes are tricky. Any linux application "writing to disk" is really calling the write system call and thus writing to an intermediate kernel buffer which is then written to its intended destination once the kernel deems fit and proper. The process of actually updating the data on the disk is known as writeback (I highly recommend reading the file I/O chapter from Robert Love's Linux System Programming for more on this).
You can call use the fsync(2) system call to manually commit a file's system cache to disk, but it's not recomended, since the kernel tends to know when the best time to do this is.
Before rolling any bespoke optimizations, I would checkout the tuning guide. If tuning the configuration for your workload doesn't solve the problem, I would probably look into a RAID configuration. Write stalls are a disk I/O throughput limitation, so data striping might give you enough of a boost, even giving you the option to add some redundancy into your storage solution, depending on the RAID configuration you opt for.
Either way, my gut feeling is that 2000 connections per second seems way too low for the problem to be the current machine's CPU or even memory. Unless each request requires a significant amount of work to process, I doubt this is it. Of course, it never hurts to make sure your endpoint's server configuration is also optimally tuned.

Visual Studio profiler too much noise in results

Can I restrict the results in Visual Studio? I just want to see results for code that I have written. I am following the "500 line openGL" tutorial and I am jealous of his results in the tutorial.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
69.16 2.95 2.95 3000000 0.00 0.00 line(int, int, int, int, TGAImage&, TGAColor)
19.46 3.78 0.83 204000000 0.00 0.00 TGAImage::set(int, int, TGAColor)
8.91 4.16 0.38 207000000 0.00 0.00 TGAColor::TGAColor(TGAColor const&)
1.64 4.23 0.07 2 35.04 35.04 TGAColor::TGAColor(unsigned char, unsigned char, unsigned char, unsigned char)
0.94 4.27 0.04 TGAImage::get(int, int)
Here are my results below :( I am using Visual Studio Community 2017. I am using instrumentation because setup by the "Performance Wizard". If I just use "CPU Usage" I get an invalid process error. I read that this happens because my program exits too fast.

How to read data into an array

I have FORTRAN 77 code from an engineering textbook that I would like to make use of. The problem is that I am unable to understand how I input the data into the arrays that are called namely: FDAM1(61),FDAM2(61),FPOW1(61),FPOW2(61),UDAM(61) and UPOW(61).
For your reference the code has been taken from Page 49 of this book: https://books.google.pt/books?id=i2hyniQpecYC&lpg=PR6&dq=optimal%20design%20siddall&pg=PA49#v=onepage&q=optimal%20design%20siddall&f=false
C PROGRAM TST (INPUT,OUTPUT,TAPE5=INPUT,TAPE6=OUTPUT)
C
C PROGRAM TO ESTIMATE MAXIMUM EXPECTED VALUE FOR ALTERNATE DESIGNS
C
C FDENS(I)= ARRAYS FOR DATA DEFINING DENSITY FUNCTIONS
C FDAM1(I)= ARRAY DEFINING DENSITY FUNCTION FOR DAMAGE IN DESIGN 1
C DFAM2(I)= ARRAY DEFINING DENSITY FUNCTION FOR DAMAGE IN DESIGN 2
C FPOW1(I)= ARRAY DEFINING DENSITY FUNCTION FOR POWER IN DESIGN 1
C FPOW2(I)= ARRAY DEFINING DENSITY FUNCTION FOR POWER IN DESIGN 2
C UDAM(I)= VALUE CURVE FOR DAMAGE
C UPOW(I)= VALUE CURVE FOR POWER
C
DIMENSION FDENS(61),FDAM1(61),FDAM2(61),FPOW1(61),FPOW2(61),
1UDAM(61),UPOW(61),FUNC(61)
C
C NORMALIZE DENSITY FUNCTIONS
C
DO 1 I=1,4
READ(5,10)(FDENS(J),J=1,61)
READ(5,11)RANGE
AREA=FSIMP(FDENS,RANGE,61)
DO 2 J=1,61
GO TO(3,4,5,6)I
3 FDAM1(J)=FDENS(J)/AREA
GO TO 2
4 FDAM2(J)=FDENS(J)/AREA
GO TO 2
5 FPOW1(J)=FDENS(J)/AREA
GO TO 2
6 FPOW2(J)=FDENS(J)/AREA
2 CONTINUE
1 CONTINUE
C
C DETERMINE EXPECTED VALUES
C
READ(5,10)(UDAM(J),J=1,61)
READ(5,10)(UPOW(J),J=1,61)
DO 20 I=1,6
GO TO (30,31,32,33,34,35)I
30 DO 40 J=1,61
40 FUNC(J)=FDAM1(J)*UDAM(J)
RANGE=12.
E1=FSIMP(FUNC,RANGE,61)
GO TO 20
31 DO 41 J=1,61
41 FUNC(J)=FDAM2(J)*UDAM(J)
C
RANGE=12.
E2=FSIMP(FUNC,RANGE,61)
GO TO 20
32 DO 42 J=1,61
RANGE=60.
42 FUNC(J)=FPOW1(J)*UPOW(J)
E3=FSIMP(FUNC,RANGE,61)
33 DO 43 J=1,61
43 FUNC(J)=FPOW2(J)*UPOW(J)
RANGE=60.
E4=FSIMP(FUNC,RANGE,61)
GO TO 20
34 E5=8.17
GO TO 20
35 E6=2.20
20 CONTINUE
DES1=E1+E3+E5
DES2=E2+E4+E6
C
C OUTPUT
C
WRITE(6,100)
100 FORMAT(/,1H ,15X,24HEXPECTED VALUES OF VALUE,//)
WRITE(6,101)
101 FORMAT(/,1H ,12X,6HDAMAGE,7X,5HPOWER,9X,5HPARTS,8X,5HTOTAL,//)
WRITE(6,102)E1,E3,E5,DES1
102 FORMAT(/,1H ,8HDESIGN 1,4X,F5.3,8X,F5.3,9X,F5.3,8X,F6.3)
WRITE(6,103)E2,E4,E6,DES2
103 FORMAT(/,1H ,8HDESIGN 2,4X,F5.3,8X,F5.3,9X,F5.3,8X,F6.3)
10 FORMAT(16F5.2)
11 FORMAT(F5.0)
STOP
END
SUBROUTINE FSIMP
FUNCTION FSIMP(FUNC,RANGE,MINT)
C.... CALCULATES INTEGRAL BY SIMPSONS RULE WITH
C MODIFICATION IF MINT IS EVEN
C.... INPUT
C FUNC = ARRAY OF EQUALLY SPACED VALUES OF FUNCTION
C DIMENSION MINT
C RANGE = RANGE OF INTEGRATION
C MINT = NUMBER OF STATIONS
C.... OUTPUT
C FSIMP = AREA
DIMENSION FUNC(1)
C.... CHECK MINT FOR ODD OR EVEN
XX=RANGE/(3.*FLOAT(MINT-1))
M=MINT/2*2
IF(M.EQ.MINT) GO TO 3
C.... ODD
AREA=FUNC(1)+FUNC(M)
MM=MINT-1
DO 1 I=2,MM,2
1 AREA=AREA+4.*FUNC(I)
MM=MM-1
DO 2 I=3,MM,2
2 AREA=AREA+2.*FUNC(I)
FSIMP=XX*AREA
RETURN
C.... EVEN
C.... USE SIMPSONS RULE FOR ALL BUT THE LAST 3 INTERVALS
3 M=MINT-3
AREA=FUNC(1)+FUNC(M)
MM=M-1
DO 4 I=2,MM,2
4 AREA=AREA+4.*FUNC(I)
MM=MM-1
DO 5 I=3,MM,2
5 AREA=AREA+2.*FUNC(I)
FSIMP=XX*AREA
C.... USE NEWTONS 3/3 RULE FOR LAST THREE INTERVALS
FSIMP=FSIMP+9./3.*XX*(FUNC(MINT-3)+3.*(FUNC(MINT-2)+FUNC(MINT-1))
1 +FUNC(MINT))
RETURN
END
Here is a minimal example to help you get started:
C Minimal working example of creaky old FORTRAN I/O
PROGRAM ABYSS
IMPLICIT NONE
C
REAL FDENS(61)
REAL XRANGE
INTEGER J
C
10 FORMAT(16F5.2)
11 FORMAT(F5.0)
909 FORMAT(/, 'BEHOLD! A DENSITY DISTRIBUTION',/)
910 FORMAT(10(F5.2, 3X),/)
911 FORMAT(/, 'XRANGE is ', F6.1)
C
CONTINUE
C
READ(5,10) (FDENS(J), J=1,61)
READ(5,11) XRANGE
C
WRITE(6,909)
WRITE(6,910) (FDENS(J), J=1,61)
WRITE(6,911) XRANGE
C
STOP
END
Apologies for writing this in F77; I'm sticking with the style of the code posted above for the sake of this example. Ideally, you'd use a F03 or F08 for new code or a completely different language which actually has decent I/O features and a rich standard library. But I digress.
This code will operate on the data (be careful to preserve the spaces):
0.1 0.3 0.5 0.9 1.30 1.90 2.50 3.20 3.80 4.20
4.70 5.0 5.1 5.2 5.2 5.1 4.9 4.7 4.6 4.4 4.2 3.9 3.8 3.6 3.4 3.2
3.0 2.9 2.7 2.5 2.4 2.2 2.1 1.9 1.8 1.6 1.5 1.4 1.2 1.1 1.0 0.9
0.8 0.7 0.6 0.5 0.4 0.3 0.3 0.2 0.1 0.1
12.
to produce
BEHOLD! A DENSITY DISTRIBUTION
0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.30 0.50 0.90
1.30 1.90 2.50 3.20 3.80 4.20 4.70 5.00 5.10 5.20
5.20 5.10 4.90 4.70 4.60 4.40 4.20 3.90 3.80 3.60
3.40 3.20 3.00 2.90 2.70 2.50 2.40 2.20 2.10 1.90
1.80 1.60 1.50 1.40 1.20 1.10 1.00 0.90 0.80 0.70
0.60 0.50 0.40 0.30 0.30 0.20 0.10 0.10 0.00 0.00
0.00
XRANGE is 12.0
If the code is in abyss.f, the input data is in abyss.dat, you should be able to build the code with
gfortran -g -Wall -Og -o abyss abyss.f
and generate similar results by running
abyss < abyss.dat > abyss.out
A key point to note is that the original code is reading from unit 5 (traditionally taken as stdin, now officially canonized in iso_fortran_env as INPUT_UNIT). In your own code, I'd suggest reading from a data file, so replace the literal 5 with whatever variable contains the unit number of the file you're reading from (hint: consider using the newunit argument to the open command introduced in Fortran 2008. It solves the perennially stupid Fortran problem of trying to find a free I/O unit number.) While you can use I/O redirection, it's suboptimal; it's used here to show how to work around the limitations of the original code.
Also, for the sake of later generations and your own sanity, please avoid taking advantage of Cold-War-era FORTRAN misfeatures such as this spaces-equal-zeroes nonsense. If your data is worth using, it's worth putting in a sensible format which can be easily parsed; columnar, space-delimited values are as good a choice as any. Fortran may actually get a standard library which can read and write CSV files sometime around 2156 (give or take a century) so you have plenty of time to design something decent...

How to generate ocamlprof.dump by ocamlcp or ocamloptp

I read the manual about profiling (ocamlprof): http://caml.inria.fr/pub/docs/manual-ocaml-4.01/profil.html
I have a hard time to use it. The way I tried to do an example with gprof is:
For example I have a file name: ex.ml
I run: sudo ocamlopt -p ex.ml -o ex
then I use: gprof ex > profile.txt
It shows me a bunch of information but the column related to time is all 0
For instance (this taken from my real function):
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 415 0.00 0.00 caml_page_table_modify
0.00 0.00 0.00 57 0.00 0.00 caml_get_exception_backtrace
I don't understand why at the column time all functions return 0.00.
In the link above there is a file ocamlprof.dump, I don't know how to write a command generate it. How can I generate ocamlprof.dump. How can I know the locate of a name for example :caml_page_table_modify ?
Thank you very much for your help.