what is that configures the clock_gettime resolution? - c++

I'm experimenting with the time-management of linux on a raspberry pi. For that I'm looking at clock_gettime (and clock_getres) for CLOCK_REALTIME.
I noticed that when I check clock_getres, it always says nanoseconds resolution (it returns 0 for tv_sec and 1 for tv_nsec) and also the values returned by clock_gettime point in that direction (e.g. I get values like 1642070078.415542996) but only if the system is fully booted with systemd etc running! When I put my test-program as a replacement for init (e.g. init=/test-clock in cmdline.txt), then suddenly getres still returns 1 for tv_nsec but all values returned by clock_gettime are with microsecond resolution!
test-program:
#include <stdint.h>
#include <stdio.h>
#include <time.h>
void test_clock(const clockid_t id, const char *const name)
{
struct timespec ts { 0, 0 };
uint64_t n = 0;
struct timespec start { 0, 0 }, end { 0, 0 };
clock_gettime(id, &start);
uint64_t start_ts = start.tv_sec * uint64_t(1000000000) + start.tv_nsec, end_ts = 0;
do {
clock_gettime(id, &end);
n++;
end_ts = end.tv_sec * uint64_t(1000000000) + end.tv_nsec;
}
while(end_ts - start_ts < 2500000000);
struct timespec now { 0, 0 };
int ns = 0;
for(int i=0; i<1000; i++) {
clock_gettime(id, &now);
if (now.tv_nsec % 1000)
ns++;
}
if (clock_getres(id, &ts))
fprintf(stderr, "clock_getres(%d) failed (%s)\n", id, name);
else
printf("%s:\t%ld.%09ld\t%f/s\t%d\t%ld.%09ld\n", name, ts.tv_sec, ts.tv_nsec, n / 2.5, ns, now.tv_sec, now.tv_nsec);
}
int main(int argc, char *argv[])
{
printf("clock\tresolution\tcalls/s\t# with ns\tnow\n");
test_clock(CLOCK_REALTIME, "CLOCK_REALTIME");
test_clock(CLOCK_TAI, "CLOCK_TAI");
test_clock(CLOCK_MONOTONIC, "CLOCK_MONOTONIC");
test_clock(CLOCK_MONOTONIC_RAW, "CLOCK_MONOTONIC_RAW");
return 0;
}
Compile with:
g++ -Ofast -ggdb3 -o test-clock test-clock.cpp -lrt
Expected output:
clock resolution calls/s # with ns now
CLOCK_REALTIME: 0.000000001 48881594.400000/s 1000 1642071062.213603835
CLOCK_TAI: 0.000000001 49500959.200000/s 1000 1642071101.713668922
CLOCK_MONOTONIC: 0.000000001 49248353.200000/s 1000 2402707.303582035
CLOCK_MONOTONIC_RAW: 0.000000001 47072281.600000/s 1000 2402705.604860726
What I see when starting test-clock as init replacement:
clock resolution calls/s # with ns now
CLOCK_REALTIME: 0.000000001 853001.200000/s 0 19.216404000
CLOCK_TAI: 0.000000001 736536.000000/s 0 21.718848000
CLOCK_MONOTONIC: 0.000000001 853367.200000/s 0 24.220166000
CLOCK_MONOTONIC_RAW: 0.000000001 855598.800000/s 0 26.721360000
The 4th column tells me that no readings were with nanosecond resolution.
So what I would like to know is: how can I configure the kernel/glibc/whatever so that it gives me nanosecond resolution at boot as well?
Any ideas?

Related

The cpu_time obtained by proc_pid_rusage does not meet expectations on the macOS M1 chip

At present, I need to calculate the cpu usage of a certain process on the macOS platform (the target process is not directly related to the current process). I use the proc_pid_rusage API. The calculation method is to call it every once in a while, and then calculate this section The difference between ri_user_time and ri_system_time of the time. So as to calculate the percentage of cpu usage.
I used it on a macOS system with non-M1 chip, and the results were in line with expectations (basically the same as what I saw on the activity monitor), but recently I found that the value obtained on the macOS system with the M1 chip is small. For example, one of my processes that consumes 30+% of the cpu(from activity monitor) is less than 1%.
I provide a demo code, you can directly create a new project to run:
//
// main.cpp
// SimpleMonitor
//
// Created by m1 on 2021/2/23.
//
#include <stdio.h>
#include <stdlib.h>
#include <libproc.h>
#include <stdint.h>
#include <iostream>
#include <thread> // std::this_thread::sleep_for
#include <chrono> // std::chrono::seconds
int main(int argc, const char * argv[]) {
// insert code here...
std::cout << "run simple monitor!\n";
// TODO: change process id:
int64_t pid = 12483;
struct rusage_info_v4 ru;
struct rusage_info_v4 ru2;
int64_t success = (int64_t)proc_pid_rusage((pid_t)pid, RUSAGE_INFO_V4, (rusage_info_t *)&ru);
if (success != 0) {
std::cout << "get cpu time fail \n";
return 0;
}
std::cout<<"getProcessPerformance, pid=" + std::to_string(pid) + " ru.ri_user_time=" + std::to_string(ru.ri_user_time) + " ru.ri_system_time=" + std::to_string(ru.ri_system_time)<<std::endl;
std::this_thread::sleep_for (std::chrono::seconds(10));
int64_t success2 = (int64_t)proc_pid_rusage((pid_t)pid, RUSAGE_INFO_V4, (rusage_info_t *)&ru2);
if (success2 != 0) {
std::cout << "get cpu time fail \n";
return 0;
}
std::cout<<"getProcessPerformance, pid=" + std::to_string(pid) + " ru2.ri_user_time=" + std::to_string(ru2.ri_user_time) + " ru2.ri_system_time=" + std::to_string(ru2.ri_system_time)<<std::endl;
int64_t cpu_time = ru2.ri_user_time - ru.ri_user_time + ru2.ri_system_time - ru.ri_system_time;
// percentage:
double cpu_usage = (double)cpu_time / 10 / 1000000000 * 100 ;
std::cout<<pid<<" cpu usage: "<<cpu_usage<<std::endl;
}
Here I want to know whether there is a problem with my calculation method, if there is no problem, how can I handle the inaccurate results on the M1 chip macOS system?
you have to multiply the cpu usage by some constant. Here are some snipets of code from a diff.
+#include <mach/mach_time.h>
mach_timebase_info_data_t sTimebase;
mach_timebase_info(&sTimebase);
timebase_to_ns = (double)sTimebase.numer / (double)sTimebase.denom;
syscpu.total = task_info.ptinfo.pti_total_system* timebase_to_ns/ 1000000;
usercpu.total = task_info.ptinfo.pti_total_user* timebase_to_ns / 1000000;
~

Clang C++ new operator large dynamic array slow compile time

I am facing a very unusual issue using clang-1200.0.32.21 on Mac OS Catalina 10.15.7.
Essentially, this code takes a long time to compile, and has extremely high RAM usage (peeks around 10GB of RAM):
m_Table = new MapGenerator::Block[MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER]();
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 50.8973 seconds (55.3352 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
34.9917 (100.0%) 15.9056 (100.0%) 50.8973 (100.0%) 55.3352 (100.0%) Clang front-end timer
34.9917 (100.0%) 15.9056 (100.0%) 50.8973 (100.0%) 55.3352 (100.0%) Total
Changing it to the following instantly fixes it:
uint32_t table_size = Map::MAP_DIAMETER * Map::MAP_HEIGHT * Map::MAP_DIAMETER * sizeof(MapGenerator::Block);
m_Table = reinterpret_cast<MapGenerator::Block*>(malloc(table_size));
memset(m_Table, 0, table_size);
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 1.3105 seconds (2.1116 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.1608 (100.0%) 0.1497 (100.0%) 1.3105 (100.0%) 2.1116 (100.0%) Clang front-end timer
1.1608 (100.0%) 0.1497 (100.0%) 1.3105 (100.0%) 2.1116 (100.0%) Total
If you are curious, these are the relevant definitions:
enum : int { MAP_HEIGHT = 15 };
enum : int { MAP_DIAMETER = 1000 };
union Block
{
struct
{
uint8_t valid : 1; // block valid
/* used to determine visible faces */
uint8_t v_top : 1;
uint8_t v_bottom : 1;
uint8_t v_front : 1;
uint8_t v_back : 1;
uint8_t v_right : 1;
uint8_t v_left : 1;
uint8_t v_base : 1;
uint8_t discard : 1; // delete flag
};
uint16_t bits;
};
Block* m_Table;
Is there any logical reason as to why new takes this long to compile? To my understanding it should not be different. Also, I do not have this issue on the MSVC (Microsoft C++) compiler on Windows.
EDIT:
This is a minimal reproducible sample:
#include <cstdint>
#include <cstdlib>
#include <cstring>
enum : int { MAP_HEIGHT = 15 };
enum : int { MAP_DIAMETER = 1000 };
union Block
{
struct
{
uint8_t valid : 1; // block valid
/* used to determine visible faces */
uint8_t v_top : 1;
uint8_t v_bottom : 1;
uint8_t v_front : 1;
uint8_t v_back : 1;
uint8_t v_right : 1;
uint8_t v_left : 1;
uint8_t v_base : 1;
uint8_t discard : 1; // delete flag
};
uint16_t bits;
};
Block* table = nullptr;
// UNCOMMENT THIS:
// #define USE_MALLOC
int main(int argc, char* argv[])
{
#ifndef USE_MALLOC
table = new Block[MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER]();
#else
uint32_t table_size = MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER * sizeof(Block);
table = reinterpret_cast<Block*>(malloc(table_size));
memset(table, 0, table_size);
#endif
(void) table;
#ifndef USE_MALLOC
delete[] table;
#else
free(table);
#endif
return 0;
}
I compile it with the command:
g++ -std=c++17 -g -Wall -Werror -ftime-report -c test.cpp -o test.o

C++: <sys/sysctl.h> fails to declare functions CTL_HW and HW_NCPU

Aloha all!
I'm working with the following script (which I did not write). This is one of many files I've been working on modifying to initiate a build/make on Linux.
Everything I've found online suggests that sys/sysctl.h should properly declare these functions:
CTL_HW and HW_NCPU
However, running the following (called "machineInfo.cpp"):
#include "machineInfo.h"
#include <sys/sysctl.h>
#include <linux/sysctl.h>
#include <cstdio>
#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
int StMachineInfo::numProcs(void) {
int numCPU = 0;
int nprocs;
size_t len = sizeof(nprocs);
static int mib[2] = { CTL_HW, HW_NCPU };
/* get the number of CPUs from the system */
sysctl(mib, 2, &numCPU, &len, NULL, 0);
if( numCPU < 1 )
{
mib[1] = HW_NCPU;
if (sysctl (mib, ARRAY_SIZE(mib), &nprocs, &len, NULL, 0) == 0 && len == sizeof (nprocs) && 0 < nprocs)
numCPU = nprocs;
if( numCPU < 1 )
numCPU = 1;
}
return numCPU;
}
...results in the following error output:
g++ -c machineInfo.cpp
machineInfo.cpp: In function ‘int StMachineInfo::numProcs()’:
machineInfo.cpp:14:24: error: ‘CTL_HW’ was not declared in this scope
static int mib[2] = { CTL_HW, HW_NCPU };
^
machineInfo.cpp:14:32: error: ‘HW_NCPU’ was not declared in this scope
static int mib[2] = { CTL_HW, HW_NCPU };
^
Makefile:33: recipe for target 'machineinfo.o' failed
make: *** [machineinfo.o] Error 1
Is there something wrong with the code itself? Or do I need to #include another header? I've experimented with this and Googled for a couple of hours, to no avail.
Many thanks,
Sean
I believe the problem here is that sysctl does not have a glibc wrapper on Linux. By my best understanding, those constants are only available on BSD.
I'd be happy to be proven wrong, as I'm trying to understand if this uname -p behavior could ever work on Linux.

C++ libm based program taking too much time on baremetal ubuntu server 16 compared to VM ubuntu server 12

I am trying to run a math intensive C++ program on Ubuntu server and surprisingly the Ubuntu Server 16 running on a baremetal Core i7 6700 is taking more time than a dual core Ubuntu server 12.04.5 running on a VM over windows 10 on the same machine. Its totally surprising to see this result. Am using the GCC version 5.4.1 on both. Also tried compiling using the -Ofast and -ffast-math but didn't make any difference. Also tried fetching the latest gcc 7.2 on the bare metal but again it didn't make any difference whatsoever. Also tried fetching the latest libm (glibc) and tried with no difference in the numbers at all. Can someone please help in letting me know where things are going wrong?
Also running callgrind over the program (am using a third party so library and so have no control over it), I see most of the time being spent in libm. The only difference between the two environments other than the server version is the libm version. On VM which performed well it was 2.15 and on the bare metal which takes more time it is 2.23. Any suggestions will be greatly appreciated. Thanks.
The build command is :
g++ -std=c++14 -O3 -o scicomplintest EuroFutureOption_test.cpp -L. -lFEOption
The program is to calculate option greeks for a set of 22 strike prices using a library whose source code isn't available. However would be able to answer any questions w.r.t the test code.
Have simplified the latency calculation using the class below:
typedef std::chrono::high_resolution_clock::time_point TimePoint;
typedef std::chrono::high_resolution_clock SteadyClock;
template <typename precision = std::chrono::microseconds>
class EventTimerWithPrecision
{
public:
EventTimerWithPrecision() { _beg = SteadyClock::now(); }
long long elapsed() {
return std::chrono::duration_cast<precision>(SteadyClock::now()
- _beg).count();
}
void reset() { _beg = SteadyClock::now(); }
private:
TimePoint _beg;
};
typedef EventTimerWithPrecision<> EventTimer;
Now am getting the times as below:
Ubuntu server 12.04.5 on VM with dual core (over windows 10):
siril#ubuntu:/media/sf_workshare/scicompeurofuturestest$ ./scicomplintest
Mean time: 61418 us
Min time: 44990 us
Max time: 79033 us
Ubuntu server 16 on Core i7 6700 bare metal:
Mean time: 104888 us
Min time: 71015 us
Max time: 125928 us
on Windows 10 (MSVC 14) on Core i7 6700 bare metal:
D:\workshare\scicompeurofuturestest\x64\Release>scicompwintest.exe
Mean time: 53322 us
Min time: 39655 us
Max time: 64506 us
I can understand windows 10 performing faster than linux on VM but why is the baremetal ubuntu so slow?
Unable to get to any conclusion am pasting the whole test code below. Please help (really curious to know why its behaving so).
#include <iostream>
#include <vector>
#include <numeric>
#include <algorithm>
#include "FEOption.h"
#include <chrono>
#define PRINT_VAL(x) std::cout << #x << " = " << (x) << std::endl
typedef std::chrono::high_resolution_clock::time_point TimePoint;
typedef std::chrono::high_resolution_clock SteadyClock;
template <typename precision = std::chrono::microseconds>
class EventTimerWithPrecision
{
public:
EventTimerWithPrecision() { _beg = SteadyClock::now(); }
long long elapsed() {
return std::chrono::duration_cast<precision>(SteadyClock::now() - _beg).count();
}
void reset() { _beg = SteadyClock::now(); }
private:
TimePoint _beg;
};
typedef EventTimerWithPrecision<> EventTimer;
int main(){
int cnt, nWarmup = 10, nTimer = 100000;
double CompuTime;
// Option Parameters
double Omega[] = {
-1,
-1,
-1,
1,
1,
1,
1,
-1,
-1,
-1,
1,
1,
1,
1,
-1,
-1,
-1,
1,
1,
1,
1,
-1,
-1,
-1,
1,
1,
1,
1,
-1,
-1,
-1,
1,
1,
1,
1,
-1,
-1,
-1,
1,
1,
1,
1
};
double Strike[] = {
92.77434863,
95.12294245,
97.5309912,
100,
102.5315121,
105.1271096,
107.7884151,
89.93652726,
93.17314234,
96.52623599,
100,
103.598777,
107.327066,
111.1895278,
85.61884708,
90.16671558,
94.95615598,
100,
105.311761,
110.90567,
116.796714,
80.28579206,
86.38250571,
92.9421894,
100,
107.5937641,
115.7641807,
124.5550395,
76.41994703,
83.58682355,
91.4258298,
100,
109.3782799,
119.6360811,
130.8558876,
73.30586976,
81.30036598,
90.16671558,
100,
110.90567,
123.0006763,
136.4147241
};
double Expiration[] = {
7,
7,
7,
7,
7,
7,
7,
14,
14,
14,
14,
14,
14,
14,
30,
30,
30,
30,
30,
30,
30,
60,
60,
60,
60,
60,
60,
60,
90,
90,
90,
90,
90,
90,
90,
120,
120,
120,
120,
120,
120,
120
};
int TradeDaysPerYr = 252;
// Market Parameters
double ValueDate = 0;
double Future = 100;
double annualSigma = 0.3;
double annualIR = 0.05;
// Numerical Parameters
int GreekSwitch = 2;
double annualSigmaBump = 0.01;
double annualIRBump = 0.0001;
double ValueDateBump = 1;
double PV;
double Delta;
double Gamma;
double Theta;
double Vega;
double Rho;
sciStatus_t res;
int nData = sizeof(Strike) / sizeof(double);
std::vector<long long> v(nData);
for (int i = 0; i < nData; i++)
{
for (cnt = 0; cnt < nWarmup; ++cnt){
res = EuroFutureOptionFuncC(annualIR, annualSigma, Omega[i], ValueDate, Expiration[i], Future, Strike[i], TradeDaysPerYr, annualIRBump + cnt*1.0e-16,
annualSigmaBump, ValueDateBump, GreekSwitch,
&PV,
&Delta,
&Gamma,
&Theta,
&Vega,
&Rho
);
if (res != SCI_STATUS_SUCCESS) {
std::cout << "Failure with error code " << res << std::endl;
return -1;
}
}
EventTimer sci;
for (cnt = 0; cnt < nTimer; ++cnt){
res = EuroFutureOptionFuncC(annualIR, annualSigma, Omega[i], ValueDate, Expiration[i], Future, Strike[i], TradeDaysPerYr, annualIRBump + cnt*1.0e-16,
annualSigmaBump, ValueDateBump, GreekSwitch,
&PV,
&Delta,
&Gamma,
&Theta,
&Vega,
&Rho
);
if (res != SCI_STATUS_SUCCESS) {
std::cout << "Failure with error code " << res << std::endl;
return -1;
}
}
v[i] = sci.elapsed();
}
long long sum = std::accumulate(v.begin(), v.end(), 0);
long long mean_t = (double)sum / v.size();
long long max_t = *std::max_element(v.begin(), v.end());
long long min_t = *std::min_element(v.begin(), v.end());
std::cout << "Mean time: " << mean_t << " us" << std::endl;
std::cout << "Min time: " << min_t << " us" << std::endl;
std::cout << "Max time: " << max_t << " us" << std::endl;
std::cout << std::endl;
PRINT_VAL(PV);
PRINT_VAL(Delta);
PRINT_VAL(Gamma);
PRINT_VAL(Theta);
PRINT_VAL(Vega);
PRINT_VAL(Rho);
return 0;
}
The callgrind graph is as follow:
callgrind graph
More updates:
Tried -fopenacc and -fopenmp on both baremetal and vm ubuntu on the same g++ 7.2. The vm showed a little improvement but the baremetal ubuntu is showing the same number again and again. Also since the majority of the time spent is in libm, is there any way to upgrade that library ?(glibc) ? Don't see any new version of it in apt-cache though
Used callgrind and plotted a graph using dot. According to that it takes 42.27% time in libm exp (version 2.23) and 15.18% time in libm log.
Finally found a similar post (so pasting it here for others): The program runs 3 times slower when compiled with g++ 5.3.1 than the same program compiled with g++ 4.8.4, the same command
The problem as suspected was from the libs (according to the post). And by setting the LD_BIND_NOW the execution times came down drastically (and now less than VM). Also that post has couple of links to bugs that were filed for that version of glibc. Will go through and will give more details here. However thanks for all the valuable inputs.

How to get the time in milliseconds in C++

In Java you can do this:
long now = (new Date()).getTime();
How can I do the same but in C++?
Because C++0x is awesome
namespace sc = std::chrono;
auto time = sc::system_clock::now(); // get the current time
auto since_epoch = time.time_since_epoch(); // get the duration since epoch
// I don't know what system_clock returns
// I think it's uint64_t nanoseconds since epoch
// Either way this duration_cast will do the right thing
auto millis = sc::duration_cast<sc::milliseconds>(since_epoch);
long now = millis.count(); // just like java (new Date()).getTime();
This works with gcc 4.4+. Compile it with --std=c++0x. I don't know if VS2010 implements std::chrono yet.
There is no such method in standard C++ (in standard C++, there is only second-accuracy, not millisecond). You can do it in non-portable ways, but since you didn't specify I will assume that you want a portable solution. Your best bet, I would say, is the boost function microsec_clock::local_time().
I like to have a function called time_ms defined as such:
// Used to measure intervals and absolute times
typedef int64_t msec_t;
// Get current time in milliseconds from the Epoch (Unix)
// or the time the system started (Windows).
msec_t time_ms(void);
The implementation below should work in Windows as well as Unix-like systems.
#if defined(__WIN32__)
#include <windows.h>
msec_t time_ms(void)
{
return timeGetTime();
}
#else
#include <sys/time.h>
msec_t time_ms(void)
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (msec_t)tv.tv_sec * 1000 + tv.tv_usec / 1000;
}
#endif
Note that the time returned by the Windows branch is milliseconds since the system started, while the time returned by the Unix branch is milliseconds since 1970. Thus, if you use this code, only rely on differences between times, not the absolute time itself.
You can try this code (get from StockFish chess engine source code (GPL)):
#include <iostream>
#include <stdio>
#if !defined(_WIN32) && !defined(_WIN64) // Linux - Unix
# include <sys/time.h>
typedef timeval sys_time_t;
inline void system_time(sys_time_t* t) {
gettimeofday(t, NULL);
}
inline long long time_to_msec(const sys_time_t& t) {
return t.tv_sec * 1000LL + t.tv_usec / 1000;
}
#else // Windows and MinGW
# include <sys/timeb.h>
typedef _timeb sys_time_t;
inline void system_time(sys_time_t* t) { _ftime(t); }
inline long long time_to_msec(const sys_time_t& t) {
return t.time * 1000LL + t.millitm;
}
#endif
int main() {
sys_time_t t;
system_time(&t);
long long currentTimeMs = time_to_msec(t);
std::cout << "currentTimeMs:" << currentTimeMs << std::endl;
getchar(); // wait for keyboard input
}
Standard C++ does not have a time function with subsecond precision.
However, almost every operating system does. So you have to write code that is OS-dependent.
Win32:
GetSystemTime()
GetSystemTimeAsFileTime()
Unix/POSIX:
gettimeofday()
clock_gettime()
Boost has a useful library for doing this:
http://www.boost.org/doc/libs/1_43_0/doc/html/date_time.html
ptime microsec_clock::local_time() or ptime second_clock::local_time()
Java:
package com.company;
public class Main {
public static void main(String[] args) {
System.out.println(System.currentTimeMillis());
}
}
c++:
#include <stdio.h>
#include <windows.h>
__int64 currentTimeMillis() {
FILETIME f;
GetSystemTimeAsFileTime(&f);
(long long)f.dwHighDateTime;
__int64 nano = ((__int64)f.dwHighDateTime << 32LL) + (__int64)f.dwLowDateTime;
return (nano - 116444736000000000LL) / 10000;
}
int main() {
printf("%lli\n ", currentTimeMillis());
return 0;
}