I'm trying to write a simple single header benchmarker and I understand that std::clock will give me the time that a process (thread) is in actual use.
So, given the following simplified program:
nt main() {
using namespace std::literals::chrono_literals;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
std::this_thread::sleep_for(1s);
// clobber();
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
We get the following output:
cpu: 4820 4839 1.9e-05 s
wall: 1.00007 s
I just want to clarify that the cpu time is the time that it executes the code that is not actually the sleep_for code as that is actually done by the kernel which std::clock doesn't track. So to confirm, I changed what I was timing:
int main() {
using namespace std::literals::chrono_literals;
int value = 0;
auto start_cpu = std::clock();
auto start_wall = std::chrono::high_resolution_clock::now();
// clobber();
for (int i = 0; i < 1000000; ++i) {
srand(value);
value = rand();
}
// clobber();
std::cout << "value = " << value << std::endl;
auto finish_cpu = std::clock();
auto finish_wall = std::chrono::high_resolution_clock::now();
std::cerr << "cpu: "
<< start_cpu << " " << finish_cpu << " "
<< (finish_cpu - start_cpu) / (double)CLOCKS_PER_SEC << " s" << std::endl;
std::cerr << "wall: "
// << FormatTime(start_wall) << " " << FormatTime(finish_wall) << " "
<< (finish_wall - start_wall) / 1.0s << " s" << std::endl;
return 0;
}
Demo
This gave me an output of:
cpu: 4949 1398224 1.39328 s
wall: 2.39141 s
value = 354531795
So far, so good. I then tried this on my windows box running MSYS2's g++ compiler. The output for the last program gave me:
value = 0
cpu: 15 15 0 s
wall: 0.0080039 s
std::clock() is always outputting 15? Is the compiler implementation of std::clock() broken?
Seems that I assumed that CLOCKS_PER_SEC would be the same. However, on the MSYS2 compiler, it was 1000x less then on godbolt.org.
Related
I am trying to make a text game where there is a timer and once the game was finished before or in 60 seconds, there is a bonus points. However, I have no idea how can I get the value or the time from using the chrono without cout-ing it. I want to use the value for calculating the bonus point. i can cout the value through the .count() but I cannot get that value to use for the condition part.
here's my code for the scoring part:
void Game::score(auto start, auto end) {
int bonus = 0;
int total = 0;
string name;
box();
gotoxy(10,8); cout << "C O N G R A T U L A T I O N S";
gotoxy(15,10); cout << "You have successfully accomplished all the levels!";
gotoxy(15,11); cout << "You are now a certified C-O-N-N-E-C-T-o-r-I-s-T" << char(002) << char(001);
gotoxy(20,13); cout << "= = = = = = = = = = GAME STATS = = = = = = = = = =";
gotoxy(25,15); cout << "Time Taken: " << chrono::duration_cast<chrono::seconds>(end - start).count() << " seconds";
gotoxy(25,16); cout << "Points: " << pts << " points";
if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 60) {
bonus == 5000;
} else if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 90) {
bonus == 3000;
} else if (chrono::duration_cast<chrono::seconds>(end - start).count() <= 120) {
bonus == 1000;
}
gotoxy(30,17); cout << "Bonus Points (Time Elapsed): " << bonus;
total = pts + bonus;
gotoxy(25,18); cout << "Total Points: " << total << " points";
gotoxy(20,20); cout << "Enter your name: ";
cin >> name;
scoreB.open("scoreboard.txt",ios::app);
scoreB << name << "\t" << total << "\n";
scoreB.close();
}
You should really use the chrono literals for comparing durations. See example here:
#include <chrono>
#include <iostream>
#include <thread>
using Clock = std::chrono::system_clock;
void compareTimes(std::chrono::time_point<Clock> startTime,
std::chrono::time_point<Clock> finishTime) {
using namespace std::chrono_literals;
std::chrono::duration<float> elapsed = finishTime - startTime;
std::cout << "elapsed = " << elapsed.count() << "\n";
if (elapsed > 10ms) {
std::cout << "over 10ms\n";
}
if (elapsed < 60s) {
std::cout << "under 60s\n";
}
}
int main() {
using namespace std::chrono_literals;
auto startTime = Clock::now();
std::this_thread::sleep_for(20ms);
auto finishTime = Clock::now();
compareTimes(startTime, finishTime);
return 0;
}
Demo: https://godbolt.org/z/hqv58acoY
I'm encountering an unexpected performance with my OpenCL code (more precisely, I use boost::compute 1.67.0). For now, I just want to add each elements of 2 buffers c[i] = a[i] + b[i].
I noticed some speed reduction in comparison of an existing SIMD implementation so I isolated each step to highlight which one is time consuming. Here is my code sample :
Chrono chrono2;
chrono2.start();
Chrono chrono;
ipReal64 elapsed;
// creating the OpenCL context and other stuff
// ...
std::string kernel_src = BOOST_COMPUTE_STRINGIZE_SOURCE(
__kernel void add_knl(__global const uchar* in1, __global const uchar* in2, __global uchar* out)
{
size_t idx = get_global_id(0);
out[idx] = in1[idx] + in2[idx];
}
);
boost::compute::program* program = new boost::compute::program;
try {
chrono.start();
*program = boost::compute::program::create_with_source(kernel_src, context);
elapsed = chrono.elapsed();
std::cout << "Create program : " << elapsed << "s" << std::endl;
chrono.start();
program->build();
elapsed = chrono.elapsed();
std::cout << "Build program : " << elapsed << "s" << std::endl;
}
catch (boost::compute::opencl_error& e) {
std::cout << "Error building program : " << std::endl << program->build_log() << std::endl << e.what() << std::endl;
return;
}
boost::compute::kernel* kernel = new boost::compute::kernel;
try {
chrono.start();
*kernel = program->create_kernel("add_knl");
elapsed = chrono.elapsed();
std::cout << "Create kernel : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error creating kernel : " << std::endl << e.what() << std::endl;
return;
}
try {
chrono.start();
// Pass the argument to the kernel
kernel->set_arg(0, bufIn1);
kernel->set_arg(1, bufIn2);
kernel->set_arg(2, bufOut);
elapsed = chrono.elapsed();
std::cout << "Set args : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error setting kernel arguments: " << std::endl << e.what() << std::endl;
return;
}
try {
chrono.start();
queue.enqueue_1d_range_kernel(*kernel, 0, sizeX*sizeY, 0);
elapsed = chrono.elapsed();
std::cout << "Kernel calculation : " << elapsed << "s" << std::endl;
}
catch (const boost::compute::opencl_error& e) {
std::cout << "Error executing kernel : " << std::endl << e.what() << std::endl;
return;
}
std::cout << "[Function] Full duration " << chrono2.elapsed() << std::endl;
chrono.start();
delete program;
elapsed = chrono.elapsed();
std::cout << "Delete program : " << elapsed << "s" << std::endl;
delete kernel;
elapsed = chrono.elapsed();
std::cout << "Delete kernel : " << elapsed << "s" << std::endl;
And here is a sample of result (I run my program on a NVidia GeForce GT 630, with NVidia SDK TookKit) :
Create program : 0.0013123s
Build program : 0.0015421s
Create kernel : 6.6e-06s
Set args : 1.7e-06s
Kernel calculation : 0.0001639s
[Function] Full duration : 0.0077794
Delete program : 4.1e-06s
Delete kernel : 0.0879901s
I know my program is simple and I don't expect having the kernel execution being the most time consumming step. However, I thought the kernel deletion would take only a few ms, such as creating or building the program.
Is this a normal behaviour?
Thanks
I'll point out that I've never used boost::compute, but it looks like it's a fairly thin wrapper over OpenCL, so the following should be correct:
Enqueueing the kernel does not wait for it to complete. The enqueue function returns an event, which you can then wait for, or you can wait for all tasks enqueued onto the queue to complete. You are timing neither of those things. What is likely happening is that when you destroy your kernel, it waits for all queued instances which are still pending to complete before returning from the destructor.
+++ See update below +++
This is a code for reverse printing the content of an array. I used 3 slightly different methods for doing it: directly putting the dimension of the array in the for loop, using iterator and using reverse_iterator and measured the execution time of printing the for loop.
#include <iostream>
#include <vector>
#include <chrono>
using get_time = std::chrono::high_resolution_clock;
int main() {
std::cout << "Enter the array dimension:";
int N;
std::cin >> N;
//Read the array elements
std::cout << "Enter the array elements:" <<'\n';
std::vector <int> v;
int input;
for(size_t i=0; i<N; i++){
std::cin >> input;
v.push_back(input);
}
auto start = get_time::now();
for(int i=N-1; i>=0; i--){
std::cout << v[i] <<" ";
}
auto finish = get_time::now();
auto time_diff=finish-start;
std::cout << "Elapsed time,non-iterator= " << std::chrono::duration<double>
(time_diff).count() << " Seconds" << '\n';
auto start2 = get_time::now();
std::vector <int>::reverse_iterator ri;
for(ri=v.rbegin(); ri!=v.rend(); ri++){
std::cout << *ri <<" ";
}
auto finish2 = get_time::now();
auto time_diff2=finish2-start2;
std::cout << "Elapsed time, reverse iterator= " << std::chrono::duration<double>
(time_diff2).count() << " Seconds" << '\n';
auto start3 = get_time::now();
std::vector <int>::iterator i;
for(i=v.end()-1; i>=v.begin(); i--){
std::cout << *i <<" ";
}
auto finish3 = get_time::now();
auto time_diff3=finish3-start3;
std::cout << "Elapsed time, iterator= " << std::chrono::duration<double>
(time_diff3).count() << " Seconds" << '\n';
return 0;
}
The output is as follows:
Output:
5 4 3 2 1 Elapsed time,non-iterator= 2.7913e-05 Seconds
5 4 3 2 1 Elapsed time, reverse iterator= 5.57e-06 Seconds
5 4 3 2 1 Elapsed time, iterator= 4.56e-06 Seconds
My question is:
Why the direct method is almost 5 times slower than both iterator and reverse_iterator methods? Also, is this faster execution of iterator machine dependent?
This is a prototype, but I will need to deal with much bigger matrices; that is why I am asking this question. Thank you.
+++ Update +++
I am posting the updated results after incorporating the comments. It was too big for a comment.
I changed the for loop to evaluate the sum of an array with 100000 elements. I evaluated the same sum using the above mentioned methods (compiled with -O3 in clang++) and I have averaged the execution time for 3 methods over 10000 runs. Here are the results:
Average (10000 runs) elapsed time, non-iterator= 2.50183e-05
Average (10000 runs) elapsed time, reverse-iterator= 3.48299e-05
Average (10000 runs) elapsed time, iterator= 7.35307e-05
The results are much more uniform now, and now the non-iterator method is the fastest! Any insights? Or even this result is meaningless and I should do some more test?
the updated code:
#include <iostream>
#include <vector>
#include <chrono>
using get_time = std::chrono::high_resolution_clock;
int main() {
double time1,time2,time3;
int run=10000;
for(int k=0; k<run; k++){
//Read the array elements
std::vector <int> v;
int input,N=100000;
for(size_t i=0; i<N; i++){
v.push_back(i);
}
int sum1{0},sum2{0},sum3{0};
auto start = get_time::now();
for(int i=N-1; i>=0; i--){
sum1+=v[i];
}
auto finish = get_time::now();
auto time_diff=finish-start;
std::cout << "Sum= " << sum1 << " " << "Elapsed time,non-iterator= " << std::chrono::duration<double>
(time_diff).count() << " Seconds" << '\n';
auto start2 = get_time::now();
std::vector <int>::reverse_iterator ri;
for(ri=v.rbegin(); ri!=v.rend(); ri++){
sum2+=*ri;
}
auto finish2 = get_time::now();
auto time_diff2=finish2-start2;
std::cout << "Sum= " << sum2 <<" Elapsed time, reverse iterator= " << std::chrono::duration<double>
(time_diff2).count() << " Seconds" << '\n';
auto start3 = get_time::now();
std::vector <int>::iterator i;
for(i=v.end()-1; i>=v.begin(); i--){
sum3+=*i;
}
auto finish3 = get_time::now();
auto time_diff3=finish3-start3;
std::cout << "Sum= " <<sum3 << " Elapsed time, iterator= " << std::chrono::duration<double>
(time_diff3).count() << " Seconds" << '\n';
time1+=std::chrono::duration<double>(time_diff).count();
time2+=std::chrono::duration<double>(time_diff2).count();
time3+=std::chrono::duration<double>(time_diff3).count();
}
std::cout << "Average (" << run << " runs)" << " elapsed time, non-iterator= " << time1/double(run) <<'\n';
std::cout << "Average (" << run << " runs)" << " elapsed time, reverse-iterator= " << time2/double(run) <<'\n';
std::cout << "Average (" << run << " runs)" << " elapsed time, iterator= " << time3/double(run) <<'\n';
return 0;
}
Recently, I'm trying to modify the classification.cpp that given by the official website of caffe so that I can use my trained_models with C++. Although I comples it successfully, an unhandled excption occurs. How can I solve this?
Here is the partial code that I have modified (the definition of variates and functions are not changed):
int WinMain(int argc, char** argv) {
clock_t start_time1, end_time1, start_time2, end_time2;
string model_file = "D:\\mySoftware\\caffe-windows\\examples\\mnist\\lenet.prototxt";
string trained_file = "D:\\mySoftware\\caffe-windows\\examples\\mnist\\model1\\lenet_train_test.prototxt_iter_10000.caffemodel";
string mean_file = "D:\\mySoftware\\caffe-windows\\examples\\mnist\mean_file\\mean.binaryproto";
string label_file = "D:\\mySoftware\\caffe-windows\\examples\\mnist\\label.txt";
string img_file = "D:\\mySoftware\\caffe-windows\\examples\\mnist\\data1\\my_10_images\\0.png";
cv::Mat img = cv::imread(img_file);
CHECK(!img.empty()) << "Unable to decode image " << img;
start_time1 = clock();
Classifier classifier(model_file, trained_file, mean_file, label_file);
end_time1 = clock();
double seconds1 = (double)(end_time1 - start_time1) / CLOCKS_PER_SEC;
std::cout << "init time=" << seconds1 << "s" << std::endl;
start_time2 = clock();
std::vector<Prediction> predictions = classifier.Classify(img);
end_time2 = clock();
double seconds2 = (double)(end_time2 - start_time2) / CLOCKS_PER_SEC;
std::cout << "classify time=" << seconds2 << "s" << std::endl;
/* Print the top N predictions. */
for (size_t i = 0; i < predictions.size(); ++i) {
Prediction p = predictions[i];
std::cout << std::fixed << std::setprecision(4) << p.second << " - \""
<< p.first << "\"" << std::endl;
}
}
This week I found out about boost::object_pool and was amazed that it was about 20-30% quicker than normal new & delete.
For testing I wrote a small C++ app that uses boost::chrono to time the different heap allocator/deallocator (shared_ptr). The functions itself do a simple loop of 60M iterations with a 'new' and 'delete'. Beneath the code:
#include <iostream>
#include <memory>
using std::shared_ptr;
#include <boost/smart_ptr.hpp>
#include <boost/chrono.hpp>
#include <boost/chrono/chrono_io.hpp>
#include <boost/pool/object_pool.hpp>
#include <SSVUtils/SSVUtils.h>
#include "TestClass.h"
const long lTestRecursion = 60000000L;
void WithSmartPtrs()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
for (long i=0; i < lTestRecursion; ++i)
{
boost::shared_ptr<TestClass> spTC = boost::make_shared<TestClass>("Test input data!");
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithSTDSmartPtrs()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
for (long i=0; i < lTestRecursion; ++i)
{
std::shared_ptr<TestClass> spTC = std::make_shared<TestClass>("Test input data!");
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
template<typename T> struct Deleter {
void operator()(T *p)
{
delete p;
}
};
void WithSmartPtrsUnique()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
for (long i=0; i < lTestRecursion; ++i)
{
boost::unique_ptr<TestClass, Deleter<TestClass> > spTC = boost::unique_ptr<TestClass, Deleter<TestClass> >(new TestClass("Test input data!"));
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithSmartPtrsNoMakeShared()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
for (long i=0; i < lTestRecursion; ++i)
{
boost::shared_ptr<TestClass> spTC = boost::shared_ptr<TestClass>( new TestClass("Test input data!"));
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithoutSmartPtrs()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
for (long i=0; i < lTestRecursion; ++i)
{
TestClass* pTC = new TestClass("Test input data!");
delete pTC;
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithObjectPool()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
{
boost::object_pool<TestClass> pool;
for (long i=0; i < lTestRecursion; ++i)
{
TestClass* pTC = pool.construct("Test input data!");
pool.destroy(pTC);
}
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithObjectPoolNoDestroy()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
//{
boost::object_pool<TestClass> pool;
for (long i=0; i < lTestRecursion; ++i)
{
TestClass* pTC = pool.construct("Test input data!");
//pool.destroy(pTC);
}
//}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithSSVUtilsPreAllocDyn()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
{
ssvu::PreAlloc::PreAllocDyn preAllocatorDyn(1024*1024);
for (long i=0; i < lTestRecursion; ++i)
{
TestClass* pTC = preAllocatorDyn.create<TestClass>("Test input data!");
preAllocatorDyn.destroy(pTC);
}
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
void WithSSVUtilsPreAllocStatic()
{
boost::chrono::system_clock::time_point startTime = boost::chrono::system_clock::now();
std::cout << "Start time: " << startTime << std::endl;
{
ssvu::PreAlloc::PreAllocStatic<TestClass> preAllocatorStat(10);
for (long i=0; i < lTestRecursion; ++i)
{
TestClass* pTC = preAllocatorStat.create<TestClass>("Test input data!");
preAllocatorStat.destroy(pTC);
}
}
boost::chrono::system_clock::time_point endTime = boost::chrono::system_clock::now();
std::cout << "End time: " << endTime << std::endl;
boost::chrono::duration<double> d = endTime - startTime;
std::cout << "Duration: " << d << std::endl;
}
int main()
{
std::cout << " With OUT smartptrs (new and delete): " << std::endl;
WithoutSmartPtrs();
std::cout << std::endl << " With smartptrs (boost::shared_ptr withOUT make_shared): " << std::endl;
WithSmartPtrsNoMakeShared();
std::cout << std::endl << " With smartptrs (boost::shared_ptr with make_shared): " << std::endl;
WithSmartPtrs();
std::cout << std::endl << " With STD smart_ptr (std::shared_ptr with make_shared): " << std::endl;
WithSTDSmartPtrs();
std::cout << std::endl << " With Object Pool (boost::object_pool<>): " << std::endl;
WithObjectPool();
std::cout << std::endl << " With Object Pool (boost::object_pool<>) but without destroy called!: " << std::endl;
WithObjectPoolNoDestroy();
std::cout << std::endl << " With SSVUtils PreAllocDyn(1024*1024)!: " << std::endl;
WithSSVUtilsPreAllocDyn();
std::cout << std::endl << " With SSVUtils PreAllocStatic(10)!: " << std::endl;
WithSSVUtilsPreAllocStatic();
return 0;
}
Results:
On Ubuntu LTS 12.04 x64 with GNU C++ 4.6 and boost 1.49
No smart ptrs (new/delete) 5,08024 100 5,1387 100 5,1108 100 5,1099 100
With boost::shared_ptr No boost::make_shared 7,36128 2,2810 145 7,34522 2,2065 143 7,28801 2,1772 143 7,3315 143
With boost::shared_ptr and boost::make_shared 6,60351 1,5233 130 6,82849 1,6898 133 6,61059 1,4998 129 6,6809 131
With std::shared_ptr and std::make_shared 6,07756 0,9973 120 5,93100 0,7923 115 5,9037 0,7929 116 5,9708 117
With boost::unique_ptr 4,97147 -0,1088 100 5,0428 -0,0959 98 4,96625 -0,1445 97 4,9935 98
With boost::object_pool 3,53291 -1,5473 70 3,60357 -1,5351 70 3,52986 -1,5809 69 3,5554 70
With boost::object_pool (Without calling Destroy) 4,52430 -0,5559 89 4,51602 -0,6227 88 4,52137 -0,5894 88 4,5206 88
Results including SSVUtils PreAllocDyn on my MacBook Pro:
Compiled with:
g++-mp-4.8 -I$BOOSTHOME/include -I$SSVUTILSHOME/include -std=c++11 -O2 -L$BOOSTHOME/lib -lboost_system -lboost_chrono -o smartptrtest smartptr.cpp
With OUT smartptrs (new and delete):
Start time: 1381596718412786000 nanoseconds since Jan 1, 1970
End time: 1381596731642044000 nanoseconds since Jan 1, 1970
Duration: 13.2293 seconds
With smartptrs (boost::shared_ptr withOUT make_shared):
Start time: 1381596731642108000 nanoseconds since Jan 1, 1970
End time: 1381596753651561000 nanoseconds since Jan 1, 1970
Duration: 22.0095 seconds
With smartptrs (boost::shared_ptr with make_shared):
Start time: 1381596753651611000 nanoseconds since Jan 1, 1970
End time: 1381596768909452000 nanoseconds since Jan 1, 1970
Duration: 15.2578 seconds
With STD smart_ptr (std::shared_ptr with make_shared):
Start time: 1381596768909496000 nanoseconds since Jan 1, 1970
End time: 1381596785500599000 nanoseconds since Jan 1, 1970
Duration: 16.5911 seconds
With Object Pool (boost::object_pool<>):
Start time: 1381596785500638000 nanoseconds since Jan 1, 1970
End time: 1381596793484515000 nanoseconds since Jan 1, 1970
Duration: 7.98388 seconds
With Object Pool (boost::object_pool<>) but without destroy called!:
Start time: 1381596793484551000 nanoseconds since Jan 1, 1970
End time: 1381596805774318000 nanoseconds since Jan 1, 1970
Duration: 12.2898 seconds
With SSVUtils PreAllocDyn(1024*1024)!:
Start time: 1381596815742696000 nanoseconds since Jan 1, 1970
End time: 1381596824173405000 nanoseconds since Jan 1, 1970
Duration: 8.43071 seconds
With SSVUtils PreAllocStatic(10)!:
Start time: 1381596824173448000 nanoseconds since Jan 1, 1970
End time: 1381596832034965000 nanoseconds since Jan 1, 1970
Duration: 7.86152 seconds
My question:
Are there, besides shared_ptr/unique_ptr/boost::object_pool more heap/allocation mechanisms that can be used for quick heap allocation/deallocation of large set of objects?
NOTE: I also have more results on other machines and Operating systems.
EDIT 1: Added SSVUtils PreAllocDyn Results
EDIT 4: Added my compiler commandline options and retested with SSVUtils PreAllocStatic(10)
Thanks
When I needed a fast new/delete mechanism I wrote it myself. Surly I had to compromise the requirements of "general dynamic allocation memory". This refinement gave me the ability to code exactly what I needed.
In nutshell -
No need in arrays.
Pre allocation is a must (like any heap, though).
The idea is very simple -
pre-allocate a vector of the size of the object needed fast
allocation/deallocation. e.g. MyType preMyType[ 1000 ]
push the addresses of the preallocated objects into a stack.
on new - pop an address
on delete - push the returned address back to the stack.
I packed everything in a nice, simple to use framework that required little from the users.
It ended up in deriving from some class and declaring the initial size.
I can elaborate, including code sample, if you wish.
There's a wacky idea I once had by replacing an array of available slots with an integer. Check it out here:
https://code.google.com/p/cpppractice/source/browse/trunk/staticdelegate.hpp