Handle std::thread::hardware_concurrency() - c++

In my question about std::thread, I was advised to use std::thread::hardware_concurrency(). I read somewhere (which I can not find it and seems like a local repository of code or something), that this feature is not implemented for versions of g++ prior to 4.8.
As a matter of fact, I was the at the same victim position as this user. The function will simply return 0. I found in this answer a user implementation. Comments on whether this answer is good or not are welcome!
So I would like to do this in my code:
unsinged int cores_n;
#if g++ version < 4.8
cores_n = my_hardware_concurrency();
#else
cores_n = std::thread::hardware_concurrency();
#endif
However, I could find a way to achieve this result. What should I do?

There is another way than using the GCC Common Predefined Macros: Check if std::thread::hardware_concurrency() returns zero meaning the feature is not (yet) implemented.
unsigned int hardware_concurrency()
{
unsigned int cores = std::thread::hardware_concurrency();
return cores ? cores : my_hardware_concurrency();
}
You may be inspired by awgn's source code (GPL v2 licensed) to implement my_hardware_concurrency()
auto my_hardware_concurrency()
{
std::ifstream cpuinfo("/proc/cpuinfo");
return std::count(std::istream_iterator<std::string>(cpuinfo),
std::istream_iterator<std::string>(),
std::string("processor"));
}

Based on common predefined macros link, kindly provided by Joachim, I did:
int p;
#if __GNUC__ >= 5 || __GNUC_MINOR__ >= 8 // 4.8 for example
const int P = std::thread::hardware_concurrency();
p = (trees_no < P) ? trees_no : P;
std::cout << P << " concurrent threads are supported.\n";
#else
const int P = my_hardware_concurrency();
p = (trees_no < P) ? trees_no : P;
std::cout << P << " concurrent threads are supported.\n";
#endif

Related

is cvCeil() faster than standard library?

I see that OpenCV implement cvCeil function:
CV_INLINE int cvCeil( double value )
{
#if defined _MSC_VER && defined _M_X64 || (defined __GNUC__ && defined __SSE2__&& !defined __APPLE__)
__m128d t = _mm_set_sd( value );
int i = _mm_cvtsd_si32(t);
return i + _mm_movemask_pd(_mm_cmplt_sd(_mm_cvtsi32_sd(t,i), t));
#elif defined __GNUC__
int i = (int)value;
return i + (i < value);
#else
int i = cvRound(value);
float diff = (float)(i - value);
return i + (diff < 0);
#endif
}
I'm curious in this implementations first part, i.e. the _mm_set_sd related calls. Will they be faster than MSVCRT / libstdc++ / libc++ ? And why?
A simple benchmark below tells me that std::round works more than 3 times faster on my SSE4-enabled machine, but about 2 times slower when SSE4 is not enabled.
#include <cmath>
#include <chrono>
#include <sstream>
#include <iostream>
#include <opencv2/core/fast_math.hpp>
auto currentTime() { return std::chrono::steady_clock::now(); }
template<typename T, typename P>
std::string toString(std::chrono::duration<T,P> dt)
{
std::ostringstream str;
using namespace std::chrono;
str << duration_cast<microseconds>(dt).count()*1e-3 << " ms";
return str.str();
}
int main()
{
volatile double x=34.234;
volatile double y;
constexpr auto MAX_ITER=100'000'000;
const auto t0=currentTime();
for(int i=0;i<MAX_ITER;++i)
y=std::ceil(x);
const auto t1=currentTime();
for(int i=0;i<MAX_ITER;++i)
y=cvCeil(x);
const auto t2=currentTime();
std::cout << "std::ceil: " << toString(t1-t0) << "\n"
"cvCeil : " << toString(t2-t1) << "\n";
}
I test with -O3 option on GCC 8.3.0, glibc-2.27, Ubuntu 18.04.1 x86_64 on Intel Core i7-3930K 3.2 GHz.
Output when compiled with -msse4:
std::ceil: 39.357 ms
cvCeil : 143.224 ms
Output when compiled without -msse4:
std::ceil: 274.945 ms
cvCeil : 146.218 ms
It's easy to understand: SSE4.1 introduces the ROUNDSD instruction, which is basically what std::round does. Before this the compiler has to do some comparison/conditional-moves tricks, and it also has to make sure that these don't overflow. Thus the cvCeil version, sacrificing well-definedness for value>INT_MAX and for value<INT_MIN, gets speedup for the values for which it's well-defined. For others it has undefined behavior (or, with intrinsics, simply gives wrong results).

Enable Boost.Log only on debug

I need a logger for debug purpose and I'm using Boost.Log (1.54.0 with a patch in the boost.org homepage).
It's all fine I've created some macro like this:
#define LOG_MESSAGE( lvl ) BOOST_LOG_TRIVIAL( lvl )
Now is that a way that LOG_MESSAGE( lvl ) is expaneded in BOOST_LOG_TRIVIAL( lvl ) only in debug mode and ignore in release?
for example:
LOG_MESSAGE( critical ) << "If I read this message we're in debug mode"
edit
My first attempt is to create a nullstream... I think that in release mode compiler will optimize it...
#if !defined( NDEBUG )
#include <boost/log/trivial.hpp>
#define LOG_MESSAGE( lvl ) BOOST_LOG_TRIVIAL( lvl )
#else
#if defined( __GNUC__ )
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-value"
#endif
#include <iosfwd>
struct nullstream : public std::ostream {
nullstream() : std::ios(0), std::ostream(0) {}
};
static nullstream g_nullstream;
#define LOG_MESSAGE( lvl ) g_nullstream
#if defined( __GNUC__ )
#pragma GCC diagnostic pop
#endif
#endif
The severity level of the log entry meerly acts as a filter for sinks. The sink will decide what to do with the message (print it or not) based on the severity level. But the message will still be sent.
If you are trying to not send the message at all, then you'll need to redefine LOG_MESSAGE to something which actually does nothing. there might be something in the Boost library for this, otherwise, you'll have to write your own. Perhaps this will be a start:
class NullLogger
{
public:
template <typename SeverityT> NullLogger (SeverityT) {};
template <typename Val> NullLog& operator<< (const Val&) { return * this};
};
...and then:
#define LOG_MESSAGE (lvl) NullLogger (lvl)
Note however that even though nothing is being done with the log message or the expressions that make it up, the expressions are still evaluated. If some of these expressions are expensive, you will still take the performance hit. For example:
LOG_MESSAGE (debug) << SomeSuperExpensiveFunction();
Even if you are using the NullLogger above, SomeSuperExpensiveFunction() is still going to be called.
I would suggest as an alternative adding a flag that is evaluated at runtime, and decide at runtime whether or not to do the logging:
if (mLogStuff)
{
LOG_MESSAGE (debug) << SomeSuperExpensiveFunction();
}
boolean comparisons are super cheap, and you may find one day in the future that the ability to turn logging on and off could be super handy. Also, doing this means you don't need to add yet another #define, which is always a good thing.
I like John's NullLogger class. The only change I would make is as follows
#define LOG_MESSAGE(lvl) while (0) NullLogger (lvl)
Unfortunately this may generate warnings, but I would hope a decent compiler would then be able to eliminate all the associated logging code.
It is possible to achieve this without defining a NullLogger or similar:
#define TEST_LOG(lvl) \
if constexpr(boost::log::trivial::lvl >= boost::log::trivial::MAX_LOG_LEVEL) \
BOOST_LOG_TRIVIAL(lvl)
Then compile with -DMAX_LOG_LEVEL=info to statically deactivate all log messages below info.
Also note that with a properly implemented macro (like TEST_LOG but also like BOOST_LOG_TRIVIAL) expensive functions are not evaluated:
// We either log with trace or warning severity, so this filter
// does not let any message pass
logging::core::get()->set_filter(
logging::trivial::severity >= logging::trivial::error);
// Filtered at compile time
{
auto start = std::chrono::steady_clock::now();
for (size_t i = 0; i < 1000 * 1000; i++) {
TEST_LOG(trace) << "Hello world!";
}
auto end = std::chrono::steady_clock::now();
std::cerr << std::chrono::duration<double>(end-start).count() << "s" << std::endl;
// Prints: 1.64e-07s
}
// Filtered at compile time
{
auto start = std::chrono::steady_clock::now();
for (size_t i = 0; i < 1000 * 1000; i++) {
TEST_LOG(trace) << ComputeExpensiveMessage();
}
auto end = std::chrono::steady_clock::now();
std::cerr << std::chrono::duration<double>(end-start).count() << "s" << std::endl;
// Prints: 8.5e-08s
}
// Filtered at run time
{
auto start = std::chrono::steady_clock::now();
for (size_t i = 0; i < 1000 * 1000; i++) {
TEST_LOG(warning) << "Hello world!";
}
auto end = std::chrono::steady_clock::now();
std::cerr << std::chrono::duration<double>(end-start).count() << "s" << std::endl;
// Prints: 0.249306s
}
// Filtered at run time
{
auto start = std::chrono::steady_clock::now();
for (size_t i = 0; i < 1000 * 1000; i++) {
TEST_LOG(warning) << ComputeExpensiveMessage();
}
auto end = std::chrono::steady_clock::now();
std::cerr << std::chrono::duration<double>(end-start).count() << "s" << std::endl;
// Prints: 0.250101s
}
John's NullLogger class doesn't compile correctly on MSVC, and still requires Boost dependency for SeverityT which is actually not needed.
I propose the following change to the class:
class NullLogger
{
public:
template <typename Val> NullLogger& operator<< (const Val&) { return *this; };
};
#define BOOST_LOG_TRIVIAL(lvl) NullLogger()

Assigning value to passed references using variable argument list (error in VS2010)

The following code compiles and runs in Code::Blocks, but issues and error in VS2010:
"Undhandled exception at 0x770815de in test2.exe: 0xC0000005: Access violation writing to location 0x00000002."
I realise the code is sort of dangerous, it's basically prototyping an idea I have for another project. What I want to be able to do is pass a reference to any given number of ints followed by a value. Then put this value into the referenced ints and bob's your uncle. And it works, which is nice. But not in VS2010 which bothers me. I'm not the most experience with pointers, so I don't know if I'm doing something wrong or it's just this kind of operation is not something that VS2010 is fond of. Which is a problem because the project I'm testing this for is all in VS2010! So I need this to work for that!
EDIT: I'm sorry, I'm new to the Code:Blocks thing. I guess I should specify which compiler I use in Code::Blocks? :D I use the miniGW (or something) implementation of the GNU GCC Compiler (or something like that). I hope it makes sense to you experience Code::Blocks users!
#include <iostream>
#include <stdarg.h>
using namespace std;
void getMonkey(int Count, ... )
{
int test;
va_list Monkeys;
va_start(Monkeys, Count );
for(int i = 0; i < (Count / 2); i++ )
{
*va_arg(Monkeys, int*) = va_arg(Monkeys, int);
}
va_end(Monkeys);
}
int main()
{
int monkey1 = 0;
int monkey2 = 0;
int monkey3 = 0;
getMonkey(6, &monkey1, 2, &monkey2, 4, &monkey3, 5);
cout << monkey1 << " " << monkey2 << " " << monkey3;
return 0;
}
Turns out lvalues and rvalues are NOT evaluated in the order I assumed! TY stackoverflow!
Updated getMonkey method:
void getMonkey(int Count, ... )
{
int test;
va_list Monkeys;
va_start(Monkeys, Count );
for(int i = 0; i < (Count / 2); i++ )
{
int* tempMonkeyPtr = va_arg(Monkeys, int*); //herp
*tempMonkeyPtr = va_arg(Monkeys, int); //derp
}
va_end(Monkeys);
}
Yey! I'm getting a hang of this pointer business me thinks!

Limitation on Qt and boost thread local storage

I have following questions on QThreadStorage and boost's thread_specific_ptr:
1) Is there any limitation on number of objects that can be stored in Qthreadstorage? I came across a qt query about 256 QThreadStorage objects, so like to clarify what this limitation points to?
2) Does QThreadStorage work only with QThreads?
3) Is there any limitation on boost tls?
4) I have a use case where I want to operate on tls and sync the data to main thread when all threads finish for further processing. I wrote the below code and like to check if the below code is okay.
#include <iostream>
#include <boost/thread/thread.hpp>
#include <boost/thread/tss.hpp>
boost::mutex mutex1;
int glob = 0;
class data
{
public:
char* p;
data()
{
p = (char*)malloc(10);
sprintf(p, "test%d\n", ++glob);
}
};
char* global_p[11] = {0};
int index = -1;
void cleanup(data* _ignored) {
std::cout << "TLS cleanup" << std::endl;
boost::mutex::scoped_lock lock(mutex1);
global_p[++index] = _ignored->p;
}
boost::thread_specific_ptr<data> value(cleanup);
void thread_proc()
{
value.reset(new data()); // initialize the thread's storage
std::cout << "here" << std::endl;
}
int main(int argc, char* argv[])
{
boost::thread_group threads;
for (int i=0; i<10; ++i)
threads.create_thread(&thread_proc);
threads.join_all();
for (int i=0; i<10; ++i)
puts(global_p[i]);
}
I can partially answer your question.
The 256 limit belongs to old qt. Probably you are reading old documentation. New qt version (i.e above 4.6) does not have such limit
QThreadStorage can destroy contained items at thread exit because it works closely with QThread. So separting these two is not a wise idea in my opinion.
Here I think you are asking the number of objects that can be stored with boost tls. I am not aware of any limitation on boost tls. You should be fine.
Your code looks OK to me except in the constructor of data you need to put a mutex lock before ++glob otherwise you may not get an incrementing value.
I hope this helps.

How to visualize bytes with C/C++

I'm working my way through some C++ training. So far so good, but I need some help reinforcing some of the concepts I am learning. My question is how do I go about visualizing the byte patterns for objects I create. For example, how would I print out the byte pattern for structs, longs, ints etc?
I understand it in my head and can understand the diagrams in my study materials, I'd just like to be able to programaticially display byte patterns from within some of my study programs.
I realize this is pretty trivial but any answers would greatly help me hammer in these concepts.
Thanks.
Edit: I use mostly XCode for my other development projects, but have VMs for Windows7 and fedora core. At work I use XP with visual studio 2005.
( I can't comment as I am still a n00b here :D)
I used unwind's solution which is about what I am looking for. I am also thinking that maybe I could just use the dos DEBUG command as I'd also like to look at chunks for memory too. Again, this is just to help me reinforce what I am learning. Thanks again people!
You can use a function such as this, to print the bytes:
static void print_bytes(const void *object, size_t size)
{
#ifdef __cplusplus
const unsigned char * const bytes = static_cast<const unsigned char *>(object);
#else // __cplusplus
const unsigned char * const bytes = object;
#endif // __cplusplus
size_t i;
printf("[ ");
for(i = 0; i < size; i++)
{
printf("%02x ", bytes[i]);
}
printf("]\n");
}
Usage would look like this, for instance:
int x = 37;
float y = 3.14;
print_bytes(&x, sizeof x);
print_bytes(&y, sizeof y);
This shows the bytes just as raw numerical values, in hexadecimal which is commonly used for "memory dumps" like these.
On a random (might even be virtual, for all I know) Linux machine running a "Intel(R) Xeon(R)" CPU, this prints:
[ 25 00 00 00 ]
[ c3 f5 48 40 ]
This handily also demonstrates that the Intel family of CPU:s really are little endian.
If you are using gcc and X, you can use the DDD debugger to draw pretty pictures of your data structures for you.
Just for completeness, a C++ example:
#include <iostream>
template <typename T>
void print_bytes(const T& input, std::ostream& os = std::cout)
{
const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);
os << std::hex << std::showbase;
os << "[";
for (unsigned int i=0; i<sizeof(T); ++i)
os << static_cast<int>(*(p++)) << " ";
os << "]" << std::endl;;
}
int main()
{
int i = 12345678;
print_bytes(i);
float x = 3.14f;
print_bytes(x);
}
Or if you have the boost lib and want to use lambda evaluations you can do it this way ...
template<class T>
void bytePattern( const T& object )
{
typedef unsigned char byte_type;
typedef const byte_type* iterator;
std::cout << "Object type:" << typeid( T ).name() << std::hex;
std::for_each(
reinterpret_cast<iterator>(&object),
reinterpret_cast<iterator>(&object) + sizeof(T),
std::cout << constant(' ') << ll_static_cast<int>(_1 )&&0xFF );
std::cout << "\n";
}
Most (visual) debuggers have a "View Memory' option. IIRC the one in Xcode is pretty basic, just showing bytes in HEX and ASCII, with a variable line length. Visual Studio (Debug->Windows->Memory in Vs2008) can format the hex portion as different integer lengths, or floating point, change the endianess, and display ANSI or UNICODE text. You can also set just about any number for the width of the window (I think xcode only lets you go to 64 bytes wide) The other IDE I have here at work has a lot of options, though not quite as many as VS.
A little bit by bit console program i whipped up, hope it helps somebody
#include <iostream>
#include <inttypes.h>
#include <vector>
using namespace std;
typedef vector<uint8_t> ByteVector;
///////////////////////////////////////////////////////////////
uint8_t Flags[8] = { 0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80};
void print_bytes(ByteVector Bv){
for (unsigned i = 0; i < Bv.size(); i++){
printf("Byte %d [ ",i);
for (int j = 0;j < 8;++j){
Bv[i] & Flags[j] ? printf("1") : printf("0");
}
printf("]\n");
}
}
int main(){
ByteVector Bv;
for (int i = 0; i < 4; ++i) { Bv.push_back(i); }
print_bytes(Bv);
}
try this:
MyClass* myObj = new MyClass();
int size=sizeof(*myObj);
int i;
char* ptr = obj; // closest approximation to byte
for( i=0; i<size; i++ )
std::cout << *ptr << endl;
Cheers,
jrh.