My application running on Mac OS X that needs to retrieve details about the machine it is running on to report for system information. One of the items I need is details about the processor(s) installed in the computer.
My code currently works, but is far from an ideal solution, in fact I consider it a bad solution, but I have had no luck in finding a better one.
The information I report currently and after some formatting looks like:
Processor: Intel Core 2 Duo 2.1 GHz, Family 6 Model 23 Stepping 6
All of the info I get is through command-line utilities called from a popen(). The readable part of the processor description is taken from the "system_profiler" command output and the Family, Model, and Stepping values are taken from the "sysctl" command.
These command-line utilities must be getting there information from somewhere. I'm wondering if there is an programmatic interface available to get this same info?
Related:
How can display driver version be obtained on the Mac?
Use sysctlbyname rather than sysctl, e.g.
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
uint64_t get_cpu_freq(void)
{
uint64_t freq = 0;
size_t size = sizeof(freq);
if (sysctlbyname("hw.cpufrequency", &freq, &size, NULL, 0) < 0)
{
perror("sysctl");
}
return freq;
}
You can get a list of the names that can be passed to systctlbyname by looking at the output of sysctl -a from the command line.
You need to look at the IOKit APIs. The IORegistryExplorer application (part of the standard devtools installation) will help you locate what you're looking for.
For instance, on my MacBook Pro, in IORegistryExplorer I select 'IODeviceTree' from the pull-down at the top-left of the window, and I can see two CPUs in the tree view below. Selecting either one gets me the following information:
IORegistryExplorer screenshot http://blog.alanquatermain.net/images/IORegistryExplorer-CPUs.png
'bus-frequency' and 'clock-frequency', and 'timebase-frequency' are all 32-bit integers wrapper in data objects, and must therefore be byte-swapped to interpret here (little-endian i386 machine words), and work out to the following values:
bus-frequency: 1064000000 Hz => 1.064 GHz
clock-frequency:2530000000 Hz => 2.53 GHz
timebase-frequency: 1000000000 HZ => 1.0 GHz
If you're reading these via IOKit however, you'll get back a CFDataRef, and can just copy the bytes into your own uint32_t like so:
uint32_t bus_frequency = 0;
CFDataGetBytes( theData, (UInt8 *) &bus_frequency, sizeof(uint32_t) );
Next, you can get processor info using the NXArchInfo() call obtained by including <mach-o/arch.h>. This will return a structure containing cpu type and subtype codes along with C-string names and descriptions. If that doesn't include a stepping ID, the only way I can think of to obtain that (off the top of my head) is via the CPUID instruction. Create a .s and .h file, and put in the following code:
.s file:
#ifdef __i386__ || __x86_64__
.macro ENTRY
.text
.private_extern $0
.align 4, 0x90
$0:
.endmacro
// __private_extern__ unsigned long GetCPUSteppingID( void )
ENTRY _GetCPUSteppingID
push %ebp // store existing frame pointer
mov %esp,%ebp // make a new frame pointer from stack pointer
#if __x86_64__
push %rbx
#endif
push %ebx // we save %ebx because the cpuid instruction
// will overwrite it, and it's expected
// to be unchanged in the caller after
// calling another function
movl $1,%eax // fetch cpu info
cpuid // stepping-id is in low 4 bits of %edx now
and $0x0000000f,%edx // clear out everything we don't want
#if __x86_64__
mov %edx,%rax // %rax is 64-bit arch result register
#else
mov %edx,%eax // %eax is 32-bit arch result register
#endif
pop %ebx // restore saved value of %ebx
#if __x86_64__
pop %rbx // restore saved value of %rbx
#endif
leave // restores prior stack frame from %ebp
ret // returns to caller
#endif // __i386__ || __x86_64__
.h file:
#ifndef __GET_STEPPING_ID__
#define __GET_STEPPING_ID__
/* unsigned long is register-sized on both 32-bit and 64-bit OS X */
__private_extern__ unsigned long GetSteppingID( void );
#endif /* __GET_STEPPING_ID__ */
Please note that I'm not certain about the x86_64 bit above; in theory what I've typed there will ensure that the same code compiles for 64-bit, and will return a 64-bit value in that case. It will also save/restore the %rbx register, the 64-bit version of the %ebx register. Theoretically that will cover all bases.
sysctl(3) is probably a good place to start. You probably want the stuff defined by the CTL_HW selectors.
A variant of Paul R's method
#include <iostream>
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
void show_cpu_info(void)
{
char buffer[1024];
size_t size=sizeof(buffer);
if (sysctlbyname("machdep.cpu.brand_string", &buffer, &size, NULL, 0) < 0) {
perror("sysctl");
}
std::cout << buffer << '\n';
}
will directly show something like Intel(R) Core(TM)2 Duo CPU L9400 # 1.86GHz.
If you specifically want CPU information then use cpuid (in C __asm cpuid) instruction. It gives all possible information of a CPU including its family, model, company, number of cores etc. Primarily all APIs use this instruction to retrieve CPU information. You can get detailed information on CPUID on the web, including sample code and tutorials.
Related
Im learning C++, and in my random number gen code, im always getting the same number
random_device rd;
mt19937 x{rd()};
uniform_int_distribution<int> ran{1, 100};
cout << ran(x);
but srand/rand() works.
srand (time(0));
cout << rand()%100;
I think that it has to do with time(). But how do i get the first code to work?
Assuming your problem is with the MinGW g++ compiler, you can define a header that wraps <random>, like this:
#pragma once
#ifndef MY_NO_FIX_OF_RANDOM_DEVICE
# ifdef __GNUC__
# undef _GLIBCXX_USE_RANDOM_TR1
# define _GLIBCXX_USE_RANDOM_TR1
# endif
#endif
#include <random>
That's just a modified-for-SO version of a header in the Wrapped stdlib library.
I would recommend to use a forced include (command line option) of this fix, or just defining _GLIBCXX_USE_RANDOM_TR1 in the command line.
By inspecting the source code of my MinGW g++ 7.3.0, files <random.h> and random.cc, it appears that the approach works on most PCs because (with that compiler) _GLIBCXX_USE_RANDOM_TR1 selects number generation via the rdrand instruction, if available, and else via the "/dev/urandom" *nix world device, if available.
So, criteria for “works”:
the processor supports the rdrand instruction, or
fopen succeeds in opening "/dev/urandom".
According to the Wikipedia article about rdrand
” AMD added support for the instruction in June 2015.
... so this approach may fail on a Windows PC (no "/dev/urandom") with AMD processor produced before that time (no rdrand instruction).
Technical details:
With _GLIBCXX_USE_RANDOM_TR1 defined the random_device default constructor calls the following function with argument "default":
void
random_device::_M_init(const std::string& token)
{
const char *fname = token.c_str();
if (token == "default")
{
#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
unsigned int eax, ebx, ecx, edx;
// Check availability of cpuid and, for now at least, also the
// CPU signature for Intel's
if (__get_cpuid_max(0, &ebx) > 0 && ebx == signature_INTEL_ebx)
{
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_RDRND)
{
_M_file = nullptr;
return;
}
}
#endif
fname = "/dev/urandom";
}
else if (token != "/dev/urandom" && token != "/dev/random")
fail:
std::__throw_runtime_error(__N("random_device::"
"random_device(const std::string&)"));
_M_file = static_cast<void*>(std::fopen(fname, "rb"));
if (!_M_file)
goto fail;
}
If __cpuid reports that the processor supports the rdrand instruction then this causes the _M_file member to be zeroed, which in turn causes the number generation code to use the rdrand instruction.
And otherwise, this code attempts to open the *nix random device, and if that fails then it and hence the random_device construction fails with an exception.
I use Linux x86_64 and clang 3.3.
Is this even possible in theory?
std::atomic<__int128_t> doesn't work (undefined references to some functions).
__atomic_add_fetch also doesn't work ('error: cannot compile this atomic library call yet').
Both std::atomic and __atomic_add_fetch work with 64-bit numbers.
It's not possible to do this with a single instruction, but you can emulate it and still be lock-free. Except for the very earliest AMD64 CPUs, x64 supports the CMPXCHG16B instruction. With a little multi-precision math, you can do this pretty easily.
I'm afraid I don't know the instrinsic for CMPXCHG16B in GCC, but hopefully you get the idea of having a spin loop of CMPXCHG16B. Here's some untested code for VC++:
// atomically adds 128-bit src to dst, with src getting the old dst.
void fetch_add_128b(uint64_t *dst, uint64_t* src)
{
uint64_t srclo, srchi, olddst[2], exchlo, exchhi;
srchi = src[0];
srclo = src[1];
olddst[0] = dst[0];
olddst[1] = dst[1];
do
{
exchlo = srclo + olddst[1];
exchhi = srchi + olddst[0] + (exchlo < srclo); // add and carry
}
while(!_InterlockedCompareExchange128((long long*)dst,
exchhi, exchlo,
(long long*)olddst));
src[0] = olddst[0];
src[1] = olddst[1];
}
Edit: here's some untested code going off of what I could find for the GCC intrinsics:
// atomically adds 128-bit src to dst, returning the old dst.
__uint128_t fetch_add_128b(__uint128_t *dst, __uint128_t src)
{
__uint128_t dstval, olddst;
dstval = *dst;
do
{
olddst = dstval;
dstval = __sync_val_compare_and_swap(dst, dstval, dstval + src);
}
while(dstval != olddst);
return dstval;
}
That isn't possible. There is no x86-64 instruction that does a 128-bit add in one instruction, and to do something atomically, a basic starting point is that it is a single instruction (there are some instructions which aren't atomic even then, but that's another matter).
You will need to use some other lock around the 128-bit number.
Edit: It is possible that one could come up with something that uses something like this:
__volatile__ __asm__(
" mov %0, %%rax\n"
" mov %0+4, %%rdx\n"
" mov %1,%%rbx\n"
" mov %1+4,%%rcx\n"
"1:\n
" add %%rax, %%rbx\n"
" adc %%rdx, %%rcx\n"
" lock;cmpxcchg16b %0\n"
" jnz 1b\n"
: "=0"
: "0"(&arg1), "1"(&arg2));
That's just something I just hacked up, and I haven't compiled it, never mind validated that it will work. But the principle is that it repeats until it compares equal.
Edit2: Darn typing too slow, Cory Nelson just posted the same thing, but using intrisics.
Edit3: Update loop to not unnecessary read memory that doesn't need reading... CMPXCHG16B does that for us.
Yes; you need to tell your compiler that you're on hardware that supports it.
This answer is going to assume you're on x86-64; there's likely a similar spec for arm.
From the generic x86-64 microarchitecture levels, you'll want at least x86-64-v2 to let the compiler know that you have the cmpxchg16b instruction.
Here's a working godbolt, note the compiler flag -march=x86-64-v2:
https://godbolt.org/z/PvaojqGcx
For more reading on the x86-64-psABI, the spec is published here.
I've scoured the web to no avail.
Is there a way for Xcode and Visual C++ to treat denormalised numbers as 0? I would have thought there's an option in the IDE preferences to turn on this option but can't seem to find it.
I'm doing some cross-platform audio stuff and need to stop certain processors hogging resources.
Cheers
You're looking for a platform-defined way to set FTZ and/or DAZ in the MXCSR register (on x86 with SSE or x86-64); see https://stackoverflow.com/a/2487733/567292
Usually this is called something like _controlfp; Microsoft documentation is at http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You can also use the _MM_SET_FLUSH_ZERO_MODE macro: http://msdn.microsoft.com/en-us/library/a8b5ts9s(v=vs.71).aspx - this is probably the most cross-platform portable method.
For disabling denormals globally I use these 2 macros:
//warning these macros has to be used in the same scope
#define MXCSR_SET_DAZ_AND_FTZ \
int oldMXCSR__ = _mm_getcsr(); /*read the old MXCSR setting */ \
int newMXCSR__ = oldMXCSR__ | 0x8040; /* set DAZ and FZ bits */ \
_mm_setcsr( newMXCSR__ ); /*write the new MXCSR setting to the MXCSR */
#define MXCSR_RESET_DAZ_AND_FTZ \
/*restore old MXCSR settings to turn denormals back on if they were on*/ \
_mm_setcsr( oldMXCSR__ );
I call the first one at the beginning of the process and the second at the end.
Unfortunately this seems to not works well on Windows.
To flush denormals locally I use this
const Float32 k_DENORMAL_DC = 1e-25f;
inline void FlushDenormalToZero(Float32& ioFloat)
{
ioFloat += k_DENORMAL_DC;
ioFloat -= k_DENORMAL_DC;
}
See update (4 Aug 2022 at the end of this entry
To do this, use the Intel Intrinsics macros during program startup. For example:
#include <immintrin.h>
int main() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
}
In my version of MSVC, this emitted the following assembly code:
stmxcsr DWORD PTR tv805[rsp]
mov eax, DWORD PTR tv805[rsp]
bts eax, 15
mov DWORD PTR tv807[rsp], eax
ldmxcsr DWORD PTR tv807[rsp]
MXCSR is the control and status register, and this code is setting bit 15, which turns flush zero mode on.
One thing to note: this only affects denormals resulting from a computation. If you want to also set denormals to zero if they're used as input, you also need to set the DAZ flag (denormals are zero), using the following command:
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
See https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags for more information.
Also note that you need to set MXCSR for each thread, as the values contained are local to each thread.
Update 4 Aug 2022
I've now had to deal with ARM processors as well. The following is a cross-platform macro that works on ARM and Intel:
#ifndef __ARM_ARCH
extern "C" {
extern unsigned int _mm_getcsr();
extern void _mm_setcsr(unsigned int);
}
#define MY_FAST_FLOATS _mm_setcsr(_mm_getcsr() | 0x8040U)
#else
#define MY_FPU_GETCW(fpcr) __asm__ __volatile__("mrs %0, fpcr" : "=r"(fpcr))
#define MY_FPU_SETCW(fpcr) __asm__ __volatile__("msr fpcr, %0" : : "r"(fpcr))
#define MY_FAST_FLOATS \
{ \
uint64_t eE2Hsb4v {}; /* random name to avoid shadowing warnings */ \
MY_FPU_GETCW(eE2Hsb4v); \
eE2Hsb4v |= (1 << 24) | (1 << 19); /* FZ flag, FZ16 flag; flush denormals to zero */ \
MY_FPU_SETCW(eE2Hsb4v); \
} \
static_assert(true, "require semi-colon after macro with this assert")
#endif
I have a multi threaded c++ application that runs on Windows, Mac and a few Linux flavors.
To make a long story short: In order for it to run at maximum efficiency, I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active.
My question therefore is if there is a way to detect whether Hyper Threading is supported and enabled? If so, how exactly.
EDIT: This is no longer 100% correct due to Intel's ongoing befuddlement.
The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.
I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.
In short here are the steps using the CPUID instruction:
Detect CPU vendor using CPUID function 0
Check for HTT bit 28 in CPU features EDX from CPUID function 1
Get the logical core count from EBX[23:16] from CPUID function 1
Get actual non-threaded CPU core count
If vendor == 'GenuineIntel' this is 1 plus EAX[31:26] from CPUID function 4
If vendor == 'AuthenticAMD' this is 1 plus ECX[7:0] from CPUID function 0x80000008
Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:
#include <iostream>
#include <string>
using namespace std;
void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
__cpuid((int *)regs, (int)i);
#else
asm volatile
("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
: "a" (i), "c" (0));
// ECX is set to zero for CPUID function 4
#endif
}
int main(int argc, char *argv[]) {
unsigned regs[4];
// Get vendor
char vendor[12];
cpuID(0, regs);
((unsigned *)vendor)[0] = regs[1]; // EBX
((unsigned *)vendor)[1] = regs[3]; // EDX
((unsigned *)vendor)[2] = regs[2]; // ECX
string cpuVendor = string(vendor, 12);
// Get CPU features
cpuID(1, regs);
unsigned cpuFeatures = regs[3]; // EDX
// Logical core count per CPU
cpuID(1, regs);
unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
cout << " logical cpus: " << logical << endl;
unsigned cores = logical;
if (cpuVendor == "GenuineIntel") {
// Get DCP cache info
cpuID(4, regs);
cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1
} else if (cpuVendor == "AuthenticAMD") {
// Get NC: Number of CPU cores - 1
cpuID(0x80000008, regs);
cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
}
cout << " cpu cores: " << cores << endl;
// Detect hyper-threads
bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;
cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;
return 0;
}
I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.
Here is the output on a few different Intel machines:
Intel(R) Core(TM)2 Duo CPU T7500 # 2.20GHz:
logical cpus: 2
cpu cores: 2
hyper-threads: false
Intel(R) Core(TM)2 Quad CPU Q8400 # 2.66GHz:
logical cpus: 4
cpu cores: 4
hyper-threads: false
Intel(R) Xeon(R) CPU E5520 # 2.27GHz (w/ x2 physical CPU packages):
logical cpus: 16
cpu cores: 8
hyper-threads: true
Intel(R) Pentium(R) 4 CPU 3.00GHz:
logical cpus: 2
cpu cores: 1
hyper-threads: true
Note this, does not give the number of physically cores as intended, but logical cores.
If you can use C++11 (thanks to alfC's comment beneath):
#include <iostream>
#include <thread>
int main() {
std::cout << std::thread::hardware_concurrency() << std::endl;
return 0;
}
Otherwise maybe the Boost library is an option for you. Same code but different include as above. Include <boost/thread.hpp> instead of <thread>.
Windows only solution desribed here:
GetLogicalProcessorInformation
for linux, /proc/cpuinfo file. I am not running linux
now so can't give you more detail. You can count
physical/logical processor instances. If logical count
is twice as physical, then you have HT enabled
(true only for x86).
The current highest voted answer using CPUID appears to be obsolete. It reports both the wrong number of logical and physical processors. This appears to be confirmed from this answer cpuid-on-intel-i7-processors.
Specifically, using CPUID.1.EBX[23:16] to get the logical processors or CPUID.4.EAX[31:26]+1 to get the physical ones with Intel processors does not give the correct result on any Intel processor I have.
For Intel CPUID.Bh should be used Intel_thread/Fcore and cache topology. The solution does not appear to be trivial. For AMD a different solution is necessary.
Here is source code by by Intel which reports the correct number of physical and logical cores as well as the correct number of sockets https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. I tested this on a 80 logical core, 40 physical core, 4 socket Intel system.
Here is source code for AMD http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/. It gave the correct result on my single socket Intel system but not on my four socket system. I don't have a AMD system to test.
I have not dissected the source code yet to find a simple answer (if one exists) with CPUID. It seems that if the solution can change (as it seems to have) that the best solution is to use a library or OS call.
Edit:
Here is a solution for Intel processors with CPUID leaf 11 (Bh). The way to do this is loop over the logical processors and get the x2APIC ID for each logical processor from CPUID and count the number of x2APIC IDs were the least significant bit is zero. For systems without hyper-threading the x2APIC ID will always be even. For systems with hyper-threading each x2APIC ID will have an even and odd version.
// input: eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)
int getNumCores(void) {
//Assuming an Intel processor with CPUID leaf 11
int cores = 0;
#pragma omp parallel reduction(+:cores)
{
int regs[4];
cpuid(regs,11);
if(!(regs[3]&1)) cores++;
}
return cores;
}
The threads must be bound for this to work. OpenMP by default does not bind threads. Setting export OMP_PROC_BIND=true will bind them or they can be bound in code as shown at thread-affinity-with-windows-msvc-and-openmp.
I tested this on my 4 core/8 HT system and it returned 4 with and without hyper-threading disabled in the BIOS. I also tested in on a 4 socket system with each socket having 10 cores / 20 HT and it returned 40 cores.
AMD processors or older Intel processors without CPUID leaf 11 have to do something different.
From gathering ideas and concepts from some of the above ideas, I have come up with this solution. Please critique.
//EDIT INCLUDES
#ifdef _WIN32
#include <windows.h>
#elif MACOS
#include <sys/param.h>
#include <sys/sysctl.h>
#else
#include <unistd.h>
#endif
For almost every OS, the standard "Get core count" feature returns the logical core count. But in order to get the physical core count, we must first detect if the CPU has hyper threading or not.
uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );
logicalcpucount = systeminfo.dwNumberOfProcessors;
#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif
We now have the logical core count, now in order to get the intended results, we first must check if hyper threading is being used or if it's even available.
__asm__ __volatile__ ("cpuid " :
"=a" (registers[0]),
"=b" (registers[1]),
"=c" (registers[2]),
"=d" (registers[3])
: "a" (1), "c" (0));
unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);
Because there is not an Intel CPU with hyper threading that will only hyper thread one core (at least not from what I have read). This allows us to find this is a really painless way. If hyper threading is available,the logical processors will be exactly double the physical processors. Otherwise, the operating system will detect a logical processor for every single core. Meaning the logical and the physical core count will be identical.
if (hyperthreading){
physicalcpucount = logicalcpucount / 2;
} else {
physicalcpucount = logicalcpucount;
}
fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);
To follow on from math's answer, as of boost 1.56 there exists the physical_concurrency attribute which does exactly what you want.
From the documentation - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency
The number of physical cores available on the current system. In contrast to hardware_concurrency() it does not return the number of virtual cores, but it counts only physical cores.
So an example would be
#include <iostream>
#include <boost/thread.hpp>
int main()
{
std::cout << boost::thread::physical_concurrency();
return 0;
}
I know this is an old thread, but no one mentioned hwloc. The hwloc library is available on most Linux distributions and can also be compiled on Windows. The following code will return the number of physical processors. 4 in the case of a i7 CPU.
#include <hwloc.h>
int nPhysicalProcessorCount = 0;
hwloc_topology_t sTopology;
if (hwloc_topology_init(&sTopology) == 0 &&
hwloc_topology_load(sTopology) == 0)
{
nPhysicalProcessorCount =
hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);
hwloc_topology_destroy(sTopology);
}
if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
nPhysicalProcessorCount = omp_get_num_procs();
#else
nPhysicalProcessorCount = 1;
#endif
}
It is not sufficient to test if an Intel CPU has hyperthreading, you also need to test if hyperthreading is enabled or disabled. There is no documented way to check this. An Intel guy came up with this trick to check if hyperthreading is enabled: Check the number of programmable performance counters using CPUID[0xa].eax[15:8] and assume that if the value is 8, HT is disabled, and if the value is 4, HT is enabled (https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551).
There is no problem on AMD chips: The CPUID reports 1 or 2 threads per core depending on whether simultaneous multithreading is disabled or enabled.
You also have to compare the thread count from the CPUID with the thread count reported by the operating system to see if there are multiple CPU chips.
I have made a function that implements all of this. It reports both the number of physical processors and the number of logical processors. I have tested it on Intel and AMD processors in Windows and Linux. It should work on Mac as well. I have published this code at
https://github.com/vectorclass/add-on/tree/master/physical_processors
On OS X, you can read these values from sysctl(3) (the C API, or the command line utility of the same name). The man page should give you usage information. The following keys may be of interest:
$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12 <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24 <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2 <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24
On Windows, there are GetLogicalProcessorInformation and GetLogicalProcessorInformationEx available for Windows XP SP3 or older and Windows 7+ respectively. The difference is that GetLogicalProcessorInformation doesn't support setups with more than 64 logical cores, which might be important for server setups, but you can always fall back to GetLogicalProcessorInformation if you're on XP. Example usage for GetLogicalProcessorInformationEx (source):
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX buffer = NULL;
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX ptr = NULL;
BOOL rc;
DWORD length = 0;
DWORD offset = 0;
DWORD ncpus = 0;
DWORD prev_processor_info_size = 0;
for (;;) {
rc = psutil_GetLogicalProcessorInformationEx(
RelationAll, buffer, &length);
if (rc == FALSE) {
if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
if (buffer) {
free(buffer);
}
buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(length);
if (NULL == buffer) {
return NULL;
}
}
else {
goto return_none;
}
}
else {
break;
}
}
ptr = buffer;
while (offset < length) {
// Advance ptr by the size of the previous
// SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX struct.
ptr = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)\
(((char*)ptr) + prev_processor_info_size);
if (ptr->Relationship == RelationProcessorCore) {
ncpus += 1;
}
// When offset == length, we've reached the last processor
// info struct in the buffer.
offset += ptr->Size;
prev_processor_info_size = ptr->Size;
}
free(buffer);
if (ncpus != 0) {
return ncpus;
}
else {
return NULL;
}
return_none:
if (buffer != NULL)
free(buffer);
return NULL;
On Linux, parsing /proc/cpuinfo might help.
I don't know that all three expose the information in the same way, but if you can safely assume that the NT kernel will report device information according to the POSIX standard (which NT supposedly has support for), then you could work off that standard.
However, differing of device management is often cited as one of the stumbling blocks to cross platform development. I would at best implement this as three strands of logic, I wouldn't try to write one piece of code to handle all platforms evenly.
Ok, all that's assuming C++. For ASM, I presume you'll only be running on x86 or amd64 CPUs? You'll still need two branch paths, one for each architecture, and you'll need to test Intel separate from AMD (IIRC) but by and large you just check for the CPUID. Is that what you're trying to find? The CPUID from ASM on Intel/AMD family CPUs?
OpenMP should do the trick:
// test.cpp
#include <omp.h>
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
int nThreads = omp_get_max_threads();
cout << "Can run as many as: " << nThreads << " threads." << endl;
}
most compilers support OpenMP. If you are using a gcc-based compiler (*nix, MacOS), you need to compile using:
$ g++ -fopenmp -o test.o test.cpp
(you might also need to tell your compiler to use the stdc++ library):
$ g++ -fopenmp -o test.o -lstdc++ test.cpp
As far as I know OpenMP was designed to solve this kind of problems.
This is very easy to do in Python:
$ python -c "import psutil; psutil.cpu_count(logical=False)"
4
Maybe you could look at the psutil source code to see what is going on?
You may use the library libcpuid (Also on GitHub - libcpuid).
As can be seen in its documentation page:
#include <stdio.h>
#include <libcpuid.h>
int main(void)
{
if (!cpuid_present()) { // check for CPUID presence
printf("Sorry, your CPU doesn't support CPUID!\n");
return -1;
}
if (cpuid_get_raw_data(&raw) < 0) { // obtain the raw CPUID data
printf("Sorry, cannot get the CPUID raw data.\n");
printf("Error: %s\n", cpuid_error()); // cpuid_error() gives the last error description
return -2;
}
if (cpu_identify(&raw, &data) < 0) { // identify the CPU, using the given raw data.
printf("Sorrry, CPU identification failed.\n");
printf("Error: %s\n", cpuid_error());
return -3;
}
printf("Found: %s CPU\n", data.vendor_str); // print out the vendor string (e.g. `GenuineIntel')
printf("Processor model is `%s'\n", data.cpu_codename); // print out the CPU code name (e.g. `Pentium 4 (Northwood)')
printf("The full brand string is `%s'\n", data.brand_str); // print out the CPU brand string
printf("The processor has %dK L1 cache and %dK L2 cache\n",
data.l1_data_cache, data.l2_cache); // print out cache size information
printf("The processor has %d cores and %d logical processors\n",
data.num_cores, data.num_logical_cpus); // print out CPU cores information
}
As can be seen, data.num_cores, holds the number of Physical cores of the CPU.
How can I detect which CPU is being used at runtime ? The c++ code needs to differentiate between AMD / Intel architectures ? Using gcc 4.2.
The cpuid instruction, used with EAX=0 will return a 12-character vendor string in EBX, EDX, ECX, in that order.
For Intel, this string is "GenuineIntel". For AMD, it's "AuthenticAMD". Other companies that have created x86 chips have their own strings.The Wikipedia page for cpuid has many (all?) of the strings listed, as well as an example ASM listing for retrieving the details.
You really only need to check if ECX matches the last four characters. You can't use the first four, because some Transmeta CPUs also start with "Genuine"
For Intel, this is 0x6c65746e
For AMD, this is 0x444d4163
If you convert each byte in those to a character, they'll appear to be backwards. This is just a result of the little endian design of x86. If you copied the register to memory and looked at it as a string, it would work just fine.
Example Code:
bool IsIntel() // returns true on an Intel processor, false on anything else
{
int id_str; // The first four characters of the vendor ID string
__asm__ ("cpuid":\ // run the cpuid instruction with...
"=c" (id_str) : // id_str set to the value of EBX after cpuid runs...
"a" (0) : // and EAX set to 0 to run the proper cpuid function.
"eax", "ebx", "edx"); // cpuid clobbers EAX, ECX, and EDX, in addition to EBX.
if(id_str==0x6c65746e) // letn. little endian clobbering of GenuineI[ntel]
return true;
else
return false;
}
EDIT: One other thing - this can easily be changed into an IsAMD function, IsVIA function, IsTransmeta function, etc. just by changing the magic number in the if.
If you're on Linux (or on Windows running under Cygwin), you can figure that out by reading the special file /proc/cpuinfo and looking for the line beginning with vendor_id. If the string is GenuineIntel, you're running on an Intel chip. If you get AuthenticAMD, you're running on an AMD chip.
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
FILE *cpuinfo = fopen("/proc/cpuinfo", "r");
if(cpuinfo == NULL)
; // handle error
char line[256];
while(fgets(line, 256, cpuinfo))
{
if(strncmp(line, "vendor_id", 9) == 0)
{
char *colon = strchr(line, ':');
if(colon == NULL || colon[1] == 0)
; // handle error
strncpy(vendor_id, 12, colon + 2);
fclose(cpuinfo);
return;
}
}
// if we got here, handle error
fclose(cpuinfo);
}
If you know you're running on an x86 architecture, a less portable method would be to use the CPUID instruction:
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
// GCC inline assembler
__asm__ __volatile__
("movl $0, %%eax\n\t"
"cpuid\n\t"
"movl %%ebx, %0\n\t"
"movl %%edx, %1\n\t"
"movl %%ecx, %2\n\t"
: "=m"(vendor_id), "=m"(vendor_id + 4), "=m"(vendor_id + 8) // outputs
: // no inputs
: "%eax", "%ebx", "%edx", "%ecx", "memory"); // clobbered registers
vendor_id[12] = 0;
}
int main(void)
{
char vendor_id[13];
get_vendor_id(vendor_id);
if(strcmp(vendor_id, "GenuineIntel") == 0)
; // it's Intel
else if(strcmp(vendor_id, "AuthenticAMD") == 0)
; // it's AMD
else
; // other
return 0;
}
On Windows, you can use the GetNativeSystemInfo function
On Linux, try sysinfo
You probably should not check at all. Instead, check whether the CPU supports the features you need, e.g. SSE3. The differences between two Intel chips might be greater than between AMD and Intel chips.
You have to define it in your Makefile arch=uname -p 2>&1 , then use #ifdef i386 some #endif for diferent architectures.
I have posted a small project:
http://sourceforge.net/projects/cpp-cpu-monitor/
which uses the libgtop library and exposes data through UDP. You can modify it to suit your needs. GPL open-source. Please ask if you have any questions regarding it.