setting processor affinity with C++ that will run on Linux [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
CPU Affinity
I'm running on Linux and I want to write a C++ program that will set 2 specific processors that my 2 applications that will run in parallel (i.e. setting each process to run on a different core/CPU). I want to use processor affinity tool with C++. Please can anyone help with C++ code.

From the command line you can use taskset(1), or from within your code you can use sched_setaffinity(2).
E.g.
#ifdef __linux__ // Linux only
#include <sched.h> // sched_setaffinity
#endif
int main(int argc, char *argv[])
{
#ifdef __linux__
int cpuAffinity = argc > 1 ? atoi(argv[1]) : -1;
if (cpuAffinity > -1)
{
cpu_set_t mask;
int status;
CPU_ZERO(&mask);
CPU_SET(cpuAffinity, &mask);
status = sched_setaffinity(0, sizeof(mask), &mask);
if (status != 0)
{
perror("sched_setaffinity");
}
}
#endif
// ... your program ...
}

You need to call sched_setaffinity or pthread_setaffinity_np
See also this question

Related

C++/C Code to have different execution block on M1 and Intel

I am writing a code for macOS application.
The application would be running on M1 based Macs as well as Intel based Macs also.
What would be the switch to differentiate M1 and Intel?
if (M1)
{
do something for M1
}
else if (Intel)
{
do something for Intel
}
I think, you can use __arm__ to detect arm architecture:
#ifdef __arm__
//do smth on arm (M1)
#else
//do smth on x86 (Intel)
#endif
I was just fooling around with this and found this reference for Objective-C from apple that seemed to work with clang for C++.
// Objective-C example
#include "TargetConditionals.h"
#if TARGET_OS_OSX
// Put CPU-independent macOS code here.
#if TARGET_CPU_ARM64
// Put 64-bit Apple silicon macOS code here.
#elif TARGET_CPU_X86_64
// Put 64-bit Intel macOS code here.
#endif
#elif TARGET_OS_MACCATALYST
// Put Mac Catalyst-specific code here.
#elif TARGET_OS_IOS
// Put iOS-specific code here.
#endif
https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary
I specifically checked to see if TARGET_CPU_ARM64 was defined in my header.
Hopefully this helps someone.
If you need a runtime check instead of compile time check, you can think of using something like below
#include <sys/sysctl.h>
#include <mach/machine.h>
int main(int argc, const char * argv[])
{
cpu_type_t type;
size_t size = sizeof(type);
sysctlbyname("hw.cputype", &type, &size, NULL, 0);
int procTranslated;
size = sizeof(procTranslated);
// Checks whether process is translated by Rosetta
sysctlbyname("sysctl.proc_translated", &procTranslated, &size, NULL, 0);
// Removes CPU_ARCH_ABI64 or CPU_ARCH_ABI64_32 encoded with the Type
cpu_type_t typeWithABIInfoRemoved = type & ~CPU_ARCH_MASK;
if (typeWithABIInfoRemoved == CPU_TYPE_X86)
{
if (procTranslated == 1)
{
cout << "ARM Processor (Running x86 application in Rosetta)";
}
else
{
cout << "Intel Processor";
}
}
else if (typeWithABIInfoRemoved == CPU_TYPE_ARM)
{
cout << "ARM Processor";
}
}

create thread but process shows up?

#include <unistd.h>
#include <stdio.h>
#include <cstring>
#include <thread>
void test_cpu() {
printf("thread: test_cpu start\n");
int total = 0;
while (1) {
++total;
}
}
void test_mem() {
printf("thread: test_mem start\n");
int step = 20;
int size = 10 * 1024 * 1024; // 10Mb
for (int i = 0; i < step; ++i) {
char* tmp = new char[size];
memset(tmp, i, size);
sleep(1);
}
printf("thread: test_mem done\n");
}
int main(int argc, char** argv) {
std::thread t1(test_cpu);
std::thread t2(test_mem);
t1.join();
t2.join();
return 0;
}
Compile it with g++ -o test test.cc --std=c++11 -lpthread
I run the program in Linux, and run top to monitor it.
I expect to see ONE process however I saw THREE.
It looks like std::thread is creating threads, why do I end up with getting processes?
Linux does not implement threads. It only has Light Weight Processes (LWP) while pthread library wraps them to provide POSIX-compatible thread interface. The main LWP creates its own address space while each subsequent thread LWP shares address space with main LWP.
Many utils, such as HTOP (which seems to be on the screenshot) by default list LWP. In order to hide thread LWPs you can open Setup (F2) -> Display Options and check Hide kernel threads and Hide userland process threads options. There is also an option to highlight threads - Display threads in different color.

how to use std::atomic_signal_fence() with semaphore and volatile?

std::atomic_signal_fence() Establishes memory synchronization ordering ... between a thread and a signal handler executed on the same thread.
-- cppreference
In order to find an example for this illustration, I looked at bames53's similar question in stackoverflow. However the answer may not suit my x86_64 environment, since x86_64 CPU is strong memory model and forbids Store-Store re-ordering ^1. Its example will correctly execute even without std::atomic_signal_fence() in my x86_64 environment.
So I made a Store-Load re-ordering example suitable for x86_64 after Jeff Preshing's post. The example code is not that short, so I opened this question instead of appending onto bames53's similar question.
main() and signal_handler() will run in the same thread(i.e. they will share the same tid) in a single core environment. main() can be interrupted at any time by signal_handler(). If no signal_fences are used, in the generated binary X = 1; r1 = Y; will be exchanged their ordering if compiled with g++ -O2(Store(X)-Load(Y) is optimized to Load(Y)-Store(X)). The same with Y = 1; r2 = X;. So if main() is interrupted just after 'Load(Y)', it results r1 == 0 and r2 == 0 at last. Thus in the following code line (C) will assert fail. But if line (A) and (B) are uncommented, it should never assert fail since a signal_fence is used to protect synchronization between main() and signal_handler(), to the best of my understanding.
#include <atomic>
#include <cassert>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <semaphore.h>
#include <signal.h>
#include <unistd.h>
sem_t endSema;
// volatile int synchronizer;
int X, Y;
int r1, r2;
void signal_handler(int sig) {
signal(sig, SIG_IGN);
Y = 1;
// std::atomic_signal_fence(std::memory_order_seq_cst); // (A) if uncommented, assert still may fail
r2 = X;
signal(SIGINT, signal_handler);
sem_post(&endSema); // if changed to the following, assert never fail
// synchronizer = 1;
}
int main(int argc, char* argv[]) {
std::srand(std::time(nullptr));
sem_init(&endSema, 0, 0);
signal(SIGINT, signal_handler);
for (;;) {
while(std::rand() % std::stol(argv[1]) != 0); // argv[1] ~ 1000'000
X = 1;
// std::atomic_signal_fence(std::memory_order_seq_cst); // (B) if uncommented, assert still may fail.
r1 = Y;
sem_wait(&endSema); // if changed to the following, assert never fail
// while (synchronizer == 0); synchronizer = 0;
std::cout << "r1=" << r1 << " r2=" << r2 << std::endl;
if (r1 == 0) assert(r2 != 0); // (C)
Y = 0; r1 = 0; r2 = 0; X = 0;
}
return 0;
}
Firstly semaphore is used to synchronize main() with signal_handler(). In this version, the assert always fail after around received 30 SIGINTs with or without the signal fence. It seems that std::atomic_signal_fence() did not work as I expected.
Secondly If semaphore is replaced with volatile int synchronizer, the program seems never fail with or without the signal fence.
What's wrong with the code? Did I miss-understand the cppreference doc? Or is there any more proper example code for this topic in x86_64 environment that I can observe the effects of std::atomic_signal_fence?
Below is some relevant info:
compiling & running env: CentOS 8 (Linux 4.18.0) x86_64 single CPU core.
Compiler: g++ (GCC) 8.3.1 20190507
Compiling command g++ -std=c++17 -o ordering -O2 ordering.cpp -pthread
Run with ./ordering 1000000, then keep pressing Ctrl-C to invoke the signal handler.

How to handle eof() in platform independent code? C++

I have a c++ code where, I'm redirecting the stdout to a string. The code should be platform independent. In windows works fine, but under linux it doesn't builds.
Here is my code:
#include <fcntl.h>
int outBuffer[2];
int orgStdOutBuffer;
std::string stringBuff;
enum PIPES {READ, WRITE};
orgStdOutBuffer = dup(fileno(stdout));
outBuffer[READ] = 0;
outBuffer[WRITE] = 0;
if (_pipe(mStdOutBuffer, 65535, O_BINARY) == -1) {
return false;
}
fflush(stdout);
dup2(outBuffer[WRITE], fileno(stdout));
The problem is where I'm checking the end of the pipe:
if (!eof(outBuffer[READ])) {
stdOutBufferSize = read(outBuffer[READ], &(*stringBuff.begin()), buffSize);
}
Under linux gives the following error:
'eof' was not declared in this scope
'O_BINARY' was not declared in this scope
Can anybody help how can I make this code run under linux?

How can Tcl_CreateInterp end up in an inifinite loop?

I'm using Tcl library version 8.6.4 (compiled with Visual Studio 2015, 64bits) to interpret some Tcl commands from a C/C++ program.
I noticed that if I create interpreters from different threads, the second one ends up in an infinite loop:
#include "tcl.h"
#include <boost/thread.hpp>
#include <boost/filesystem.hpp>
void runScript()
{
Tcl_Interp* pInterp = Tcl_CreateInterp();
std::string sTclPath = boost::filesystem::current_path().string() + "/../../stg/Debug/lib/tcl";
const char* setvalue = Tcl_SetVar( pInterp, "tcl_library", sTclPath.c_str(), TCL_GLOBAL_ONLY );
assert( setvalue != NULL );
int i = Tcl_Init( pInterp );
assert( i == TCL_OK );
int nTclResult = Tcl_Eval( pInterp, "puts \"Hello\"" );
assert( nTclResult == TCL_OK );
Tcl_DeleteInterp( pInterp );
}
int main( int argc, char* argv[] )
{
Tcl_FindExecutable(NULL);
runScript();
runScript();
boost::thread thrd1( runScript );
thrd1.join(); // works OK
boost::thread thrd2( runScript );
thrd2.join(); // never joins
return 1;
}
Infinite loop is here, within Tcl source code:
void
TclInitNotifier(void)
{
ThreadSpecificData *tsdPtr;
Tcl_ThreadId threadId = Tcl_GetCurrentThread();
Tcl_MutexLock(&listLock);
for (tsdPtr = firstNotifierPtr; tsdPtr && tsdPtr->threadId != threadId;
tsdPtr = tsdPtr->nextPtr) {
/* Empty loop body. */
}
// I never exit this loop because, after first thread was joined
// at some point tsdPtr == tsdPtr->nextPtr
Am I doing something wrong? Is there any special function call I'm missing?
Note: TCL_THREADS was not set while I compiled Tcl. However, I feel like I'm doing nothing wrong here. Also, adding
/* Empty loop body. */
if ( tsdPtr != NULL && tsdPtr->nextPtr == tsdPtr )
{
tsdPtr = NULL;
break;
}
within the loop apparently fixes the issue. But I'm not very confident in modifying 3rd party library source code...
After reporting a bug to Tcl team, I was asked to try again with Tcl library compiled with TCL_THREADS enabled. It fixed the issue.
TCL_THREADS was disabled because I compiled on windows using a CMake Lists.txt file I found on the web: this one was actually written for Linux and disabled thread support because it was unable to find pthread on my machine. I finally compiled Tcl libraries using the scripts provided by the Tcl team: threading is enabled by default and the infinite loop is gone!