Getting memory usage of program from another program in C++ (LINUX) - c++

I would like to measure the maximum memory usage of abc.exe on random tests generated by gen.exe. How could I do that?
My code that runs abc.exe on tests from gen.exe looks like this:
#include <bits/stdc++.h>
using namespace std;
int main()
{
int i = 0;
while (true)
{
string si = to_string(i);
cout << i << "\n";
if (system(("echo " + si + "| ./gen.exe > test.in").c_str())) // gen.exe is test generator
{
cout << "gen error\n";
break;
}
if (system(("./abc.exe < test.in > a.out"))) // abc.exe is the program I want to test
{
cout << "abc error\n";
break;
}
i++;
}
}
I know that i can use time -v ./abc.exe but then the used memory is printed in the terminal but I'd like to be able to save it to a variable.

You can use getrusage( RUSAGE_CHILDREN, ... ) to obtain the maximum resident memory. Note that this call will return the maximum memory used by the biggest child at that point in time.
In the example below I used boost::process because it gives better control but it's up to you to use std::system or not, works the same way.
#include <string>
#include <cstdint>
#include <string.h>
#include <iostream>
#include <boost/process/child.hpp>
#include <sys/resource.h>
namespace bp = boost::process;
int parent( const std::string& exename )
{
// Loop from 0 to 10 megabytes
for ( int j=0; j<10; ++j )
{
// Command name is the name of this executable plus one argument with size
std::string gencmd = exename + " " + std::to_string(j);
// Start process
bp::child child( gencmd );
// Wait for it to allocate memory
sleep(1);
// Query the memory usage at this point in time
struct rusage ru;
getrusage( RUSAGE_CHILDREN, &ru );
std::cerr << "Loop:" << j << " mem:"<< ru.ru_maxrss/1024. << " MB" << std::endl;
// Wait for process to quit
child.wait();
if ( child.exit_code()!=0 )
{
std::cerr << "Error executing child:" << child.exit_code() << std::endl;
return 1;
}
}
return 0;
}
int child( int size ) {
// Allocated "size" megabites explicitly
size_t memsize = size*1024*1024;
uint8_t* ptr = (uint8_t*)malloc( memsize );
memset( ptr, size, memsize );
// Wait for the parent to sample our memory usage
sleep( 2 );
// Free memory
free( ptr );
return 0;
}
int main( int argc, char* argv[] )
{
// Without arguments, it is the parent.
// Pass the name of the binary
if ( argc==1 ) return parent( argv[0] );
return child( std::atoi( argv[1] ) );
}
It prints
$ ./env_test
Loop:0 mem:0 MB
Loop:1 mem:3.5625 MB
Loop:2 mem:4.01953 MB
Loop:3 mem:5.05469 MB
Loop:4 mem:6.04688 MB
Loop:5 mem:7.05078 MB
Loop:6 mem:7.78516 MB
Loop:7 mem:8.97266 MB
Loop:8 mem:9.82031 MB
Loop:9 mem:10.8867 MB
If you cannot use boost libraries, you'd got to work a little more but it is still feasible.
If you just want to know the maximum size ever of your children processes then the following works with std::system:
#include <cstdio>
#include <string>
#include <iostream>
#include <sstream>
#include <string.h>
#include <unistd.h>
#include <sys/resource.h>
int main(int argc, char* argv[]) {
if (argc > 1) {
size_t size = ::atol(argv[1]);
size_t memsize = size * 1024 * 1024;
void* ptr = ::malloc(memsize);
memset(ptr, 0, memsize);
::sleep(2);
::free(ptr);
return 0;
}
for (int j = 0; j < 10; ++j) {
std::ostringstream cmd;
cmd << argv[0] << " " << j;
int res = std::system(cmd.str().c_str());
if (res < 0) {
fprintf(stderr, "ERROR system: %s\n", strerror(errno));
break;
}
struct rusage ru;
res = getrusage(RUSAGE_CHILDREN, &ru);
size_t maxmem = ru.ru_maxrss;
fprintf(stderr, "Loop:%d MaxMem:%ld\n", j, maxmem);
}
return 0;
}
It prints
Loop:0 MaxMem:3552
Loop:1 MaxMem:4192
Loop:2 MaxMem:5148
Loop:3 MaxMem:6228
Loop:4 MaxMem:7364
Loop:5 MaxMem:8456
Loop:6 MaxMem:9120
Loop:7 MaxMem:10188
Loop:8 MaxMem:11324
Loop:9 MaxMem:12256
However if you want to keep track of the memory usage during the child process execution you cannot use std::system(). First, you need to call fork() to spawn a new process and then execv() to execute a bash command.
#include <string>
#include <cstdint>
#include <string.h>
#include <unistd.h>
#include <iostream>
#include <sys/resource.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <vector>
int parent(const std::string& exename) {
// Loop from 0 to 10 megabytes
for (int j = 0; j < 10; ++j) {
// Command name is the name of this executable plus one argument with size
std::string gencmd = exename + " " + std::to_string(j);
// Start process
pid_t pid = fork();
if (pid == 0) { // child
const char* args[] = {"/bin/bash", "-c", gencmd.c_str(), (char*)0};
int res = execv("/bin/bash", (char**)args);
// Should never return
std::cerr << "execv error: " << strerror(errno) << std::endl;
return 1;
}
// parent
long maxmem = 0;
while (true) {
int status;
pid_t rid = ::waitpid(pid, &status, WNOHANG);
if (rid < 0) {
if (errno != ECHILD) {
std::cerr << "waitpid:" << strerror(errno) << std::endl;
return 2;
}
break;
}
if (rid == pid) {
if (WIFEXITED(pid)) {
break;
}
}
// Wait for it to allocate memory
usleep(10000);
// Query the memory usage at this point in time
struct rusage ru;
int res = getrusage(RUSAGE_CHILDREN, &ru);
if (res != 0) {
if (errno != ECHILD) {
std::cerr << "getrusage:" << errno << strerror(errno) << std::endl;
}
break;
}
if (maxmem < ru.ru_maxrss) {
maxmem = ru.ru_maxrss;
}
}
std::cerr << "Loop:" << j << " mem:" << maxmem / 1024. << " MB" << std::endl;
}
return 0;
}
int child(int size) {
// Allocated "size" megabites explicitly
size_t memsize = size * 1024 * 1024;
uint8_t* ptr = (uint8_t*)malloc(memsize);
memset(ptr, size, memsize);
// Wait for the parent to sample our memory usage
sleep(2);
// Free memory
free(ptr);
return 0;
}
int main(int argc, char* argv[]) {
// Without arguments, it is the parent.
// Pass the name of the binary
if (argc == 1) return parent(argv[0]);
return child(std::atoi(argv[1]));
}
The result on my machine is:
$ ./fork_test
Loop:0 mem:3.22656 MB
Loop:1 mem:3.69922 MB
Loop:2 mem:4.80859 MB
Loop:3 mem:5.92578 MB
Loop:4 mem:6.87109 MB
Loop:5 mem:8.05469 MB
Loop:6 mem:8.77344 MB
Loop:7 mem:9.71875 MB
Loop:8 mem:10.7422 MB
Loop:9 mem:11.6797 MB
There is a video about this post.

Related

Value in shared memory different after shared memory access C++

I am trying to create shared memory, but whenever I access it from a child process its value is different than what it should be. I think that I am using shmget() correctly. I have tried a lot of stuff that I have found online, but I can't find anyone with the same problem I am having. No matter what I enter num as, whenever I try to get l->returnLicense it outputs 0. I'm really at a loss as to what to try next.
#include "license.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <errno.h>
#include <string>
#include <sys/wait.h>
#include <iostream>
#include <unistd.h>
using namespace std;
int validateArguments (int num) {
if (num == -69) {
//no arg
return 10;
}
if (num < 1 || num > 20) {
//warning use 20 as num
return 10;
}
return num;
}
int initSharedMemory (License *l) {
key_t key = ftok("/tmp", 'J');
cout << "key: " << key << endl;
int shmid = shmget(key, sizeof(l), 0666|IPC_CREAT);
if (shmid == -1) {
perror("Shared memory");
return -1;
}
l = (License*)shmat(shmid, (void*)0, 0);
if (l == (void*) -1) {
perror("Shared memory attach");
return -1;
}
return shmid;
}
void detachSharedMemory (License *l) {
shmdt(l);
}
void destroySharedMemory (int shmid) {
shmctl(shmid, IPC_RMID, NULL);
}
void spawn (int shmid) {
pid_t c_pid = fork();
if (c_pid == -1) {
perror("fork");
} else if (c_pid > 0) {
cout << "parent" << shmid << endl;
c_pid = wait(NULL);
} else {
cout << "child" << endl;
License *l;
key_t key = ftok("/tmp", 'J');
cout << "key: " << key << endl;
int shmid = shmget(key, sizeof(l), 0666);
cout << shmid;
l = (License*) shmat(shmid,0,0);
if(l == (void*) -1) {
perror("memory attach");
exit(0);
}
int num = l->returnLicense();
cout << num << "num\n";
shmdt(l);
char* args[] = {"./testChild", NULL};
execvp(args[0],args);
exit(0);
}
}
int main (int argv, char *argc[]) {
int num;
if (argv == 2) {
num = atoi(argc[1]);
} else {
num = -69;
}
num = validateArguments (num);
License *l;
int shmid = initSharedMemory (l);
License *tmp = l;
tmp->initLicense(num);
spawn(shmid);
cout << l->returnLicense() << endl;
detachSharedMemory(l);
destroySharedMemory(shmid);
return 0;
}
I'm not including the entirety of my code, but I think this is enough to illustrate my problem. I copied code from the testChild that I exec from this process so that you can see the problem I'm facing all in one file.
License *l; // uninitialized
int shmid = initSharedMemory (l); // pass l by value, UB!
License *tmp = l; // copy uninitialized pointer, UB!
tmp->initLicense(num); // call member function through uninitialized pointer, BOOM!
spawn(shmid);
cout << l->returnLicense() << endl; // call member function through uninitialized pointer, BOOM!
Probably you meant for your initSharedMemory() function to have a reference-typed parameter, so that it would affect the License * l; variable in main().

Multiple Definitions Error of Global Arrays [duplicate]

This question already has answers here:
c++ multiple definitions of a variable
(5 answers)
multiple definition error c++
(2 answers)
What exactly is One Definition Rule in C++?
(1 answer)
Closed 2 years ago.
I am attempting to compile my c++ code, and I continue getting the error:
/tmp/ccEsZppG.o:(.bss+0x0): multiple definition of `mailboxes'
/tmp/ccEZq43v.o:(.bss+0x0): first defined here
/tmp/ccEsZppG.o:(.bss+0xc0): multiple definition of `threads'
/tmp/ccEZq43v.o:(.bss+0xc0): first defined here
/tmp/ccEsZppG.o:(.bss+0x120): multiple definition of `semaphores'
/tmp/ccEZq43v.o:(.bss+0x120): first defined here
collect2: error: ld returned 1 exit status
Here is my code:
addem.cpp
#include <stdio.h>
#include <iostream>
#include <stdlib.h>
#include <semaphore.h>
#include <pthread.h>
#include "mailbox.h"
using namespace std;
void *sumUp(void *arg);
int main(int argc, char *argv[]) {
int numThreads, minThreads, maxInt, minInt;
if (argc < 3) {
cout << "Error: Need three arguments" << endl;
return 1;
}
numThreads = atoi(argv[1]);
maxInt = atoi(argv[2]);
minThreads = 1;
minInt = 1;
if (numThreads < 1) {
cout << "Cannot work with less than one thread\n"
<< "It's okay but do better next time!\n"
<< "We'll work with 1 thread this time.\n";
numThreads = minThreads;
} else if (numThreads > MAXTHREAD) {
cout << "Sorry, the max for threads is 10.\n"
<< "We'll work with 10 threads this time.\n";
numThreads = MAXTHREAD;
}
if (maxInt < 1) {
cout << "What do you want me to do? I can't count backwards!\n"
<< "I can barely count forwards! Let's make the max number\n"
<< "be 1 to save time\n";
maxInt = minInt;
}
struct msg outgoingMail[numThreads];
int divider = maxInt / numThreads;
int count = 1;
//initialize arrays (mailboxes, semaphores)
for (int i = 0; i < numThreads; i++) {
sem_init(&semaphores[i], 0, 1);
outgoingMail[i].iSender = 0;
outgoingMail[i].type = RANGE;
outgoingMail[i].value1 = count;
count = count + divider;
if (i = numThreads - 1) {
outgoingMail[i].value2 = maxInt;
} else {
outgoingMail[i].value2 = count;
}
}
for (int message = 0; message < numThreads; message++) {
SendMsg(message+1, outgoingMail[message]);
}
int thread;
for (thread = 0; thread <= numThreads; thread++) {
pthread_create(&threads[thread], NULL, &sumUp, (void *)(intptr_t)(thread+1));
}
struct msg incomingMsg;
int total = 0;
for (thread = 0; thread < numThreads; thread++) {
RecvMsg(0, incomingMsg);
total = total + incomingMsg.value1;
}
cout << "The total for 1 to " << maxInt << " using "
<< numThreads << " threads is " << total << endl;
return 0;
}
void *sumUp(void *arg) {
int index,total;
index = (intptr_t)arg;
struct msg message;
RecvMsg(index, message);
message.iSender = index;
message.type = ALLDONE;
total = 0;
for (int i = message.value1; i <= message.value2; i++) {
total += i;
}
SendMsg(0, message);
return (void *) 0;
}
mailbox.cpp
#include <stdio.h>
#include <iostream>
#include "mailbox.h"
using namespace std;
int SendMsg(int iTo, struct msg &Msg) {
if (safeToCall(iTo)) {
cout << "Error calling SendMsg" << endl;
return 1;
}
sem_wait(&semaphores[iTo]);
mailboxes[iTo] = Msg;
sem_post(&semaphores[iTo]);
return 0;
}
int RecvMsg(int iFrom, struct msg &Msg) {
sem_wait(&semaphores[iFrom]);
if (safeToCall(iFrom)) {
cout << "Error calling RecvMsg" << endl;
return 1;
}
mailboxes[iFrom] = Msg;
sem_post(&semaphores[iFrom]);
return 0;
}
bool safeToCall(int location) {
bool safe = !(location < 0 || location > MAXTHREAD + 1);
return safe;
//return true;
}
mailbox.h
#ifndef MAILBOX_H_
#define MAILBOX_H_
#define RANGE 1
#define ALLDONE 2
#define MAXTHREAD 10
#include <semaphore.h>
#include <pthread.h>
struct msg {
int iSender; /* sender of the message (0 .. numThreads)*/
int type; /* its type */
int value1; /* first value */
int value2; /* second value */
};
struct msg mailboxes[MAXTHREAD + 1];
pthread_t threads[MAXTHREAD + 1];
sem_t semaphores[MAXTHREAD + 1];
int SendMsg(int iTo, struct msg &Msg);
int RecvMsg(int iFrom, struct msg &Msg);
bool safeToCall(int location);
#endif
I am compiling the code with the command
g++ -o addem addem.cpp mailbox.cpp -lpthread
I have tried commenting out all of the function bodies in the source code to leave them as stub functions, and the same error occurs. The only way I have been able to compile the file is if I comment out the function bodies, and remove
#include "mailbox.h"
From at least one of the files. I feel it has to do with how I am initializing the arrays? But I cannot figure out a workaround.

Fail to query via move_pages()

#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <limits>
int main(int argc, char** argv) {
const constexpr uint64_t size = 16lu * 1024 * 1024;
const constexpr uint32_t nPages = size / (4lu * 1024 * 1024);
int32_t status[nPages];
std::fill_n(status, nPages, std::numeric_limits<int32_t>::min());
void* pages[nPages];
auto fd = shm_open("test_shm", O_RDWR|O_CREAT, 0666);
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == MAP_FAILED) {
if (fd > 0) close(fd);
throw "failed to map hugepages";
}
for (uint32_t i = 0; i < nPages; i++) {
pages[i] = (char*)ptr + 4 * 1024 * 1024;
}
if (0 != move_pages(0, nPages, pages, nullptr, status, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nPages; i++) {
std::cout << "page # " << i << " locates at numa node " << status[i] << std::endl;
}
}
munmap(ptr, size);
close(fd);
}
And it prints:
page # 0 locates at numa node -2
page # 1 locates at numa node -2
page # 2 locates at numa node -2
page # 3 locates at numa node -2
According to the manpage, it states:
nodes is an array of integers that specify the desired location for each page.
Each element in the array is a node number. nodes can also be NULL, in which
case move_pages() does not move any pages but instead will return the node where
each page currently resides, in the status array. Obtaining the status of each
page may be necessary to determine pages that need to be moved.
Why does it print negative values although querying return success? My machine only has 2 NUMAs -- 0 and 1.
kernel version: 3.10.0-862.2.3.el7.x86_64
Here is the version for hugepages:
#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <limits>
int main(int argc, char** argv) {
const int32_t dst_node = strtoul(argv[1], nullptr, 10);
const constexpr uint64_t size = 4lu * 1024 * 1024;
const constexpr uint64_t pageSize = 2lu * 1024 * 1024;
const constexpr uint32_t nPages = size / pageSize;
int32_t status[nPages];
std::fill_n(status, nPages, std::numeric_limits<int32_t>::min());
void* pages[nPages];
int32_t dst_nodes[nPages];
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB, -1, 0);
if (ptr == MAP_FAILED) {
throw "failed to map hugepages";
}
memset(ptr, 0x41, nPages*pageSize);
for (uint32_t i = 0; i < nPages; i++) {
pages[i] = &((char*)ptr)[i*pageSize];
dst_nodes[i] = dst_node;
}
std::cout << "Before moving" << std::endl;
if (0 != move_pages(0, nPages, pages, nullptr, status, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nPages; i++) {
std::cout << "page # " << i << " locates at numa node " << status[i] << std::endl;
}
}
// real move
if (0 != move_pages(0, nPages, pages, dst_nodes, status, MPOL_MF_MOVE_ALL)) {
std::cout << "failed to move pages because " << strerror(errno) << std::endl;
exit(-1);
}
const constexpr uint64_t smallPageSize = 4lu * 1024;
const constexpr uint32_t nSmallPages = size / smallPageSize;
void* smallPages[nSmallPages];
int32_t smallStatus[nSmallPages] = {std::numeric_limits<int32_t>::min()};
for (uint32_t i = 0; i < nSmallPages; i++) {
smallPages[i] = &((char*)ptr)[i*smallPageSize];
}
std::cout << "after moving" << std::endl;
if (0 != move_pages(0, nSmallPages, smallPages, nullptr, smallStatus, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nSmallPages; i++) {
std::cout << "page # " << i << " locates at numa node " << smallStatus[i] << std::endl;
}
}
}
The interesting thing is that move_pages() seems to understand hugepages as after the hugepages are moved, I query based on small page size, and it populates the expected NUMA IDs.
Your usage of shm_open and mmap probably will not get huge pages as you want.
move_pages syscall (and libnuma wrapper) works on standard pages of 4096 bytes for x86_64.
And you use move_pages in wrong way with incorrect 3rd argument "pages". It should be not pointer to memory; but pointer to array which itself will contain nPages pointers:
http://man7.org/linux/man-pages/man2/move_pages.2.html
long move_pages(int pid, unsigned long count, void **pages,
const int *nodes, int *status, int flags);
pages is an array of pointers to the pages that should be moved.
These are pointers that should be aligned to page boundaries.
Addresses are specified as seen by the process specified by pid.
Without correct pointers in the "pages' you will get -14 which is EFAULT according to errno 14 (from moreutils package).
//https://stackoverflow.com/questions/54546367/fail-to-query-via-move-pages
//g++ 54546367.move_pages.cc -o 54546367.move_pages -lnuma -lrt
#include <cstdint>
#include <iostream>
#include <numaif.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <limits>
int main(int argc, char** argv) {
const constexpr uint64_t size = 256lu * 1024;// * 1024;
const constexpr uint32_t nPages = size / (4lu * 1024);
void * pages[nPages];
int32_t status[nPages];
std::fill_n(status, nPages, std::numeric_limits<int32_t>::min());
// auto fd = shm_open("test_shm", O_RDWR|O_CREAT, 0666);
// void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
std::cout << "Ptr is " << ptr << std::endl;
if (ptr == MAP_FAILED) {
// if (fd > 0) close(fd);
throw "failed to map hugepages";
}
memset(ptr, 0x41, nPages*4096);
for(uint32_t i = 0; i<nPages; i++) {
pages[i] = &((char*)ptr)[i*4096];
}
if (0 != move_pages(0, nPages, pages, nullptr, status, 0)) {
std::cout << "failed to inquiry pages because " << strerror(errno) << std::endl;
}
else {
for (uint32_t i = 0; i < nPages; i++) {
std::cout << "page # " << i << " locates at numa node " << status[i] << std::endl;
}
}
munmap(ptr, size);
// close(fd);
}
With NUMA machine it outputs same node when started as taskset -c 7 ./54546367.move_pages and interleaved (0 1 0 1) when numactl -i all ./54546367.move_pages.

boost interprocess message_queue and fork

I am trying to communicate with forked child process using message queue from boost interprocess library. When child process calls receive it causes exception with message
boost::interprocess_exception::library_error
I am using GCC 6.3 on Debian 9 x64.
#include <iostream>
#include <unistd.h>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <memory>
int main(int argc, char* argv[])
{
using namespace boost::interprocess;
const char* name = "foo-552b8ae9-6037-4b77-aa0d-d4dc9dad790b";
const int max_num_msg = 100;
const int max_msg_size = 32;
bool is_child = false;
message_queue::remove(name);
auto mq = std::make_unique<message_queue>(create_only, name, max_num_msg, max_msg_size);
auto child_pid = fork();
if (child_pid == -1)
{
std::cout << "fork failed" << std::endl;
return -1;
}
else if (child_pid == 0)
{
is_child = true;
}
if (is_child)
{
// does child needs to reopen it?
mq.reset( new message_queue(open_only, name) );
}
int send_num = 0;
while(true)
{
unsigned int priority = 0;
if (is_child)
{
message_queue::size_type bytes = 0;
try
{
int num;
// Always throws. What is wrong ???????
mq->receive(&num, sizeof(num), bytes, priority);
std::cout << num << std::endl;
}
catch(const std::exception& e)
{
std::cout << "Receive caused execption " << e.what() << std::endl;
}
sleep(1);
}
else
{
mq->send(&send_num, sizeof(send_num), priority);
send_num++;
sleep(5);
}
}
return 0;
}
Also, in child process is it required to reopen the message queue created by the parent process? I tried it both ways and neither worked. I am getting the same exception on receive.
The problem is that your receive buffer is smaller than max_msg_size. Assuming 4-byte integers, this should work:
int num[8];
mq.receive(num, sizeof(num), bytes, priority);
std::cout << *num << std::endl;
Also, I see no reason to play fast and loose with the actual queue instance. Just create it per process:
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/exception/diagnostic_information.hpp>
#include <iostream>
#include <memory>
#include <unistd.h>
int main() {
namespace bip = boost::interprocess;
const char *name = "foo-552b8ae9-6037-4b77-aa0d-d4dc9dad790b";
{
const int max_num_msg = 100;
const int max_msg_size = 32;
bip::message_queue::remove(name);
bip::message_queue mq(bip::create_only, name, max_num_msg, max_msg_size);
}
auto child_pid = fork();
if (child_pid == -1) {
std::cout << "fork failed" << std::endl;
return -1;
}
bip::message_queue mq(bip::open_only, name);
if (bool const is_child = (child_pid == 0)) {
while (true) {
unsigned int priority = 0;
bip::message_queue::size_type bytes = 0;
try {
int num[8];
mq.receive(num, sizeof(num), bytes, priority);
std::cout << *num << std::endl;
} catch (const bip::interprocess_exception &e) {
std::cout << "Receive caused execption " << boost::diagnostic_information(e, true) << std::endl;
}
sleep(1);
}
} else {
// parent
int send_num = 0;
while (true) {
unsigned int priority = 0;
mq.send(&send_num, sizeof(send_num), priority);
send_num++;
sleep(5);
}
}
}

POSIX semaphore doesn't work under high contention/load

Using C++11 on Linux kernel 4.4.0-57, I'm trying to run two busy-looping processes (say p1, p2) pinned (pthread_setaffinity_np) on the same core and making sure the interleaving execution order by using POSIX semaphore (semaphore.h) and sched_yield(). But it did not work out well.
Below is the parent code (parent-task) that spawns 2 processes and each executes child-task code.
#include <stdio.h>
#include <cstdlib>
#include <errno.h> // errno
#include <iostream> // cout cerr
#include <semaphore.h> // semaphore
#include <fcntl.h> // O_CREAT
#include <unistd.h> // fork
#include <string.h> // cpp string
#include <sys/types.h> //
#include <sys/wait.h> // wait()
int init_semaphore(){
std::string sname = "/SEM_CORE";
sem_t* sem = sem_open ( sname.c_str(), O_CREAT, 0644, 1 );
if ( sem == SEM_FAILED ) {
std::cerr << "sem_open failed!\n";
return -1;
}
sem_init( sem, 0, 1 );
return 0;
}
// Fork and exec child-task.
// Return pid of child
int fork_and_exec( std::string pname, char* cpuid ){
int pid = fork();
if ( pid == 0) {
// Child
char* const params[] = { "./child-task", "99", strdup( pname.c_str() ), cpuid, NULL };
execv( params[0], params );
exit(0);
}
else {
// Parent
return pid;
}
}
int main( int argc, char* argv[] ) {
if ( argc <= 1 )
printf( "Usage ./parent-task <cpuid> \n" );
char* cpuid = argv[1];
std::string pnames[2] = { "p111", "p222" };
init_semaphore();
int childid[ 2 ] = { 0 };
int i = 0;
for( std::string pname : pnames ){
childid[ i ] = fork_and_exec( pname, cpuid );
}
for ( i=0; i<2; i++ )
if ( waitpid( childid[i], NULL, 0 ) < 0 )
perror( "waitpid() failed.\n" );
return 0;
}
The child-task code looks like this:
#include <cstdlib>
#include <stdio.h>
#include <sched.h>
#include <pthread.h>
#include <stdint.h>
#include <errno.h>
#include <semaphore.h>
#include <iostream>
#include <sys/types.h>
#include <fcntl.h> // O_CREAT
sem_t* sm;
int set_cpu_affinity( int cpuid ) {
pthread_t current_thread = pthread_self();
cpu_set_t cpuset;
CPU_ZERO( &cpuset );
CPU_SET( cpuid, &cpuset );
return pthread_setaffinity_np( current_thread,
sizeof( cpu_set_t ), &cpuset );
}
int lookup_semaphore() {
sm = sem_open( "/SEM_CORE", O_RDWR );
if ( sm == SEM_FAILED ) {
std::cerr << "sem_open failed!" << std::endl ;
return -1;
}
}
int main( int argc, char* argv[] ) {
printf( "Usage: ./child-task <PRIORITY> <PROCESS-NAME> <CPUID>\n" );
printf( "Setting SCHED_RR and priority to %d\n", atoi( argv[1] ) );
set_cpu_affinity( atoi( argv[3] ) );
lookup_semaphore();
int res;
uint32_t n = 0;
while ( 1 ) {
n += 1;
if ( !( n % 1000 ) ) {
res = sem_wait( sm );
if( res != 0 ) {
printf(" sem_wait %s. errno: %d\n", argv[2], errno);
}
printf( "Inst:%s RR Prio %s running (n=%u)\n", argv[2], argv[1], n );
fflush( stdout );
sem_post( sm );
sched_yield();
}
sched_yield();
}
sem_close( sm );
}
In the child-task code, I have if ( !( n % 1000 ) ) to experiment reducing the contention/load in waiting and posting the semaphore. The outcome I got is that when n % 1000, one of the child process will be always in Sleep state (from top) and the other child process executes properly. However, if I set n % 10000, i.e. less load/contention, both processes will run and printout the output interleavingly which is my expected outcome.
Does anyone know if this is the limitaion of semaphore.h or there's a better way to ensure processes execution order?
Updated: I did a simple example with threads and semaphore, note that sched_yield may help avoiding unnecessary wakeups of the thread that is not 'in turn' to do work, but yielding is not a guarantee. I also show an example with mutex/condvar that is guaranteed to work, no yield necessary.
#include <stdexcept>
#include <semaphore.h>
#include <pthread.h>
#include <thread>
#include <iostream>
using std::thread;
using std::cout;
sem_t sem;
int count = 0;
const int NR_WORK_ITEMS = 10;
void do_work(int worker_id)
{
cout << "Worker " << worker_id << '\n';
}
void foo(int work_on_odd)
{
int result;
int contention_count = 0;
while (count < NR_WORK_ITEMS)
{
result = sem_wait(&sem);
if (result) {
throw std::runtime_error("sem_wait failed!");
}
if (count % 2 == work_on_odd)
{
do_work(work_on_odd);
count++;
}
else
{
contention_count++;
}
result = sem_post(&sem);
if (result) {
throw std::runtime_error("sem_post failed!");
}
result = sched_yield();
if (result < 0) {
throw std::runtime_error("yield failed!");
}
}
cout << "Worker " << work_on_odd << " terminating. Nr of redundant wakeups from sem_wait: " <<
contention_count << '\n';
}
int main()
{
int result = sem_init(&sem, 0, 1);
if (result) {
throw std::runtime_error("sem_init failed!");
}
thread t0 = thread(foo, 0);
thread t1 = thread(foo, 1);
t0.join();
t1.join();
return 0;
}
Here is one way to do it with condition variables and mutexes. Translating from C++ std threads to pthreads should be trivial. To do it between processes, you would have to use a pthread mutex type that can be shared between processes. Maybe the condvar and the mutex can both be placed in shared memory, to achieve the same thing I do below with threads.
See also the manpage pthread_condattr_setpshared (3) or
http://manpages.ubuntu.com/manpages/wily/man3/pthread_condattr_setpshared.3posix.html
On the other hand, maybe it is simpler to just use a SOCK_STREAM unix domain socket between the two worker processes, and just block on the socket until the peer worker pings you (i.e. send one char) over the socket.
#include <cassert>
#include <iostream>
#include <thread>
#include <condition_variable>
#include <unistd.h>
using std::thread;
using std::condition_variable;
using std::mutex;
using std::unique_lock;
using std::cout;
condition_variable cv;
mutex mtx;
int count;
void dowork(int arg)
{
std::thread::id this_id = std::this_thread::get_id();
cout << "Arg: " << arg << ", thread id: " << this_id << '\n';
}
void tfunc(int work_on_odd)
{
assert(work_on_odd < 2);
auto check_can_work = [&count, &work_on_odd](){ return ((count % 2) ==
work_on_odd); };
while (count < 10)
{
unique_lock<mutex> lk(mtx);
cv.wait (lk, check_can_work);
dowork(work_on_odd);
count++;
cv.notify_one();
// Lock is unlocked automatically here, but with threads and condvars,
// it is actually better to unlock manually before notify_one.
}
}
int main()
{
count = 0;
thread t0 = thread(tfunc, 0);
thread t1 = thread(tfunc, 1);
sleep(1);
cv.notify_one();
t0.join();
t1.join();
}