C: performance of pthread, low than single thrad - c++

I'm confusing about the performance of my code, when dealing with single thread it only using 13s, but it's will consume 80s. I don't know whether the vector can only be accessed by one thread at a time, if so it's likely I have to use a struct array to store data instead of vector, could anyone kindly help?
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iterator>
#include <string>
#include <ctime>
#include <bangdb/database.h>
#include "SEQ.h"
#define NUM_THREADS 16
using namespace std;
typedef struct _thread_data_t {
std::vector<FDT> *Query;
unsigned long start;
unsigned long end;
connection* conn;
int thread;
} thread_data_t;
void *thr_func(void *arg) {
thread_data_t *data = (thread_data_t *)arg;
std::vector<FDT> *Query = data->Query;
unsigned long start = data->start;
unsigned long end = data->end;
connection* conn = data->conn;
printf("thread %d started %lu -> %lu\n", data->thread, start, end);
for (unsigned long i=start;i<=end ;i++ )
{
FDT *fout = conn->get(&((*Query).at(i)));
if (fout == NULL)
{
//printf("%s\tNULL\n", s);
}
else
{
printf("Thread:%d\t%s\n", data->thread, fout->data);
}
}
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
if (argc<2)
{
printf("USAGE: ./seq <.txt>\n");
printf("/home/rd/SCRIPTs/12X18610_L5_I052.R1.clean.code.seq\n");
exit(-1);
}
printf("%s\n", argv[1]);
vector<FDT> Query;
FILE* fpin;
if((fpin=fopen(argv[1],"r"))==NULL) {
printf("Can't open Input file %s\n", argv[1]);
return -1;
}
char *key = (char *)malloc(36);
while (fscanf(fpin, "%s", key) != EOF)
{
SEQ * sequence = new SEQ(key);
FDT *fk = new FDT( (void*)sequence, sizeof(*sequence) );
Query.push_back(*fk);
}
unsigned long Querysize = (unsigned long)(Query.size());
std::cout << "myvector stores " << Querysize << " numbers.\n";
//create database, table and connection
database* db = new database((char*)"berrydb");
//get a table, a new one or existing one, walog tells if log is on or off
table* tbl = db->gettable((char*)"hg19", JUSTOPEN);
if(tbl == NULL)
{
printf("ERROR:table NULL error");
exit(-1);
}
//get a new connection
connection* conn = tbl->getconnection();
if(conn == NULL)
{
printf("ERROR:connection NULL error");
exit(-1);
}
cerr<<"begin querying...\n";
time_t begin, end;
double duration;
begin = clock();
unsigned long ThreadDealSize = Querysize/NUM_THREADS;
cerr<<"Querysize:"<<ThreadDealSize<<endl;
pthread_t thr[NUM_THREADS];
int rc;
thread_data_t thr_data[NUM_THREADS];
for (int i=0;i<NUM_THREADS ;i++ )
{
unsigned long ThreadDealStart = ThreadDealSize*i;
unsigned long ThreadDealEnd = ThreadDealSize*(i+1) - 1;
if (i == (NUM_THREADS-1) )
{
ThreadDealEnd = Querysize-1;
}
thr_data[i].conn = conn;
thr_data[i].Query = &Query;
thr_data[i].start = ThreadDealStart;
thr_data[i].end = ThreadDealEnd;
thr_data[i].thread = i;
}
for (int i=0;i<NUM_THREADS ;i++ )
{
if (rc = pthread_create(&thr[i], NULL, thr_func, &thr_data[i]))
{
fprintf(stderr, "error: pthread_create, rc: %d\n", rc);
return EXIT_FAILURE;
}
}
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_join(thr[i], NULL);
}
cerr<<"done\n"<<endl;
end = clock();
duration = double(end - begin) / CLOCKS_PER_SEC;
cerr << "runtime: " << duration << "\n" << endl;
db->closedatabase(OPTIMISTIC);
delete db;
printf("Done\n");
return EXIT_SUCCESS;
}

Like all data structures in standard library, methods of vector are reentrant, but not thread-safe. That means different instances can be accessed by multiple threads independently, but each instance may only be accessed by one thread at a time and you have to ensure it. But since you have separate vector for each thread, that's not your problem.
What is probably your problem is the printf. printf is thread-safe, meaning you can call it from any number of threads at the same time, but at the cost of being wrapped in mutual exclusion internally.
Majority of work in the threaded part of your program is done inside printf. So what probably happens is that all the threads are started and quickly get to the printf, where all but the first will stop. When the printf finishes and releases the mutex, system considers scheduling the threads that were waiting for it. It probably does, so rather slow context switch happens. And repeats after every printf.
How exactly it happens depends on which actual locking primitive is being used, which depends on your operating system and standard library versions. The system should each time wake up only the next sleeper, but many implementations actually wake up all of them. So in addition to the printfs being executed in mostly round-robin fashion, incurring one context switch for each, there may be quite a few additional spurious wake-ups in which the thread just finds the lock is held and goes back to sleep.
So the lesson from this is that threads don't make things automagically faster. They only help when:
The thread spends most of it's time doing blocking system calls. In things like network servers the threads wait for data from the socket, than from data for response to come from disk and finally for network to accept the response. In such cases, having many threads helps as long as they are mostly independent.
There is just so many threads as there are CPU threads. Currently the usual number is 4 (either quad-core or dual-core with hyper-threading). More threads can't physically run in parallel, so they provide no gain and incur a bit of overhead. 16 threads is thus overkill.
And they never help when they all manipulate the same objects, so they end up spending most of the time waiting for locks anyway. In addition to any of your own objects that you lock, keep in mind that input and output file handles have to be internally locked as well.
Memory allocation also needs to internally synchronize between threads, but modern allocators have separate pools for threads to avoid much of it; if the default allocator proves to be too slow with many threads, there are some specialized ones you can use.

Related

Program with pipes and threads exiting before completion

I am writing a c++ program with threading and pipes. I am implementing a parallelized algorithm and the idea is that I have a main thread that writes data to child threads. The child thread must read this data, process it, and write back the result to the main thread.
I have stripped down a minimal reproducing, compiling version of the core logic of the communication and commented out the places where I have more code. The program runs and exits without typing out complete. Usually, the last value of i that is printed is between 1 and 9 and the program just terminates without saying anything. I would expect the program to run to completion but I am not getting any errors and the program exits gracefully so I am not sure how to debug.
NOTE: Pipes and Pthreads are mandated from somewhere else and are hard requirements. Please don't suggest a solution to use std::thread or just communicate between threads within the same address space.
#include <iostream>
#include "pthread.h"
#include "unistd.h"
#include <vector>
using namespace std;
void* func (void* args)
{
std::vector<int> v = * (std::vector<int>*)(args);
auto FH = fdopen(v[0], "r");
char buffer[1024];
int buffer_len = 1024;
while (fgets(buffer, buffer_len, FH))
{
std::string x{buffer};
}
// process the result and return it to the parent
return NULL;
}
int main()
{
std::vector<std::vector<int> *> pipes{};
std::vector<pthread_t *> threads{};
for (int i=0; i<20; i++)
{
std::cout<<i<<std::endl;
int fd[2];
if (pipe(fd) < 0)
{
std::cout<<"failed"<<std::endl;
exit(0);
}
int fd2[2];
if (pipe(fd2) < 0)
{
std::cout<<"failed"<<std::endl;
exit(0);
}
std::vector<int> *pipe_info = new std::vector<int>{fd[0], fd[1], fd2[0], fd2[1]};
auto F = fdopen(fd[1], "w");
pthread_t *thread = new pthread_t;
threads.push_back(thread);
pipes.push_back(pipe_info);
pthread_create(thread, NULL, func, (void*) pipe_info);
for (int i=0; i<100; i++)
fprintf(F, "%d", 3);
}
// read the data returned from the child threads
// using fd2 (indices 2,3) in each pipe in pies.
// free all allocated memory
for (auto thread: threads)
{
pthread_join(*thread, NULL);
delete thread;
}
std::cout<<"complete"<<std::endl;
return 0;
}
I cannot reproduce the problem, and the symptoms you describe seem improbabe. On my system the program prints all the numbers, and does not terminate, but hangs.
The reason is that pipe() is not a constructor; fdopen() is not a constructor either. The are c interfaces, and they wouldn't close by virtue of leaving the scope. You have to close fds and FILEs manually. You don't do it, and the threads patiently wait in fgets for more data or EOF. and until main close the writing end of a pipe, there will be no EOF.

How to selectively ignore some inotify events?

Lets say I have two processes (simulated in this example with two threads) in a producer-consumer set up. That is, one process writes data to a file, the other process consumes the data in the file, then clears said file.
The set up I currently have, based on bits and pieces I've thrown together from various resources online, is that I should use a lock file to ensure that only one process can access the data file at a time. The producer acquires the lock, writes to the file, then releases the lock. Meanwhile, the consumer waits for modify events with inotify at which point it acquires the lock, consumes the data, and empties the file.
This seems relatively straightforward, but the part that's tripping me up is that when I empty the file out in my consumer thread, it triggers inotify modify event again, which sets off the whole flow again, and ends with the data file being cleared again, thus repeating forever.
I've tried a few ways to work around this problem, but none of them seem quite right. I'm worried doing this wrong will introduce potential race conditions or I'll end up skipping modify events or something.
Here is my current code:
#include <fstream>
#include <iostream>
#include <string>
#include "pthread.h"
#include "sys/file.h"
#include "sys/inotify.h"
#include "sys/stat.h"
#include "unistd.h"
const char* lock_filename = "./test_lock_file";
const char* data_filename = "./test_data_file";
int AquireLock(char const* lockName) {
mode_t m = umask(0);
int fd = open(lockName, O_RDWR | O_CREAT, 0666);
umask(m);
bool success = false;
if (fd < 0 || flock(fd, LOCK_EX) < 0) {
close(fd);
return -1;
}
return fd;
}
void ReleaseLock(int fd, char const* lockName) {
if (fd < 0) return;
remove(lockName);
close(fd);
}
void* ConsumerThread(void*) {
// Set up inotify.
int file_descriptor = inotify_init();
if (file_descriptor < 0) return nullptr;
int watch_descriptor =
inotify_add_watch(file_descriptor, data_filename, IN_MODIFY);
if (watch_descriptor < 0) return nullptr;
char buf[4096] __attribute__((aligned(__alignof__(inotify_event))));
while (true) {
// Read new events.
const inotify_event* event;
ssize_t numRead = read(file_descriptor, buf, sizeof(buf));
if (numRead <= 0) return nullptr;
// For each event, do stuff.
for (int i = 0; i < numRead; i += sizeof(inotify_event) + event->len) {
event = reinterpret_cast<inotify_event*>(&buf[i]);
// Critical section!
int fd = AquireLock(lock_filename);
// Read from the file.
std::string line;
std::ifstream data_file(data_filename);
if (data_file.is_open()) {
while (getline(data_file, line)) {
std::cout << line << std::endl;
}
data_file.close();
// Clear the file by opening then closing without writing to it.
std::ofstream erase_data_file(data_filename);
erase_data_file.close();
std::cout << "file cleared." << std::endl;
}
ReleaseLock(fd, lock_filename);
// Critical section over!
}
}
return nullptr;
}
int main(int argv, char** argc) {
// Set up other thread.
pthread_t thread;
int rc = pthread_create(&thread, NULL, ConsumerThread, nullptr);
if (rc) return rc;
// Producer thread: Periodically write to a file.
while (true) {
sleep(3);
// Critical section!
int fd = AquireLock(lock_filename);
// Write some text to a file
std::ofstream data_file(data_filename);
int counter = 0;
if (data_file.is_open()) {
std::cout << "Writing to file.\n";
data_file << "This is some example data. " << counter++ << "\n";
data_file.close();
}
ReleaseLock(fd, lock_filename);
// Critical section over!
}
pthread_exit(NULL);
return 0;
}
One idea I had was to disable tracking of modify events at the start of the consumer thread's critical section with inotify_rm_watch, then re-add it right before leaving the critical section. This doesn't seem to work though. Even with the events disabled, modify events are still getting triggered and I'm not sure why.
I've also considered just using a boolean to see if there was any file contents while consuming the file, and only clearing the file if it wasn't empty. This felt kind of hacky since it's still doing a second unnecessary iteration of the loop, but if I can't find a better solution I might just go with that. Ideally there would be a way to have only the producer thread's modifications trigger events, while the consumer could have it's own file modifications somehow ignored or disabled, but I'm not sure how to achieve that effect.

Parsing large file with MPI in C++

I have a C++ program in which I want to parse a huge file, looking for some regex that I've implemented. The program was working ok when executed sequentially but then I wanted to run it using MPI.
I started the adaptation to MPI by differentiating the master (the one who coordinates the execution) from the workers (the ones that parse the file in parallel) in the main function. Something like this:
MPI::Init(argc, argv);
...
if(rank == 0) {
...
// Master sends initial and ending byte to every worker
for(int i = 1; i < total_workers; i++) {
array[0] = (i-1) * first_worker_file_part;
array[1] = i * first_worker_file_part;
MPI::COMM_WORLD.Send(array, 2, MPI::INT, i, 1);
}
}
if(rank != 0)
readDocument();
...
MPI::Finalize();
The master will send to every worker an array with 2 position that contains the byte where it will start the reading of the file in position 0 and the byte where it needs to stop reading in position 1.
The readDocument() function looks like this by now (not parsing, just each worker reading his part of the file):
void readDocument()
{
array = new int[2];
MPI::COMM_WORLD.Recv(array, 10, MPI::INT, 0, 1, status);
int read_length = array[1] - array[0];
char* buffer = new char [read_length];
if (infile)
{
infile.seekg(array[0]); // Start reading in supposed byte
infile.read(buffer, read_length);
}
}
I've tried different examples, from writing to a file the output of the reading to running it with different number of processes. What happens is that when I run the program with 20 processes instead of 10, for example, it lasts twice the time to read the file. I expected it to be nearly half the time and I can't figure why this is happening.
Also, in a different matter, I want to make the master wait for all the workers to complete their execution and then print the final time. Is there any way to "block" him while the workers are processing? Like a cond_wait in C pthreads?
In my experience people working on computer systems with parallel file systems tend to know about those parallel file systems so your question marks you out, initially, as someone not working on such a system.
Without specific hardware support reading from a single file boils down to the system positioning a single read head and reading a sequence of bytes from the disk to memory. This situation is not materially altered by the complex realities of many modern file systems, such as RAID, which may in fact store a file across multiple disks. When multiple processes ask the operating system for access to files at the same time the o/s parcels out disk access according to some notion, possibly of fairness, so that no process gets starved. At worst the o/s spends so much time switching disk access from process to process that the rate of reading drops significantly. The most efficient, in terms of throughput, approach is for a single process to read an entire file in one go while other processes do other things.
This situation, multiple processes contending for scarce disk i/o resources, applies whether or not those processes are part of a parallel, MPI (or similar) program or entirely separate programs running concurrently.
The impact is what you observe -- instead of 10 processes each waiting to get their own 1/10th share of the file you have 20 processes each waiting for their 1/20th share. Oh, you cry, but each process is only reading half as much data so the whole gang should take the same amount of time to get the file. No, I respond, you've forgotten to add the time it takes the o/s to position and reposition the read/write heads between accesses. Read time comprises latency (how long does it take reading to start once the request has been made) and throughput (how fast can the i/o system pass the bytes to and fro).
It should be easy to come up with some reasonable estimates of latency and bandwidth that explains the twice as long reading by 20 processes as by 10.
How can you solve this ? You can't, not without a parallel file system. But you might find that having the master process read the whole file and then parcel it out to be faster than your current approach. You might not, you might just find that the current approach is the fastest for your whole computation. If read time is, say, 10% of total computation time you might decide it's a reasonable overhead to live with.
To add to High Performance Mark's correct answer, one can use MPI-IO to do the file reading, providing (in this case) hints to the IO routines not to read from every processor; but this same code with a modified (or empty) MPI_Info should be able to take advantage of a parallel file system as well should you move to a cluster that has one. For the most common implementation of MPI-IO, Romio, the manual describing what hints are available is here; in particular, we're using
MPI_Info_set(info, "cb_config_list","*:1");
to set the number of readers to be one per node. The code below will let you try reading the file using MPI-IO or POSIX (eg, seek).
#include <iostream>
#include <fstream>
#include <mpi.h>
void partitionFile(const int filesize, const int rank, const int size,
const int overlap, int *start, int *end) {
int localsize = filesize/size;
*start = rank * localsize;
*end = *start + localsize-1;
if (rank != 0) *start -= overlap;
if (rank != size-1) *end += overlap;
}
void readdataMPI(MPI_File *in, const int rank, const int size, const int overlap,
char **data, int *ndata) {
MPI_Offset filesize;
int start;
int end;
// figure out who reads what
MPI_File_get_size(*in, &filesize);
partitionFile((int)filesize, rank, size, overlap, &start, &end);
*ndata = end - start + 1;
// allocate memory
*data = new char[*ndata + 1];
// everyone reads in their part
MPI_File_read_at_all(*in, (MPI_Offset)start, *data,
(MPI_Offset)(*ndata), MPI_CHAR, MPI_STATUS_IGNORE);
(*data)[*ndata] = '\0';
}
void readdataSeek(std::ifstream &infile, int array[2], char *buffer)
{
int read_length = array[1] - array[0];
if (infile)
{
infile.seekg(array[0]); // Start reading in supposed byte
infile.read(buffer, read_length);
}
}
int main(int argc, char **argv) {
MPI_File in;
int rank, size;
int ierr;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (argc != 3) {
if (rank == 0)
std::cerr << "Usage: " << argv[0] << " infilename [MPI|POSIX]" << std::endl;
MPI_Finalize();
return -1;
}
std::string optionMPI("MPI");
if ( !optionMPI.compare(argv[2]) ) {
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "cb_config_list","*:1"); // ROMIO: one reader per node
// Eventually, should be able to use io_nodes_list or similar
ierr = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, info, &in);
if (ierr) {
if (rank == 0)
std::cerr << "Usage: " << argv[0] << " Couldn't open file " << argv[1] << std::endl;
MPI_Finalize();
return -1;
}
const int overlap=1;
char *data;
int ndata;
readdataMPI(&in, rank, size, overlap, &data, &ndata);
std::cout << "MPI: Rank " << rank << " has " << ndata << " characters." << std::endl;
delete [] data;
MPI_File_close(&in);
MPI_Info_free(&info);
} else {
int fsize;
if (rank == 0) {
std::ifstream file( argv[1], std::ios::ate );
fsize=file.tellg();
file.close();
}
MPI_Bcast(&fsize, 1, MPI_INT, 0, MPI_COMM_WORLD);
int start, end;
partitionFile(fsize, rank, size, 1, &start, &end);
int array[2] = {start, end};
char *buffer = new char[end-start+2];
std::ifstream infile;
infile.open(argv[1], std::ios::in);
readdataSeek(infile, array, buffer);
buffer[end-start+1] = '\0';
std::cout << "Seeking: Rank " << rank << " has " << end-start+1 << " characters." << std::endl;
infile.close() ;
delete [] buffer;
}
MPI_Finalize();
return 0;
}
On my desktop, I don't get much of a performance difference, even oversubscribing the cores (eg, using lots of seeks):
$ time mpirun -np 20 ./read-chunks moby-dick.txt POSIX
Seeking: Rank 0 has 62864 characters.
[...]
Seeking: Rank 8 has 62865 characters.
real 0m1.250s
user 0m0.290s
sys 0m0.190s
$ time mpirun -np 20 ./read-chunks moby-dick.txt MPI
MPI: Rank 1 has 62865 characters.
[...]
MPI: Rank 4 has 62865 characters.
real 0m1.272s
user 0m0.337s
sys 0m0.265s

Is this piece of code correct?

I compiled it on Linux with: g++ test.c -o test
I rewritten the original example.
Now made the first process to wait 2 seconds, (so that process2 could write on the shared memory), then I made process1 to read from that memory. Is this test correct?
Secondo question: where should I put:
shmdt(tests[0]); // or 1
shmctl(statesid, IPC_RMID, 0);
//Global scope
char *state[2];
//...
//...
struct teststruct {
int stateid;
teststruct *next;
//other things
};
void write(teststruct &t, char* what)
{
strcpy(state[t.next->stateid], what);
printf("\n\nI am (%d), I wrote on: %d", t.stateid, t.next->stateid);
}
void read(teststruct &t)
{
printf("\n\nI am (%d), I read: **%s**", t.stateid, state[t.stateid]);
}
int main() {
key_t key;
if ((key = ftok(".", 'a')) == -1) {
perror("ftok");
exit(1);
}
int statesid;
if ((statesid = shmget(key, sizeof(char*)*50, 0600 | IPC_CREAT )) == -1) {
perror("shmget error");
exit(1);
}
state[0] = (char*)shmat(statesid, NULL, 0);
state[1] = (char*)shmat(statesid, NULL, 0);
teststruct tests[2];
tests[0].stateid = 0;
tests[0].next = &tests[1];
tests[1].stateid = 1;
tests[1].next = &tests[0];
int t0, t1;
switch (t0 = fork()) {
case (0):
sleep(2);
read(tests[0]);
exit(0);
case (-1):
printf("\nError!");
exit(-1);
default:
wait();
}
switch (t1 = fork()) {
case (0):
write(tests[1], "1 write on 0 in theory.");
exit(0);
case (-1):
printf("\nError!");
exit(-1);
default:
wait();
}
return 0;
}
In particular I am asking if "state" is really shared between the two process, and If what I've done is a good way to do that.
My goal is to make char *state[2] shared (reading/modifying) between the two processes after fork.
You don't need to call shmat() twice. You've only allocated enough space for two pointers, so you can't communicate much between the two processes. And you can't rely on being able to copy a pointer to memory in the first process into shared memory and then have the second process read and use it. The address may be valid in the first process and not in the second; it may well point at completely different data in the second process (dynamic memory allocation in particular could screw this up). You can only rely on the contents of the shared memory being the same in both processes. You should allocate enough shared memory to hold the shared data.
However, with that said, the two processes should be sharing that small piece of shared memory, and in both processes, state[0] and state[1] will point at the shared memory and you should be able to communicate between the two by writing in the shared memory. Note that after forking, if either process changes the value stored in its state[0] or state[1], the other process will not see that change — the other process can only see what changes in the shared memory those pointers point to.
Of course, you've not set up any synchronization mechanism, so the access will likely be chaotic.
How can I modify my code just to make it works as intended (without considering synchronization issues)?
It isn't entirely clear how it is intended to work, which complicates answering the question. However, if you want (for sake of example) the child process to write a word to the shared memory and the parent process to read the word from shared memory, then you allocate enough shared memory for the biggest word you're willing to process, then arrange for the child to copy a word from its per-process memory into the shared memory (and notify the parent that it has done so), and then the parent can copy or read the word from shared memory and compare it with data from its per-process memory.
Because you have a parent-child process which are forks of the same process, you will find that the two processes share a lot of the same memory addresses containing the same information. This is, however, coincidental. You can have unrelated processes connect to shared memory, and they need not have any addresses in common. Thus, it would be trivial to get spurious results from your current setup.
Working Code
For some definitions of 'working', the following C++ code does. The code is subtly C++; the code assumes struct teststruct declares type teststruct, and uses references as parameters.
Note that the (revised) code in the question has its wait() calls infelicitously placed.
shm2.cpp
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <unistd.h>
static char *state = 0;
struct teststruct
{
int stateid;
teststruct *next;
};
void sm_write(teststruct &t, char* /*what*/)
{
//strcpy(state[t.next->stateid], what);
printf("[%5d] I am (%d), I wrote on: %d\n", (int)getpid(), t.stateid, t.next->stateid);
}
void sm_read(teststruct &t)
{
printf("[%5d] I am (%d), I read: **%s**\n", (int)getpid(), t.stateid, state);
}
int main(void)
{
key_t key;
if ((key = ftok(".", 'a')) == -1) {
perror("ftok");
exit(1);
}
int statesid;
if ((statesid = shmget(key, sizeof(char)*512, 0600 | IPC_CREAT )) == -1) {
perror("shmget error");
exit(1);
}
if ((state = (char*)shmat(statesid, NULL, 0)) == 0)
{
perror("shmat");
exit(1);
}
sprintf(state, "This is a string in shared memory %d", 919);
teststruct tests[2];
tests[0].stateid = 0;
tests[0].next = &tests[1];
tests[1].stateid = 0;
tests[1].next = &tests[0];
int t0, t1;
if ((t0 = fork()) < 0)
{
perror("fork-1");
exit(1);
}
else if (t0 == 0)
{
sm_read(tests[0]);
printf("[%5d] sleeping\n", (int)getpid());
sleep(2);
printf("[%5d] waking\n", (int)getpid());
sm_read(tests[0]);
exit(0);
}
else if ((t1 = fork()) < 0)
{
perror("fork-2");
exit(-1);
}
else if (t1 == 0)
{
printf("[%5d] sleeping\n", (int)getpid());
sleep(1);
printf("[%5d] waking\n", (int)getpid());
strcpy(state, "1 write on 0 in theory.");
sm_write(tests[1], state);
exit(0);
}
int corpse;
int status;
while ((corpse = wait(&status)) > 0)
printf("PID %5d died with status 0x%.4X\n", corpse, status);
return 0;
}
Example run
[20440] sleeping
[20440] waking
[20440] I am (0), I wrote on: 0
[20439] I am (0), I read: **This is a string in shared memory 919**
[20439] sleeping
[20439] waking
[20439] I am (0), I read: **1 write on 0 in theory.**
PID 20440 died with status 0x0000
PID 20439 died with status 0x0000
You have a problem with the size of the shared memory. In:
(statesid = shmget(key, sizeof(char*)*2, 0600 | IPC_CREAT )
you are just reserving space for 2 pointers to char. You need to allocate enough space for all your data, that based on the struct is kind of linked structure. The code could be something like the following, though the purpose of the fork() and shared memory is not very clear to me:
struct teststruct {
int stateid;
teststruct *next;
//other things
};
void dosomething(teststruct &t){
//forget about global space, you don't need it
}
int main() {
key_t key;
if ((key = ftok(".", 'a')) == -1) {
perror("ftok");
exit(1);
}
int statesid;
int size_struct = sizeof(teststruct)*2; //assuming you will have only 1 level of linking
if ((statesid = shmget(key, size_struct, 0600 | IPC_CREAT )) == -1) {
perror("shmget error");
exit(1);
}
//if you need to hold just one teststruct object data, you can do
teststruct* p_test_struct = (teststruct*)shmat(statesid, NULL, 0);
for (int i=0; i<2; i++){
*p_test_struct = tests[i]; //this actually writes tests[i] into shared mem
int t0, t1;
switch (t0 = fork()) {
case (0):
dosomething(*p_test_struct);
exit(0);
case (-1):
printf("\nError!");
exit(-1);
default:
wait();
}
}
return 0;
}
No, it does not. Because you are using fork (multiprocess) instead of threads (multithread). Memory zones are not shared into parent and child process. You will have the same value into it on the child but after that it will be independent to the another one.

WinAPI's sleep doesn't work inside child thread

I'm a beginner and I'm trying to reproduce a rae condition in order to familirize myself with the issue. In order to do that, I created the following program:
#include <Windows.h>
#include <iostream>
using namespace std;
#define numThreads 1000
DWORD __stdcall addOne(LPVOID pValue)
{
int* ipValue = (int*)pValue;
*ipValue += 1;
Sleep(5000ull);
*ipValue += 1;
return 0;
}
int main()
{
int value = 0;
HANDLE threads[numThreads];
for (int i = 0; i < numThreads; ++i)
{
threads[i] = CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
WaitForMultipleObjects(numThreads, threads, true, INFINITE);
cout << "resulting value: " << value << endl;
return 0;
}
I added sleep inside a thread's function in order to reproduce the race condition as, how I understood, if I just add one as a workload, the race condition doesn't manifest itself: a thread is created, then it runs the workload and it happens to finish before the other thread which is created on the other iteration starts its workload. My problem is that Sleep() inside the workload seems to be ignored. I set the parameter to be 5sec and I expect the program to run at least 5 secs, but insted it finishes immediately. When I place Sleep(5000) inside main function, the program runs as expected (> 5 secs). Why is Sleep inside thread unction ignored?
But anyway, even if the Sleep() is ignored, the program outputs this everytime it is launched:
resulting value: 1000
while the correct answer should be 2000. Can you guess why is that happening?
WaitForMultipleObjects only allows waiting for up to MAXIMUM_WAIT_OBJECTS (which is currently 64) threads at a time. If you take that into account:
#include <Windows.h>
#include <iostream>
using namespace std;
#define numThreads MAXIMUM_WAIT_OBJECTS
DWORD __stdcall addOne(LPVOID pValue) {
int* ipValue=(int*)pValue;
*ipValue+=1;
Sleep(5000);
*ipValue+=1;
return 0;
}
int main() {
int value=0;
HANDLE threads[numThreads];
for (int i=0; i < numThreads; ++i) {
threads[i]=CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
WaitForMultipleObjects(numThreads, threads, true, INFINITE);
cout<<"resulting value: "<<value<<endl;
return 0;
}
...things work much more as you'd expect. Whether you'll actually see results from the race condition is, of course, a rather different story--but on multiple runs, I do see slight variations in the resulting value (e.g., a low of around 125).
Jerry Coffin has the right answer, but just to save you typing:
#include <Windows.h>
#include <iostream>
#include <assert.h>
using namespace std;
#define numThreads 1000
DWORD __stdcall addOne(LPVOID pValue)
{
int* ipValue = (int*)pValue;
*ipValue += 1;
Sleep(5000);
*ipValue += 1;
return 0;
}
int main()
{
int value = 0;
HANDLE threads[numThreads];
for (int i = 0; i < numThreads; ++i)
{
threads[i] = CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
DWORD Status = WaitForMultipleObjects(numThreads, threads, true, INFINITE);
assert(Status != WAIT_FAILED);
cout << "resulting value: " << value << endl;
return 0;
}
When things go wrong, make sure you've asserted the return value of any Windows API function that can fail. If you really badly need to wait on lots of threads, it is possible to overcome the 64-thread limit by chaining. I.e., for every additional 64 threads you need to wait on, you sacrifice a thread whose sole purpose is to wait on 64 other threads, and so on. We (Windows Developer's Journal) published an article demonstrating the technique years ago, but I can't recall the author name off the top of my head.