memory leak with sockets and map - c++

I have a socket server, everytime a new connection is made, a XClient class is instantiated and I am inserting it into a map. I am watching the memory usage through task manager. everytime a new connection is made, lets assume, the memory usage of my program increases by 800kb for example. Inside that class, there is a connected variable, which will tell me wheter this client is active or not. I created a thread to run endlessly and iterate through all the elements of my map and I'm checking if the connected variable is true or false. if it is false, I am (at least I think I am...) releasing the memory used by the previously instantiated XClient class. BUT, the memory usage is being decreased only half of the 800kb (for example, no precise values). So, when a client connects: +800kb. when client disconnects: -400kb. I think I have a memory leak? If I have 100 clients connected, that 400kb that is not being released would turn into 4000kb of non-used(?) memory, and that would be a problem.
So, here is my code.
The thread to iterate through all elements:
DWORD Update(XSockets *sockets)
{
while(true)
{
for(sockets->it = sockets->clients.begin(); sockets->it != sockets->clients.end(); sockets->it++)
{
int key = (*sockets->it).first;
if(sockets->clients[key]->connected == false) // remove the client, releasing memory
{
delete sockets->clients[key];
}
}
Sleep(100);
}
return true;
}
The code that is adding new XClients instances to my map:
bool XSockets::AcceptConnections()
{
struct sockaddr_in from;
while(true)
{
try
{
int fromLen = sizeof(from);
SOCKET client = accept(this->loginSocket,(struct sockaddr*)&from,&fromLen);
if(client != INVALID_SOCKET)
{
srand(time(NULL));
int clientKey = rand();
XClient* clientClass = new XClient(inet_ntoa(from.sin_addr),clientKey,client);
this->clients.insert(make_pair(clientKey,clientClass));
}
Sleep(100);
}
catch(...)
{
printf("error accepting incoming connection!\r\n");
break;
}
}
closesocket(this->loginSocket);
WSACleanup();
return true;
}
And the declarations:
map<int,XClient*> clients;
map<int,XClient*>::iterator it;

You've got several problems, but the chief one is that you appear to be sharing a map between threads without any synchronization at all. That can lead to all kinds of trouble.

Are you using c++11 or Boost? To avoid memory leak nightmares like this, you could create a map of shared pointers. This way, you can let the structure clean itself up.
This is how I would do it:
#include <memory>
#include <map>
#include <algorithm>
#include <functional>
#include <mutex>
typedef std::shared_ptr<XClient> XClientPtr;
std::map<int, XClientPtr> client;
std::mutex the_lock;
bool XSockets::AcceptConnections()
{
/* snip */
auto clientClass = std::make_shared<XClient>(/*... params ...*/);
the_lock.lock();
clients[clientKey] = clientClass;
the_lock.unlock();
/* snip */
}
bool client_is_connected(const std::pair<int, XClientPtr> &p) {
return p.second->connected;
}
DWORD Update(XSockets *sockets) {
while(true) { /* You should probably have some kind of
exit condition here. Like a global "running" bool
so that the thread will eventually stop. */
the_lock.lock();
auto it = sockets->clients.begin(), end = sockets->clients.end();
for(; it != end; ) {
if (!it->second->connected)
//Clients will be destructed here if their refcount goes to 0
sockets->clients.erase(it++);
else
++it;
}
the_lock.unlock();
Sleep(100);
}
return 1;
}
Note: Above code is untested. I haven't even tried to compile it.

See What happens to an STL iterator after erasing it in VS, UNIX/Linux?. In your case, you are not deleting everything, so you will want to not use a for loop.
sockets->it = sockets->clients.begin();
while (sockets->it != sockets->clients.end())
{
int key = (*sockets->it).first;
if(sockets->clients[key]->connected == false) // remove the client, releasing memory
{
delete sockets->clients[key];
sockets->clients.erase(sockets->it++);
}
else
{
sockets->it++;
}
}

Related

Let recv() only the previous accepted socket

I am using this server application:
I'd like to add some conditions to FD_ISSET() before recv():
if (`client's socket` was the previous `accepted socket`) {
canRecv = TRUE;
} else {
canRecv = FALSE;
}
This is my idea of ​​program functionality:
recv only from the previous accepted socket
Wait for the communication to end
FD_CLR()
I don't know how to:
loop through each fd from select()
let only one recv()
return the others to the queue of select()
I use simple example from IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzab6/xnonblock.htm
You could create a std::vector<int> sockets; to keep your sockets. Checking if it's the latest you added will then be done by just checking if(current_socket == sockets[sockets.size()-1]) ...
Here's an example with a helper class to keep a list of your sockets and function for waiting on activity.
#include <cerrno>
#include <cstring>
#include <utility>
#include <vector>
constexpr unsigned other_socket = 0b00;
constexpr unsigned server_socket = 0b01;
constexpr unsigned latest_addition = 0b10;
class SocketList {
public:
explicit SocketList(int server) : readfds{} { add(server); }
void add(int s) {
sockets.push_back(s);
FD_SET(s, &readfds);
if(s > max_fd) max_fd = s;
}
// return the ready sockets and a state for each
std::vector<std::pair<int, unsigned>> wait() {
int ready_sockets;
do {
ready_sockets = select(max_fd + 1, &readfds, nullptr, nullptr, nullptr);
} while(ready_sockets == -1 && errno == EINTR); // retry if interrupted
// throw if an error occured
if(ready_sockets == -1) throw std::runtime_error(std::strerror(errno));
std::vector<std::pair<int, unsigned>> result;
// loop through each fd used in the select()
for(int s : sockets) {
if(FD_ISSET(s, &readfds)) {
auto x = other_socket;
if(s == sockets[0]) x |= server_socket;
if(s == sockets[sockets.size() - 1]) x |= latest_addition;
result.emplace_back(s, x);
}
}
return result;
}
private:
int max_fd = 0;
fd_set readfds;
std::vector<int> sockets;
};
It can be used like this:
int server = socket(...);
SocketList ss(server);
// all sockets in result are ready
auto result = ss.wait();
for(auto [sock, state] : result) {
if(state & server_socket) {
// do server things on sock
} else if(state & latest_addition) {
// do stuff if sock was the latest addition
} else {
// do this if sock is not the server socket or the latest addition
}
}
recv only from the previous accepted socket
Wait for the communication to end
FD_CLR()
For that you really don't need select. Just recv directly on the previously accepted socket. This is usually not a good behavior of a server that is supposed to server many clients simultaneously since a bad client could connect without sending anything, and that would stop the server from responding to any new clients - until the bad client decides to disconnect (if that ever happens).
I don't know how to:
1. loop through each fd from select()
That is shown in the code above.
let only one recv()
When you have the result vector in the example above, you can loop through them and only keep the part dealing with latest_addition:
if(state & latest_addition) {
// do stuff if sock was the latest addition
}
return the others to the queue of select()
The state of the other ready sockets in result will remain unchanged if you don't read from them, so they are returned automatically. This also means that the next select will return immediately if you don't read from all fds that are ready, so the program will spin really fast until there's some action on the latest added socket again, effectively making this a polling program and the select is sort of useless.

No need for mutex, race conditions not always bad, do they?

I'm getting this crazy idea that mutex synchronization can be omitted in some cases when most of us would typically want and would use mutex synchronization.
Ok suppose you have this case:
Buffer *buffer = new Buffer(); // Initialized by main thread;
...
// The call to buffer's `accumulateSomeData` method is thread-safe
// and is heavily executed by many workers from different threads simultaneously.
buffer->accumulateSomeData(data); // While the code inside is equivalent to vector->push_back()
...
// All lines of code below are executed by a totally separate timer
// thread that executes once per second until the program is finished.
auto bufferPrev = buffer; // A temporary pointer to previous instance
// Switch buffers, put old one offline
buffer = new Buffer();
// As of this line of code all the threads will switch to new instance
// of buffer. Which yields that calls to `accumulateSomeData`
// are executed over new buffer instance. Which also means that old
// instance is kinda taken offline and can be safely operated from a
// timer thread.
bufferPrev->flushToDisk(); // Ok, so we can safely flush
delete bufferPrev;
While it's obvious that during buffer = new Buffer(); there can still be uncompleted operations that add data on previous instance. But since disk operations are slow we get natural kind of barrier.
So how do you estimate the risk of running such code without mutex synchronisation?
Edit
It's so hard these days to ask a question in SO without getting mugged by couple of angry guys for no reason.
Here is my correct in all terms code:
#include <cassert>
#include "leveldb/db.h"
#include "leveldb/filter_policy.h"
#include <iostream>
#include <boost/asio.hpp>
#include <boost/chrono.hpp>
#include <boost/thread.hpp>
#include <boost/filesystem.hpp>
#include <boost/lockfree/stack.hpp>
#include <boost/lockfree/queue.hpp>
#include <boost/uuid/uuid.hpp> // uuid class
#include <boost/uuid/uuid_io.hpp> // streaming operators etc.
#include <boost/uuid/uuid_generators.hpp> // generators
#include <CommonCrypto/CommonDigest.h>
using namespace std;
using namespace boost::filesystem;
using boost::mutex;
using boost::thread;
enum FileSystemItemType : char {
Unknown = 1,
File = 0,
Directory = 4,
FileLink = 2,
DirectoryLink = 6
};
// Structure packing optimizations are used in the code below
// http://www.catb.org/esr/structure-packing/
class FileSystemScanner {
private:
leveldb::DB *database;
boost::asio::thread_pool pool;
leveldb::WriteBatch *batch;
std::atomic<int> queue_size;
std::atomic<int> workers_online;
std::atomic<int> entries_processed;
std::atomic<int> directories_processed;
std::atomic<uintmax_t> filesystem_usage;
boost::lockfree::stack<boost::filesystem::path*, boost::lockfree::fixed_sized<false>> directories_pending;
void work() {
workers_online++;
boost::filesystem::path *item;
if (directories_pending.pop(item) && item != NULL)
{
queue_size--;
try {
boost::filesystem::directory_iterator completed;
boost::filesystem::directory_iterator iterator(*item);
while (iterator != completed)
{
bool isFailed = false, isSymLink, isDirectory;
boost::filesystem::path path = iterator->path();
try {
isSymLink = boost::filesystem::is_symlink(path);
isDirectory = boost::filesystem::is_directory(path);
} catch (const boost::filesystem::filesystem_error& e) {
isFailed = true;
isSymLink = false;
isDirectory = false;
}
if (!isFailed)
{
if (!isSymLink) {
if (isDirectory) {
directories_pending.push(new boost::filesystem::path(path));
directories_processed++;
boost::asio::post(this->pool, [this]() { this->work(); });
queue_size++;
} else {
filesystem_usage += boost::filesystem::file_size(iterator->path());
}
}
}
int result = ++entries_processed;
if (result % 10000 == 0) {
cout << entries_processed.load() << ", " << directories_processed.load() << ", " << queue_size.load() << ", " << workers_online.load() << endl;
}
++iterator;
}
delete item;
} catch (boost::filesystem::filesystem_error &e) {
}
}
workers_online--;
}
public:
FileSystemScanner(int threads, leveldb::DB* database):
pool(threads), queue_size(), workers_online(), entries_processed(), directories_processed(), directories_pending(0), database(database)
{
}
void scan(string path) {
queue_size++;
directories_pending.push(new boost::filesystem::path(path));
boost::asio::post(this->pool, [this]() { this->work(); });
}
void join() {
pool.join();
}
};
int main(int argc, char* argv[])
{
leveldb::Options opts;
opts.create_if_missing = true;
opts.compression = leveldb::CompressionType::kSnappyCompression;
opts.filter_policy = leveldb::NewBloomFilterPolicy(10);
leveldb::DB* db;
leveldb::DB::Open(opts, "/temporary/projx", &db);
FileSystemScanner scanner(std::thread::hardware_concurrency(), db);
scanner.scan("/");
scanner.join();
return 0;
}
My question is: Can I omit synchronization for batch which I'm not using yet? Since it's thread-safe and it should be enough to just switch buffers before actually committing any results to disk?
You have a serious misunderstanding. You think that when you have a race condition, there are some specific list of things that can happen. This is not true. A race condition can cause any kind of failure, including crashes. So absolutely, definitely not. You absolutely cannot do this.
That said, even with this misunderstanding, this is still a disaster.
Consider:
buffer = new Buffer();
Suppose this is implemented by first allocating memory, then setting buffer to point to that memory, and then calling the constructor. Other threads may operate on the unconstructed buffer. boom.
Now, you can fix this. But it's just one the many ways I can imagine this screwing up. And it can screw up in ways that we're not clever enough to imagine. So, for all that is holy, do not even think of doing this ever again.

NET-SNMP and multithreading

I am writing a C++ SNMP server using a NET-SNMP library. I read the documentation and still got one question. Can multiple threads sharing single snmp session and using it in procedures like snmp_sess_synch_response() simultaneously, or I must init and open new session in each thread?
Well, when I am trying to snmp_sess_synch_response() from two different threads using the same opaque session pointer simultaneously, one of three errors always occures. The first is memory access violation, the second is endless WaitForSingleObject() in both threads and the third is heap allocation error.
I suppose I can treat this as an answer, thus sharing single session between multiple threads is unsafe, because using it in procedures like snmp_sess_synch_response() simultaneously will cause an errors.
P.S. Here is the piece of code of described before:
void* _opaqueSession;
boost::mutex _sessionMtx;
std::shared_ptr<netsnmp_pdu> ReadObjectValue(Oid& objectID)
{
netsnmp_pdu* requestPdu = snmp_pdu_create(SNMP_MSG_GET);
netsnmp_pdu* response = 0;
snmp_add_null_var(requestPdu, objectID.GetObjId(), objectID.GetLen());
void* opaqueSessionCopy;
{
//Locks the _opaqueSession, wherever it appears
boost::mutex::scoped_lock lock(_sessionMtx);
opaqueSessionCopy = _opaqueSession;
}
//Errors here!
snmp_sess_synch_response(opaqueSessionCopy, requestPdu, &response);
std::shared_ptr<netsnmp_pdu> result(response);
return result;
}
void ExecuteThread1()
{
Oid sysName(".1.3.6.1.2.1.1.5.0");
try
{
while(true)
{
boost::thread::interruption_pont();
ReadObjectValue(sysName);
}
}
catch(...)
{}
}
void ExecuteThread2()
{
Oid sysServices(".1.3.6.1.2.1.1.7.0");
try
{
while(true)
{
boost::thread::interruption_pont();
ReadObjectValue(sysServices);
}
}
catch(...)
{}
}
int main()
{
std::string community = "public";
std::string ipAddress = "127.0.0.1";
snmp_session session;
{
SNMP::snmp_sess_init(&session);
session.timeout = 500000;
session.retries = 0;
session.version = SNMP_VERSION_2c;
session.remote_port = 161;
session.peername = (char*)ipAddress.c_str();
session.community = (u_char*)community.c_str();
session.community_len = community.size();
}
_opaqueSession = snmp_sess_open(&session);
boost::thread thread1 = boost::thread(&ExecuteThread1);
boost::thread thread2 = boost::thread(&ExecuteThread2);
boost::this_thread::sleep(boost::posix_time::seconds::seconds(30));
thread1.interrupt();
thread1.join();
thread2.interrupt();
thread2.join();
return 0;
}

Does the use of an anonymous pipe introduce a memory barrier for interthread communication?

For example, say I allocate a struct with new and write the pointer into the write end of an anonymous pipe.
If I read the pointer from the corresponding read end, am I guaranteed to see the 'correct' contents on the struct?
Also of of interest is whether the results of socketpair() on unix & self connecting over tcp loopback on windows have the same guarantees.
The context is a server design which centralizes event dispatch with select/epoll
For example, say I allocate a struct with new and write the pointer into the write end of an anonymous pipe.
If I read the pointer from the corresponding read end, am I guaranteed to see the 'correct' contents on the struct?
No. There is no guarantee that the writing CPU will have flushed the write out of its cache and made it visible to the other CPU that might do the read.
Also of of interest is whether the results of socketpair() on unix & self connecting over tcp loopback on windows have the same guarantees.
No.
In practice, calling write(), which is a system call, will end up locking one or more data structures in the kernel, which should take care of the reordering issue. For example, POSIX requires subsequent reads to see data written before their call, which implies a lock (or some kind of acquire/release) by itself.
As for whether that's part of the formal spec of the calls, probably it's not.
A pointer is just a memory address, so provided you are on the same process the pointer will be valid on the receiving thread and will point to the same struct. If you are on different processes, at best you will get immediately a memory error, at worse you will read (or write) to a random memory which is essentially Undefined Behaviour.
Will you read the correct content? Neither better nor worse than if your pointer was in a static variable shared by both threads: you still have to do some synchronization if you want consistency.
Will the kind of transfer address matter between static memory (shared by threads), anonymous pipes, socket pairs, tcp loopback, etc.? No: all those channels transfers bytes, so if you pass a memory address, you will get your memory address. What is left you then is synchronization, because here you are just sharing a memory address.
If you do not use any other synchronization, anything can happen (did I already spoke of Undefined Behaviour?):
reading thread can access memory before it has been written by writing one giving stale data
if you forgot to declare the struct members as volatile, reading thread can keep using cached values, here again getting stale data
reading thread can read partially written data meaning incoherent data
Interesting question with, so far, only one correct answer from Cornstalks.
Within the same (multi-threaded) process there are no guarantees since pointer and data follow different paths to reach their destination.
Implicit acquire/release guarantees do not apply since the struct data cannot piggyback on the pointer through the cache and formally you are dealing with a data race.
However, looking at how the pointer and the struct data itself reach the second thread (through the pipe and memory cache respectively), there is a real chance that this mechanism is not going to cause any harm.
Sending the pointer to a peer thread takes 3 system calls (write() in the sending thread, select() and read() in the receiving thread) which is (relatively) expensive and by the time the pointer value is available
in the receiving thread, the struct data probably has arrived long before.
Note that this is just an observation, the mechanism is still incorrect.
I believe, your case might be reduced to this 2 threads model:
int data = 0;
std::atomic<int*> atomicPtr{nullptr};
//...
void thread1()
{
data = 42;
atomicPtr.store(&integer, std::memory_order_release);
}
void thread2()
{
int* ptr = nullptr;
while(!ptr)
ptr = atomicPtr.load(std::memory_order_consume);
assert(*ptr == 42);
}
Since you have 2 processes you can't use one atomic variable across them but since you listed windows you can omit atomicPtr.load(std::memory_order_consume) from the consuming part because, AFAIK, all the architectures Windows is running on guarantee this load to be correct without any barrier on the loading side. In fact, I think there are not much architectures out there where that instruction would not be a NO-OP(I heard only about DEC Alpha)
I agree with Serge Ballesta's answer. Within the same process, it's feasible to send and receive object address via anonymous pipe.
Since the write system call is guaranteed to be atomic when message size is below PIPE_BUF (normally 4096 bytes), so multi-producer threads will not mess up each other's object address (8 bytes for 64 bit applications).
Talk is cheap, here is the demo code for Linux (defensive code and error handlers are omitted for simplicity). Just copy & paste to pipe_ipc_demo.cc then compile & run the test.
#include <unistd.h>
#include <string.h>
#include <pthread.h>
#include <string>
#include <list>
template<class T> class MPSCQ { // pipe based Multi Producer Single Consumer Queue
public:
MPSCQ();
~MPSCQ();
int producerPush(const T* t);
T* consumerPoll(double timeout = 1.0);
private:
void _consumeFd();
int _selectFdConsumer(double timeout);
T* _popFront();
private:
int _fdProducer;
int _fdConsumer;
char* _consumerBuf;
std::string* _partial;
std::list<T*>* _list;
static const int _PTR_SIZE;
static const int _CONSUMER_BUF_SIZE;
};
template<class T> const int MPSCQ<T>::_PTR_SIZE = sizeof(void*);
template<class T> const int MPSCQ<T>::_CONSUMER_BUF_SIZE = 1024;
template<class T> MPSCQ<T>::MPSCQ() :
_fdProducer(-1),
_fdConsumer(-1) {
_consumerBuf = new char[_CONSUMER_BUF_SIZE];
_partial = new std::string; // for holding partial pointer address
_list = new std::list<T*>; // unconsumed T* cache
int fd_[2];
int r = pipe(fd_);
_fdConsumer = fd_[0];
_fdProducer = fd_[1];
}
template<class T> MPSCQ<T>::~MPSCQ() { /* omitted */ }
template<class T> int MPSCQ<T>::producerPush(const T* t) {
return t == NULL ? 0 : write(_fdProducer, &t, _PTR_SIZE);
}
template<class T> T* MPSCQ<T>::consumerPoll(double timeout) {
T* t = _popFront();
if (t != NULL) {
return t;
}
if (_selectFdConsumer(timeout) <= 0) { // timeout or error
return NULL;
}
_consumeFd();
return _popFront();
}
template<class T> void MPSCQ<T>::_consumeFd() {
memcpy(_consumerBuf, _partial->data(), _partial->length());
ssize_t r = read(_fdConsumer, _consumerBuf, _CONSUMER_BUF_SIZE - _partial->length());
if (r <= 0) { // EOF or error, error handler omitted
return;
}
const char* p = _consumerBuf;
int remaining_len_ = _partial->length() + r;
T* t;
while (remaining_len_ >= _PTR_SIZE) {
memcpy(&t, p, _PTR_SIZE);
_list->push_back(t);
remaining_len_ -= _PTR_SIZE;
p += _PTR_SIZE;
}
*_partial = std::string(p, remaining_len_);
}
template<class T> int MPSCQ<T>::_selectFdConsumer(double timeout) {
int r;
int nfds_ = _fdConsumer + 1;
fd_set readfds_;
struct timeval timeout_;
int64_t usec_ = timeout * 1000000.0;
while (true) {
timeout_.tv_sec = usec_ / 1000000;
timeout_.tv_usec = usec_ % 1000000;
FD_ZERO(&readfds_);
FD_SET(_fdConsumer, &readfds_);
r = select(nfds_, &readfds_, NULL, NULL, &timeout_);
if (r < 0 && errno == EINTR) {
continue;
}
return r;
}
}
template<class T> T* MPSCQ<T>::_popFront() {
if (!_list->empty()) {
T* t = _list->front();
_list->pop_front();
return t;
} else {
return NULL;
}
}
// = = = = = test code below = = = = =
#define _LOOP_CNT 5000000
#define _ONE_MILLION 1000000
#define _PRODUCER_THREAD_NUM 2
struct TestMsg { // all public
int _threadId;
int _msgId;
int64_t _val;
TestMsg(int thread_id, int msg_id, int64_t val) :
_threadId(thread_id),
_msgId(msg_id),
_val(val) { };
};
static MPSCQ<TestMsg> _QUEUE;
static int64_t _SUM = 0;
void* functor_producer(void* arg) {
int my_thr_id_ = pthread_self();
TestMsg* msg_;
for (int i = 0; i <= _LOOP_CNT; ++ i) {
if (i == _LOOP_CNT) {
msg_ = new TestMsg(my_thr_id_, i, -1);
} else {
msg_ = new TestMsg(my_thr_id_, i, i + 1);
}
_QUEUE.producerPush(msg_);
}
return NULL;
}
void* functor_consumer(void* arg) {
int msg_cnt_ = 0;
int stop_cnt_ = 0;
TestMsg* msg_;
while (true) {
if ((msg_ = _QUEUE.consumerPoll()) == NULL) {
continue;
}
int64_t val_ = msg_->_val;
delete msg_;
if (val_ <= 0) {
if ((++ stop_cnt_) >= _PRODUCER_THREAD_NUM) {
printf("All done, _SUM=%ld\n", _SUM);
break;
}
} else {
_SUM += val_;
if ((++ msg_cnt_) % _ONE_MILLION == 0) {
printf("msg_cnt_=%d, _SUM=%ld\n", msg_cnt_, _SUM);
}
}
}
return NULL;
}
int main(int argc, char* const* argv) {
pthread_t consumer_;
pthread_create(&consumer_, NULL, functor_consumer, NULL);
pthread_t producers_[_PRODUCER_THREAD_NUM];
for (int i = 0; i < _PRODUCER_THREAD_NUM; ++ i) {
pthread_create(&producers_[i], NULL, functor_producer, NULL);
}
for (int i = 0; i < _PRODUCER_THREAD_NUM; ++ i) {
pthread_join(producers_[i], NULL);
}
pthread_join(consumer_, NULL);
return 0;
}
And here is test result ( 2 * sum(1..5000000) == (1 + 5000000) * 5000000 == 25000005000000 ):
$ g++ -o pipe_ipc_demo pipe_ipc_demo.cc -lpthread
$ ./pipe_ipc_demo ## output may vary except for the final _SUM
msg_cnt_=1000000, _SUM=251244261289
msg_cnt_=2000000, _SUM=1000708879236
msg_cnt_=3000000, _SUM=2250159002500
msg_cnt_=4000000, _SUM=4000785160225
msg_cnt_=5000000, _SUM=6251640644676
msg_cnt_=6000000, _SUM=9003167062500
msg_cnt_=7000000, _SUM=12252615629881
msg_cnt_=8000000, _SUM=16002380952516
msg_cnt_=9000000, _SUM=20252025092401
msg_cnt_=10000000, _SUM=25000005000000
All done, _SUM=25000005000000
The technique showed here is used in our production applications. One typical usage is the consumer thread acts as a log writer, and worker threads can write log messages almost asynchronously. Yes, almost means sometimes writer threads may be blocked in write() when pipe is full, and this is a reliable congestion control feature provided by OS.

Understanding unix child processes that use semaphore and shared memory

I'm going to do my best to ask this question with the understanding that I have.
I'm doing a programming assignment (let's just get that out of the way now) that uses C or C++ on a Unix server to fork four children and use semaphore and shared memory to update a global variable. I'm not sure I have an issue yet, but my lack of understanding has me questioning my structure. Here it is:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/sem.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define NUM_REPEATS 10
#define SEM_KEY 1111
#define SHM_KEY 2222
int globalCounter = 0;
/***** Test function for confriming a process type ******/
int checkProcessType(const char *whoami)
{
printf("I am a %s. My pid is:%d my ppid is %d\n",
whoami, getpid(), getppid() );
for(int i = 1; i<=3; i++){
printf("%s counting %d\n", whoami, i);
}
return 1;
}
void
int main (void) {
pid_t process_id; // PID (child or zero)
int sharedMemID; // Shared memory ID
int sharedMemSize; // shared memory size
struct my_mem * sharedMemPointer; // pointer to the attached shared memory
// Definition of shared memory //
struct my_mem {
long counter;
int parent;
int child;
};
// Gathering size of shared memory in bytes //
sharedMemSize = sizeof(my_mem);
if(sharedMemSize <= 0){
perror("error collection shared memory size: Exiting...\n");
exit(0);
}
// Creating Shared Memory //
sharedMemID = shmget(SHM_KEY, sharedMemSize, 0666 | IPC_CREAT);
if (sharedMemID < 0) {
perror("Creating shared memory has failed: Exiting...");
exit(0);
}
// Attaching Shared Memory //
sharedMemPointer = (struct my_mem *)shmat(sharedMemID, NULL, 0);
if (sharedMemPointer == (struct my_mem*) -1) {
perror("Attaching shared memory has failed. Exiting...\n");
exit(0);
}
// Initializing Shared Memory //
sharedMemPointer->counter = 0;
sharedMemPointer->parent = 0;
sharedMemPointer->child = 0;
pid_t adder, reader1, reader2, reader3;
adder = fork();
if(adder > 0)
{
// In parent
reader1 = fork();
if(reader1 > 0)
{
// In parent
reader2 = fork();
if(reader2 > 0)
{
//In parent
reader3 = fork();
if (reader3 > 0)
{
//In parent
}
else if (reader3 < 0)
{
// Error
perror("fork() error");
}
else
{
// In reader3
}
}
else if(reader2 < 0)
{
//Error
perror("fork() error");
}
else
{
// In reader2
}
}
else if(reader1 < 0)
{
// Error
perror("fork() error");
}
else
{
// In reader1
}
}
else if(adder < 0 )
{
// Error
perror("fork() error");
}
else
{
// In adder
//LOOP here for global var in critical section
}
}
Just some info of what I'm doing (I think), I'm creating a hunk of shared memory that will contain a variable, lets call it counter that will strictly be updated by adder and by the parent which becomes a subtractor after all child processes are active. I'm still trying to figure out the semaphore stuff that I will be using so adder and subtractor execute in critical section, but my main question is this.
How can I know where I am in this structure? My adder should have a loop that will do some job (update global var), and the parent/subtractor should have a loop for its job (also update global var). And all the readers can look at any time. Does the loop placement for parent/subtractor matter? I basically have 3 locations I know I'll be in parent. But since all children need to be created first does it have to be in the last conditional after my third fork where I know I'm in parent? When I use my test method I get scattered outputs, meaning child one can be after parent's output, then child three, etc. It's never in any order, and from what I understand of fork that's expected.
I really have like three questions going on, but I need to first wrap my head around the structure. So let me just try to say this again concisely without any junk cause I'm hung up on loop and critical section placement that isn't even written up yet.
More directly, when does parent know the existence of all children and with this structure can one child do a task and somehow come back to it (i.e. adder/first child adding to global variable once, exits, and some other child can do its thing etc).
I still feel like I'm not asking the right thing, and I believe this is due to still trying to grasp concepts. Hopefully my stammering will kind of show what I'm stuck on conceptually. If not I can clarify.