semop does the checks of all sops atomically?

semop does the checks of all sops atomically? - concurrency

I am implementing R/W lock with semop as follows.
My question is: semop does the checks of all sops atomically? Like, semop adds some lock firstly, and then check these three sops one by one.
If not, can the following scenario happens?
when one thread calls CSemRWLock::CWriteLock::lock(), finds there is no reader by checking the first sops, and goes to the second sops. At the same time, another thread calling CReadLock::lock() changes the first to non-zero value. Both two threads get locks, one read and one write.
If so, CWriteLock::lock() needs to exchange the first two sops, specify there is writer first, then judge no reader. Right?
bool CSemRWLock::CWriteLock::lock()
{
//judge no reader
sbuf[0].sem_num = 0;
sbuf[0].sem_op = 0;
sbuf[0].sem_flg = 0;
//specify there is writer
sbuf[1].sem_num = 1;
sbuf[1].sem_op = 1;
sbuf[1].sem_flg = SEM_UNDO;
//occupy the writter resource
sbuf[2].sem_num = 2;
sbuf[2].sem_op = -1;
sbuf[2].sem_flg = SEM_UNDO;
semop(m_iSemID, sbuf, 3)
......
}
bool CSemRWLock::CReadLock::lock()
{
//add a reader
sbuf[0].sem_num = 0;
sbuf[0].sem_op = 1;
sbuf[0].sem_flg = SEM_UNDO;
//judge no writer
sbuf[1].sem_num = 1;
sbuf[1].sem_op = 0;
sbuf[1].sem_flg = 0;
semop(m_iSemID, sbuf, 2);
......
}

Related

Is it possible to wait for a transfer from the staging buffer to complete without calling vkQueueWaitIdle

The following piece of code show you how i transfer a vertex buffer data from the staging buffer to a local memory buffer :
bool Vulkan::UpdateVertexBuffer(std::vector<VERTEX>& data, VULKAN_BUFFER& vertex_buffer)
{
std::memcpy(this->staging_buffer.pointer, &data[0], vertex_buffer.size);
size_t flush_size = static_cast<size_t>(vertex_buffer.size);
unsigned int multiple = static_cast<unsigned int>(flush_size / this->physical_device.properties.limits.nonCoherentAtomSize);
flush_size = this->physical_device.properties.limits.nonCoherentAtomSize * ((uint64_t)multiple + 1);
VkMappedMemoryRange flush_range = {};
flush_range.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
flush_range.pNext = nullptr;
flush_range.memory = this->staging_buffer.memory;
flush_range.offset = 0;
flush_range.size = flush_size;
vkFlushMappedMemoryRanges(this->device, 1, &flush_range);
VkResult result = vkWaitForFences(this->device, 1, &this->transfer.fence, VK_FALSE, 1000000000);
if(result != VK_SUCCESS) {
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer => vkWaitForFences : Timeout" << std::endl;
#endif
return false;
}
vkResetFences(this->device, 1, &this->transfer.fence);
VkCommandBufferBeginInfo command_buffer_begin_info = {};
command_buffer_begin_info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
command_buffer_begin_info.pNext = nullptr;
command_buffer_begin_info.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
command_buffer_begin_info.pInheritanceInfo = nullptr;
vkBeginCommandBuffer(this->transfer.command_buffer, &command_buffer_begin_info);
VkBufferCopy buffer_copy_info = {};
buffer_copy_info.srcOffset = 0;
buffer_copy_info.dstOffset = 0;
buffer_copy_info.size = vertex_buffer.size;
vkCmdCopyBuffer(this->transfer.command_buffer, this->staging_buffer.handle, vertex_buffer.handle, 1, &buffer_copy_info);
VkBufferMemoryBarrier buffer_memory_barrier = {};
buffer_memory_barrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
buffer_memory_barrier.pNext = nullptr;
buffer_memory_barrier.srcAccessMask = VK_ACCESS_MEMORY_WRITE_BIT;
buffer_memory_barrier.dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
buffer_memory_barrier.srcQueueFamilyIndex = this->queue_stack[this->transfer_stack_index].index;
buffer_memory_barrier.dstQueueFamilyIndex = this->queue_stack[this->graphics_stack_index].index;
buffer_memory_barrier.buffer = vertex_buffer.handle;
buffer_memory_barrier.offset = 0;
buffer_memory_barrier.size = VK_WHOLE_SIZE;
vkCmdPipelineBarrier(this->transfer.command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, 0, 0, nullptr, 1, &buffer_memory_barrier, 0, nullptr);
vkEndCommandBuffer(this->transfer.command_buffer);
VkSubmitInfo submit_info = {};
submit_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submit_info.pNext = nullptr;
submit_info.waitSemaphoreCount = 0;
submit_info.pWaitSemaphores = nullptr;
submit_info.pWaitDstStageMask = nullptr;
submit_info.commandBufferCount = 1;
submit_info.pCommandBuffers = &this->transfer.command_buffer;
submit_info.signalSemaphoreCount = 0;
submit_info.pSignalSemaphores = nullptr;
VkResult result = vkQueueSubmit(this->queue_stack[this->transfer_stack_index].handle, 1, &submit_info, this->transfer.fence);
if(result != VK_SUCCESS) {
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer => vkQueueSubmit : Failed" << std::endl;
#endif
return false;
}
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer : Success" << std::endl;
#endif
return true;
}
It works perfectly without any validation layer warning. But when i call i twice, both buffers contains the same data, from the second call. For example :
UpdateVertexBuffer(cube_data, cube_buffer);
UpdateVertexBuffer(prism_data, prism_buffer);
This will result in having a prism inside both cube_buffer and prism_buffer. To fix this, i can simply wait for a few milliseconds between the two calls :
UpdateVertexBuffer(cube_data, cube_buffer);
std::this_thread::sleep_for(std::chrono::milliseconds(100));
UpdateVertexBuffer(prism_data, prism_buffer);
or preferably, i can replace the fence by a call to
vkQueueWaitIdle(this->queue_stack[this->transfer_stack_index].handle);
In my opinion this will result in performance loss and the fence is supposed to be the optimal way to wait for transfer operation to complete properly, so why is my first buffer filled by second when i'm using a fence. And is there a way to do this properly without using vkQueueWaitIdle.
Thanks for your help.

You wait for the fence for the previous upload after you have already written the data to the staging buffer. That's too late; the fence is there to prevent you from writing data to memory that's being read.
But really, your problem is that your design is wrong. Your design is such that sequential updates all use the same memory. They shouldn't. Instead, sequential updates should use different regions of the same memory, so that they cannot overlap. That way, you can perform the transfers and not have to wait on fences at all (or at least, not until next frame).
Basically, you should treat your staging buffer like a ring buffer. Every operation that wants to do some staged transfer work should "allocate" X bytes of memory from the staging ring buffer. The staging buffer system allocates memory sequentially, wrapping around if there is insufficient space. But it also remembers where the last memory region is that it synchronized with. If you try to stage too much work, then it has to synchronize.
Also, one of the purposes behind mapping memory is that you can write directly to that memory, rather than writing to some other CPU memory and copying it in. So instead of passing in a VULKAN_BUFFER (whatever that is), the process that generated that data should have fetched a pointer to a region of the active staging buffer and written its data into that.
Oh, and one more thing: never, ever create a command buffer and immediately submit it. Just don't do it. There's a reason why vkQueueSubmit can take multiple command buffers, and multiple batches of command buffers. For any one queue, you should never be submitting more than once (or maybe twice) per frame.

Fastest and safest way to call functions in extern process

Describtion of the problem:
we need to call a function in extern process as fast as possible. Boost interprocess shared memory is used for communication. The extern process is either mpi master or a single executable. The calculation time of the function lies between 1ms and 1s. The function should be called up to 10^8-10^9 times.
I've tried a lot of possibilities, but I still have some problems with each of them. Here I introduce two of best working implementations
Version 1 ( using intreprocess conditions )
Main-process
bool calculate(double& result, std::vector<double> c){
// data_ptr is a structure in shared memoty
data_ptr_->validCalculation = false;
bool timeout = false;
// write data (cVec_ is a vector in shared memory )
cVec_->clear();
for (int i = 0; i < c.size(); ++i)
{
cVec_->push_back(c[i]);
}
// cond_input_data is boost interprocess condition
data_ptr_->cond_input_data.notify_one();
boost::system_time const waittime = boost::get_system_time() + boost::posix_time::seconds(maxWaitTime_in_sec);
// lock slave process
scoped_lock<interprocess_mutex> lock_output(data_ptr_->mutex_output);
// wait till data calculated
timeout = !(data_ptr_->cond_output_data.timed_wait(lock_output, waittime)); // true if timeout, false if no timeout
if (!timeout)
{
// get result
result = *result_;
return data_ptr_->validCalculation;
}
else
{
return false;
}
};
Extern process runs a while-loop ( till abort condition is fullfilled)
do {
scoped_lock<interprocess_mutex> lock_input(data_ptr_->mutex_input);
boost::system_time const waittime = boost::get_system_time() + boost::posix_time::seconds(maxWaitTime_in_sec);
timeout = !(data_ptr_->cond_input_data.timed_wait(lock_input, waittime)); // true if timeout, false if no timeout
if (!timeout)
{
if (!*abort_flag_) {
c.clear();
for (int i = 0; i < (*cVec_).size(); ++i) //Insert data in the vector
{
c.push_back(cVec_->at(i));
}
// calculate value
if (call_of_function_here(result, c)) { // valid calculation ?
*result_ = result;
data_ptr_->validCalculation = true;
}
}
}
//Notify the other process that the data is avalible or we dont get the input data
data_ptr_->cond_output_data.notify_one();
} while (!*abort_flag_); // while abort flag is not set, check if some values should be calculated
This is best working version, but sometimes it holds up, if the calculation time is short (~1ms). I assume, it happens, if main-process reaches
data_ptr_->cond_input_data.notify_one();
earlier, than extern process is waiting on
timeout = !(data_ptr_->cond_input_data.timed_wait(lock_input, waittime));
waiting condition. So we have probably some kind of synchronisation problem.
Second condition does not help ( i.e. wait only if input data not set, similar to the anonymous condition example with message_in flag). Since, it is still possible, that one process notify the other one, before the second one is waiting for notification.
Version 2 ( using boolean flag and while loop with some delay )
Main-process
bool calculate(double& result, std::vector<double> c){
data_ptr_->validCalculation = false;
bool timeout = false;
// write data
cVec_->clear();
for (int i = 0; i < c.size(); ++i) //Insert data in the vector
{
cVec_->push_back(c[i]);
}
// this is the flag in shared memory used for communication
*calc_flag_ = true;
clock_t test_begin = clock();
clock_t calc_time_begin = clock();
do
{
calc_time_begin = clock();
boost::this_thread::sleep(boost::posix_time::milliseconds(while_loop_delay_m_s));
// wait till data calculated
timeout = (double(calc_time_begin - test_begin) / CLOCKS_PER_SEC > maxWaitTime_in_sec);
} while (*(calc_flag_) && !timeout);
if (!timeout)
{
// get result
result = *result_;
return data_ptr_->validCalculation;
}
else
{
return false;
}
};
and the extern process
do {
// we wait till input data is set
wait_begin = clock();
do
{
wait_end = clock();
timeout = (double(wait_end - wait_begin) / CLOCKS_PER_SEC > maxWaitTime_in_sec);
boost::this_thread::sleep(boost::posix_time::milliseconds(while_loop_delay_m_s));
} while (!(*calc_flag_) && !(*abort_flag_) && !timeout);
if (!timeout)
{
if (!*abort_flag_) {
c.clear();
for (int i = 0; i < (*cVec_).size(); ++i) //Insert data in the vector
{
c.push_back(cVec_->at(i));
}
// calculate value
if (call_of_local_function(result, c)) { // valid calculation ?
*result_ = result;
data_ptr_->validCalculation = true;
}
}
}
//Notify the other process that the data is avalible or we dont get the input data
*calc_flag_ = false;
} while (!*abort_flag_); // while abort flag is not set, check if some values should be calculated
The problem in this version is the delay-time. Since we have calculation times close to 1ms, we have to set the delay at least to this value. For smaller delays the cpu-load is high, for higher delays we lose a lot of performance due to not necessary waiting time
Do you have an idea how to improve one of this versions? or may be there is a better solution?
thx.

C++ Debug assertion failed, using Windows.h mutex

I have a problem caused by this code:
char KernelFS::mount(Partition* part) {
WaitForSingleObject(mutexFS,INFINITE);
int pos;
for(pos=0; pos<26; pos++)
if(mountedPartitions[pos] == 0)
break;
if(pos < 26) {
mountedPartitions[pos] = part;
bitVectors[pos] = new BitVector(part);
fileEvidention[pos] = new ListHandler();
openedFiles[pos] = 0;
forbidOpening[pos] = false;
ReleaseMutex(mutexFS);
return intToChar(pos);
}
else {
ReleaseMutex(mutexFS);
return '0';
}
}
and
char KernelFS::format(char part){
WaitForSingleObject(mutexFS,INFINITE);
forbidOpening[charToInt(part)] = true;
ReleaseMutex(mutexFS);
while(openedFiles[charToInt(part)]>0)
WaitForSingleObject(unmountSem,INFINITE);
WaitForSingleObject(mutexFS,INFINITE);
// write fresh bit vector to cluster 0 of partition
bitVectors[charToInt(part)]->formatBitVector();
openedFiles[charToInt(part)] = 0;
forbidOpening[charToInt(part)] = false;
delete fileEvidention; //!!***!!
fileEvidention[charToInt(part)] = new ListHandler();
// some other stuff, irrelevant
ReleaseMutex(mutexFS);
return 1;
}
There are 3 thread executing, 1 is blocked and two are running through this code;
they first call mount, then format (each has its own argument Partition object, p1 and p2).
The first time mount is called, it always goes through - then there is an assertion failure at random during one of the next calls of mount/format by any of the two running threads.
Usually, it fails during thread 1 - it calls mount(..) completes it, then calls format(...) and fails around:
delete fileEvidention[charToInt(pos)];
(in debug mode, when I reach this instruction, even if I try to go into with F11, there is an assertion failure)
In case it matters... this is the initialization:
char KernelFS::firstLetter = 'A'; // 'A' = 65
Partition* KernelFS::mountedPartitions[26] = {0}; // init. no partitions are mounted
BitVector* KernelFS::bitVectors[26] = {0}; // init. no partitions are mounted
bool KernelFS::forbidOpening[26] = {false};
long KernelFS::openedFiles[26] = {0};
ListHandler* KernelFS::fileEvidention[26] = {0};
HANDLE KernelFS::mutexFS = CreateMutex(0,0,0);
HANDLE KernelFS::unmountSem = CreateSemaphore(0,0,INFINITE,0);
I have never had this error before, I have no idea how to debug this nor what could cause it.
Thanks for the help, in advance.
EDIT:
when i remove the marked line of code (and ignore the memory leak) there is no assertion failure. What is this witchcraft ?
! :)

Solved. should be
delete fileEvidention[charToInt(part)];
......

"printf" appears to be non-deterministic in Qt?

I know "printf" is standard-c and should be deterministic. But when run in Qt I see a more non-deterministic response(clock cycles). Could this be due to Qt adding some "pork" to its response?
I have multiple threads that make call to function that uses a mutex. When one thread enters it set a switch so the others can't until it is done. Things appeared to work ok for acouple seconds and then threads appeared to be killed off from 10 to 1 thread. So I tried adding a delay: (k=k+1: no help), then (looping k=k+1: no help), (usleep works), and so does (printf) work at creating a random delay and allowing all threads to continue running.
void CCB::Write(int iThread)
{
static bool bUse = false;
bool bDone = false;
char cStr[20];
int posWrite;// = *m_posWrite; // issue of posWrite be altered with next extrance
long k = 0;
long m = 0;
m_threadCount++;
while(bDone == false){
if(bUse == false){
bUse = true;
posWrite = *m_posWrite;
memcpy(m_cmMessageCB + posWrite, &m_cmMessageWrite, sizeof(typeCanMessage));
memset(cStr, '\0', 20);
memcpy(cStr, (m_cmMessageCB + posWrite)->cMessage, 11); //fails: every 20
*m_posWrite = *m_posWrite + 1;
if(*m_posWrite == m_iNBufferLength)
*m_posWrite = 0;
bDone = true;
bUse = false;
}else if(bUse == true){
//why are threads being killed ?
// printf("T%d_%d ", iThread, m_threadCount);//non-deterministic value ?
usleep(1);//non-deterministic value
//k++;//delay of a couple clock cycles was not enough
/*
for(k = 0; k < iThread * 100; k++){//deterministic and fails to resolve thread problem
m++;
}
*/
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js