Shared memory SPSC queue for strings without allocations in C++

Shared memory SPSC queue for strings without allocations in C++ - c++

I am looking for something similar to SHM (SHared Memory) SPSC queue setup offered by boost::lockfree::spsc_queue and boost::interprocess but without allocating strings and storing them flat i.e. next to each other for maximum efficiency.
If I understand correctly that setup stores strings offset in the queue and allocates memory for the string somewhere else in the SHM.
Queue design can be:
| size | string 1 | size | string 2 | size | string 3 | ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SHM segment
in a circular buffer fashion. Idea:
struct Writer {
std::byte *shm;
void write(std::string_view str) {
// write size
const uint32_t sz = str.size();
std::memcpy(shm, &sz, sizeof(sz));
shm += sizeof(sz);
// write string
std::memcpy(shm, str.data(), sz);
shm += sz;
}
};

tl;dr
It is possible, although you'll have to deal with a bunch of extra edge-cases.
Here's a working godbolt of a spsc queue that fulfils your requirements.
(see bottom of this post in case the link goes bad)
Please also check 2. Potential alternatives for viable alternatives.
1. Assumptions
Given the information you provided i'm going to assume you want the following properties for the single-producer, single-consumer (spsc) queue:
It should be backed by a ring buffer (that might be located within a shared memory region)
(this is also the case for boost::lockfree::spsc_queue)
Elements should be returned in FIFO order
(like boost::lockfree::spsc_queue)
The string-elements should be stored entirely within the ring buffer (no additional allocations)
The queue should be wait-free
(like boost::lockfree::spsc_queue, but not like most stuff in boost::interprocess (e.g. boost::interprocess::message_queue utilizes mutexes))
2. Potential alternatives
Depending on your requirements there could be a few alternative options:
Fixed-length strings:
Like #John Zwinck suggested in his answer you could use a fixed-length string buffer.
The trade-off would be that your maximum string length is bounded by this size, and - depending on your expected variation of possible string sizes - might result in a lot of unused buffer space.
If you go this route i'd recommend you to use boost::static_string - it's essentially a std::string with the dynamic allocation stuff removed and solely relying on its internal buffer.
i.e. boost::lockfree::spsc_queue<boost::static_string<N>>, where N is the maximum size for the string values.
Only store pointers in the queue and allocate the strings separately:
If you're already using boost::interprocess you could use a boost::interprocess::basic_string with a boost::interprocess::allocator that allocates the string separately in the same shared memory region.
Here's a answer that contains a full example of this approach (it even uses boost::lockfree::spsc_queue) (direct link to code example)
The trade-off in this case is that the strings will be stored somewhere outside the spsc queue (but still within the same shared memory region).
If your strings are relatively long this might even be faster than storing the strings directly within the queue (the ring buffer can be alot smaller if it only needs to store pointers, and therefore would have a much better cache-locality).
(cache locality won't help in this case - see this excellent comment from #Peter Cordes)
3. Design Considerations
3.1 Wrapped-around writes
A ringbuffer with fixed-size elements essentially splits the raw buffer into element-sized slots that are contiguous within the buffer, e.g.:
/---------------------- buffer ----------------------\
/ \
+----------+----------+----------+----------+----------+
| Object 1 | Object 2 | Object 3 | ... | Object N |
+----------+----------+----------+----------+----------+
This automatically ensures that all objects within the buffer are contiguous. i.e. you never have to deal with a wrapped-around object:
/---------------------- buffer ----------------------\
/ \
+-----+----------+----------+----------+----------+-----+
|ct 1 | | | | | Obje|
+-----+----------+----------+----------+----------+-----+
If the element-size is not fixed however, we do have to handle this case somehow.
Assume for example an empty 8-byte ringbuffer with the read and write pointer on the 7th byte (the buffer is currently empty):
/--------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| | | | | | | | |
+----+----+----+----+----+----+----+----+
^
\---------- write pointer
If we now attempt to write "bar" into the buffer (prefixed by it's length), we would get a wrapped-around string:
/--------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| a | r | | | | | 03 | b |
+----+----+----+----+----+----+----+----+
^
\----------------------------- write pointer
There are 2 ways to deal with this:
Let the consumer deal with wrapped-around writes.
Don't allow wrapped-around writes; ensure each string is written in a contiguous block within the buffer.
3.1.1 Making it part of the interface
The 1st option would be the easiest to implement, but it would be quite cumbersome to use for the consumer of the queue, because it needs to deal with 2 separate pointers + sizes that in combination represent the string.
i.e. a potential interface for this could be:
class spsc_queue {
// ...
bool push(const char* str, std::size_t size);
bool read(
const char*& str1, std::size_t& size1,
const char*& str2, std::size_t& size2
);
bool pop();
// ...
};
// Usage example (assuming the state from "bar" example above)
const char* str1;
std::size_t size1;
const char* str2;
std::size_t size2;
if(queue.read(str1, size1, str2, size2)) {
// str1 would point to buffer[7] ("b")
// size1 would be 1
// str2 would point to buffer[0] ("ar")
// size2 would be 2
queue.pop();
}
Having to deal with 2 pointers and 2 sizes all the time for the odd case of a wrapped-around write is not the best solution imho, so i went with option 2:
3.1.2 Prevent wrap-arounds
The alternative option would be to prevent wrap-arounds from occuring in the first place.
A simple way to achieve this is to add a special "wrap-around" marker that tells the reader to immediately jump back to the beginning of the buffer (and therefore skip over all bytes after the wrap-around marker)
Example writing "bar": (WA represents a wrap-around marker)
/--------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 03 | b | a | r | | | WA | |
+----+----+----+----+----+----+----+----+
^
\------------------- write pointer
So once the reader tries to read the next element it'll encounter the wrap-around marker.
This instructs it to directly go back to index 0, where the next element is located:
/--------------------------------------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 03 | b | a | r | | | WA | |
+----+----+----+----+----+----+----+----+
^
\------------------- write pointer
This technique allows all strings to be stored contiguously within the ring buffer - with the small trade-off that the end of the buffer might not be fully utilized and a couple extra branches in the code.
For this answer i chose the wrap-around marker approach.
3.2 What if there's no space for a wrap-around marker?
Another problem comes up once you want a string-size that's above 255 - at that point the size needs to be larger than 1 byte.
Assume we use a 2 byte-length and write "foo12" (length 5) into the ring buffer:
/--------------------------------------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 05 | 00 | f | o | o | 1 | 2 | |
+----+----+----+----+----+----+----+----+
^
\---- write pointer
so far so good, but as soon as the read pointer catches up we have a problem:
there is only a single byte left to write before we need to wrap around, which is not enough to fit a 2-byte length!
So we would need to wrap-around the length on the next write (writing "foo" (length 3) into the ringbuffer):
/---- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 00 | f | o | o | | | | 03 |
+----+----+----+----+----+----+----+----+
^
\------------------- write pointer
There are three potential ways this could be resolved:
Deal with wrapped-around lengths in the implementation.
The downside to this approach is that it introduces a bunch more branches into the code (a simple memcpy for the size won't suffice anymore), it makes aligning strings more difficult and it goes against the design we chose in 3.1.
Use a single byte wrap-around marker (regardless of the string size)
This would allow us to place a wrap-around marker and prevent wrapped-around string sizes:
/---- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 03 | 00 | f | o | o | | | WA |
+----+----+----+----+----+----+----+----+
^
\-------------- write pointer
The downside with this approach is that it's rather difficult to differentiate between a wrap-around marker and an actual string-size. We also would need to probe the first byte of each length first to check if it's a wrap-around marker before reading the full length integer.
Automatically advance the write pointer if the remaining size is insufficient for a length entry.
This is the simplest solution i managed to come up with, and the one i ended up implementing. It basically allows the writer to advance the write pointer back to the beginning of the buffer (without writing anything in it), if the remaining size would be less than the size of the length integral type. (The reader recognizes this and jumps back to the beginning as well)
/---- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 03 | 00 | f | o | o | | | |
+----+----+----+----+----+----+----+----+
^
\-------------- write pointer
3.3 Destructive interference
Depending on how fast your reader is in comparison to your writer you might have a bit of destructive interference within the ring buffer.
If for example your L1 cache line size is 128 bytes, using 1-byte lengths for the strings in the ring-buffer, and only pushing length 1 strings, e.g.:
/--------------------------------------- read pointer
v
+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+----+----+----+----+----+----+----+----+
| 01 | a | 01 | b | 01 | c | | | ...
+----+----+----+----+----+----+----+----+
^
\--------- write pointer
Then this would result in +/- 64 string entries being stored on the same cache line, which are continously written by the producer while they get read from the consumer => a lot of potential interference.
This can be prevented by padding the strings within the ring buffer to a multiple of your cache line size (in C++ available as std::hardware_destructive_interference_size)
i.e. strings padded to 4-bytes:
/--------------------------------------- read pointer
v
+----+----+----+----+----+----+----+----+----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+----+----+----+----+----+----+----+----+----+
| 01 | a | | | 01 | c | | | | ...
+----+----+----+----+----+----+----+----+----+
^
\--------- write pointer
The trade-off here is of course that this will potentially waste a lot of space within the ring buffer.
So you'll have to profile how much padding you want for your string values.
The padding value you choose should be between those two values:
1 <= N <= std::hardware_destructive_interference_size
Where 1 is basically no padding => best space utilization, worst potential interference
And std::hardware_destructive_interference_size=> worst space utilization, no potential interference
4. Implementation
This is a full implementation of a wait-free, string only, spsc fifo queue - based on the design considerations listed above.
I've only implemented the bare minimum interface required, but you can easily create all the utility functions boost::lockfree::spsc_queue provides around those 3 core functions:
godbolt
template<std::unsigned_integral size_type, std::size_t padding = 1>
class lockfree_spsc_string_queue {
public:
lockfree_spsc_string_queue(void* buffer, std::size_t buffer_size, bool init = false);
// producer:
// tries to push the given string into the queue
// returns true if adding the string was successfull
// (contents will be copied into the ringbuffer)
bool push(std::string_view str);
// consumer:
// reads the next element in the queue.
// returns an empty optional if the queue is empty,
// otherwise a string_view that points to the string
// within the ringbuffer.
// Does NOT remove the element from the queue.
std::optional<std::string_view> front();
// Removes the next element from the queue.
// Returns true if an element has been removed.
bool pop();
};
size_type is the integral type that is used to store the length of the strings within the buffer, i.e. if you use unsigned char each string could be at most 254 bytes in length (with unsigned short (assuming 2 bytes) it would be 65534, etc...) (the maximum value is used as a wrap-around marker)
padding is the alignment that's used for the string values. If it is set to 1 then strings will be packed as tightly as possible into the ring buffer (best space utilization). If you set it to std::hardware_destructive_interference_size then there will be no interference between different string values in the ring buffer, at the cost of space utilization.
Usage example: godbolt
void producer() {
lockfree_spsc_string_queue<unsigned short> queue(BUFFER_POINTER, BUFFER_SIZE, true);
while(true) {
// retry until successful
while(!queue.push("foobar"));
}
}
void consumer() {
lockfree_spsc_string_queue<unsigned short> queue(BUFFER_POINTER, BUFFER_SIZE);
while(true) {
std::optional<std::string_view> result;
// retry until successful
while(!result) result = queue.front();
std::cout << *result << std::endl;
bool pop_result = queue.pop();
assert(pop_result);
}
}
boost::lockfree::spsc_queue::consume_all e.g. could be implemented like this (in terms of the 3 functions provided by this minimal implementation):
template<class Functor>
std::size_t consume_all(Functor&& f) {
std::size_t cnt = 0;
for(auto el = front(); el; el = front()) {
std::forward<Functor>(f)(*el);
bool pop_result = pop();
assert(pop_result);
++cnt;
}
return cnt;
}
Full implementation: godbolt
// wait-free single-producer, single consumer fifo queue
//
// The constructor must be called once with `init = true` for a specific region.
// After the construction of the queue with `init = true` has succeeded additional instances
// can be created for the region by passing `init = false`.
template<
std::unsigned_integral size_type,
std::size_t padding = 1>
class lockfree_spsc_string_queue {
// we use the max value of size_type as a special marker
// to indicate that the writer needed to wrap-around early to accommodate a string value.
// this means the maximum size a string entry can be is `size_type_max - 1`.
static constexpr size_type size_type_max = std::numeric_limits<size_type>::max();
// calculates the padding necessary that is required after
// a T member to align the next member onto the next cache line.
template<class T>
static constexpr std::size_t padding_to_next_cache_line =
std::hardware_destructive_interference_size -
sizeof(T) % std::hardware_destructive_interference_size;
public:
// internal struct that will be placed in the shared memory region
struct spsc_shm_block {
using atomic_size = std::atomic<std::size_t>;
// read head
atomic_size read_offset;
char pad1[padding_to_next_cache_line<atomic_size>];
// write head
atomic_size write_offset;
char pad2[padding_to_next_cache_line<atomic_size>];
std::size_t usable_buffer_size;
char pad3[padding_to_next_cache_line<std::size_t>];
// actual data
std::byte buffer[];
[[nodiscard]] static inline spsc_shm_block* init_shm(void* ptr, std::size_t size) {
spsc_shm_block* block = open_shm(ptr, size);
// atomics *must* be lock-free, otherwise they won't work across process boundaries.
assert(block->read_offset.is_lock_free());
assert(block->write_offset.is_lock_free());
block->read_offset = 0;
block->write_offset = 0;
block->usable_buffer_size = size - offsetof(spsc_shm_block, buffer);
return block;
}
[[nodiscard]] static inline spsc_shm_block* open_shm(void* ptr, std::size_t size) {
// this type must be trivially copyable, otherwise we can't implicitly start its lifetime.
// It also needs to have a standard layout for offsetof.
static_assert(std::is_trivially_copyable_v<spsc_shm_block>);
static_assert(std::is_standard_layout_v<spsc_shm_block>);
// size must be at least as large as the header
assert(size >= sizeof(spsc_shm_block));
// ptr must be properly aligned for the header
assert(reinterpret_cast<std::uintptr_t>(ptr) % alignof(spsc_shm_block) == 0);
// implicitly start lifetime of spsc_shm_block
return std::launder(reinterpret_cast<spsc_shm_block*>(ptr));
}
};
public:
inline lockfree_spsc_string_queue(void* ptr, std::size_t size, bool init = false)
: block(init ? spsc_shm_block::init_shm(ptr, size) : spsc_shm_block::open_shm(ptr, size))
{
// requires a buffer at least 1 byte larger than size_type
assert(block->usable_buffer_size > sizeof(size_type));
}
// prevent copying / moving
lockfree_spsc_string_queue(lockfree_spsc_string_queue const&) = delete;
lockfree_spsc_string_queue(lockfree_spsc_string_queue&&) = delete;
lockfree_spsc_string_queue& operator=(lockfree_spsc_string_queue const&) = delete;
lockfree_spsc_string_queue& operator=(lockfree_spsc_string_queue&&) = delete;
// producer: tries to add `str` to the queue.
// returns true if the string has been added to the queue.
[[nodiscard]] inline bool push(std::string_view str) {
std::size_t write_size = pad_size(sizeof(size_type) + str.size());
// impossible to satisfy write (not enough space / insufficient size_type)
if(write_size > max_possible_write_size() || str.size() >= size_type_max) [[unlikely]] {
assert(write_size < max_possible_write_size());
assert(str.size() < size_type_max);
return false;
}
std::size_t write_off = block->write_offset.load(std::memory_order_relaxed);
std::size_t read_off = block->read_offset.load(std::memory_order_acquire);
std::size_t new_write_off = write_off;
if(try_align_for_push(read_off, new_write_off, write_size)) {
new_write_off = push_element(new_write_off, write_size, str);
block->write_offset.store(new_write_off, std::memory_order_release);
return true;
}
if(new_write_off != write_off) {
block->write_offset.store(new_write_off, std::memory_order_release);
}
return false;
}
// consumer: discards the current element to be read (if there is one)
// returns true if an element has been removed, false otherwise.
[[nodiscard]] inline bool pop() {
std::size_t read_off;
std::size_t str_size;
if(!read_element(read_off, str_size)) {
return false;
}
std::size_t read_size = pad_size(sizeof(size_type) + str_size);
std::size_t new_read_off = advance_offset(read_off, read_size);
block->read_offset.store(new_read_off, std::memory_order_release);
return true;
}
// consumer: returns the current element to be read (if there is one)
// this does not remove the element from the queue.
[[nodiscard]] inline std::optional<std::string_view> front() {
std::size_t read_off;
std::size_t str_size;
if(!read_element(read_off, str_size)) {
return std::nullopt;
}
// return string_view into buffer
return std::string_view{
reinterpret_cast<std::string_view::value_type*>(&block->buffer[read_off + sizeof(size_type)]),
str_size
};
}
private:
// handles implicit and explicit wrap-around for the writer
[[nodiscard]] inline bool try_align_for_push(
std::size_t read_off,
std::size_t& write_off,
std::size_t write_size) {
std::size_t cont_avail = max_avail_contiguous_write_size(write_off, read_off);
// there is enough contiguous space in the buffer to push the string in one go
if(write_size <= cont_avail) {
return true;
}
// not enough contiguous space in the buffer.
// check if the element could fit contiguously into
// the buffer at the current write_offset.
std::size_t write_off_to_end = block->usable_buffer_size - write_off;
if(write_size <= write_off_to_end) {
// element could fit at current position, but the reader would need
// to consume more elements first
// -> do nothing
return false;
}
// element can NOT fit contiguously at current write_offset
// -> we need a wrap-around
std::size_t avail = max_avail_write_size(write_off, read_off);
// not enough space for a wrap-around marker
// -> implicit wrap-around
if(write_off_to_end < sizeof(size_type)) {
// the read marker has advanced far enough
// that we can perform a wrap-around and try again.
if(avail >= write_off_to_end) {
write_off = 0;
return try_align_for_push(read_off, write_off, write_size);
}
// reader must first read more elements
return false;
}
// explicit wrap-around
if(avail >= write_off_to_end) {
std::memcpy(&block->buffer[write_off], &size_type_max, sizeof(size_type));
write_off = 0;
return try_align_for_push(read_off, write_off, write_size);
}
// explicit wrap-around not possible
// (reader must advance first)
return false;
}
// writes the element into the buffer at the provided offset
// and calculates new write_offset
[[nodiscard]] inline std::size_t push_element(
std::size_t write_off,
std::size_t write_size,
std::string_view str) {
// write size + string into buffer
size_type size = static_cast<size_type>(str.size());
std::memcpy(&block->buffer[write_off], &size, sizeof(size_type));
std::memcpy(&block->buffer[write_off + sizeof(size_type)], str.data(), str.size());
// calculate new write_offset
return advance_offset(write_off, write_size);
}
// returns true if there is an element that can be read (and sets read_off & str_size)
// returns false otherwise.
// internally handles implicit and explicit wrap-around.
[[nodiscard]] inline bool read_element(std::size_t& read_off, std::size_t& str_size) {
std::size_t write_off = block->write_offset.load(std::memory_order_acquire);
std::size_t orig_read_off = block->read_offset.load(std::memory_order_relaxed);
read_off = orig_read_off;
str_size = 0;
if(read_off == write_off) {
return false;
}
// remaining space would be insufficient for a size_type
// -> implicit wrap-around
if(block->usable_buffer_size - read_off < sizeof(size_type)) {
read_off = 0;
if(read_off == write_off) {
block->read_offset.store(read_off, std::memory_order_release);
return false;
}
}
size_type size;
std::memcpy(&size, &block->buffer[read_off], sizeof(size_type));
// wrap-around marker
// -> explicit wrap-around
if(size == size_type_max) {
read_off = 0;
if(read_off == write_off) {
block->read_offset.store(read_off, std::memory_order_release);
return false;
}
std::memcpy(&size, &block->buffer[read_off], sizeof(size_type));
}
// modified read_off -> store
if(read_off != orig_read_off) {
block->read_offset.store(read_off, std::memory_order_release);
}
str_size = size;
return true;
}
// the maximum number of contiguous bytes we are currently able
// to fit within the memory block (without wrapping around)
[[nodiscard]] inline std::size_t max_avail_contiguous_write_size(
std::size_t write_off,
std::size_t read_off) {
if(write_off >= read_off) {
std::size_t ret = block->usable_buffer_size - write_off;
ret -= read_off == 0 ? 1 : 0;
return ret;
}
// write_off < read_off
return read_off - write_off - 1;
}
// the maximum number of bytes we are currently able
// to fit within the memory block (might include a wrap-around)
[[nodiscard]] inline std::size_t max_avail_write_size(std::size_t write_off, std::size_t read_off) {
std::size_t avail = read_off - write_off - 1;
if (write_off >= read_off)
avail += block->usable_buffer_size;
return avail;
}
// the largest possible size an element could be and still
// fit within the memory block.
[[nodiscard]] inline std::size_t max_possible_write_size() {
return block->usable_buffer_size - 1;
}
// pads a given size to be a multiple of the template parameter padding
[[nodiscard]] inline std::size_t pad_size(std::size_t size) {
if(size % padding != 0) {
size += padding - size % padding;
}
return size;
}
// advances offset and wraps around if required
[[nodiscard]] inline std::size_t advance_offset(std::size_t offset, std::size_t element_size) {
std::size_t new_offset = offset + element_size;
// wrap-around
if(new_offset >= block->usable_buffer_size) {
new_offset -= block->usable_buffer_size;
}
return new_offset;
}
private:
spsc_shm_block* block;
};

Create your own string type that does what you want:
struct MyString
{
uint8_t size; // how much of data is actually populated
std::array<char, 127> data; // null terminated? up to you
};
Now an spsc_queue<MyString> can store strings without separate allocation.

Related

Is optimizing small std::strings wiht c-style functions often pointless?

Let's say we've got the following piece of code and we've decided to optimise it a bit:
/// BM_NormalString
bool value = false;
std::string str;
str = "orthogonal";
if (str == "orthogonal") {
value = true;
}
So far we've come up with two pretty simple and straightforward strategies:
/// BM_charString
bool value = false;
char *str = new char[11];
std::strcpy(str, "orthogonal");
if (std::strcmp(str, "orthogonal") == 0) {
value = true;
}
delete[] str;
/// BM_charStringMalloc
bool value = false;
char *str = (char *) std::malloc(11);
std::strcpy(str, "orthogonal");
if (std::strcmp(str, "orthogonal") == 0) {
value = true;
}
free(str);
If we try and benchmark our three approaches we, quite surprisingly, won't see much of a difference.
Although bench-marking it locally gives me even more surprisingly disconcerting results:
| Benchmark | Time | CPU | Iterations |
|----------------------|------------------|-----------|---------------|
| BM_charString | 52.1 ns | 51.6 ns | 11200000 |
| BM_charStringMalloc | 47.4 ns | 47.5 ns | 15448276 |
| **BM_NormalString** | 17.1 ns | 16.9 ns | 40727273 |
Would you say then, that for such rather small strings there's almost no point in going 'bare metal' style (by using C-style string functions)?

For small strings, there's no point using dynamic storage. The allocation itself is slower than the comparison. Standard library implementers know this and have optimised std::basic_string to not use dynamic storage with small strings.
Using C-strings is not an "optimisation".

Is there any difference between map and unordered_map in c++ in terms of memory usage?

I am solving a problem on InterviewBit and come across a question,
here's link https://www.interviewbit.com/problems/diffk-ii/.
When I have used c++ STL map to solve this problem, it shows me the message
Memory Limit Exceeded. Your submission didn't complete in the allocated memory limit.
here's my code
int Solution::diffPossible(const vector<int> &A, int B) {
int n = A.size();
map< int , int > mp;
for(int i =0;i<n; i++)
mp[A[i]] = i;
int k = B;
for(int i =0; i<n; i++){
if(mp.find(A[i]+k) != mp.end() && mp[A[i]+k] != i){
return 1;
}
if(mp.find(A[i]-k) != mp.end() && mp[A[i]-k] != i){
return 1;
}
}
return 0;
}
and when I have replaced map by unorderd_map solution accepted.
Here's code
int Solution::diffPossible(const vector<int> &A, int B) {
int n = A.size();
unordered_map< int , int > mp;
for(int i =0;i<n; i++)
mp[A[i]] = i;
int k = B;
for(int i =0; i<n; i++){
if(mp.find(A[i]+k) != mp.end() && mp[A[i]+k] != i){
return 1;
}
if(mp.find(A[i]-k) != mp.end() && mp[A[i]-k] != i){
return 1;
}
}
return 0;
}
It means map is taking more memory than unordered_map.
Can anyone explain how this is happening? Why map is taking more memory
space than unordered_map?

Maps are implemented as binary search trees and each node (additionally to useful data) typically stores 3 pointers (to a left child, a right child, and a parent).
Unordered maps are implemented as hash tables, where each node is in a linked list. In case of a singly linked list (that belongs to a relevant bucket), there is only 1 pointer per node. UPDATE: However, there is also an additional pointer for each bucket. In an ideal case where there are no collisions, there would be 2 pointers per element stored in memory.
Note that 2 ints will typically occupy 8 bytes, same as a single pointer.
For example, look at the GNU libstdc++ implementation. The RB tree's node is defined as follows:
struct _Rb_tree_node_base
{
typedef _Rb_tree_node_base* _Base_ptr;
typedef const _Rb_tree_node_base* _Const_Base_ptr;
_Rb_tree_color _M_color;
_Base_ptr _M_parent;
_Base_ptr _M_left;
_Base_ptr _M_right;
...
There, you can observe those 3 pointers.
Generally, it's hard to say which container will consume less overall memory. However, I created a benchmark to insert 1M random numbers into both containers and measured maximum resident size (MaxRSS) to reflect all the consumed memory space including, e.g., heap housekeeping data. The results were as follows:
48,344 kB for std::map,
50 888 kB for std::unordered_map,
40,932 kB for std::unordered_map with reserve.
Note that the memory consumption for the unordered map (ad 2.) was higher due to reallocations of bucket lists. That is what reserve member function is for. If one cares about memory consumption and knows the number of elements beforehand, he/she should always preallocate (this is the same situation as for vectors).

A map is basically a binary search tree, while unordered_map is implemented as hash map. If you look at implementations of both, you'll quickly notice that a BST is quite a bit bigger.
It also means that map is a lot slower than unordered_map.
| map | unordered_map
---------------------------------------------------------
Ordering | increasing order | no ordering
| (by default) |
Implementation | Self balancing BST | Hash Table
| like Red-Black Tree |
search time | log(n) | O(1) -> Average
| | O(n) -> Worst Case
Insertion time | log(n) + Rebalance | Same as search
Deletion time | log(n) + Rebalance | Same as search
BST:
struct node
{
int data;
node* left;
node* right;
};
HashMap:
struct hash_node {
int key;
int value;
hash_node* next;
}
Reference: https://www.geeksforgeeks.org/map-vs-unordered_map-c/

Writing aligned data in binary file

I am creating a file with some data objects inside. data object have different sizes and are something like this (very simplified):
struct Data{
uint64_t size;
char blob[MAX_SIZE];
// ... methods here:
}
At some later step, the file will be mmap() in memory,
so I want the beginning of every data objects to starts on memory address aligned by 8 bytes where uint64_t size will be stored (let's ignore endianness).
Code looks more or less to this (currently hardcoded 8 bytes):
size_t calcAlign(size_t const value, size_t const align_size){
return align_size - value % align_size;
}
template<class ITERATOR>
void process(std::ofstream &file_data, ITERATOR begin, ITERATOR end){
for(auto it = begin; it != end; ++it){
const auto &data = *it;
size_t bytesWriten = data.writeToFile(file_data);
size_t const alignToBeAdded = calcAlign(bytesWriten, 8);
if (alignToBeAdded != 8){
uint64_t const placeholder = 0;
file_data.write( (const char *) & placeholder, (std::streamsize) alignToBeAdded);
}
}
}
Is this the best way to achieve alignment inside a file?

you don't need to rely on writeToFile to return the size, you can use ofstream::tellp
const auto beginPos = file_data.tellp();
// write stuff to file
const auto alignSize = (file_data.tellp()-beginPos)%8;
if(alignSize)
file_data.write("\0\0\0\0\0\0\0\0",8-alignSize);
EDIT post OP comment:
Tested on a minimal example and it works.
#include <iostream>
#include <fstream>
int main(){
using namespace std;
ofstream file_data;
file_data.open("tempfile.dat", ios::out | ios::binary);
const auto beginPos = file_data.tellp();
file_data.write("something", 9);
const auto alignSize = (file_data.tellp() - beginPos) % 8;
if (alignSize)
file_data.write("\0\0\0\0\0\0\0\0", 8 - alignSize);
file_data.close();
return 0;
}

You can optimize the process by manipulating the input buffer instead of the file handling. Modify your Data struct so the code that fills the buffer takes care of the alignment.
struct Data{
uint64_t size;
char blob[MAX_SIZE];
// ... other methods here
// Ensure buffer alignment
static_assert(MAX_SIZE % 8 != 0, "blob size must be aligned to 8 bytes to avoid Buffer Overflow.");
uint64_t Fill(const char* data, uint64_t dataLength) {
// Validations...
memcpy(this->blob, data, dataLength);
this->size = dataLength;
const auto paddingLen = calcAlign(dataLength, 8) % 8;
if (padding > 0) {
memset(this->blob + dataLength, 0, paddingLen);
}
// Return the aligned size
return dataLength + paddingLen;
}
};
Now when you pass the data to the "process" function simply use the size returned from Fill, which ensures 8 byte alignment.
This way you still takes care of the alignment manually but you don't have to write twice to the file.
note: This code assumes you use Data also as the input buffer. You should use the same principals if your code uses some another object to hold the buffer before it is written to the file.
If you can use POSIX, see also pwrite

How can a null pointer change its value?

I used something like this:
struct Node
{
char name[50];
Node *left,*right;
};
int main()
{
char cmd[10];
Node* p=NULL;
scanf("%s",&cmd);
if (p == NULL)
{
// do something
// THIS NEVER GETS EXECUTED
// WHYYYYY????
//THIS IS STRANGE
}
}
So basically, the pointer p changes its value after I read into the cmd variable. I tried to commented out the scanf code, and then everything works. Very strange.

You have a buffer overflow.
The memory looks like this:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| | | | | | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
^ ^
cmd[10] p
If scanf reads more than 10 bytes, it will overflow to p. Any non-zero value will fail the NULL check, even if it is not a valid pointer.

Depth of a node in binary tree

I wrote the following function that returns the depth of a specific node of a binary tree. Consider the tree here: If I ask for the depth of node 5, I should get an answer of 3, from the path 1 -> 2 -> 5.
It's not working; I get 0, even though I return the height from the function.
Here "data" is the value whose depth is to be found, and root is the root node of tree. The initial value of height is 1 (the root node is level 1).
int height_target(node *root,int data,int height){ if(root==NULL) return 0;
if(root->info==data)
return height;
height_target(root->left,data,height+1);
height_target(root->right,data,height+1);
}

Most notably, your recursive branch returns nothing. Passing the height back one level isn't enough: you have to pass it all the way up the line.
You need to capture and return the maximum of the left & right subtree calls.
EDIT: remove that last sentence ...
Thus, you need to return the value from whichever call (left or right) finds the desired node. You didn't return anything. Something like:
ldepth = height_target(root->left , data, height+1);
if (ldepth > 0)
return ldepth;
rdepth = height_target(root->right, data, height+1);
if (rdepth > 0)
return rdepth;
return 0;
This will return whichever branch is successful in finding the desired node, and 0 on failure.

You aren't doing anything with the value returned from height_target.
Presumably you want something like:
return std::max(height_target(root->left,data,height+1),
height_target(root->right,data,height+1));

I think you get 0 as answwer because the condition
if(root->info==data)is never satisfied if there isn't a node with value equal to data. If you get 0 as a response it would be that the "data" you search in the binary tree there isn't.

Ok. I think i get it. Try with a function like this:
void height_target(node* root,int data,int height,int& h)
{
if(root==NULL)
return;
if(root->info==data)
h=height;
height(root->left,data,height+1,h);
height(root->right,data,height+1,h);
}
I think you supposend that there can't be two equal values in the tree

It's not working;
Here is one possible fix to your code: (updated - this code now compiles)
Summary - the last two lines of your code discard results during the 'decursion' (when recursion is unravelling).
int height_target(node* current, int data, int height)
{
int retVal = 0;
do {
if (nullptr == current)
break; // 0 indicates not found
if (current->info == data) { retVal = height; break; }
// found the node at 'height'; now de-curse
retVal = height_target (current->left, data, height+1);
if (retVal) break; // found data in left branch
retVal = height_target(current->right, data, height+1);
if(retVal) break; // found data in right branch
}while(0);
return retVal;
}
Imagine your search item is found 5 layers 'up', so that highest recursion returns 5.
At layer 4, the data was not found, so your code then searched both the left branch and right branch.
With either branch, when your code returned (from layer 5) with the value '5', your code simply discarded the result.
In this possible solution, I tested retVal after return from either left or right. Now if the return value (from layer 5) is not zero, the function will return the non-zero value; eventually popping the value off the stack all the way back 'down' to the bottom of your recursion.
Perhaps a simplified call trace can illustrate:
height_target (., ., 1); // first call, data not found at root
| height_target (., ., 2); // recursive call, data not found
| | height_target (., ., 3); // recurse, not found
| | | height_target (., ., 4); // recurse, not found
| | | | height_target (., data, 5); // found! 'decursion' begins
| | | | |
| | | | returns 5 // height 5 returns 5
| | | returns 5 // height 4 return 5
| | returns 5 // height 3 returns 5
| returns 5 // height 2 returns 5
returns 5 // height 1 returns 5
First call now returns 5.

The problem is in your understanding of what return does: it does not terminate all calls to the current function, it only ends the current iteration of said function and passes the value back to the previous stack frame.
Consider:
#include <iostream>
int f(int args) {
if (args == 3)
return args;
f(args + 1);
std::cout << args << "\n";
}
int main() {
f(0);
}
http://ideone.com/59Dl7X
Your compiler should have been warning you about the function not returning a value.
What you want is something like this:
static const size_t NODE_NOT_FOUND = 0;
size_t height_target_impl(node *root, int data, size_t height)
{
if (root == nullptr)
return NODE_NOT_FOUND;
if (root->info == data)
return height;
int branch_height = height_target_impl(root->left, data, height + 1);
if (branch_height != NODE_NOT_FOUND)
return branch_height;
return height_target_impl(root->right, data, height + 1);
}
size_t height_target(node* root, int data)
{
return height_target_impl(root, data, 1);
}
Called as:
int h = height_target(root, data);
This returns a value of 0 if the node wasn't found or a 1-based height, with 1 indicating data was found in the root node.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Shared memory SPSC queue for strings without allocations in C++ - c++

Create your own string type that does what you want: struct MyString { uint8_t size; // how much of data is actually populated std::array<char, 127> data; // null terminated? up to you }; Now an spsc_queue<MyString> can store strings without separate allocation.

Related

Is optimizing small std::strings wiht c-style functions often pointless?

Is there any difference between map and unordered_map in c++ in terms of memory usage?

Writing aligned data in binary file

How can a null pointer change its value?

Depth of a node in binary tree

Categories

Resources