An effective way to concatenate multiple buffers - c++

I need to design an efficient and readable class with 2 main functions:
add_buffer(char* buffer) - add a buffer.
char* read_all() - get one big buffer that contains all the buffers that the user added until now (by order).
for example:
char first_buffer[] = {1,2,3};
char second_buffer[] = {4,5,6};
MyClass instance;
instance.add_buffer(first_buffer);
instance.add_buffer(second_buffer);
char* big_buffer = instance.read_all(); // big_buffer = [1,2,3,4,5,6]
NOTE: There are a lot of solutions for this problem but I'm looking for an efficient one because in real life the buffers will be many and big, and I want to save a lot of copying and reallocs (like what std::vector does). I'm also want a readble c++ code.
NOTE: The real life problem is: I'm reading data from an HTTP request that came to me at separated chunks. After all chunks arrived I want to return the whole data to the user.

Use an std::vector<char> with enough memory reserved. Since C++11, you can access the internal buffer with std::vector::data() (until C++11, you have to use &*std::vector::begin()).

If you can use Boost, boost::algorithm::join will do:
#include <boost/algorithm/string/join.hpp>
#include <vector>
#include <iostream>
int main(int, char **)
{
std::vector<std::string> list;
list.push_back("Hello");
list.push_back("World!");
std::string joined = boost::algorithm::join(list, ", ");
std::cout << joined << std::endl;
}
Output:
Hello, World!
Original answer by Tristram Gräbener

Use some standard approach like,
start with some initial memory, say 256
whenever it gets full use reallocate and double the size.
If you don't want to do it yourself, use STL containers like
std::vector<char>
It automatically reallocates memory for you when buffer is full.

Related

Is there a standard library class for substrings?

I have a big read-only string that I scan for syntax and based on that simple syntax I extract a bunch of smaller strings that I use later for further processing. Based on testing, creating and copying most of the big string into the small strings is kind of a performance bottleneck (there are thousands of them per each big string).
I figured that I don't actually need to allocate for-, and copy the data though. What I really need is a sort of string snippet type instead that would only store a pointer to the start of the relevant data and the length but at the same time, it should be a drop-in replacement for std::string and all the standard library interactions it has.
That would be the easiest to implement anyways, I could roll my own class for that and implement the functions I need but if there is already something like it in the standard library then why bother.
So basically, is there a substring sort of class in STL?
Yes, since C++17 you have std::string_view.
Example:
#include <iostream>
#include <string>
#include <string_view>
int main() {
std::string foo = "Hello world";
std::string_view a(foo.c_str(), 5);
std::string_view b(foo.c_str() + 6, 5);
std::cout << a << '\n' // prints Hello
<< b << '\n'; // prints world
}
This is where using std::string_view instead of std::string is very beneficial in reducing copies of those original strings and being able to use std::string_view::substr.
Instead of copying the strings you are operating on, a string view provides a view to the underlying string - pretty much just the pointer to the start of the string and the size of it.

insert a struct into a vector as binary data for network transmission

I am using an older network transmission function for a legacy product, which takes a char array and transmits it over the network. This char array is just data, no need for it make sense (or be null terminated). As such in the past the following occurred:
struct robot_info {
int robot_number;
int robot_type;
...
} // A robot info data structure for sending info.
char str[1024], *currentStrPos = str;
robot_info r_info;
... // str has some header data added to it.
... // robot info structure is filled out
memcpy(currentStrPos, (char *)&r_info, sizeof robot_info); // Add the robot info
scanSocket.writeTo(str, currentStrPos - str); // Write to the socket.
We have just added a bunch of stuff to robot_info but i am not happy with the single length method of the above code, i would prefer a dynamiclly allocated raii type in order to be expandable, especially since there can be multiple robot_info structures. I propose the following:
std::vector<char> str;
... // str has some header information added to it.
... // r_info is filled out.
str.insert(str.end(), (char *)&r_info, (char *)&r_info + sizeof r_info);
scanSocket.writeTo(str.data(), str.size());
Live example.
Using the std::vector insert function (with a pointer to the start of r_info as the iterator) and relying on the fact that a struct here would be aligned to at least a char and can be operated on like this. The struct has no dynamic memory elements, and no inheritance.
Will this have well defined behavior? Is there a better way to perform the same action?
While this works, it is ultimately solving a compile time problem with a run time solution. Since robot_info is a defined type, a better solution would be this:
std::array<char, sizeof robot_info> str;
memcpy(str.data(), static_cast<char *>(&r_info), sizeof robot_info);
scanSocket.writeTo(str.data(), str.size());
This has the advantages:
Can never be over size, or undersized
Automatic Storage duration and stack allocation means this is potentially faster

How to use inplace const char* as std::string content

I am working on a embedded SW project. A lot of strings are stored inside flash memory. I would use these strings (usually const char* or const wchar*) as std::string's data. That means I want to avoid creating a copy of the original data because of memory restrictions.
An extended use might be to read the flash data via stringstream directly out of the flash memory.
Example which unfortunately is not working in place:
const char* flash_adr = 0x00300000;
size_t length = 3000;
std::string str(flash_adr, length);
Any ideas will be appreciated!
If you are willing to go with compiler and library specific implementations, here is an example that works in MSVC 2013.
#include <iostream>
#include <string>
int main() {
std::string str("A std::string with a larger length than yours");
char *flash_adr = "Your source from the flash";
char *reset_adr = str._Bx._Ptr; // Keep the old address around
// Change the inner buffer
(char*)str._Bx._Ptr = flash_adr;
std::cout << str << std::endl;
// Reset the pointer or the program will crash
(char*)str._Bx._Ptr = reset_adr;
return 0;
}
It will print Your source from the flash.
The idea is to reserve a std::string capable of fitting the strings in your flash and keep on changing its inner buffer pointer.
You need to customize this for your compiler and as always, you need to be very very careful.
I have now used string_span described in CPP Core Guidelines (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md). GSL provides a complete implementation (GSL: Guidelines Support Library https://github.com/Microsoft/GSL).
If you know the address of your string inside flash memory you can just use the address directly with the following constructor to create a string_span.
constexpr basic_string_span(pointer ptr, size_type length) noexcept
: span_(ptr, length)
{}
std::string_view might have done the same job as Captain Obvlious (https://stackoverflow.com/users/845568/captain-obvlious) commented as my favourite comment.
I am quite happy with the solution. It works good from performance side including providing a good readability.

Read in data of a dynamic size into char*?

I was wondering how the following code works.
#include <iostream>
using namespace std;
int main()
{
char* buffer = new char(NULL);
while(true)
{
cin >> buffer;
cout << buffer;
cout << endl;
}
return 0;
}
I can input any amount of text of any size and it will print it back out to me. How does this work? Is it dynamically allocating space for me?
Also, if I enter in a space, it will print the next section of text on a new line.
This however, is fixed by using gets(buffer); (unsafe).
Also, is this code 'legal'?
It's not safe at all. It's rewriting whatever memory happens to lie after the buffer, and then reading it. The fact that this is working is coincidental. This is because your cin/cout operations don't say "oh, a pointer to one char, I should just write one char" but "oh, you have enough space allocated for me."
Improvement #1:
char* buffer = new char(10000) or simply char buffer[10000];
Now you can safely write long-ish paragraphs with no issue.
Improvement #2:
std::string buffer;
To answer your question in the comment, C++ is all for letting you make big memory mistakes. As noted in comment this is because it's a "don't pay for what you don't need" language. There are some people who really need this level of optimization in their code although you are probably not one of them.
However, it also gives you plenty of ways to do it where you don't have to think about memory at all. I will say firmly: if you are using new and delete or char[] and not because you are using a design pattern with which you've familiarized that require them, or because you are using 3rd-party or C libraries that require them, there is a safer way to do it.
Some guidelines that will save you 80% of the time:
-Don't use char[]. Use string.
-Don't use pointers to pass or return argument. Pass by reference, return by value.
-Don't use arrays (e.g. int[]). Use vectors. You still have to check your own bounds.
With just those three you'll be writing "pretty safe", non-C-like code.
This is what std::string is for:
std::string s;
while (true)
{
std::cin >> s;
std::cout << s << std::endl;
}
std::string WILL dynamically allocate space for you, so you don't have to worry about overwriting memory elsewhere.

Is it possible to use an std::string for read()?

Is it possible to use an std::string for read() ?
Example :
std::string data;
read(fd, data, 42);
Normaly, we have to use char* but is it possible to directly use a std::string ? (I prefer don't create a char* for store the result)
Thank's
Well, you'll need to create a char* somehow, since that's what the
function requires. (BTW: you are talking about the Posix function
read, aren't you, and not std::istream::read?) The problem isn't
the char*, it's what the char* points to (which I suspect is what
you actually meant).
The simplest and usual solution here would be to use a local array:
char buffer[43];
int len = read(fd, buffer, 42);
if ( len < 0 ) {
// read error...
} else if ( len == 0 ) {
// eof...
} else {
std::string data(buffer, len);
}
If you want to capture directly into an std::string, however, this is
possible (although not necessarily a good idea):
std::string data;
data.resize( 42 );
int len = read( fd, &data[0], data.size() );
// error handling as above...
data.resize( len ); // If no error...
This avoids the copy, but quite frankly... The copy is insignificant
compared to the time necessary for the actual read and for the
allocation of the memory in the string. This also has the (probably
negligible) disadvantage of the resulting string having an actual buffer
of 42 bytes (rounded up to whatever), rather than just the minimum
necessary for the characters actually read.
(And since people sometimes raise the issue, with regards to the
contiguity of the memory in std:;string: this was an issue ten or more
years ago. The original specifications for std::string were designed
expressedly to allow non-contiguous implementations, along the lines of
the then popular rope class. In practice, no implementor found this
to be useful, and people did start assuming contiguity. At which point,
the standards committee decided to align the standard with existing
practice, and require contiguity. So... no implementation has ever not
been contiguous, and no future implementation will forego contiguity,
given the requirements in C++11.)
No, you cannot and you should not. Usually, std::string implementations internally store other information such as the size of the allocated memory and the length of the actual string. C++ documentation explicitly states that modifying values returned by c_str() or data() results in undefined behaviour.
If the read function requires a char *, then no. You could use the address of the first element of a std::vector of char as long as it's been resized first. I don't think old (pre C++11) strings are guarenteed to have contiguous memory otherwise you could do something similar with the string.
No, but
std::string data;
cin >> data;
works just fine. If you really want the behaviour of read(2), then you need to allocate and manage your own buffer of chars.
Because read() is intended for raw data input, std::string is actually a bad choice, because std::string handles text. std::vector seems like the right choice to handle raw data.
Using std::getline from the strings library - see cplusplus.com - can read from an stream and write directly into a string object. Example (again ripped from cplusplus.com - 1st hit on google for getline):
int main () {
string str;
cout << "Please enter full name: ";
getline (cin,str);
cout << "Thank you, " << str << ".\n";
}
So will work when reading from stdin (cin) and from a file (ifstream).