What is wrong with my implementation of overflow()? - c++

I am trying to implement a stream buffer and I'm having trouble with making overflow() work. I resize the buffer by 10 more characters and reset the buffer using setp. Then I increment the pointer back where we left off. For some reason the output is not right:
template <class charT, class traits = std::char_traits<charT>>
class stringbuf : public std::basic_stringbuf<charT, traits>
{
public:
using char_type = charT;
using traits_type = traits;
using int_type = typename traits::int_type;
public:
stringbuf()
: buffer(10, 0)
{
this->setp(&buffer.front(), &buffer.back());
}
int_type overflow(int_type c = traits::eof())
{
if (traits::eq_int_type(c, traits::eof()))
return traits::not_eof(c);
std::ptrdiff_t diff = this->pptr() - this->pbase();
buffer.resize(buffer.size() + 10);
this->setp(&buffer.front(), &buffer.back());
this->pbump(diff);
return traits::not_eof(traits::to_int_type(*this->pptr()));
}
// ...
std::basic_string<charT> str()
{
return buffer;
}
private:
std::basic_string<charT> buffer;
};
int main()
{
stringbuf<char> buf;
std::ostream os(&buf);
os << "hello world how are you?";
std::cout << buf.str();
}
When I print the string it comes out as:
hello worl how are ou?
It's missing the d and the y. What did I do wrong?

The first thing to not is that you are deriving from std::basic_stringbuf<char> for whatever reason without overriding all of the relevant virtual functions. For example, you don't override xsputn() or sync(): whatever these functions end up doing you'll inherit. I'd strongly recommend to derive your stream buffer from std::basic_streambuf<char> instead!
The overflow() method announces a buffer which is one character smaller than the string to the stream buffer: &buffer.back() isn't a pointer to the end of the array but to the last character in the string. Personally, I would use
this->setp(&this->buffer.front(), &this->buffer.front() + this->buffer.size());
There is no problem so far. However, after making space for more characters you omitted adding the overflowing character, i.e., the argument passed to overflow() to the buffer:
this->pbump(diff);
*this->pptr() = traits::to_char_type(c);
this->pbump(1);
There are a few more little things which are not quite right:
It is generally a bad idea to give overriding virtual functions a default parameter. The base class function already provides the default and the new default is only picked up when the function is ever called explicitly.
The string returned may contain a number of null characters at the end because the held string is actually bigger than sequence which was written so far unless the buffer is exactly full. You should probably implement the str() function differently:
std::basic_string<charT> str() const
{
return this->buffer.substr(0, this->pptr() - this->pbase());
}
Growing the string by a constant value is a major performance problem: the cost of writing n characters is n * n. For larger n (they don't really need to become huge) this will cause problems. You are much better off growing your buffer exponentially, e.g., doubling it every time or growing by a factor of 1.5 if you feel doubling isn't a good idea.

Related

writing pointer contents without copy

I'm trying to print N characters pointed to by a pointer, there is no terminating character. Lets just say I have something like this (hopefully my ascii artwork is ok here.. I want to write the chars/string "bcd" to file/stdout )
char* ptr ----> 'a' , 'b' , 'c' , 'd' , 'e' , 'f'
^ ^
| |
begin end
Now I have no terminating character there. I have a pointer to beginning and end of the chars I want to write to stdout (or a logfile say). Performance is really important say and I want to avoid the overhead of copy constructing std::string (using the begin/end pointers).
Whats the fastest way to accomplish this, can anybody tell me? I've googled around but can't see anything. I could iterate over begin->end and print/write each char at a time but I'd like to get something faster/ready made. This is a theoretical question (for my own benefit) but I'd like to know how this is done in high performance applications (think FIX message strings in low latency applications).
Thanks much
Graham
If you would like to make custom buffering I could suggest on something like this
class buffered_stream_buf : public std::streambuf {
public:
buffered_stream_buf(std::ostream* stream)
: _size(), _stream(stream), _out(_stream->rdbuf()) {
_stream->rdbuf(this);
}
~buffered_stream_buf() {
_stream->flush();
_stream->rdbuf(_out);
}
int sync() override {
if (_size) {
_out->sputn(_buffer, _size);
_size = 0;
}
return _out->pubsync();
}
int overflow(int c) override {
if (c == std::streambuf::traits_type::eof()) {
return !std::streambuf::traits_type::eof();
}
_buffer[_size] = static_cast<char>(c);
++_size;
if (_size == sizeof(_buffer) && sync() != 0)
return std::streambuf::traits_type::eof();
return c;
}
private:
char _buffer[8 * 1024];
size_t _size;
std::ostream* _stream;
std::streambuf* _out;
};
int main() {
// Unbuffering `cout` might be a good idea:
// to avoid double copying
std::cout.setf(std::ios::unitbuf);
buffered_stream_buf mycoutbuf(&std::cout);
std::ofstream f("testmybuffer.txt", std::ios_base::out);
buffered_stream_buf myfbuf(&f);
std::cout << "Hello";
f << "Hello";
std::string my_long_string("long_long_long");
auto b = my_long_string.begin() + 3;
auto e = my_long_string.begin() + 5;
for (; b != e; ++b) {
std::cout << *b;
f << *b;
}
return 0;
}
The question will this improve the performance? I am not sure, it can even worsen your performance. Why? cout and fstream are usually already buffered with a probably good size according to your machine. That means before being send to OS objects (files, pipes, etc) C++ std implementaion might buffer it (though it is not required by the standard). Adding new layer of buffering might not be unnecessary and hit the performance (you first copy it to your buffer, then later you copy it again to std buffer). Second, OS objects files, pipes are already mapped to memory, i.e. buffered. However, with this language buffering you achieve less system calls that might be expensive. To be sure if your own buffering is helpful you should benchmark it. You do not have time for that I recommend leaving it out of your scope and rely on std and OS, those are usually quite good in this.
You can use std::basic_ostream::write:
std::cout.write(begin, end - begin);

Implementing a String class with implicit conversion to char* (C++)

It might not be advisable according to what I have read at a couple of places (and that's probably the reason std::string doesn't do it already), but in a controlled environment and with careful usage, I think it might be ok to write a string class which can be implicitly converted to a proper writable char buffer when needed by third party library methods (which take only char* as an argument), and still behave like a modern string having methods like Find(), Split(), SubString() etc. While I can try to implement the usual other string manipulation methods later, I first wanted to ask about the efficient and safe way to do this main task. Currently, we have to allocate a char array of roughly the maximum size of the char* output that is expected from the third party method, pass it there, then convert the return char* to a std::string to be able to use the convenient methods it allows, then again pass its (const char*) result to another method using string.c_str(). This is both lengthy and makes the code look a little messy.
Here is my very initial implementation so far:
MyString.h
#pragma once
#include<string>
using namespace std;
class MyString
{
private:
bool mBufferInitialized;
size_t mAllocSize;
string mString;
char *mBuffer;
public:
MyString(size_t size);
MyString(const char* cstr);
MyString();
~MyString();
operator char*() { return GetBuffer(); }
operator const char*() { return GetAsConstChar(); }
const char* GetAsConstChar() { InvalidateBuffer(); return mString.c_str(); }
private:
char* GetBuffer();
void InvalidateBuffer();
};
MyString.cpp
#include "MyString.h"
MyString::MyString(size_t size)
:mAllocSize(size)
,mBufferInitialized(false)
,mBuffer(nullptr)
{
mString.reserve(size);
}
MyString::MyString(const char * cstr)
:MyString()
{
mString.assign(cstr);
}
MyString::MyString()
:MyString((size_t)1024)
{
}
MyString::~MyString()
{
if (mBufferInitialized)
delete[] mBuffer;
}
char * MyString::GetBuffer()
{
if (!mBufferInitialized)
{
mBuffer = new char[mAllocSize]{ '\0' };
mBufferInitialized = true;
}
if (mString.length() > 0)
memcpy(mBuffer, mString.c_str(), mString.length());
return mBuffer;
}
void MyString::InvalidateBuffer()
{
if (mBufferInitialized && mBuffer && strlen(mBuffer) > 0)
{
mString.assign(mBuffer);
mBuffer[0] = '\0';
}
}
Sample usage (main.cpp)
#include "MyString.h"
#include <iostream>
void testSetChars(char * name)
{
if (!name)
return;
//This length is not known to us, but the maximum
//return length is known for each function.
char str[] = "random random name";
strcpy_s(name, strlen(str) + 1, str);
}
int main(int, char*)
{
MyString cs("test initializer");
cout << cs.GetAsConstChar() << '\n';
testSetChars(cs);
cout << cs.GetAsConstChar() << '\n';
getchar();
return 0;
}
Now, I plan to call the InvalidateBuffer() in almost all the methods before doing anything else. Now some of my questions are :
Is there a better way to do it in terms of memory/performance and/or safety, especially in C++ 11 (apart from the usual move constructor/assignment operators which I plan to add to it soon)?
I had initially implemented the 'buffer' using a std::vector of chars, which was easier to implement and more C++ like, but was concerned about performance. So the GetBuffer() method would just return the beginning pointer of the resized vector of . Do you think there are any major pros/cons of using a vector instead of char* here?
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
Is it too much overkill rather than just doing the usual way of passing char array pointer, converting to a std::string and doing our work with it. The third party function calls expecting char* arguments are used heavily in the code and I plan to completely replace both char* and std::string with this new string if it works.
Thank you for your patience and help!
If I understood you correctly, you want this to work:
mystring foo;
c_function(foo);
// use the filled foo
with a c_function like ...
void c_function(char * dest) {
strcpy(dest, "FOOOOO");
}
Instead, I propose this (ideone example):
template<std::size_t max>
struct string_filler {
char data[max+1];
std::string & destination;
string_filler(std::string & d) : destination(d) {
data[0] = '\0'; // paranoia
}
~string_filler() {
destination = data;
}
operator char *() {
return data;
}
};
and using it like:
std::string foo;
c_function(string_filler<80>{foo});
This way you provide a "normal" buffer to the C function with a maximum that you specify (which you should know either way ... otherwise calling the function would be unsafe). On destruction of the temporary (which, according to the standard, must happen after that expression with the function call) the string is copied (using std::string assignment operator) into a buffer managed by the std::string.
Addressing your questions:
Do you think there are any major pros/cons of using a vector instead of char* here?
Yes: Using a vector frees your from manual memory management. This is a huge pro.
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
A union is a bad idea. How do you know which member is currently active? You need a flag outside of the union. Do you really want every string to carry that around? Instead look what the standard library is doing: It's using templates to provide this abstraction.
Is it too much overkill [..]
Writing a string class? Yes, way too much.
What you want to do already exists. For example with this plain old C function:
/**
* Write n characters into buffer.
* n cann't be more than size
* Return number of written characters
*/
ssize_t fillString(char * buffer, ssize_t size);
Since C++11:
std::string str;
// Resize string to be sure to have memory
str.resize(80);
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
or without first resizing:
std::string str;
if (!str.empty()) // To avoid UB
{
auto newSize = fillSrting(&str[0], str.size());
str.resize(newSize);
}
But before C++11, std::string isn't guaranteed to be stored in a single chunk of contiguous memory. So you have to pass through a std::vector<char> before;
std::vector<char> v;
// Resize string to be sure to have memor
v.resize(80);
ssize_t newSize = fillSrting(&v[0], v.size());
std::string str(v.begin(), v.begin() + newSize);
You can use it easily with something like Daniel's proposition

Adapting C++ std/boost iostreams to provide cyclic write to memory block

Background:
I'm trying to optimize a logging system so that it uses memory-mapped files. I need to provide an std::ostream-like interface so that the logging system can write to that memory.
I have identified std::strstream (which is deprecated though) and boost::iostreams::basic_array_sink could fit my needs.
Now I want to have the logging cyclic, meaning that when the output pointer is near the end of the memory block it should start over at the beginning again.
My question is where would be the best point to start in order to implement this specific behaviour.
I'm rather overwhelmed by the std::iostreams class hierarchy and don't grasp all the internal workings as for now.
I'm uncertain to whether i should/need to derive from ostream, streambuf, or both?
Are these made for being derived from, anyway?
Or using boost:iostreams, would i need to have to write my own Sink?
EDIT:
The following attempt compiles and produces the expected output:
class rollingstreambuf : public std::basic_streambuf<TCHAR>
{
public:
typedef std::basic_streambuf<TCHAR> Base;
rollingstreambuf(Base::char_type* baseptr, size_t size)
{
setp(baseptr, baseptr + size);
}
protected:
virtual int_type overflow (int_type c)
{
// reset position to start of buffer
setp(pbase(), epptr());
return putchar(c);
}
virtual std::streamsize xsputn (const char* s, std::streamsize n)
{
if (n >= epptr() - pptr())
// reset position to start of buffer
setp(pbase(), epptr());
return Base::xsputn(s, n);
}
};
char buffer[100];
rollingstreambuf buf(buffer, sizeof(buffer));
std::basic_ostream<TCHAR> out(&buf);
for (int i=0; i<10; i++)
{
out << "Mumblemumble " << i << '\n';
}
out << std::ends; //write terminating NULL char
Printing the buffer gives:
Mumblemumble 6
Mumblemumble 7
Mumblemumble 8
Mumblemumble 9
(which confirms the roll-over has taken place)
What it does is that it makes the streambuf use the provided buffer as a cyclic output buffer (put area), without ever advancing the buffer window in the output sequence (stream).
(Using terminology from http://en.cppreference.com/w/cpp/io/basic_streambuf)
Now i feel very uncertain about the robustness and quality of this implementation. Please review and comment it.
This is a valid approach. overflow() should return:
traits::eof() or throws an exception if the function fails. Otherwise, returns some value other than traits::eof() to indicate success.
E.g.:
virtual int_type overflow (int_type c)
{
// reset position to start of buffer
setp(pbase(), epptr());
return traits::not_eof(c);
}
xsputn() should probably write the beginning of the sequence to the end of the buffer, then rewind and write the remaining sequence to the front of the buffer. You could probably get away with the default implementation of xsputn() that calls sputc(c) for each character and then overflow() when the buffer is full.

Easiest way to get a line when data is coming in chunks?

I am using win32's ReadFile to read from a child process's pipe. That gives me chunks of characters at a time and the size of each chunk, but they may or may not have new lines. I want to process the output line by line. What is the simplest way of doing so?
I thought about appending each chunk to a string, and at the end, using a stringstream to process it line by line, but I'd like to do this as data is coming in. I suppose the trick is, how and where should I detect the new line ending? If only streamreader's getline returned nothing when no delimiter is found...
Append to a string until you encounter newline char or end of data. Then you have a line in the string, process it, empty string and repeat. Reason for using the string: that you don't know how long a line may be, and a string does the reallocation etc. for you as necessary.
Special case: end of data with nothing earlier on that line should probably be not-a-line, ignored.
Cheers & hth.
What I can think of is using a buffer where to store chunks coming to you, you know the size via lpNumberOfBytesRead, so on each chunk you check if it contains new line character/s and if it does, you output all the buffered characters till the new line character, then start buffering until you received another chunk with new line characters and so on.
Some pseudo code might look like:
w_char buffer[BUFFER_SIZE]; // enough for few chunks (use new to allocate dynamic memory)
ReadLine(hFile, lpBuffer, nNumberOfBytesToRead, lpNumberOfBytesRead, lpOverlapped);
if (w_char has new_line_character) {
process_whole_line(); // don't forget to clear up the buffer
}
You can extend std::basic_streambuf and implement xsputn so it stores data in an internal buffer, and detect newlines or do whatever processing you need. If you want to process only complete lines, upon detection you can push the buffer to a std::queue up to the newline, and erase the respective part from the buffer. You'll need to implement overflow too. For example:
template<typename CharT, typename Traits = std::char_traits<CharT> >
class line_streambuf : public std::basic_streambuf<CharT, Traits> {
public:
typedef CharT char_type;
typedef Traits traits_type;
typedef std::basic_string<char_type> string_type;
typedef typename string_type::size_type size_type;
typedef typename traits_type::int_type int_type;
line_streambuf(char_type separator = '\n') : _separator(separator) {}
virtual ~line_streambuf() {}
bool getline(string_type& output) { /* pop from the queue and return */ }
protected:
virtual int_type overflow(int_type v) {
if (v == _separator) {
_processed.push(_buffer);
_buffer.erase(_buffer.begin(), _buffer.end());
} else {
_buffer += v;
}
return v;
}
virtual std::streamsize xsputn(const char_type* p, std::streamsize n) {
_buffer.append(p, p + n);
while (true) {
// The 1st find could be smarter - finding only after old_length+p
size_type pos = _buffer.find(_separator);
if (pos == string_type::npos)
break;
_processed.push(string_type(_buffer.begin(), _buffer.begin()+pos));
_buffer.erase(_buffer.begin(), _buffer.begin() + pos + 1);
}
return n;
}
private:
char_type _separator;
string_type _buffer;
std::queue<string_type> _processed;
};
NOTE: highly untested code. Let me know if you find a problem, or feel free to edit.

l-value substr method in C++

I want to create a substr method in C++ in a string class that I made.
The string class is based on C-style string of course, and I take care of the memory management.
I want to write a substr(start, length) function that can work on the regular way:
CustomString mystring = "Hello";
cout << mystring.substr(0,2); // will print "He"
And also in this way:
mystring.substr(1,3) = "DD"; // mystring will be "HDDo"
Notice that even though I get a 3 chars long sub-string, I put in the assignment only 2 chars and the output string will be HDDo, still.
Any idea how to get this done?
Thanks!
To support that, you'll probably have to write your substr() to return a proxy object that keeps track of what part of the original string is being referred to. The proxy object will overload operator=, and in it will replace the referred-to substring with the newly assigned one.
Edit in response to comments: the idea of a proxy is that it's similar enough to the class for which it's a proxy that returning a proxy is still a closed operation -- i.e. from the user's viewpoint, all that's visible is the original type of object, but it has capabilities that wouldn't be possible (or would be much more difficult to implement) without the proxy. In this case, we the proxy class would be private to the string class, so the user could never create an instance of the proxy class except as a temporary. That temporary can be used to modify its parent string if you assign to it. Using the proxy in any other way just yields a string.
As to what this buys you over attempting to do it all inside the original string: each proxy object is a temporary object -- the compiler can/will/does keep track of how to create temporaries as needed, destroys them properly at the end of a full expression, etc. The compiler also keeps track of what substring a particular assignment refers to, automatically converts one to a string when we try to use its value, and so on. Simply put, the compiler handles nearly all the hard work involved.
Here's some working code. The surrounding string class is pretty minimal (e.g. it has no searching capability). I'd expect to add a fair amount to a useful version of the string class. The proxy class, however, is complete -- I wouldn't expect to see it change much (if at all) in a feature-complete version of the string class.
#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>
class string {
std::vector<char> data;
public:
string(char const *init) {
data.clear();
data.assign(init, init+strlen(init));
}
string(string const &s, size_t pos, size_t len) {
data.assign(s.data.begin()+pos, s.data.begin()+pos+len);
}
friend class proxy;
class proxy {
string &parent;
size_t pos;
size_t length;
public:
proxy(string &s, size_t start, size_t len) : parent(s), pos(start), length(len) {}
operator string() { return string(parent, pos, length); }
proxy &operator=(string const &val) {
parent.data.erase(parent.data.begin()+pos, parent.data.begin()+pos+length);
parent.data.insert(parent.data.begin()+pos, val.data.begin(), val.data.end());
return *this;
}
};
proxy substr(size_t start, size_t len) {
return proxy(*this, start, len);
}
friend std::ostream &operator<<(std::ostream &os, string const &s) {
std::copy(s.data.begin(), s.data.end(), std::ostream_iterator<char>(os));
return os;
}
};
#ifdef TEST
int main() {
string x("Hello");
std::cout << x << std::endl;
std::cout << x.substr(2, 3) << std::endl;
x.substr(2, 3) = "DD";
std::cout << x << std::endl;
return 0;
}
#endif
Edit 2:
As far as substrings of substrings go, it depends. The one situation that's not currently covered is if you want to assign to a substring of a substring, and have it affect the original string. If you want something like x=y.substr(1,4).substr(1,2); it'll work fine as-is. The first proxy will be converted to a string, and the second substr will be invoked on that string.
If you want: x.substr(1,4).substr(1,2) = "whatever"; it won't currently work. I'm not sure it accomplishes much, but on the assumption that it does, the addition to support it is fairly minimal -- you'd add a substr member to proxy:
proxy substr(size_t start, size_t len) {
return proxy(parent, pos+start, len);
}
Presumably you want substr to return a string, rather than some other proxy class. You'd therefore need to make your string class capable of holding a pointer to its own copy of the string data and also a pointer to another string object that it was created from (as the return value of substr), along with information about which part of the string it was created from.
This might get quite complicated when you call substr on a string returned from another call to substr.
The complexity is probably not worth the attractiveness of the interface.
The first requirement is simple; look up operator's standard implementation.
Loosely, c_string& substr(int, int)
The second part, not so much, I don't think. It'll look similar, I believe. However, I'll think about it and get back to you over the weekend.