Taking ownership of streambuf/stringbuf data

Taking ownership of streambuf/stringbuf data - c++

I'd like an interface for writing to an automatically resizing array. One way to do this is with a generic std::ostream *.
Then consider if ostringstream is the target:
void WritePNG(ostream *out, const uint8_t *pixels);
void *WritePNGToMemory(uint8_t *pixels)
{
ostringstream out;
WritePng(&out, pixels);
uint8_t *copy = new uint8_t[out.tellp()];
memcpy(copy, out.str().c_str(), out.tellp()];
return copy;
}
But I want to avoid the memcpy(). Is there a way to take ownership of the array in the underlying stringbuf class and return that?
I get the feeling this can't be done using standard library, since the stream buffer might not even be a contiguous array.

If you're willing to use the old, deprecated <strstream> interface, this is fairly easy - just create a std::strstreambuf pointing at your storage, and it will work by magic. std::ostrstream even has a constructor to do this for you:
#include <iostream>
#include <strstream>
int main()
{
char copy[32] = "";
std::ostrstream(copy, sizeof copy) << "Hello, world!"
<< std::ends << std::flush;
std::cout << copy << '\n';
}
With the more modern <sstream> interface, you need to access the string stream's buffer, and call pubsetbuf() to make it to point at your storage:
#include <iostream>
#include <sstream>
int main()
{
char copy[32] = "";
{
std::ostringstream out{};
out.rdbuf()->pubsetbuf(copy, sizeof copy);
out << "Hello, world!" << std::ends << std::flush;
}
std::cout << copy << '\n';
}
Obviously, in both cases, you'll need a way to know in advance how much memory to allocate for copy, because you can't wait until tellp() is ready for you...

Here's the solution I ended up using. The idea is the same as the one proposed by HostileFork - only needing to implement overflow(). But as already hinted, it has much better throughput by buffering. It also optionally supports random access (seekp(), tellp()).
class MemoryOutputStreamBuffer : public streambuf
{
public:
MemoryOutputStreamBuffer(vector<uint8_t> &b) : buffer(b)
{
}
int_type overflow(int_type c)
{
size_t size = this->size(); // can be > oldCapacity due to seeking past end
size_t oldCapacity = buffer.size();
size_t newCapacity = max(oldCapacity + 100, size * 2);
buffer.resize(newCapacity);
char *b = (char *)&buffer[0];
setp(b, &b[newCapacity]);
pbump(size);
if (c != EOF)
{
buffer[size] = c;
pbump(1);
}
return c;
}
#ifdef ALLOW_MEM_OUT_STREAM_RANDOM_ACCESS
streampos MemoryOutputStreamBuffer::seekpos(streampos pos,
ios_base::openmode which)
{
setp(pbase(), epptr());
pbump(pos);
// GCC's streambuf doesn't allow put pointer to go out of bounds or else xsputn() will have integer overflow
// Microsoft's does allow out of bounds, so manually calling overflow() isn't needed
if (pptr() > epptr())
overflow(EOF);
return pos;
}
// redundant, but necessary for tellp() to work
// https://stackoverflow.com/questions/29132458/why-does-the-standard-have-both-seekpos-and-seekoff
streampos MemoryOutputStreamBuffer::seekoff(streamoff offset,
ios_base::seekdir way,
ios_base::openmode which)
{
streampos pos;
switch (way)
{
case ios_base::beg:
pos = offset;
break;
case ios_base::cur:
pos = (pptr() - pbase()) + offset;
break;
case ios_base::end:
pos = (epptr() - pbase()) + offset;
break;
}
return seekpos(pos, which);
}
#endif
size_t size()
{
return pptr() - pbase();
}
private:
std::vector<uint8_t> &buffer;
};
They say a good programmer is a lazy one, so here's an alternate implementation I came up with that needs even less custom code. However, there's a risk for memory leaks because it hijacks the buffer inside MyStringBuffer, but doesn't free MyStringBuffer. In practice, it doesn't leak for GCC's streambuf, which I confirmed using AddressSanitizer.
class MyStringBuffer : public stringbuf
{
public:
uint8_t &operator[](size_t index)
{
uint8_t *b = (uint8_t *)pbase();
return b[index];
}
size_t size()
{
return pptr() - pbase();
}
};
// caller is responsible for freeing out
void Test(uint8_t *&_out, size_t &size)
{
uint8_t dummy[sizeof(MyStringBuffer)];
new (dummy) MyStringBuffer; // construct MyStringBuffer using existing memory
MyStringBuffer &buf = *(MyStringBuffer *)dummy;
ostream out(&buf);
out << "hello world";
_out = &buf[0];
size = buf.size();
}

IIRC the whole reason stringstream exists (vs strstream) was to sort out the fuzzy questions of memory ownership that would come up by giving direct buffer access. e.g. I think that change was to specifically prevent what you are asking to do.
One way or another I think you'd have to do it yourself, by overriding the stream buffer. To answer a similar question I suggested something for input streams that wound up getting quite a few upvotes. But honestly I didn't know what I was talking about then, nor now when I suggest the following:
Hacking up this link from the web for doing an "uppercasing stream buffer" to one that just echoes and gives you a reference to its buffer might give:
#include <iostream>
#include <streambuf>
class outbuf : public std::streambuf {
std::string data;
protected:
virtual int_type overflow (int_type c) {
if (c != EOF)
data.push_back(c);
return c;
}
public:
std::string& get_contents() { return data; }
};
int main() {
outbuf ob;
std::ostream out(&ob);
out << "some stuff";
std::string& data = ob.get_contents();
std::cout << data;
return 0;
}
I'm sure it's broken in all kinds of ways. But the uppercase-buffer-authors seemed to think that overriding the overflow() method alone would let them uppercase all output to the stream, so I guess one could argue that it's enough to see all output if writing to one's own buffer.
But even so, going one character at a time seems suboptimal...and who knows what overhead you get from inheriting from streambuf in the first place. Consult your nearest C++ iostream expert for what the actual right way is. But hopefully it's proof that something of the sort is possible.

Related

When should I use std::string / std::string_view for parameter / return type

Introduction
I'm writing some communication application. Before C++17 (without Boost), I use std::string and its const reference as cls1.
Since C++17, I introduced std::string_view to my code as cls2.
However, I don't have clear policy when should I use std::string_view. My communication application receives data from the network and it is stored to recv_buffer. And creates some application classes from recv_buffer.
Construction
If I focus only cls1's constructor, move construction is efficient. But I think that where the parameter s from. If it is originally from the recv_buffer, I can create std::string_view at the receiving (very early) point. And during recv_buffer's lifetime is enabled, use std::string_view everywhere. If I need to store the part of recv_buffer then create std::string.
An only exception I noticed is the recv_buffer is always contained complete data for my application class. In this case, move construction is efficient.
Getter
I think using the return type as std::string_view has advantage. Some member function such as substr() is efficient. But I don't see any disadvantage, so far.
Question
I suspect that I might see only pros of std::string_view. Before re-writing many codes, I would like to know your ideas.
PoC code
#include <string>
struct cls1 {
explicit cls1(std::string s):s_(std::move(s)) {}
std::string const& get() const { return s_; }
private:
std::string s_;
};
struct cls2 {
explicit cls2(std::string_view s):s_(s) {}
std::string_view get() const { return s_; }
private:
std::string s_;
};
#include <iostream>
int main() {
// If all of the receive buffer is the target
{
std::string recv_buffer = "ABC";
cls1 c1(std::move(recv_buffer)); // move construct
std::cout << c1.get().substr(1, 2) << std::endl; // create new string
}
{
std::string recv_buffer = "ABC";
cls2 c2(recv_buffer); // copy happend
std::cout << c2.get().substr(1, 2) << std::endl; // doesn't create new string
}
// If a part of the receive buffer is the target
{
std::string recv_buffer = "<<<ABC>>>";
cls1 c1(recv_buffer.substr(3, 3)); // copy happend and move construct
std::cout << c1.get().substr(1, 2) << std::endl; // create new string
}
{
std::string recv_buffer = "<<<ABC>>>";
std::string_view ref = recv_buffer;
cls2 c2(ref.substr(3, 3)); // string create from the part of buffer directly
std::cout << c2.get().substr(1, 2) << std::endl; // doesn't create new string
}
}
Running Demo: https://wandbox.org/permlink/TW8w3je3q3D46cjk

std::string_view is a way to get some std::string const member functions without creating a std::string if you have some char* or you want to reference subset of a string.
Consider it as a const reference. If the object it refers vanishes (or changes) for any reason, you have a problem. If your code can return a reference, you can return a string_view.
Example:
#include <cstdio>
#include <string>
#include <vector>
#include <string.h>
#include <iostream>
int main()
{
char* a = new char[10];
strcpy(a,"Hello");
std::string_view s(a);
std::cout << s; // OK
delete[] a;
std::cout << s; // whops. UD. If it was std::string, no problem, it would have been a copy
}
More info.
Edit: It doesn't have a c_str() member because this needs the creation of a \0 at the end of the substring which cannot be done without modification.

Don't return a string view when:
The caller needs a null terminated string. This is often the case when dealing with C API's.
You don't store the string itself somewhere. You do store the string in a member in this case.
Do realise, that the string view becomes invalidated by operations on the original string such as changing the capacity, as well as if the original string is destroyed. If the caller needs the string for a longer than the life time of the object that stores the string, then they can copy from the view into their own storage.

Extending std::cout

I want to extend the usage of std::cout to use my own console/cout wrapper class.
Ideally I would have 2 ostreams, one for regular printing and one that appends a new line.
std::ostream Write;
Write << "Hello, I am " << 99 << " years old.";
prints Hello, I am 99 years old.
std::ostream WriteLine;
WriteLine << "Hello, I am " << 99 << " years old.";
prints Hello, I am 99 years old.\n (an actual new line, not just it escaped)
I would then like to extend this to have error streams (Error and ErrorLine, for example) which prefix "ERROR: " before the message and prints in a different color.
I know I have to create my own streams to add in this functionality, and I followed C++ cout with prefix for prefixing std::cout which was almost what I wanted but not quite. I couldn't figure out how to add a new line to the end of the stream, and the prefix was unreliable, especially when I would do more than a single print statement.
I should also mention I don't want to use overloaded operators to achieve this effect, because I want to be able to daisy-chain things on.
What didn't work
If I did WriteLine >> "First"; then WriteLine << "Second"; I would get weird results like SecondFirst\n or Second\nFirst. My ideal output would be First\nSecond\n. I think it is due to not closing/flushing/resetting the stream properly, but nothing I tried got it to work reliably.
I could get it to work for a single statement, but as soon as I added another print statement, the things I tried to print would switch order, the post/pre fix wouldn't get added in the correct spot, or I would end up with garbage.
I don't care about wchars, because we will always only need a single byte for a single char. Also we will only be working on Windows 10.
This is what I have so far:
Console.h
#include <windows.h>
#include <iostream>
#include <sstream>
#include <string>
class Console {
using Writer = std::ostream;
Console() {}
static const char newline = '\n';
class error_stream: public std::streambuf {
public:
error_stream(std::streambuf* s) : sbuf(s) {}
~error_stream() { overflow('\0'); }
private:
typedef std::basic_string<char_type> string;
int_type overflow(int_type c) {
if(traits_type::eq_int_type(traits_type::eof(), c))
return traits_type::not_eof(c);
switch(c) {
case '\n':
case '\r':
{
SetColor(ConsoleColor::Red);
prefix = "ERROR: ";
buffer += c;
if(buffer.size() > 1)
sbuf->sputn(prefix.c_str(), prefix.size());
int_type rc = sbuf->sputn(buffer.c_str(), buffer.size());
buffer.clear();
SetColor(ConsoleColor::White);
return rc;
}
default:
buffer += c;
return c;
}
}
std::string prefix;
std::streambuf* sbuf;
string buffer;
};
class write_line_stream: public std::streambuf {
public:
write_line_stream(std::streambuf* s) : sbuf(s) {}
~write_line_stream() { overflow('\0'); }
private:
typedef std::basic_string<char_type> string;
int_type overflow(int_type c) {
if(traits_type::eq_int_type(traits_type::eof(), c))
return traits_type::not_eof(c);
switch(c) {
case '\n':
case '\r':
{
buffer += c;
int_type rc = sbuf->sputn(buffer.c_str(), buffer.size());
sbuf->sputn(&newline, 1);
buffer.clear();
return rc;
}
default:
buffer += c;
return c;
}
}
std::streambuf* sbuf;
string buffer;
};
static output_stream outputStream;
static error_stream errorStream;
static write_line_stream writeLineStream;
public:
static void Setup();
static Writer Write;
static Writer WriteLine;
static Writer Err;
};
Console.cpp
#include "Console.h"
Console::Writer Console::Write(nullptr);
Console::Writer Console::WriteLine(nullptr);
Console::Writer Console::Err(nullptr);
Console::error_stream Console::errorStream(std::cout.rdbuf());
Console::write_line_stream Console::writeLineStream(std::cout.rdbuf());
void Console::Setup() {
Write.rdbuf(std::cout.rdbuf());
Err.rdbuf(&errorStream);
WriteLine.rdbuf(&writeLineStream);
}
Main.cpp
int main() {
Console::Setup();
Console::Write << "First" << "Second";
Console::WriteLine << "Third";
Console::WriteLine << "Fourth";
Console::Write << "Fifth";
Console::Error << "Sixth";
Console::ErrorLine << "Seventh";
Console::WriteLine << "Eighth";
}
Which should give an output of
FirstSecondThird
Fourth
FifthERROR: SixthERROR: Seventh
Eighth
Press any key to continue...
Any help and/or suggestions are appreciated.

There are multiple concerns here which do require different approaches. Some of the description also seems that the actual desire isn't quite clear. The most problematic requirement is that a newline needs to be inserted apparently at the end of a statement. That's certainly doable but effectively does require a temporary being around.
Before going there I want to point out that most other languages offering a print-line(....) construct delineate what is going onto a line using a function call. There is no doubt where the newline goes. If C++ I/O would be created now I'd be quite certain that it would be based on a variadic (not vararg) function template. This way print something at the end of the expression is trivial. Using a suitable manipulator at the end of the line (although probably not std::endl but maybe a custom nl) would be an easy approach.
The basics of adding a newline at the end of expression would be using a destructor of a suitable temporary object to add it. The straight forward way would be something like this:
#include <iostream>
class newline_writer
: public std::ostream {
bool need_newline = true;
public:
newline_writer(std::streambuf* sbuf)
: std::ios(sbuf), std::ostream(sbuf) {
}
newline_writer(newline_writer&& other)
: newline_writer(other.rdbuf()) {
other.need_newline = false;
}
~newline_writer() { this->need_newline && *this << '\n'; }
};
newline_writer writeline() {
return newline_writer(std::cout.rdbuf());
}
int main() {
writeline() << "hello, " << "world";
}
This works reasonable nice. The notation in the question doesn't use a function call, though. So, instead of writing
writeline() << "hello";
it seems necessary to write
writeline << "hello";
instead and still add a newline. This complicates matters a bit: essentially, writeline now needs to be an object which somehow causes another object to jump into existence upon use so the latter can do its work in the destructor. Using a conversion won't work. However, overloading an output operator to return a suitable object does work, e.g.:
class writeliner {
std::streambuf* sbuf;
public:
writeliner(std::streambuf* sbuf): sbuf(sbuf) {}
template <typename T>
newline_writer operator<< (T&& value) {
newline_writer rc(sbuf);
rc << std::forward<T>(value);
return rc;
}
newline_writer operator<< (std::ostream& (*manip)(std::ostream&)) {
newline_writer rc(sbuf);
rc << manip;
return rc;
}
} writeline(std::cout.rdbuf());
int main() {
writeline << "hello" << "world";
writeline << std::endl;
}
The primary purpose of the overloaded shift operators is to create a suitable temporary object. They don't try to mess with the content of the character stream. Personally, I'd rather have the extra parenthesis than using this somewhat messy approach but it does work. What is kind of important is that the operator is also overloaded for manipulators, e.g., to allow the second statement with std::endl. Without the overload the type of endl can't be deduce.
The next bit is writing the prefix and mixing multiple streams. The important bit here is to realize that you'd want to one of two things:
Immediately write the characters to a common buffer. The buffer is most like just another stream buffer, e.g., the destination std::streambuf.
If the character should be buffered locally in separate stream buffers, the corresponding streams need be flushed in a timely manner, e.g., after each insertion (by setting the std::ios_base::unitbuf bit) or, latest, at the end of the expression, e.g., using an auxiliary class similar to the newline_writer.
Passing through the characters immediately is fairly straight forward. The only slight complication is to know when to write a prefix: upon the first non-newline, non-carriage-return return after a newline or a carriage return (other definitions are possibly and should be easily adaptable). The important aspect is that stream buffer doesn't really buffer but actually passes through the character to the underlying [shared] stream buffer:
class prefixbuf
: public std::streambuf {
std::string prefix;
bool need_prefix = true;
std::streambuf* sbuf;
int overflow(int c) {
if (c == std::char_traits<char>::eof()) {
return std::char_traits<char>::not_eof(c);
}
switch (c) {
case '\n':
case '\r':
need_prefix = true;
break;
default:
if (need_prefix) {
this->sbuf->sputn(this->prefix.c_str(), this->prefix.size());
need_prefix = false;
}
}
return this->sbuf->sputc(c);
}
int sync() {
return this->sbuf->pubsync();
}
public:
prefixbuf(std::string prefix, std::streambuf* sbuf)
: prefix(std::move(prefix)), sbuf(sbuf) {
}
};
The remaining business is to set up the relevant objects in the Console namespace. However, doing so is rather straight forward:
namespace Console {
prefixbuf errorPrefix("ERROR", std::cout.rdbuf());
std::ostream Write(std::cout.rdbuf());
writeliner WriteLine(std::cout.rdbuf());
std::ostream Error(&errorPrefix);
writeliner ErrorLine(&errorPrefix);
}
I except that the approach adding the newline creates a custom type I think that matches the original goes. I don't think the temporary object can be avoided to automatically create a newline at the end of a statement.
All that said, I think you should use C++ idioms and not try to replicate some other language in C++. The way to choose whether a line end in newline or not in C++ is to write, well, a newline where one should appear potentially by way of a suitable manipulator.

Save struct with an array of pointers into a file

I'm working on c++ and I need to save this struct into a file:
struct myStruct{
int value;
int value2;
MyClass * myClass[10];
};
The way that I'm saving this struct is the following:
myStruct my_struct;
my_struct.value = 1;
my_struct.value2 = 2;
for ( int i = 0; i < 10; i++){
my_struct.myClass[i] = new MyClass();
}
FILE* f = fopen(path.c_str(), "wb");
if ( f != NULL){
fwrite(&my_struct, sizeof(myStruct), 1, f);
fclose(f);
}
But, when I want to read this file, my program crashes when try to access to the array of "MyClass":
FILE* f = fopen(path.c_str(), "rb");
if ( f != NULL){
fread(&my_struct2, sizeof(struct myStruct), 1, f);
fclose(f);
}
for ( int i = 0; i < 10; i ++ ){
if ( my_struct2.myClass[i] != NULL ){
//Here is the crash
}
}
I've been searching but I can't find a solution. I only find topics about arrays of structs. I know that maybe I'm not searching very well.
Thanks.

Your MyStruct contains twenty pointers to other structures.
By fwrite()ing the contents of your MyStruct to a file, you have successfully written twenty raw memory addresses of your other structures into the file, together with the other members of the MyStruct class.
Which, of course, is utterly meaningless when you try to read them back in another process. You've read back twenty raw memory addresses. Which mean nothing to a completely unrelated process. And, accessing those memory addresses, unsurprisingly, leads to a crash since those memory addresses, for all intents and purposes, are completely random values.
What your code needs to do is not write twenty raw pointer addresses to the file, but the contents of those pointers, and what they point to.

I want to add some things to Sam's answer, even if I know this is not code review, you are writing C in C++.
C++ is not meant to be coded in C, it doesn't want to... It fought its entire life to break its bound with its deprecated father, to surpass him, to explore new meanings and way to solve problems and build efficient code. Don't do this to him... (I love C by the way, deprecated was a joke obviously ;) )
Here's how I'd do it:
#include <fstream>
#include <iostream>
class MyClass
{
public:
MyClass() : _type(-1) {}
MyClass(int type) : _type(type) {}
inline const int &type() const
{ return _type; }
private:
int _type;
};
// -- overload of operator<< that permits me to write a MyClass* to a stream
std::ostream &operator<<(std::ostream &stream, MyClass *myClass)
{
stream << "myClass::type: " << myClass->type();
return stream;
}
struct MyStruct
{
int value;
int value2;
MyClass *myClasses[10];
MyStruct()
{
value = -1;
value2 = 1;
for (std::size_t i = 0 ; i < 10 ; ++i)
{ myClasses[i] = new MyClass(-i); }
}
};
// -- overload of operator<< that permits me to write a MyStruct to a stream
std::ostream &operator<<(std::ostream &stream, const MyStruct &myStruct)
{
stream << "myStruct::"
<< "\n\t value: " << myStruct.value
<< "\n\t value2: " << myStruct.value2
<< "\n\t myClasses: ";
for (std::size_t i = 0 ; i < 10 ; ++i)
{ stream << "\n\t\t " << myStruct.myClasses[i]; }
return stream;
}
int main()
{
std::ofstream outputFile("output.txt");
if (outputFile.is_open() == false)
{ std::cerr << "Could not open file." << std::endl; return -1; }
outputFile << MyStruct() << std::endl; // -- this is where the information is written into the file
outputFile.close();
}
See simple way to write a struct, you could even get it back into the struct the same way with operator>> overload, bonus is you can use on any ostream, which means it will work with sstream, std::cout and everything!
Still this is not really c++-like as there is too much (unprotected) pointers and unchecked magical number sizes (MyClass *myClasses[10]; this is a no-no for me, because it implies this thing: for (std::size_t i = 0 ; i < 10 ; ++i), and this shit gives me shivers).
I would probably use an std::array here
, but I wanted to keep MyStruct as you defined it so the example stay "close" to what you wrote. Another way would have been to use std::unique_ptr or std::shared_ptr.
This can seem as quite a bit of work or intimidating, but you may find that useful in the future. Same goes for using the std containers(array, set, vector, map, etc...), unique_ptr and shared_ptr. But I assure you it's worth giving some time to understand them and learn how to use them. It makes things simpler and safer.
What gave me shivers earlier would be written like this:
std::array<MyClass, 10> myClasses;
Loops would go like this:
for (std::size_t i = 0 ; i < myClasses.size() ; ++i)
{ myClasses[i].type(); }
for (std::array<MyClass, 10>::iterator itC = myClasses.begin() ; itC != myClasses.end() ; ++itC)
{ itC->type(); }
Or even better, a c++11 way to write a loop, that I find easier to read and write:
for (auto myClass : myClasses)
{ myClass.type(); }
Note that if you want to modify myClass inside this one you need to write auto& myClass : myClasses
Hope it helps you.

Using fwrite(&my_struct, sizeof(myStruct), 1, f); is good if your struct my_struct contains purely static data(i.e the data for which memory was allocated at compile time). If it contains dynamic data(i.e the data for which memory is allocated at runtime) then you need to manually store such dynamic data.
Overloading operator<< as shown my #vianney is a good method of saving/serializing dynamic data.

Difference between strstream and stringstream

I was going through these two class implementations and found out that the strstream class is deprecated.
And if I use the stringstream class as replacement, then there is big difference between how they log in the buffer since the stringstream class object maintains a deep copy of the buffer.
Has anyone faced any problem while replacing strstream with stringstream class?
What would be the output of this code and why?
#include<iostream>
#include <sstream>
#include <strstream>
int main(){
char strArr[] = "Soheb Khan is great";
char stringArr[] = "TurboCharging";
std::strstream strStream(strArr,19);
std::stringstream stringStream(std::string(stringArr,19));
std::cout<<"Before Modification strArr= "<<strArr<<" & stringArr= "<<stringArr<<std::endl;
strStream << "Fifa 2012 is nice";
stringStream << "Sometimes its sucks";
std::cout<<"After Modification strArr= "<<strArr<<" & stringArr= "<<stringArr<<std::endl;
return 0;
}

The classes from <strstream> are hideous to use. When they were more popular I haven't seen any correct production used (well, they got corrected when I spotted the problems). Either people didn't terminate the string using std::ends or they didn't release the memory using s.freeze(0) (or, most often, both). Although the <sstream> class do create a copy I haven't found this to be a problem.
In case memory allocation actually matters for your use case, either because you need to allocate large chunks or because you have many of them, you can take control easily and have data read from or written to buffers you provide using a custom stream buffer. For example, a stream buffer writing to a readily allocate chunk of memory is trivial to write:
struct omembuf
: std::streambuf {
{
omembuf(char* base, std::size_t size) {
this->setp(base, base + size);
}
char* begin() const { return this->pbase(); }
char* end() const { return this->pptr(); }
};

Realloc()/Resize an object in C++ for a string implementation

When they are represented in memory, are C++ objects the same as C structs?
For example, with C, I could do something like this:
struct myObj {
int myInt;
char myVarChar;
};
int main() {
myObj * testObj = (myObj *) malloc(sizeof(int)+5);
testObj->myInt = 3;
strcpy((char*)&testObj->myVarChar, "test");
printf("String: %s", (char *) &testObj->myVarChar);
}
I don't think C++ allows overloading the + operator for the built-in char * type.
So i'd like to create my own lightweight string class which has no extra overhead that std::string has. I think std::string is represented contiguously:
(int)length, (char[])data
I want exactly the same functionality but without prefixing the length (saving 8 bytes overhead).
Here is the code i'm using to test, but it results in a segfault
#include <iostream>
using namespace std;
class pString {
public:
char c;
pString * pString::operator=(const char *);
};
pString * pString::operator=(const char * buff) {
cout << "Address of this: " << (uint32_t) this << endl;
cout << "Address of this->c: " << (uint32_t) &this->c << endl;
realloc(this, strlen(buff)+1);
memcpy(this, buff, strlen(buff));
*(this+strlen(buff)) = '\0';
return this;
};
struct myObj {
int myInt;
char myVarChar;
};
int main() {
pString * myString = (pString *) malloc(sizeof(pString));
*myString = "testing";
cout << "'" << (char *) myString << "'";
}
Edit: nobody really understands what i want to do. Yes i know i can have a pointer to the string in the class but thats 8 bytes more expensive than a plain cstring, i wanted exactly the same internal representation. Thanks for trying though
Edit: The end result of what i wanted to achieve was being able to use the + operator with NO extra memory usage compared to using strcat etc
const char * operator+(const char * first, const char * second);

You should not waste your time writing string classes - there's a reason people spent time writing them in the first place and it's naive to think they wrote them because they wanted to create big obfuscated and overheaded code that you could easily improve in a matter of hours.
For example your code has quadratic complexity for memory reallocations in the assignment operator - each assignment of a sting greater by 1 character will use a new memory block greater by 1 byte resulting in big memory fragmentation after a "few" assignments like this - you save a few bytes but potentialy lose megabytes to address space and memory page fragmentation.
Also designing this way you have no way of efficiently implementing the += operator as instead of just copying the appended string in most cases you will always need to copy the whole string - thus reaching quadratic complexity again in case you append small strings to one bigger one a few times.
Sorry but your idea looks to have great chances of becoming terrible to maintain and orders of magnitude less efficient then the typical string implementations like std::string.
Don't worry - this is true for practicaly all great ideas of "writing your own better version of a standard container" :)

struct myObj {
//...
char myVarChar;
};
This won't work. You either need a fixed size array, a pointer to char or use the struct hack. You won't be able to assign a pointer to this myVarChar.
so i'd like to create my own lightweight string class which has no extra overhead std::string has.
What extra overhead are you referring to? Have you tested and measured to see if std::string is really a bottleneck?
I think std::string is represented contiguously
Yes, mostly, the character buffer part. However, the following:
(int)length(char[])data
is not required by the standard. Translated: A string implementation need not use this particular layout of its data. And it may have additional data.
Now, your lightweight string class is frought with errors:
class pString {
public:
char c; // typically this is implementation detail, should be private
pString * pString::operator=(const char *);
// need ctors, dtors at least as well
// won't you need any functions on strings?
};
Try something along the lines of the following:
/* a light-weight string class */
class lwstring {
public:
lwstring(); // default ctor
lwstring(lwstring const&); // copy ctor
lwstring(char const*); // consume C strings as well
lwstring& operator=(lwstring const&); // assignment
~lwstring(); // dtor
size_t length() const; // string length
bool empty() const; // empty string?
private:
char *_myBuf;
size_t _mySize;
};

Wow. What you're trying to do is a complete abuse of C++, would be totally compiler dependent if it worked, and would surely land you in TheDailyWTF some day.
The reason you're getting a segfault is probably because your operator= is reallocating the object to a different address, but you're not updating the myString pointer in main. I hesitate to even call it an object at this point, since no constructor was ever called.
I think what you're trying to do is make pString a smarter pointer to a string, but you're going about it all wrong. Let me take a crack at it.
#include <iostream>
using namespace std;
class pString {
public:
char * c;
pString & operator=(const char *);
const char * c_str();
};
pString & pString::operator=(const char * buff) {
cout << "Address of this: " << (uint32_t) this << endl;
cout << "Address of this->c: " << (uint32_t) this->c << endl;
c = (char *) malloc(strlen(buff)+1);
memcpy(c, buff, strlen(buff));
*(c+strlen(buff)) = '\0';
return *this;
};
const char * pString::c_str() {
return c;
}
int main() {
pString myString;
myString = "testing";
cout << "'" << myString.c_str() << "'";
}
Now I wouldn't use malloc but new/delete instead, but I left this as close to your original as possible.
You might think you are wasting the space of a pointer in your class, but you aren't - you're trading it for the pointer you previously kept in main. I hope this example makes it clear - the variables are the same size, and the amount of additional memory allocated by malloc/realloc is the same as well.
pString myString;
char * charString;
assert(sizeof(myString) == sizeof(charString));
P.S. I should point out that this code still needs a lot of work, it's full of holes. You need a constructor to initialize the pointer, and a destructor to free it when it's done, just for starters. You can do your own implementation of operator+, too.

You cannot change the size of an object/struct in either C or C++. Their sizes are fixed at compile time.

when they are represented in memory are objects C++ objects the same as C structs.
Strictly speaking, no. In general, yes. C++ classes and structs are identical in memory layout to C structs except:
Bit fields have different packing rules
Sizes are fixed at compile time
If there are any virtual functions, the compiler will add a vtable entry to the memory layout.
If the object inherits a base class, the new class' layout will be appended to the base class layout, including vtable, if any.
I don't think C++ allows overloading the + operator for the built in char * type. so i'd like to create my own lightweight string class which has no extra overhead std::string has. I think std::string is represented contiguously
You can create a operator+ overload for the char* type. Normal behavior is pointer arithmetic. std::string overloads operator+ to append char* data to the string. The string is stored in memory as a C string, plus additional information. The c_str() member function returns a pointer to the internal char array.
In your C example, you're relying on undefined behavior. Don't realloc like that. It can result in Bad Things - namely bizarre segfaults.
Your C++ example is also doing Bad Things in doing realloc(this). Instead, you should carry a char* and get a new char[] buffer to store the chars in instead of a realloc(). Behavior for such a realloc is undefined.

There is a lot a wrong with your class definition/usage. If you want to store a string you should use a pointer type, like char* a member, not an individual char. Using a single char means that only a single character of memory is allocated.
Another mistake is the allocation code where you do a realloc on this - you can potentially change the memory allocated, but not the value of this. You must assign the result to this to achieve this (this = (*pString)realloc(this, strlen(buff+1));) and that is really bad practice anyway. Much better to use realloc on a char* member.
Unfortunately C++ proper has no alternative for realloc or expand and you must instead use new and delete, doing any copying yourself.

Why do you write in C with classes, why don't use C++?

I do not think 'this' works the way you think it works.
Specifically, you cannot reallocate this to point to a larger buffer in a member function, because whatever called that member function still has a pointer to the old 'this'. Since it's not passed by reference there is no way that you can update it.
The obvious way around that is that your class should hold a pointer to the buffer and reallocate that. However, reimplementing a string class is a good way to give yourself lots of headaches down the line. A simple wrapper function would probably accomplish what you wanted (assuming "being able to use the + operator with NO extra memory usage compared to using strcat" is really what you wanted):
void concatenate(std::string& s, const char* c) {
s.reserve(s.size() + strlen(c));
s.append(c);
}
There's some probability that append may do that internally anyway though.

don't mind the lack of const correctness, as this is a mock up, but how about this:
class light_string {
public:
light_string(const char* str) {
size_t length = strlen(str);
char* buffer = new char[sizeof(size_t) + length + 1];
memcpy(buffer, &length, sizeof(size_t));
memcpy(buffer + sizeof(size_t), str, length);
memset(buffer + sizeof(size_t) + length, 0, 1);
m_str = buffer + sizeof(size_t);
}
~light_string() {
char* addr = m_str - sizeof(size_t);
delete [] addr;
}
light_string& operator =(const char* str) {
light_string s = str;
std::swap(*this, s);
return *this;
}
operator const char*() {
return m_str;
}
size_t length() {
return
*reinterpret_cast<size_t *>(m_str - sizeof(size_t));
}
private:
char* m_str;
};
int main(int argc, char* argv[])
{
cout<<sizeof(light_string)<<endl;
return 0;
}

You are moving the "this" pointer. Thats not going to work.
I think what you really want is just a wrapper around a buffer.

#include <iostream>
using namespace std;
class pString {
public:
char c[1];
pString * pString::operator=(const char *);
};
pString * pString::operator=(const char * buff) {
cout << "Address of this: " << (uint32_t) this << endl;
cout << "Address of this->c: " << (uint32_t) &this->c << endl;
realloc(this->c, strlen(buff)+1);
memcpy(this->c, buff, strlen(buff));
*(this->c+strlen(buff)) = '\0';
return this;
};
struct myObj {
int myInt;
char myVarChar;
};
int main() {
pString * myString = (pString *) malloc(sizeof(pString));
*myString = "testing vijay";
cout << "'" << ((char*)myString << "'";
}
This should work. But its not advisable.

#include <iostream>
using namespace std;
class pString {
public:
char c;
pString * pString::operator=(const char *);
};
pString * pString::operator=(const char * buff) {
cout << "Address of this: " << (uint32_t) this << endl;
cout << "Address of this->c: " << (uint32_t) &this->c << endl;
char *newPoint = (char *)realloc(this, strlen(buff)+1);
memcpy(newPoint, buff, strlen(buff));
*((char*)newPoint+strlen(buff)) = '\0';
cout << "Address of this After: " << (uint32_t) newPoint << endl;
return (pString*)newPoint;
};
int main() {
pString * myString = (pString *) malloc(sizeof(pString));
*myString = "testing";
cout << "Address of myString: " << (uint32_t) myString << endl;
cout << "'" << (char *) myString << "'";
}
Works When realloc doesn't re-assign the pointer i.e.
Address of this: 1049008
Address of this->c: 1049008
Address of this After: 1049008
Address of myString: 1049008 'testing'
Works, but when the the following happens it fails
Address of this: 1049008
Address of this->c: 1049008
Address of this After: 1049024
Address of myString: 1049008 ''
the obvious solution is to have
this = (pString*) newPoint;
But the compiler complains about an invalid lvalue in assignment. Does anyone the correct way to update this (just for completeness, i doubt i'll use the code since everyone seems to hate it). Thanks

If you want something that is basically the same as std::string except that it doesn't know how long the string is, you should learn how std::string works, what operator overloads it has, etc. and then mimic that, with just the differences you want.
There is unlikely to be any real point to this, however.
Regarding your latest update - you say you want a design in which general application code will be passing around naked pointers to heap objects. With no automatic cleanup.
This is, quite simply, a very bad idea.

What you want to do doesn't and cannot work in C++. What you are looking for is the C99-feature of flexible arrays. This works nice in C99 for two reasons, first you don't have build-in constructors and second you don't have inheritance (at least not as a language feature). If a class inherits from another the memory used by the subclass is packed by hind the memory of the parent class, but a flexible array needs to be at the end the structure/class.
class pString {
char txt[];
}
class otherString : pString { // This cannot work because now the
size_t len; // the flexible array is not at the
} // end
Take std::string it was written by experts of C++, I'm sure they didn't leaved out a "good trick" without a reason. If you still find out that they don't perform very well in your programm, use plain C strings instead, of course, they don't provide the sweet API, you want.

You can't realloc C++ objects. As others pointed out this is not really a pointer you can modify, there's no guarantee that it will be pointing to an area realloc has access.
One solution to concatenation is to implement a class hierarchy that will defer the real concatenation until it is needed.
Something like this
class MyConcatString;
class MyString {
public:
MyString(const MyConcatString& c) {
reserve(c.l.length()+c.r.lenght());
operator = (l);
operator += (r);
}
MyConcatString operator + (const MyString& r) const {
return MyConcatString(*this, r);
}
};
class MyConcatString {
public:
friend class MyString;
MyConcatString(const MyString& l, const MyString& r):l(l), r(r) {};
...
operator MyString () {
MyString tmp;
tmp.reserve(l.length()+r.length());
tmp = l;
tmp += r;
return tmp;
}
private:
MyString& l;
MyString& r;
}
So if you have
MyString a = "hello";
MyString b = " world";
MyString c = a + b;
Will turn into
MyString c = MyConcatString(a, b);
For more detail check "The C++ Programming language".
Other solution, is to wrap char* inside a struct, but you seem to no like this idea.
But whatever solution you will choose, objects in C++ can't be relocated.

If you want performance, you can write your class like this:
template<int max_size> class MyString
{
public:
size_t size;
char contents[max_size];
public:
MyString(const char* data);
};
Initialize max_size to an appropriate value under context. In this way the object can be created on stack, and no memory allocation is involved.
It is possible to create what you desired by overloading new operator:
class pstring
{
public:
int myInt;
char myVarchar;
void* operator new(size_t size, const char* p);
void operator delete(void* p);
};
void* pstring::operator new(size_t size, const char* p)
{
assert(sizeof(pstring)==size);
char* pm = (char*)malloc(sizeof(int) + strlen(p) +1 );
strcpy(sizeof(int)+pm, p);
*(int*)(pm) = strlen(p); /* assign myInt */
return pm;
}
void pstring::operator delete(void* p)
{
::free(p);
}
pstring* ps = new("test")pstring;
delete ps;

This code is a mess and RnR and others suggested is not advisable. But it works for what i want it to do:
#include <iostream>
using namespace std;
struct pString {
/* No Member Variables, the data is the object */
/* This class cannot be extended & will destroy a vtable */
public:
pString * pString::operator=(const char *);
};
pString& operator+(pString& first, const char *sec) {
int lenFirst;
int lenSec = strlen(sec);
void * newBuff = NULL;
if (&first == NULL)
{
cout << "NULL" << endl;
lenFirst = 0;
newBuff = malloc(sizeof(pString)+lenFirst+lenSec+1);
} else {
lenFirst = strlen((char*)&first);
newBuff= (pString*)realloc(&first, lenFirst+lenSec+1);
}
if (newBuff == NULL)
{
cout << "Realloc Failed"<< endl;
free(&first);
exit(0);
}
memcpy((char*)newBuff+lenFirst, sec, lenSec);
*((char*)newBuff+lenFirst+lenSec) = '\0';
cout << "newBuff: " << (char*)newBuff << endl;
return *(pString*)newBuff;
};
pString * pString::operator=(const char * buff) {
cout << "Address of this: " << (uint32_t) this << endl;
char *newPoint = (char *)realloc(this, strlen(buff)+200);
memcpy(newPoint, buff, strlen(buff));
*((char*)newPoint+strlen(buff)) = '\0';
cout << "Address of this After: " << (uint32_t) newPoint << endl;
return (pString*)newPoint;
};
int main() {
/* This doesn't work that well, there is something going wrong here, but it's just a proof of concept */
cout << "Sizeof: " << sizeof(pString) << endl;
pString * myString = NULL;
//myString = (pString*)malloc(1);
myString = *myString = "testing";
pString& ref = *myString;
//cout << "Address of myString: " << myString << endl;
ref = ref + "test";
ref = ref + "sortofworks" + "another" + "anothers";
printf("FinalString:'%s'", myString);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Taking ownership of streambuf/stringbuf data - c++

Related

When should I use std::string / std::string_view for parameter / return type

Extending std::cout

Save struct with an array of pointers into a file

Difference between strstream and stringstream

Realloc()/Resize an object in C++ for a string implementation

Categories

Resources