boost::interprocess::basic_string as std::string - c++

I am trying to replace a class method which returns const std::string & with const boost::interprocess::basic_string &. The main challenge I am facing is the incompatibility between the two classes despite their implementation similarity. For more clear explanation I will put that into code
class A
{ std::string m_str;
const std::string & StrVal() { return m_str; }
}
Now this class has to look like this:
typedef boost::interprocess::allocator<char,boost::interprocess::managed_shared_memory::segment_manager> ShmemAllocatorChar;
typedef boost::interprocess::basic_string<char, std::char_traits<char>,ShmemAllocatorChar> ShMemString;
class A
{
ShMemString m_str;
const ShMemString & StrVal() { return m_str; }
}
The problem is that we have a huge code base depending on this:
A a;
const std::string & str = a.StrVal();
// Many string specific operations go here, comparing str with other std::strings for instance
Even If I go over all the code replacing the expected results with const ShMemString &, it will be an even harder work to also fix the uses that follow. I was surprised to find out that the boost's string does not include any comparison/construction methods from std::string.
Any ideas on how to approach this?

Even if boost::interprocess::basic_string<> did have a conversion to std::basic_string<>, it would be completely useless for your purposes -- after the conversion, the interprocess string would be destroyed, and its allocator is the important one (i.e., the one holding the data in shared memory, which I assume is your motivation for switching basic_string<> implementations in the first place).
So, in the end, you have no choice but to go over all the code replacing the expected results with ShMemString const& (or auto const& if your compiler is recent enough to support it).
To make this less painful in the future, typedef judiciously:
struct A
{
typedef ShMemString StrValType;
StrValType const& StrVal() { return m_str; }
private:
StrValType m_str;
};
// ...
A a;
A::StrValType const& str = a.StrVal();
This way, only the typedef inside of A needs to change and all code relying on it will automatically use the correct type.

The problem is that we have a huge code base depending on this:
Why does A::StrVal in the second one return an interprocess::basic_string? It is an implementation detail of the class A that it uses interprocess::basic_string internally. The actual string class it's interface uses does not have to be the same. This is simply poor refactoring.
A::StrVal should return a std::string, just like always (well, not a const& of course, but user code won't need to change because of that). And therefore, A::StrVal will need to do the conversion between the two string types. That's how proper refactoring is done: you change the implementation, but the interface stays the same.
Yes, this means you're going to have to copy the string data. Live with it.

Related

Standard way to handle the encapsulated access to values stored in private map without breaking the abstraction in C++

I want to create a class in order to manage markup language (such as HTML) in C++. I would like my class to retain attributes and sub-tags. The problem is, given encapsulated containers, how to properly abstract the accesses and what to return in order to provide an easy way to check if the value returned is valid.
I defined my class containing two maps as private members (nominally, std::map<std::string, Tag> _children; and std::map<std::string, std::string> _attr;. I defined two functions to populate these fields and I would like to define two functions to read-access the stored elements.
The problem is, I don't want to break my abstraction and, as I'm doing this in order to work on my c++ skills, I would like to find the proper way (or cleaner way, or standard way) to do it.
One basic solution would be to simply call return map.find(s);, but then I would have to define the return type of my function as std::map<std::string, Tag>::const_iterator, which would break the abstraction. So I could dereference the iterator returned by map.find(), but in case the value in not in the map I would dereference a non-dereferencable iterator (_children.cend()).
What I defined so far:
using namespace std;
class Tag {
static const regex re_get_name, re_get_attributes;
string _name;
map<string,string> _attr;
map<string,Tag> _children;
public:
Tag(const string &toParse) {
/* Parse line using the regex */
}
const string& name() const {
return _name;
}
Tag& add_child(const Tag& child) {
_children.insert(child._name, child);
return *this;
}
SOMETHING get_child(const string& name) const {
map<string,Tag>::const_iterator val = _children.find(name);
/* Do something here, but what ? */
return something;
}
SOMETHING attr(const string& name) const {
map<string, string>::const_iterator val = _attr.find(name);
/* Do something here, but what ? */
return something;
}
};
const regex Tag::re_get_name("^<([^\\s]+)");
const regex Tag::re_get_attributes(" ([^\\s]+) = \"([^\\s]+)\"");
What would be the proper way to handle this kind on situation in C++? Should I create my own Tag::const_iterator type? If so, how to? Should I go for a more "C" approach, where I just define the return type as Tag& and return NULL if the map doesn't contain my key? Should I be more OOP with a static member static const Tag NOT_FOUND, and return a reference to this object if the element isn't in my map? I also thought of throwing an exception, but exception management seems to be quite heavy and ineffective in C++.
std::optional could help you, but needs a C++17 ready standard library, so in the meantime you could also use boost::optional which is more or less the same, since AFAIK std::optionals design was based on the boost one. (As boost is often the source for new C++ standard proposals)
Even as I am reluctant to make you a proposal because of the general problem of your approach, I still wrote one for you, but please consider the points after the code:
#include <string>
#include <regex>
#include <map>
#include <boost/optional.hpp>
class Tag {
static const std::regex re_get_name, re_get_attributes;
using string = std::string;
string _name;
std::map<string,string> _attr;
std::map<string,Tag> _children;
public:
Tag(const string &toParse) {
/* Parse line using the regex */
}
const string& name() const {
return _name;
}
Tag& add_child(const Tag& child) {
_children.emplace(child._name, child);
return *this;
}
boost::optional<Tag> get_child(const string& name) const {
auto val = _children.find(name);
return val == _children.cend() ? boost::optional<Tag>{} : boost::optional<Tag>{val->second};
}
boost::optional<string> attr(const string& name) const {
auto val = _attr.find(name);
return val == _attr.cend() ? boost::optional<string>{} : boost::optional<string>{val->second};
}
};
As you can see you are basically just reimplementing container semantics of std::map but also with the somehow built in parser logic. I strongly disagree from this approach, since parsing gets ugly very fast in a hurry, and mixing value generation code into a container which could i.e. should be use as a value class will make things even worse.
My first suggestion is to just declare/use your Tag class/struct as a value class, so just containing the std::maps as public members. Put your parsing functions in a namespace along with the Tag container, and let them just be functions or distinct classes if needed.
My second suggestion is small one: Don't prefix with _, it's reserved and considered bad style, but you can use it as a suffix. Also don't use using namespace directives outside of a class/function/namespace block i.e. global, it's bad style in a .cpp, and extremely bad style in a header /.h/.hpp
My third suggestion: Use the boost spirit qi parser frame work, you would just declare your value classes as I suggestion first, while qi would automatically fill them, via boost fusion. If you know the EBNF notation already, you can just write the EBNF like grammar in C++, and the compiler will generate a parser via template magic. However qi and especially fusion has some issues, but it makes things much easier in the long run. Regexes only does half of the parsing logic, at best.

How to shorten this method signature?

I have the following class with a method signature as below:
class Foo
{
public:
std::vector<std::string> barResults(const std::vector<std::string>&, const std::vector<std::string>&);
}
In the implementation file, I've got this:
std::vector<std::string> Foo::barResults(const std::vector<std::string>& list1, const std::vector<std::string>& list2)
{
std::vector<std::string> results;
// small amount of implementation here...
return results;
}
So I thought to myself, let's see if I can simplify this function signature a bit with some auto-magic as it's getting to be a "bit of a line-full"! So I tried this...
class Foo
{
public:
auto barResults(const std::vector<std::string>&, const std::vector<std::string>&);
}
auto Foo::barResults(const std::vector<std::string>& list1, const std::vector<std::string>& list2)
{
std::vector<std::string> results;
// small amount of implementation here...
return results;
}
Now ignoring the fact that, yes I can use a "using namespace std", to trim it down a lot, I was wondering why the compiler gave me an error "a function that returns 'auto' cannot be used before it is defined".
I personally would have thought the compiler would have easily been able to deduce the return type of the method, but in this case it doesn't seem so. Sure, you can fix it with a trailing return type as below:
class Foo
{
public:
std::vector<std::string> barResults(const std::vector<std::string>&, const std::vector<std::string>&) -> std::vector<std::string>;
}
But then if you use the above, it's no better than it was before. So, apart from "using namespace std", is there a nicer way to do the above, and why can't the compiler deduce the return-type in this instance? Or even, does it depend on how this method is invoked that's causing the compiler not to be able to figure out the return type.
The issue here is an issue of how include files work. Your error:
a function that returns 'auto' cannot be used before it is defined
means that in the file you are using your function, its definition (ie. implementation) not anywhere in the file before the usage. This means that the compiler compiling your code using the function can't deduce the functions return type, as that requires access to the definition (implementation). The most likely reason for this is that the function's definition (implementation) is in its own source (.cpp, .c, etc.) file, that is not included. To more fully understand this, I recommend reading this answer, and perhaps this answer as well.
To address the titular question, likely the easiest way to shorten that signature is with a typedef. More specifically, you can add the following code wherever you see appropriate, provided the scoping is appropriate (I would add it as a public member in your class):
typedef std::vector<std::string> strvec;
This allows you to re-write your method signature as the much more manageable:
strvec barreuslts(const strvec&, const strvec&)
When sticking to C++11, you can't rely on deduced return types, you need the trailing return type (C++14 allows that, though). As there is nothing special about the trailing return type in your case, i.e., no expression is passed decltype to determine the return type, I would instead try to shorten the method signature with some type aliases:
class Foo
{
public:
using StrVec = std::vector<std::string>;
StrVec barResults(const StrVec&, const StrVec&);
};
Foo::StrVec Foo::barResults(const StrVec& list1, const StrVec& list2)
{
StrVec results;
// small amount of implementation here...
return results;
}
If you are just looking for a visually appealing way to deal with longer signatures, stop forcing everything to be on one line. Be liberal with your vertical spacing and insert line breaks. The signature contains quality information for the caller and what you may be looking for is a multi-line signature. If you are working with 80 character page widths, reconsider the indentation on the access specifier.
class Foo
{
public:
std::vector<std::string> barResults(const std::vector<std::string>&,
const std::vector<std::string>&);
}
std::vector<std::string>
Foo::barResults(const std::vector<std::string>& list1,
const std::vector<std::string>& list2)
{
std::vector<std::string> results;
// small amount of implementation here...
return results;
}
There are many styles when it comes to splitting up a declaration. Having a tool like clang-format in your toolset will do this automatically and consistently for you.

const correctness for configuration structures

I have a configuration file which gets read in, parsed and put into structures at the beginning of my programs run time.
The problem I am having is that I want these structures to be constant since the values in them should not change during the programs lifespan.
Currently I am doing the following:
config.h
#pragma warning(push)
#pragma warning(disable: 4510) /*-- we don't want a default constructor --*/
#pragma warning(disable: 4610) /*-- we don't want this to ever be user instantiated --*/
typedef struct SerialNode {
private:
void operator=(SerialNode&);
public:
const char* const port;
const char* const format;
} SerialNode;
#pragma warning(pop)
typedef std::map<const char*, const SerialNode*, MapStrComp> SerialMap;
SerialMap SerialConfig;
config.cpp
/*-- so we don't fall out of scope --*/
SerialNode* global_sn;
SerialNode local_sn = {port, format};
global_sn = new SerialNode(local_sn);
SerialConfig[key_store] = global_sn;
This works fine. However my problem is that now I am dealing with more complicated configuration data which requires me to pull a structure back out of the list, modify it and then put it back.
Obviously I can't modify it, so the solution would be something like:
SerialNode* global_sn;
SerialNode* old_sn = SerialConfig[key_store];
SerialNode local_sn = {port, format, old_sn->old_data, old_sn->more_old_data};
global_sn = new SerialNode(local_sn);
SerialConfig[key_store] = global_sn;
delete old_sn;
But this strikes me as bad programming practice. Is there is a better way to achieve what I'm going for which doesn't require such a hacked looking solution?
For reference, I'm using Visual Studio 2010
As always, the best thing you can do is not re-implement something that has already been written. There are a large number of libraries and frameworks that will help with serialization for c++:
Boost Serialization
Qt
Protocol Buffers
msgpack
Capn' Proto
Ideally the serialization framework you choose will exactly recreate the data graph that you are trying to store. Regardless of whether you have done any fixup, your goal will likely be to only provide const access to the global configuration data. Just make sure that mutators (including non const pointers) are not exposed via a header file.
The simple answer is what Thomas suggest, but correctly done (that is, not causing undefined behavior):
Create a mutable configuration object but pass it to the rest of the components by constant reference. When you create (and where you maintain) the real object you can change it, but the rest of the application won't be able to modify the config. A common pattern I have used in the past was:
class SomeObject {
Configuration const & config;
public:
SomeObject(Configuration const & config) : config(config) {}
void f() {
if (config.someParam()) { ...
// ...
void loadConfiguration(Config & config) { ... }
int main() {
Configuration config;
loadConfiguration(config); // config is a non-const &, can modify
SomeObject object(config); // object holds a const&, can only read
object.f();
// ...
This is not an answer to your question, just some observations to your code.
You don't need the typedef struct SerialNode { ... } SerialNode;, this is a c idiom. In c++, you just write struct SerialNode { ... }; and use SerialNode as a type name.
If you want to prevent a default constructor, make it private as you already do with the assignment operator
class SerialNode {
private:
SerialNode();
SerialNode &operator=(SerialNode&);
...
};
Don't use char* members, use std::string instead. C++ strings are much easier and safer to use than plain char pointers and the associated heap allocation.
Same goes for the map key; if you use std::string as a key, you don't need MapStrComp anymore, because std::string already provides an appropriate comparison.
Probably nicer is to wrap the whole thing in a singleton class:
class Config {
public:
static Config const& get() { return *config; }
static void load();
SerialNode const* operator[](const char*);
private:
static Config* config;
SerialMap map;
};
void Config::load() {
config = new Config();
// put things into it
}
Disclaimer: not tested, and haven't used C++ in a while, so there might be some syntax errors :)

How to read through types of a struct in C/C++

I am trying to find the "types" of any given variables in different structs and be able to read them. (keep in mind this is psuedo code)
For Example:
#include "stream.h" //a custom stream reader class I made
typedef unsigned char BYTE;
/***** SERIES OF DIFFERENT STRUCTS ******/
struct asset
{
char *name;
int size;
BYTE *data;
};
struct asset2
{
char *lang;
char *entry;
};
/*****************************************/
void readAsset( Enumerable<struct> &istruct)
{
foreach( object o in istruct )
{
switch( o )
{
case int:
&o = _stream->ReadInt32();
break;
case char:
&o = _stream->ReadChar();
break;
case *:
&o = _stream->ReadInt32();
break;
default: break;
}
}
}
I want it to be able to do the following:
asset a1;
asset2 a2;
readAsset( a1 );
readAsset( a2 );
and pass all the info from the file to a1 and a2.
I was wondering if there was a way in C/C++ to get the type of the data from any object in the struct then read based on that? is it possible with complex enums? Sorry for the bad code but I wanted it to be easier to understand what I'm trying to do.
Additional Info:
_stream is a pointer to a Stream class I made similar to Stream Reader in .Net It reads data from a file and advances it's position based on how big of data it was read.
I'll be happy to re-phrase if you don't understand what I'm asking.
There is no way to iterate through the members of a structure without listing them all out.
You can iterate through something like a structure at compile time using ::std::tuple in C++11.
You also can't really switch on type in that fashion. You can do it, but the way you do it is to have several functions with the same name that each take a different parameter type. Something like:
void doRead(StreamType &stream, int &data)
{
data = stream.readInt32();
}
void doRead(StreamType &stream, char &data)
{
data = stream.readChar();
}
// etc...
Then you just call doRead with your structure member and poof the compiler magically picks the right one based on the type.
In C++, the way to solve the problem you're solving here is a serialization library. If you have control of both the format written and the format read, you can use something like protobuf or boost::serialization to do this relatively easily without having to write a lot of your own code.
Additionally, a couple of issues with your code. Do not use a leading _ character in identifiers. Identifiers with a leading _ are reserved for use by the compiler or standard library implementation. Many compilers have special keywords that are compiler specific language extensions that start with an _ character. Using identifiers with a leading _ character may result in your code mysteriously failing to compile with all kinds of strange inscrutable errors in some environments.
You can get something like a struct that is enumerable at compile time. But it's ugly:
#include <tuple>
#include <string>
#include <vector>
#include <type_traits>
class asset : public ::std::tuple< ::std::string, ::std::vector<BYTE> >
{
public:
::std::string &name() { return ::std::get<0>(*this); }
const ::std::string &name() const { return ::std::get<0>(*this); }
::std::vector<BYTE> &data() { return ::std::get<1>(*this); }
const ::std::vector<BYTE> &data() const { return ::std::get<1>(*this); }
};
void writeToStream(Stream *out, const ::std::string &field)
{
out->writeString(field);
}
void writeToStream(Stream *out, const ::std::vector<BYTE> &field)
{
out->writeInt(field.size());
out->writeRaw(field.data(), field.size());
}
template <unsigned int fnum, typename... T>
typename ::std::enable_if< (fnum < sizeof...(T)), void >::type
writeToStream_n(Stream *out, const::std::tuple<T...> &field)
{
writeToStream(out, ::std::get<fnum>(field));
writeToStream_n<fnum+1, T...>(out, field);
}
template <unsigned int fnum, typename... T>
typename ::std::enable_if< (fnum >= sizeof...(T)) >::type
writeToStream_n(Stream *, const::std::tuple<T...> &)
{
}
template <typename... Tp>
void writeToStream(Stream *out, const ::std::tuple<Tp...> &composite)
{
writeToStream_n<0, Tp...>(out, composite);
}
void foo(Stream *out, const asset &a)
{
writeToStream(out, a);
}
Notice that there is no explicit writeToStream for the asset type. The compiler will write it at runtime by unpacking the ::std::tuple it's derived from and writing out each individual field.
Also, if you have bare pointers, you're writing poor C++. Please write idiomatic, good C++ if you're going to write C++. This whole thing you want to do with runtime reflection is just not the way to do things.
That's the reason I converted your char *name to a ::std::string and your size delimited BYTE array represented by your size and data fields into a ::std::vector. Using those types is the idiomatically correct way to write C++. Using bare pointers the way you were is not. Additionally, having two fields that have strongly related values (the data and size) fields that have no behavior or any other indication that they're associated would make it hard even for a compiler that does introspection at runtime to figure out the right thing to do. It can't know how big the BYTE array being pointed to by data is, and it can't know about your decision to encode this in size.
What you're asking for is something called Reflection - which is:
the ability of a computer program to examine
and modify the structure and behavior (specifically the values,
meta-data, properties and functions) of an object at runtime.
C++ doesn't have that "natively".
What I mean is - there have been some attempts at introducing some aspects of it - with varied degrees of success - which have produced some aspects of Reflection, but not "full" Reflection itself as you will get in a language like Ruby or so.
However, if you are feeling adventurous, you can try a Reflection library called Reflectabit:
To see if it might be worthwhile (which it might be considering your code), it is referenced here - which has quite a bit of examples on how to use the API:
http://www.altdevblogaday.com/2011/09/25/reflection-in-c-part-1-introduction/
Good luck!
The usual pattern in C++ is not to try and figure out what the members of the type are, but rather provide an operator, implemented by the implementor of the type, that is able to serialize/deserialize to disk.
You can take a look at, for example, the boost::serialize library. The usage is not too complex, you need to provide a function that lists your members in some order and then the library will take it from there and implement serialization to different formats.

How can I reduce allocation for lookup in C++ map/unordered_map containers?

Suppose I am using std::unordered_map<std::string, Foo> in my code. It's nice and convenient, but unfortunately every time I want to do a lookup (find()) in this map I have to come up with an instance of std::string.
For instance, let's say I'm tokenizing some other string and want to call find() on every token. This forces me to construct an std::string around every token before looking it up, which requires an allocator (std::allocator, which amounts to a CRT malloc()). This can easily be slower than the actual lookup itself. It also contends with other threads since heap management requires some form of synchronization.
A few years ago I found the Boost.intrusive library; it was just a beta version back then. The interesting thing was it had a container called boost::intrusive::iunordered_set which allowed code to perform lookups with any user-supplied type.
I'll explain it how I'd like it to work:
struct immutable_string
{
const char *pf, *pl;
struct equals
{
bool operator()(const string& left, immutable_string& right) const
{
if (left.length() != right.pl - right.pf)
return false;
return std::equals(right.pf, right.pl, left.begin());
}
};
struct hasher
{
size_t operator()(const immutable_string& s) const
{
return boost::hash_range(s.pf, s.pl);
}
};
};
struct string_hasher
{
size_t operator()(const std::string& s) const
{
return boost::hash_range(s.begin(), s.end());
}
};
std::unordered_map<std::string, Foo, string_hasher> m;
m["abc"] = Foo(123);
immutable_string token; // token refers to a substring inside some other string
auto it = m.find(token, immutable_string::equals(), immutable_string::hasher());
Another thing would be to speed up the "find and insert if not found" use caseā€”the trick with lower_bound() only works for ordered containers. The intrusive container has methods called insert_check() and insert_commit(), but that's for a separate topic I guess.
Turns out boost::unordered_map (as of 1.42) has a find overload that takes CompatibleKey, CompatibleHash, CompatiblePredicate types, so it can do exactly what I asked for here.
When it comes to lexing, I personally use two simple tricks:
I use StringRef (similar to LLVM's) which just wraps a char const* and a size_t and provides string-like operations (only const operations, obviously)
I pool the encountered strings using a bump allocator (using lumps of say 4K)
The two combined is quite efficient, though one need understand that all StringRef that point into the pool are obviously invalidated as soon as the pool is destroyed.