Memory-efficient C++ strings (interning, ropes, copy-on-write, etc) [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
My application is having memory problems, including copying lots of strings about, using the same strings as keys in lots of hashtables, etc. I'm looking for a base class for my strings that makes this very efficient.
I'm hoping for:
String interning (multiple strings of the same value use the same memory),
copy-on-write (I think this comes for free in nearly all std::string implementations),
something with ropes would be a bonus (for O(1)-ish concatenation).
My platform is g++ on Linux (but that is unlikely to matter).
Do you know of such a library?

copy-on-write (I think this comes for free in nearly all std::string implementations)
I don't believe this is the case any longer. Copy-on-write causes problems when you modify the strings through iterators: in particular, this either causes unwanted results (i.e. no copy, and both strings get modified) or an unnecessary overhead (since the iterators cannot be implemented purely in terms of pointers: they need to perform additional checks when being dereferenced).
Additionally, all modern C++ compilers perform NRVO and eliminate the need for copying return value strings in most cases. Since this has been one of the most common cases for copy-on-write semantics, it has been removed due to the aforementioned downsides.

If most of your strings are immutable, the Boost Flyweight library might suit your needs.
It will do the string interning, but I don't believe it does copy-on-write.

Andrei Alexandrescu's 'Policy Based basic_string implementation' may help.

Take a look at The Better String Library from legendary Paul Hsieh

Related

Choice between using std::string and character array [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I've read that performance-wise character arrays are better/faster than std::string. But personally I find using std::string much easier.
I'm currently writing some database APIs which will be fetching/inserting data into the database. In these APIs, I want to use std strings, but I'm not sure how much penalty in performance will I pay due to my choice. My APIs will query the database, therefore, network IO will be involved.
Is the performance penalty much lesser than the network latency(~10 ms), because in that case, I would happily like to use std::string.
As with nearly all performance questions, the answer is to measure. Modern std::string implementations are very likely not going to be the bottleneck on inserting data into a database. Until you have profiling data that suggests that they are, you're probably best off not worrying about it.
You asked:
Is the performance penalty much lesser than the network latency(~10
ms), because in that case, I would happily like to use std::string.
The blunt answer is Yes.
A quick compare of const char* vs. std::string :
const char* pros:
uses a little less space since it doesn't store the size of the string. This is about the only advantage I can come up with atm. performance wise.
std::string pros:
stores the size of string, this is generally better since it means not having to scan through the string to know the size, etc...
(and to avoid copies use const std::string&)
std::string is basicly that: a const char* and a size_t (the size/lenght of the string) if you ignore what is called "small string optimization" (another advantage - look it up yourself)
So I wouldn't worry about performance (if everything is handled properly) - my advice: stop worrying about performance - do some testing and a profiling and see what shows up in the profiler. That being said - knowing how 'stuff' works and performs is good thing.

What are allocators and when is their use necessary? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
While reading books on C++ and the standard library, I see frequent references to allocators.
For example, Nicolai Josuttis's The C++ Standard Library discusses them in detail in the last chapter, and both items 10 ("be aware of allocators' conventions & restrictions") and 11 ("understand the legitimate uses of custom allocators") in Scott Meyers's Effective STL are about their use.
My question is, how do allocators represent a special memory model? Is the default STL memory management not enough? When should allocators be used instead?
If possible, please explain with a simple memory model example.
An allocator abstracts allocating raw memory, and constructing/destroying objects in that memory.
In most cases, the default Allocator is perfectly fine. In some cases, however, you can increase efficiency by replacing it with something else. The classic example is when you need/want to allocate a large number of very small objects. Consider, for example, a vector of strings that might each be only a dozen bytes or so. The normal allocator uses operator new, which might impose pretty high overhead for such small objects. Creating a custom allocator that allocates a larger chunk of memory, then sub-divides it as needed can save quite a bit of both memory and time.

Linux C or C++ library to diff and patch strings? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Possible Duplicate:
Is there a way to diff files from C++?
I have long text strings that I wish to diff and patch. That is given strings a and b:
string a = ...;
string b = ...;
string a_diff_b = create_patch(a,b);
string a2 = apply_patch(a_diff_b, b);
assert(a == a2);
If a_diff_b was human readable that would be a bonus.
One way to implement this would be to use system(3) to call the diff and patch shell commands from diffutils and pipe them the strings. Another way would be to implement the functions myself (I was thinking treat each line atomically and use the standard edit distance n^3 algorithm linewise with backtracking).
I was wondering if anyone knows of a good Linux C or C++ library that would do the job in-process?
You could google implementation of Myers Diff algorithm. ("An O(ND) Difference Algorithm and Its Variations") or libraries that solve "Longest common subsequence" problem.
As far as I know, the situation with diff/patch in C++ isn't good - there are several libraries (including diff match patch, libmba), but according to my experience they're either somewhat poorly documented or have heavy external dependencies (diff match patch requires Qt 4, for example) or are specialized on type you don't need (std::string when you need unicode, for example), or aren't generic enough, or use generic algorithm which has very high memory requirements ((M+N)^2 where M and N are lengths of input sequences).
You could also try to implement Myers algorithm ((N+M) memory requirements) yourself, but the solution of problem is extremely difficult to understand - expect to waste at least a week reading documentation. Somewhat human-readable explanation of Myers algorithm is available here.
I believe that
https://github.com/cubicdaiya/dtl/wiki/Tutorial
may have what you need
http://code.google.com/p/google-diff-match-patch/
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

Looking for one linear algebra library for embedded systems (without malloc and free) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I use to work with microcontrollers. The RTOSs that I employ in my applications do not have free and malloc (and other calls like assert), sometimes they could be available, but I prefer to have everything Static in my system.
I have started to employ linear Algebra, but most of them need dynamic memory. My matrices are dense and 'small' (not more than 10x10).
I really like Eigen (everything can be static decided in compile time), but apparently there is a bug calling for asserts which are not provided by my RTOS (even with -DNDEBUG). The library
should provide matrix decomposition routines (like QR, Cholesky, LU...)
I would prefer C instead C++. Any suggestions?
Many thanks in advance!
Anything wrong with CLAPACK? Or even straight Fortran LAPACK (you can compile it with gfortran, which is part of gcc).
[C]LAPACK's routines take all memory buffers in their arguments as already allocated, and do not do themselves any heap allocation whatsoever. For the routines that take "work" buffers in addition to the other arguments (for example, dgesdd for computing an SVD), you can usually call them with special "size only" argument, and get back in response the required size for the work buffers, which you can then allocate however you wish.
Redefining the assert macro seems a good solution.
But you can even provide your own malloc and free implementation or statically linking with the appropriate memory management library:
http://blog.reverberate.org/2009/02/one-malloc-to-rule-them-all.html
If Q16.16 fixed point math is fine for your application, libfixmatrix can be an option:
https://github.com/PetteriAimonen/libfixmatrix

Efficiency of c++ built ins [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am fairly new to C++, having much more C experience.
I am writing a program that will use the string class, and began to wonder about the efficiency of the "length()" method.
I realized though that I didn't have a good answer to this question, and so was wondering if the answer to this and similar questions exist somewhere. While I am more than capable of determining the runtime of my own code, I'm at a bit of a loss when it comes to provided code, and so I find I can't accurately judge the efficiency of my programs.
Is there c++ documentation (online, or in "man" format) that includes information on the runtime of provided code?
Edit: I'm interested in this in general, not just string::length.
At present, time complexity of size() for all STL containers is underspecified. There's an open C++ defect report for that.
The present ISO C++ standard says that STL containers should have size() of constant complexity:
21.3[lib.basic.string]/2
The class template basic_string conforms to the requirements of a Sequence, as specified in (23.1.1). Additionally, because the iterators supported by basic_string are random access iterators (24.1.5), basic_string conforms to the the requirements of a Reversible Container, as specified in (23.1).
23.1[lib.container.requirements]/5
Expression: a.size()
Complexity: (Note A)
Those entries marked ‘‘(Note A)’’ should have constant complexity
However, "should" is not a binding requirement in the Standard parlance; indeed, the above applies to std::list as well, but in practice some implementations (notably g++) have O(N) std::list::size().
The only thing that can be guaranteed is that (end() - begin()) for a string is (possibly amortized) O(1). This is because string iterators are guaranteed to be random-access, and random-access iterators are guaranteed to have constant time operator-.
As a more practical issue, for all existing C++ implementations out there, the following holds:
std::string::size() is O(1)
std::vector::size() is O(1)
They are fairly obvious, as both strings and vectors are most efficiently implemented as contiguous arrays with separately stored size: contiguous because it gives fastest element access while satisfying all other complexity requirements, and storing size is because Container requirements demand that end() be constant-time.
All of the implementations I've seen are O(1).
The documentation you're looking for is the C++ standard -- I believe C++03 is the latest one at present. It isn't available online or in man format, it's sold commercially. There's a list of the places to find it, and recent prices, here.