Redefine data area for faster access - c++

New to c++. I've searched but probably using wrong terms.
I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
If I can use an internal function to scan the whole array as if it was one big string, I feel this would be much faster. (Old COBOL programmer).
Any way I can do this please?

I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
OK, I'm going to take a punt and infer that:
you want to store string literals of any length in some kind of container.
the container must be mutable (i.e. you can add literals at will)
there will not be duplicates in the container.
you want to know whether a string literal as been stored in the container previously, and what "position" it was at so that you can remove it if necessary.
the string literals will be inserted in random lexicographical order and need not be sorted.
The container that springs to mind is the std::unordered_set
#include <unordered_set>
std::unordered_set<std::string> tokens;
int main()
{
tokens.emplace("foo");
tokens.emplace("bar");
auto it = tokens.find("baz");
assert(it == tokens.end()); // not found
it = tokens.find("bar"); // will be found
assert(it != tokens.end());
tokens.erase(it); // remove the token
}
The search time complexity of this container is O(1).

As you already found out by the comments, "scanning as one big string" is not the way to go in C++.
Typical in C++ when using C-style arrays and normally fast enough for linear search is
auto myStr = "result";
auto it = std::find_if(std::begin(arr), std::end(arr),
[myStr](const char* const str) { return std::strcmp(mystr,str) == 0; });
Remember that string compare function stop at the first wrong character.
More C++ style:
std::vector<std::string> vec = { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = std::find(vec.begin(), vec.end(), myStr);
If you are interested in very fast lookup for a large container, std::unordered_set is the way to go, but the "slot" has lost its meaning then, but maybe in that case std::unordered_map can be used.
std::unordered_set<std::string> s= { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = s.find(myStr);
All code is written as example, not compiled/tested

Related

Best way to group string members of object in a vector

I am trying to store a vector of objects and sort them by a string member possessed by each object. It doesn't need to be sorted alphabetically, it only needs to group every object with an identical string together in the vector.
IE reading through the vector and outputting the strings from beginning to end should return something like:
string_bulletSprite
string_bulletSprite
string_bulletSprite
string_playerSprite
string_enemySprite
string_enemySprite
But should NEVER return something like:
string_bulletSprite
string_playerSprite
string_bulletSprite
[etc.]
Currently I am using std:sort and a custom comparison function:
std::vector<GameObject*> worldVector;
[...]
std::sort(worldVector.begin(), worldVector.end(), compString);
And the comparison function used in the std::sort looks like this:
bool compString(GameObject* a, GameObject* b)
{
return a->getSpriteNameAndPath() < b->getSpriteNameAndPath();
}
getSpriteNameAndPath() is a simple accessor which returns a normal string.
This seems to work fine. I've stress tested this a fair bit and it seems to always group things together the way I wanted.
My question is, is this the ideal or most logical/efficient way of accomplishing the stated goal? I get the impression Sort isn't quite meant to be used this way and I'm wondering if there's a better way to do this if all I want to do is group but don't care about doing so in alphabetic order.
Or is this fine?
If you have lots of equivalent elements in your range, then std::sort is less efficient than manually sorting the elements.
You can do this by shifting the minimum elements to the beginning of the range, and then repeating this process on the remaining non-minimum elements
// given some range v
auto b = std::begin(v); // keeps track of remaining elements
while (b != std::end(v)) // while there's elements to be arranged
{
auto min = *std::min_element(b, std::end(v)); // find the minimum
// move elements matching that to the front
// and simultaneously update the remaining range
b = std::partition(b, std::end(v),
[=](auto const & i) {
return i == min;
});
}
Of course, a custom comparator can be passed to min_element, and the lambda in partition can be modified if equivalence is defined some other way.
Note that if you have very few equivalent elements, this method is much less efficient than using std::sort.
Here's a demo with a range of ints.
I hope I understood your question correctly, if so, I will give you a little example of std::map which is great for grouping things by keys, which will most probably be a std::string.
Please take a look:
class Sprite
{
public:
Sprite(/* args */)
{
}
~Sprite()
{
}
};
int main(int argc, char ** argv){
std::map <std::string, std::map<std::string, Sprite>> sprites;
std::map <std::string, Sprite> spaceships;
spaceships.insert(std::make_pair("executor", Sprite()));
spaceships.insert(std::make_pair("millennium Falcon", Sprite()));
spaceships.insert(std::make_pair("death star", Sprite()));
sprites.insert(std::make_pair("spaceships",spaceships));
std::cout << sprites["spaceships"]["executor"].~member_variable_or_function~() << std::endl;
return 0;
}
Seems like Functor or Lambda is the way to go for this particular program, but I realized some time after posting that I could just create an ID for the images and sort those instead of strings. Thanks for the help though, everyone!

Appending strings in C++ [duplicate]

This question already has answers here:
C++ equivalent of StringBuffer/StringBuilder?
(10 answers)
Closed 9 years ago.
Consider this piece of code:
public String joinWords(String[] words) {
String sentence = "";
for(String w : words) {
sentence = sentence + w;
}
return sentence;
}
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2). Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
While in C++, std::string::append() has complexity of O(n), and I'm not clear about the complexity of stringstream.
In C++, are there methods like those in StringBuffer with the same complexity?
C++ strings are mutable, and pretty much as dynamically sizable as a StringBuffer. Unlike its equivalent in Java, this code wouldn't create a new string each time; it just appends to the current one.
std::string joinWords(std::vector<std::string> const &words) {
std::string result;
for (auto &word : words) {
result += word;
}
return result;
}
This runs in linear time if you reserve the size you'll need beforehand. The question is whether looping over the vector to get sizes would be slower than letting the string auto-resize. That, i couldn't tell you. Time it. :)
If you don't want to use std::string itself for some reason (and you should consider it; it's a perfectly respectable class), C++ also has string streams.
#include <sstream>
...
std::string joinWords(std::vector<std::string> const &words) {
std::ostringstream oss;
for (auto &word : words) {
oss << word;
}
return oss.str();
}
It's probably not any more efficient than using std::string, but it's a bit more flexible in other cases -- you can stringify just about any primitive type with it, as well as any type that has specified an operator <<(ostream&, its_type&) override.
This is somewhat tangential to your Question, but relevant nonetheless. (And too big for a comment!!)
On each concatenation a new copy of the string is created, so that the overall complexity is O(n^2).
In Java, the complexity of s1.concat(s2) or s1 + s2 is O(M1 + M2) where M1 and M2 are the respective String lengths. Turning that into the complexity of a sequence of concatenations is difficult in general. However, if you assume N concatenations of Strings of length M, then the complexity is indeed O(M * N * N) which matches what you said in the Question.
Fortunately in Java we could solve this with a StringBuffer, which has O(1) complexity for each append, then the overall complexity would be O(n).
In the StringBuilder case, the amortized complexity of N calls to sb.append(s) for strings of size M is O(M*N). The key word here is amortized. When you append characters to a StringBuilder, the implementation may need to expand its internal array. But the expansion strategy is to double the array's size. And if you do the math, you will see that each character in the buffer is going to be copied on average one extra time during the entire sequence of append calls. So the complexity of the entire sequence of appends still works out as O(M*N) ... and, as it happens M*N is the final string length.
So your end result is correct, but your statement about the complexity of a single call to append is not correct. (I understand what you mean, but the way you say it is facially incorrect.)
Finally, I'd note that in Java you should use StringBuilder rather than StringBuffer unless you need the buffer to be thread-safe.
As an example of a really simple structure that has O(n) complexity in C++11:
template<typename TChar>
struct StringAppender {
std::vector<std::basic_string<TChar>> buff;
StringAppender& operator+=( std::basic_string<TChar> v ) {
buff.push_back(std::move(v));
return *this;
}
explicit operator std::basic_string<TChar>() {
std::basic_string<TChar> retval;
std::size_t total = 0;
for( auto&& s:buff )
total+=s.size();
retval.reserve(total+1);
for( auto&& s:buff )
retval += std::move(s);
return retval;
}
};
use:
StringAppender<char> append;
append += s1;
append += s2;
std::string s3 = append;
This takes O(n), where n is the number of characters.
Finally, if you know how long all of the strings are, just doing a reserve with enough room makes append or += take a total of O(n) time. But I agree that is awkward.
Use of std::move with the above StringAppender (ie, sa += std::move(s1)) will significantly increase performance for non-short strings (or using it with xvalues etc)
I do not know the complexity of std::ostringstream, but ostringstream is for pretty printing formatted output, or cases where high performance is not important. I mean, they aren't bad, and they may even out perform scripted/interpreted/bytecode languages, but if you are in a rush, you need something else.
As usual, you need to profile, because constant factors are important.
A rvalue-reference-to-this operator+ might also be a good one, but few compilers implement rvalue references to this.

Best way to concatenate and condense a std::vector<std::string>

Disclaimer: This problem is more of a theoretical, rather than a practical interest. I want to find out various different ways of doing this, with speed as icing on the new year cake.
The Problem
I want to be able to store a list of strings, and be able to quickly combine them into 1 if needed.
In short, I want to condense a structure (currently a std::vector<std::string>) that looks like
["Hello, ", "good ", "day ", " to", " you!"]
to
["Hello, good day to you!"]
Is there any idiomatic way to achieve this, ala python's [ ''.join(list_of_strings) ]?
What is the best way to achieve this in C++, in terms of time?
Possible Approaches
The first idea I had is to
loop over the vector,
append each element to the first,
simultaneously delete the element.
We will be concatenating with += and reserve(). I assume that max_size() will not be reached.
Approach 1 (The Greedy Approach)
So called because it ignores conventions and operates in-place.
#if APPROACH == 'G'
// Greedy Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Reserve the size for all characters, less than max_size()
my_strings[0].reserve(total_characters_in_list);
// There are strings left, ...
for(auto itr = my_strings.begin()+1; itr != my_strings.end();)
{
// append, and...
my_strings[0] += *itr;
// delete, until...
itr = my_strings.erase(itr);
}
}
#endif
Now I know, you would say that this is risky and bad. So:
loop over the vector,
append each element to another std::string,
clear the vector and make the string first element of the vector.
Approach 2 (The "Safe" Haven)
So called because it does not modify the container while iterating over it.
#if APPROACH == 'H'
// Safe Haven Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Store the whole vector here
std::string condensed_string;
condensed_string.reserve(total_characters_in_list);
// There are strings left...
for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr)
{
// append, until...
condensed_string += *itr;
}
// remove all elements except the first
my_strings.resize(1);
// and set it to condensed_string
my_strings[0] = condensed_string;
}
#endif
Now for the standard algorithms...
Using std::accumulate from <algorithm>
Approach 3 (The Idiom?)
So called simply because it is a one-liner.
#if APPROACH == 'A'
// Accumulate Approach
void condense(std::vector< std::string >& my_strings, int total_characters_in_list)
{
// Reserve the size for all characters, less than max_size()
my_strings[0].reserve(total_characters_in_list);
// Accumulate all the strings
my_strings[0] = std::accumulate(my_strings.begin(), my_strings.end(), std::string(""));
// And resize
my_strings.resize(1);
}
#endif
Why not try to store it all in a stream?
Using std::stringstream from <sstream>.
Approach 4 (Stream of Strings)
So called due to the analogy of C++'s streams with flow of water.
#if APPROACH == 'S'
// Stringstream Approach
void condense(std::vector< std::string >& my_strings, int) // you can remove the int
{
// Create out stream
std::stringstream buffer(my_strings[0]);
// There are strings left, ...
for(auto itr = my_strings.begin(); itr != my_strings.end(); ++itr)
{
// add until...
buffer << *itr;
}
// resize and assign
my_strings.resize(1);
my_strings[0] = buffer.str();
}
#endif
However, maybe we can use another container rather than std::vector?
In that case, what else?
(Possible) Approach 5 (The Great Indian "Rope" Trick)
I have heard about the rope data structure, but have no idea if (and how) it can be used here.
Benchmark and Verdict:
Ordered by their time efficiency (currently and surprisingly) is1:
Approaches Vector Size: 40 Vector Size: 1600 Vector Size: 64000
SAFE_HAVEN: 0.1307962699997006 0.12057728999934625 0.14202970000042114
STREAM_OF_STRINGS: 0.12656566000077873 0.12249500000034459 0.14765803999907803
ACCUMULATE_WEALTH: 0.11375975999981165 0.12984520999889354 3.748660090001067
GREEDY_APPROACH: 0.12164988000004087 0.13558526000124402 22.6994204800023
timed with2:
NUM_OF_ITERATIONS = 100
test_cases = [ 'greedy_approach', 'safe_haven' ]
for approach in test_cases:
time_taken = timeit.timeit(
f'system("{approach + ".exe"}")',
'from os import system',
number = NUM_OF_ITERATIONS
)
print(approach + ": ", time_taken / NUM_OF_ITERATIONS)
Can we do better?
Update: I tested it with 4 approaches (so far), as I could manage in my little time. More incoming soon. It would have been better to fold the code, so that more approaches could be added to this post, but it was declined.
1 Note that these readings are only for a rough estimate. There are a lot of things that influence the execution time, and note that there are some inconsistencies here as well.
2 This is the old code, used to test only the first two approaches. The current code is a good deal longer, and more integrated, so I am not sure I should add it here.
Conclusions:
Deleting elements is very costly.
You should just copy the strings somewhere, and resize the vector.
Infact, better reserve enough space too, if copying to another string.
You could also try std::accumulate:
auto s = std::accumulate(my_strings.begin(), my_strings.end(), std::string());
Won't be any faster, but at least it's more compact.
With range-v3 (and soon with C++20 ranges), you might do:
std::vector<std::string> v{"Hello, ", "good ", "day ", " to", " you!"};
std::string s = v | ranges::view::join;
Demo
By default, I would use std::stringstream. Simply construct the steam, stream in all the strings from the vector, and then return the output string. It isn't very efficient but it is clear what it does.
In most cases, one doesn't need fast method when dealing with strings and printing - so the "easy to understand and safe" methods are better. Plus, compilers nowadays are good at optimizing inefficiencies in simple cases.
The most efficient way... it is a hard question. Some applications require efficiency on multiple fronts. In these cases you might need to utilize multithreading.
Personally, I'd construct a second vector to hold a single "condensed" string, construct the condensed string, and then swap vectors when done.
void Condense(std::vector<std::string> &strings)
{
std::vector<std::string> condensed(1); // one default constructed std::string
std::string &constr = &condensed.begin(); // reference to first element of condensed
for (const auto &str : strings)
constr.append(str);
std::swap(strings, condensed); // swap newly constructed vector into original
}
If an exception is thrown for some reason, then the original vector is left unchanged, and cleanup occurs - i.e. this function gives a strong exception guarantee.
Optionally, to reduce resizing of the "condensed" string, after initialising constr in the above, one could do
// optional: compute the length of the condensed string and reserve
std::size_t total_characters_in_list = 0;
for (const auto &str : strings)
total_characters_in_list += str.size();
constr.reserve(total_characters_in_list);
// end optional reservation
As to how efficient this is compared with alternatives, that depends. I'm also not sure it's relevant - if strings keep on being appended to the vector, and needing to be appended, there is a fair chance that the code that obtains the strings from somewhere (and appends them to the vector) will have a greater impact on program performance than the act of condensing them.

Substring of an element in a set

Is there a way to find and replace subset of a char*/string in a set?
Example:
std::set<char*> myset;
myset.insert("catt");
myset.insert("world");
myset.insert("hello");
it = myset.subsetfind("tt");
myset.replace(it, "t");
There are at least three reasons why this won't work.
std::set provides only the means to search the set for a value that compares equally to the value being searched for, and not to a value that matches some arbitrary portion of the value.
The shown program is undefined behavior. A string literal, such as "hello" is a const char *, and not a char *. No self-respecting C++ compiler will allow you to insert a const char * into a container of char *s. And you can't modify const values, by definition, anyway.
Values in std::set cannot be modified. To effect the modification of an existing value in a set, it must be erase()d, then the new value insert()ed.
std::set is simply not the right container for the goals you're trying to accomplish.
No, you can't (or at least shouldn't) modify the key while it's in the set. Doing so could change the relative order of the elements, in which case the modification would render the set invalid.
You need to start with a set of things you can modify. Then you need to search for the item, remove it from the set, modify it, then re-insert the result back into the set.
std::set<std::string> myset {"catt", "world", "hello"};
auto pos = std::find_if(myset.begin(), myset.end(), [](auto const &s) { return s.find("tt");};
if (pos != myset.end()) {
auto temp = *pos;
myset.remove(pos);
auto p= temp.find("tt");
temp.replace(p, 2, "t");
myset.insert(temp);
}
You cannot modify elements within a set.
You can find strings that contain the substring using std::find_if. Once you find matching elements, you can remove each from the set and add a modified copy of the string, with the substring replaced with something else.
PS. Remember that you cannot modify string literals. You will need to allocate some memory for the strings.
PPS. Implicit conversion of string literal to char* has been deprecated since C++ was standardized, and since C++11 such conversion is ill-formed.
PPPS. The default comparator will not be correct when you use pointers as the element type. I recommend you to use std::string instead. (A strcmp based comparator approach would also be possible, although much more prone to memory bugs).
You could use std::find_if with a predicate function/functor/lambda that searches for the substring you want.

Prepend std::string

What is the most efficient way to prepend std::string? Is it worth writing out an entire function to do so, or would it take only 1 - 2 lines? I'm not seeing anything related to an std::string::push_front.
There actually is a similar function to the non-existing std::string::push_front, see the below example.
Documentation of std::string::insert
#include <iostream>
#include <string>
int
main (int argc, char *argv[])
{
std::string s1 (" world");
std::string s2 ("ello");
s1.insert (0, s2); // insert the contents of s2 at offset 0 in s1
s1.insert (0, 1, 'h'); // insert one (1) 'h' at offset 0 in s1
std::cout << s1 << std::endl;
}
output:
hello world
Since prepending a string with data might require both reallocation and copy/move of existing data you can get some performance benefits by getting rid of the reallocation part by using std::string::reserve (to allocate more memory before hand).
The copy/move of data is sadly quite inevitable, unless you define your own custom made class that acts like std::string that allocates a large buffer and places the first content in the center of this memory buffer.
Then you can both prepend and append data without reallocation and moving data, if the buffer is large enough that is. Copying from source to destination is still, obviously, required though.
If you have a buffer in which you know you will prepend data more often than you append a good alternative is to store the string backwards, and reversing it when needed (if that is more rare).
myString.insert(0, otherString);
Let the Standard Template Library writers worry about efficiency; make use of all their hours of work rather than re-programming the wheel.
This way does both of those.
As long as the STL implementation you are using was thought through you'll have efficient code. If you're using a badly written STL, you have bigger problems anyway :)
If you're using std::string::append, you should realize the following is equivalent:
std::string lhs1 = "hello ";
std::string lhs2 = "hello ";
std::string rhs = "world!";
lhs1.append(rhs);
lhs2 += rhs; // equivalent to above
// Also the same:
// lhs2 = lhs2 + rhs;
Similarly, a "prepend" would be equivalent to the following:
std::string result = "world";
result = "hello " + result;
// If prepend existed, this would be equivalent to
// result.prepend("hello");
You should note that it's rather inefficient to do the above though.
There is an overloaded string operator+ (char lhs, const string& rhs);, so you can just do your_string 'a' + your_string to mimic push_front.
This is not in-place but creates a new string, so don't expect it to be efficient, though. For a (probably) more efficient solution, use resize to gather space, std::copy_backward to shift the entire string back by one and insert the new character at the beginning.
The problem is efficiency: inserting to the beginning of the string is more expensive as it requires both reallocation and shifting of existing characters.
If you are only prepending to the string, the most efficient way is appending, and then either reverse the string, or even better, go through the string in reverse order.
string s;
for (auto c: "foobar") {
s.push_back(c);
}
for (auto it=s.rbegin(); it!=s.rend(); it++) {
// do something
}
If you need a mix of prepending and appending, I'd suggest using a deque, and then construct a string from it.
The double-ended queue supports O(1) insertion and deletion at the beginning and end.
deque<char> dq;
dq.push_front('f');
dq.push_back('o');
dq.push_front('o');
string s {dq.begin(), dq.end()};