What is the best and fastest way to extract substring from string? - c++

What is the best and most effective way to extract a string from a string? I will need this operation to be preforms thousands of times.
I have this string and I'd like to extract the URL. The URL is always after the "url=" substring until the end of the string. For example:
http://foo.com/fooimage.php?d=AQA4GxxxpcDPnw&w=130&h=130&url=http00253A00252F00252Fi1.img.com00252Fvi00252FpV4Taseyww00252Fhslt.jpg
and I need to extract the
http00253A00252F00252Fi1.img.com00252Fvi00252FpV4Taseyww00252Fhslt.jpg
I want to avoid using split and such.

If you absolutely need the results as a string, you'll have to measure,
but I doubt that anything will be significantly faster than the most
intuitive:
std::string
getTrailer( std::string const& original, std::string const& key )
{
std::string::const_iterator pivot
= std::search( original.begin(), original.end(), key.begin(), key.end() );
return pivot == original.end()
? std::string() // or some error condition...
: std::string( pivot + key.size(), original.end() );
}
However, the fastest way is probably not to extract the string at all,
but to simply keep it as a pair of iterators. If you need this a lot,
it might be worth defining a Substring class which encapsulates this.
(I've found a mutable variant of this to be very effective when
parsing.) If you go this way, don't forget that the iterators will
become invalid if the original string disappears; be sure to convert
anything you want to keep into a string before this occurs.

std::string inStr;
//this step is necessary
size_t pos = inStr.find("url=");
if(pos != std::string::npos){
char const * url = &inStr[pos + 4];
// it is fine to do any read only operations with url
// if you would apply some modifications to url, please make a copy string
}

you can use std::string::find() :
if its a char* than just move the pointer to the position right after "url="
yourstring = (yourstring + yourstring.find("url=")+4 );
I cant think of anything faster..

You could also look into the boost libraries.
For example boost::split()
I don't know how they actually perform in terms of speed, but it's definitely worth a try.

Related

Redefine data area for faster access

New to c++. I've searched but probably using wrong terms.
I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
If I can use an internal function to scan the whole array as if it was one big string, I feel this would be much faster. (Old COBOL programmer).
Any way I can do this please?
I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
OK, I'm going to take a punt and infer that:
you want to store string literals of any length in some kind of container.
the container must be mutable (i.e. you can add literals at will)
there will not be duplicates in the container.
you want to know whether a string literal as been stored in the container previously, and what "position" it was at so that you can remove it if necessary.
the string literals will be inserted in random lexicographical order and need not be sorted.
The container that springs to mind is the std::unordered_set
#include <unordered_set>
std::unordered_set<std::string> tokens;
int main()
{
tokens.emplace("foo");
tokens.emplace("bar");
auto it = tokens.find("baz");
assert(it == tokens.end()); // not found
it = tokens.find("bar"); // will be found
assert(it != tokens.end());
tokens.erase(it); // remove the token
}
The search time complexity of this container is O(1).
As you already found out by the comments, "scanning as one big string" is not the way to go in C++.
Typical in C++ when using C-style arrays and normally fast enough for linear search is
auto myStr = "result";
auto it = std::find_if(std::begin(arr), std::end(arr),
[myStr](const char* const str) { return std::strcmp(mystr,str) == 0; });
Remember that string compare function stop at the first wrong character.
More C++ style:
std::vector<std::string> vec = { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = std::find(vec.begin(), vec.end(), myStr);
If you are interested in very fast lookup for a large container, std::unordered_set is the way to go, but the "slot" has lost its meaning then, but maybe in that case std::unordered_map can be used.
std::unordered_set<std::string> s= { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = s.find(myStr);
All code is written as example, not compiled/tested

Delete recurring characters

For a little something I was trying out in C++, I have accepted a string (say 'a tomato is red') and gotten rid of spaces ('atomatoisred').
Now how would I go about deleting recurring characters only, on condition that the first instance of that character gets to stay (so our example becomes,'atomisred')?
Thanks in advance!
You can use the erase-remove idiom in conjunction with a set keeping track of the duplicate characters:
std::set<char> dupes;
str.erase(
std::remove_if(
str.begin(), str.end(),
[&](char c) { return not dupes.insert(c).second; }),
str.end());
This also uses the fact that the return value of std::set::insert is a pair whose second element is a bool indicating whether the insertion took place.
If you want to implement it yourself (without stl), there are a number of ways.
Through sorting. This works if you don't care about the order of the chars. Sort your string first, then go through it, performing a very simple check on every element:
if( currentElement == elemebtBeforeIt )
deleteCurrentElement
Another way is to have an array dedicated to the unique characters (well, maybe not an array, but you'll get the idea). Go through your string, and for each charcter, check:
foreach Element of the string:
if( arrayOfUniqueElements contains currentElement )
do nothing
else
put currentElement into the arrayOfUniquElements
After this, you will have all unique elements in the dedicated array.

Search string for first char different than "X"

The opposite of str.find('X') -
What is the most efficient way of finding first character in std::string that is different than specific char? If I have a string that consists of mainly X'es, but at some point there is another char - how do I find it quickly?
std::string str = "XXXXXXXXXXXXXXX.XXXXXXXXXXX";
size_t index = str.find_first_not_of('X');
But a plain old for loop will be just as good.
Or, if you want an iterator instead of an index, perhaps like this:
std::string::iterator = std::find_if(str.begin(), str.end(),
[](char c){ return c != 'X'; });
I think the most efficient way would be to iterate through the string and compare each character with 'X', returning the first one that is different.
Without any prior knowledge about the string, I don't see an approach better than O(n), and calling find('X') succesively might be worse than just iterating through the characters.

How can you compare words within a string array?

Basically I want to check a string array to see if any of the words match "and".
Is this possible?
Can you push me in the right direction?
Thanks
I should make it clear that the words are char put together best way to explain is an example
abc defg hijk and lmnop <-- each character is in its own element
I recommend you use std::string and not null-terminated char* strings (maybe you already are -- hard to be sure). And use a standard container rather than an array. Then use std::find (which would work on an array too, but containers are better).
Iterate through the array and use int string::compare ( const string& str ) const; to check for matches.
Break from the loop on first match.
I am guessing you'd like to take care of lower\upper casing and the word appearing in the beginning\end of the string.
std::string data;
std::transform(data.begin(), data.end(), data.begin(), ::tolower);
data.append(' ');
if (data.find("and ") != std::string::npos) ......

Prepend std::string

What is the most efficient way to prepend std::string? Is it worth writing out an entire function to do so, or would it take only 1 - 2 lines? I'm not seeing anything related to an std::string::push_front.
There actually is a similar function to the non-existing std::string::push_front, see the below example.
Documentation of std::string::insert
#include <iostream>
#include <string>
int
main (int argc, char *argv[])
{
std::string s1 (" world");
std::string s2 ("ello");
s1.insert (0, s2); // insert the contents of s2 at offset 0 in s1
s1.insert (0, 1, 'h'); // insert one (1) 'h' at offset 0 in s1
std::cout << s1 << std::endl;
}
output:
hello world
Since prepending a string with data might require both reallocation and copy/move of existing data you can get some performance benefits by getting rid of the reallocation part by using std::string::reserve (to allocate more memory before hand).
The copy/move of data is sadly quite inevitable, unless you define your own custom made class that acts like std::string that allocates a large buffer and places the first content in the center of this memory buffer.
Then you can both prepend and append data without reallocation and moving data, if the buffer is large enough that is. Copying from source to destination is still, obviously, required though.
If you have a buffer in which you know you will prepend data more often than you append a good alternative is to store the string backwards, and reversing it when needed (if that is more rare).
myString.insert(0, otherString);
Let the Standard Template Library writers worry about efficiency; make use of all their hours of work rather than re-programming the wheel.
This way does both of those.
As long as the STL implementation you are using was thought through you'll have efficient code. If you're using a badly written STL, you have bigger problems anyway :)
If you're using std::string::append, you should realize the following is equivalent:
std::string lhs1 = "hello ";
std::string lhs2 = "hello ";
std::string rhs = "world!";
lhs1.append(rhs);
lhs2 += rhs; // equivalent to above
// Also the same:
// lhs2 = lhs2 + rhs;
Similarly, a "prepend" would be equivalent to the following:
std::string result = "world";
result = "hello " + result;
// If prepend existed, this would be equivalent to
// result.prepend("hello");
You should note that it's rather inefficient to do the above though.
There is an overloaded string operator+ (char lhs, const string& rhs);, so you can just do your_string 'a' + your_string to mimic push_front.
This is not in-place but creates a new string, so don't expect it to be efficient, though. For a (probably) more efficient solution, use resize to gather space, std::copy_backward to shift the entire string back by one and insert the new character at the beginning.
The problem is efficiency: inserting to the beginning of the string is more expensive as it requires both reallocation and shifting of existing characters.
If you are only prepending to the string, the most efficient way is appending, and then either reverse the string, or even better, go through the string in reverse order.
string s;
for (auto c: "foobar") {
s.push_back(c);
}
for (auto it=s.rbegin(); it!=s.rend(); it++) {
// do something
}
If you need a mix of prepending and appending, I'd suggest using a deque, and then construct a string from it.
The double-ended queue supports O(1) insertion and deletion at the beginning and end.
deque<char> dq;
dq.push_front('f');
dq.push_back('o');
dq.push_front('o');
string s {dq.begin(), dq.end()};