string manipulation - character counting

string manipulation - character counting - c++

I have a string S = "&|&&|&&&|&" where we should get the number of '&' between 2 indexes of the string.
So the output with the 2 indexes 1 and 8 here should be 5. And here's my brute force style code:
std::size_t cnt = 0;
for(i = start; i < end; i++) {
if (S[i] == '&')
cnt++;
}
cout << cnt << endl;
The problem I faced was my code was getting timed out for larger inputs in a coding platform. Can anyone suggest a better way to reduce the time complexity here?

I decided to try several approaches, including the ones proposed by the other two answers to this question. I made several assumptions about the input, with the goal to find a fast implementation for a single large string that would only be searched once for a single character. For a string that will have multiple queries made against it for more than one character, I suggest building a segment tree as suggested in a comment by user Jefferson Rondan.
I used std::chrono::steady_clock::now() to measure implementation times.
Assumptions
The program prompts the user for a string size, search character, start index, and end index.
The inputs are well formed (start <= end <= size).
The string is randomly generated from a uniform distribution of ascii characters between ' ' and '~'.
The underlying data in the string object is stored contiguously in memory.
Approaches
Naive for loop: an index variable is incremented, and the string is indexed, character by character, using the index.
Iterator loop: a string iterator is used, dereferenced at each iteration, and compared to the search character.
Underlying data pointer: a pointer to the underlying character array of the string is found, and this is incremented in a loop. The dereferenced pointer is compared to the search character.
Index mapping (as suggested by GyuHyeon Choi): An int-type array of max printable ascii character elements is initialized to 0, and for each character encountered while iterating through the array, that corresponding index is incremented by one. At the end, the index of the search character is dereferenced to find how many of that character were found.
Just use std::count (as suggested by Atul Sharma): Just use the builting counting functionality.
Recast the underlying data as a pointer to a larger data type and iterate: the underlying const char* const pointer that holds the string data is reinterpreted as a pointer to a wider data type (in this case a pointer to type uint64_t). Each dereferenced uint64_t is then XOR'ed with a mask made up of the search character, and each byte of the uint64_t masked with 0xff. This reduces the number of pointer increments needed to step through the entire array.
Results
For a 1,000,000,000 size string searching from index 5 to 999999995, the results of each method follow:
Naive for loop: 843 ms
Iterator loop: 818 ms
Underlying data pointer: 750 ms
Index mapping (as suggested by GyuHyeon Choi): 929 ms
Just use std::count (as suggested by Atul Sharma): 819 ms
Recast the underlying data as a pointer to a larger data type and iterate: 664 ms
Discussion
The best performing implementation was my own data pointer recast, which completed in a little over 75% of the time it took for the naive solution. The fastest "simple" solution is pointer iteration over the underlying data structure. This method has the benefit of being easy to implement, understand, and maintain. The index mapping method, despite being marketed as 2x faster than the naive solution, didn't see such speedups on my benchmarks. The std::count method is about as fast as the by-hand pointer iteration, and even simpler to implement. If speed really matters, consider recasting the underlying pointer. Otherwise, stick with std::count.
The Code
#include <algorithm>
#include <iostream>
#include <random>
#include <string>
#include <functional>
#include <typeinfo>
#include <chrono>
int main(int argc, char** argv)
{
std::random_device device;
std::mt19937 generator(device());
std::uniform_int_distribution<short> short_distribution(' ', '~');
auto next_short = std::bind(short_distribution, generator);
std::string random_string = "";
size_t string_size;
size_t start_search_index;
size_t end_search_index;
char search_char;
std::cout << "String size: ";
std::cin >> string_size;
std::cout << "Search char: ";
std::cin >> search_char;
std::cout << "Start search index: ";
std::cin >> start_search_index;
std::cout << "End search index: ";
std::cin >> end_search_index;
if (!(start_search_index <= end_search_index && end_search_index <= string_size))
{
std::cout << "Requires start_search <= end_search <= string_size\n";
return 0;
}
for (size_t i = 0; i < string_size; i++)
{
random_string += static_cast<char>(next_short());
}
// naive implementation
size_t count = 0;
auto start_time = std::chrono::steady_clock::now();
for (size_t i = start_search_index; i < end_search_index; i++)
{
if (random_string[i] == search_char)
count++;
}
auto end_time = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Naive implementation. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterator solution
count = 0;
start_time = std::chrono::steady_clock::now();
for (auto it = random_string.begin() + start_search_index, end = random_string.begin() + end_search_index;
it != end;
it++)
{
if (*it == search_char)
count++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterator solution. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterate on data
count = 0;
start_time = std::chrono::steady_clock::now();
for (auto it = random_string.data() + start_search_index,
end = random_string.data() + end_search_index;
it != end; it++)
{
if (*it == search_char)
count++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterate on underlying data solution. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// use index mapping
count = 0;
size_t count_array['~']{ 0 };
start_time = std::chrono::steady_clock::now();
for (size_t i = start_search_index; i < end_search_index; i++)
{
count_array[random_string.at(i)]++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
count = count_array[search_char];
std::cout << "Using index mapping. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// using std::count
count = 0;
start_time = std::chrono::steady_clock::now();
count = std::count(random_string.begin() + start_search_index
, random_string.begin() + end_search_index
, search_char);
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Using std::count. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterate on larger type than underlying char
count = end_search_index - start_search_index;
start_time = std::chrono::steady_clock::now();
// Iterate through underlying data until the address is modulo 4
{
auto it = random_string.data() + start_search_index;
auto end = random_string.data() + end_search_index;
// iterate until we reach a pointer that is divisible by 8
for (; (reinterpret_cast<std::uintptr_t>(it) & 0x07) && it != end; it++)
{
if (*it != search_char)
count--;
}
// iterate on 8-byte sized chunks until we reach the last full chunk that is 8-byte aligned
auto chunk_it = reinterpret_cast<const uint64_t* const>(it);
auto chunk_end = reinterpret_cast<const uint64_t* const>((reinterpret_cast<std::uintptr_t>(end)) & ~0x07);
uint64_t search_xor_mask = 0;
for (size_t i = 0; i < 64; i+=8)
{
search_xor_mask |= (static_cast<uint64_t>(search_char) << i);
}
constexpr uint64_t all_ones = 0xff;
for (; chunk_it != chunk_end; chunk_it++)
{
auto chunk = (*chunk_it ^ search_xor_mask);
if (chunk & (all_ones << 56))
count--;
if (chunk & (all_ones << 48))
count--;
if (chunk & (all_ones << 40))
count--;
if (chunk & (all_ones << 32))
count--;
if (chunk & (all_ones << 24))
count--;
if (chunk & (all_ones << 16))
count--;
if (chunk & (all_ones << 8))
count--;
if (chunk & (all_ones << 0))
count--;
}
// iterate on the remainder of the bytes, should be no more than 7, tops
it = reinterpret_cast<decltype(it)>(chunk_it);
for (; it != end; it++)
{
if (*it != search_char)
count--;
}
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterate on underlying data with larger step sizes. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
}
Example Output
String size: 1000000000
Search char: &
Start search index: 5
End search index: 999999995
Naive implementation. Found: 10527454
Elapsed time: 843179us.
Iterator solution. Found: 10527454
Elapsed time: 817762us.
Iterate on underlying data solution. Found: 10527454
Elapsed time: 749513us.
Using index mapping. Found: 10527454
Elapsed time: 928560us.
Using std::count. Found: 10527454
Elapsed time: 819412us.
Iterate on underlying data with larger step sizes. Found: 10527454
Elapsed time: 664338us.

int cnt[125]; // ASCII '&' = 46, '|' = 124
cnt['&'] = 0;
for(int i = start; i < end; i++) {
cnt[S.at(i)]++;
}
cout << cnt['&'] << endl;
if is expensive as it compares and branches. So it would be better.

You can use the std::count from algorithm standard C++ library.
Just include the header <algorithm>
std::string s{"&|&&|&&&|&"};
// https://en.cppreference.com/w/cpp/algorithm/count
auto const count = std::count(s.begin() + 1 // starting index
,s.begin() + 8 // one pass end index
,'&');

Related

C++ code - problems with сode execution time

there is code.
#include "pch.h"
#include <algorithm>
#include <iostream>
#include <vector>
#include <stdlib.h>
using namespace std;
vector<int> SearchInt(vector<int> vec, int num) {
vector<int> temp(2);
sort(begin(vec), end(vec));
int j = 0;
for (int i : vec) {
if (i > num) {
temp[0] = i;
temp[1] = j;
return { temp };
}
//cout << i << " !>= " << num << endl ;
j++;
}
cout << "NO";
exit(0);
}
int main()
{
int n;
cin >> n;
vector<int> nums(n, 0);
vector<int> NewNums(n, 0);
for (int i = 0; i < n; i++) {
cin >> nums[i];
}
if (n != nums.size()) {
cout << "://";
return 0;
}
sort(begin(nums), end(nums));
NewNums[1] = nums[nums.size() - 1];
nums.erase(nums.begin() + nums.size() - 1);
NewNums[0] = nums[nums.size() - 1];
nums.erase(nums.begin() + nums.size() - 1);
for (int j = 2; j <= NewNums.size() - 1; j++) {
NewNums[j] = SearchInt(nums, NewNums[j-1]- NewNums[j-2])[0];
nums.erase(nums.begin() + SearchInt(nums, NewNums[j] - NewNums[j - 1])[1]);
}
if (NewNums[NewNums.size()-1] < NewNums[NewNums.size() - 2] + NewNums[0]) {
cout << "YES" << endl;
for (int i : NewNums) {
cout << i << " ";
}
return 0;
}
else {
cout << "NO";
return 0;
}
}
His task is to check whether it is possible from the given Each number is less than the sum of the two adjacent ones.
(each number is less than both of two adjacent ones)
But there is a problem - with a large number of numbers, the code takes too long. Please help me to optimize it, or just give some advice.
numbers cаn not be null.
time limit: 3.0 s
n <= 500000
You are given n numbers a1, a2,…, an. Is it possible to arrange them in a circle so that each number is strictly less than the sum of its neighbors?
For example, for the array [1,4,5,6,7,8], the left array satisfies the condition, while the right array does not, since 5≥4 + 1 and 8> 1 + 6.
Input data
The first line contains one integer n (3≤n≤105) - the number of numbers.
The second line contains n integers a1, a2,…, an (1≤ai≤109) - the numbers themselves. The given numbers are not necessarily different.
Output
If there is no solution, print "NO" on the first line.
If it exists, print "YES" on the first line. After that, on the second line print n numbers - the elements of the array in the order in which they will stand on the circle. The first and last elements you print are considered neighbors on the circle. If there are multiple solutions, output any of them. You can print a circle starting with any of the numbers.

First I'll only briefly analyze technical shortcomings of your code - without analyzing its meaning. After that I'll write my solution of the problem you defined.
Performance problems of your code are due to some strange decisions:
(1) passing std::vector<int> by value and not by reference to SearchInt function - this implies allocating and copying of the whole array on each function invocation,
(2) call SearchInt two times per loop iteration in function main instead of only one,
(3) sort array within each invocation of SearchInt - it is already sorted before the loop.
To be honest your code feels ridiculously time-consuming. I'm only wondering if that was your intention to make it as slow as you possibly can...
I will not analyze correctness of your code according to problem description. To be honest even after fixing technical shortcomings your code seems to me utterly sub-optimal and quite incomprehensible - so it is just easier to solve the problem from scratch to me.
The answer to the problem as defined is YES if the biggest number is smaller than the sum of the second big and the third big and NO otherwise - this follows from the fact that all numbers are positive (in range 1 - 109 according to newly found problem description). If the answer is YES then to make a circle that satisfies the problem description you just need in a sorted sequence of input numbers switch places of the biggest number and the next big one - that's all.
Here is my code for that (for slightly relaxed input format - I'm not checking if number of items is on a separate line and that all items are on the same line - but all correct inputs will be parsed just fine):
#include <set>
#include <iostream>
int main()
{
std::multiset<unsigned> input_set;
unsigned n;
if( !( std::cin >> n ) )
{
std::cerr << "Input error - failed to read number of items." << std::endl;
return 2;
}
if( n - 3U > 105U - 3U )
{
std::cerr << "Wrong number of items value - " << n << " (must be 3 to 105)" << std::endl;
return 2;
}
for( unsigned j = 0; j < n; ++j )
{
unsigned x;
if( !( std::cin >> x ) )
{
std::cerr << "Input error - failed to read item #" << j << std::endl;
return 2;
}
if( x - 1U > 109U - 1U )
{
std::cerr << "Wrong item #" << j << " value - " << x << " (must be 1 to 109)" << std::endl;
return 2;
}
input_set.insert(x);
}
std::multiset<unsigned>::const_reverse_iterator it = input_set.rbegin();
std::multiset<unsigned>::const_reverse_iterator it0 = it;
std::multiset<unsigned>::const_reverse_iterator it1 = ++it;
if( *it0 >= *it1 + *++it )
{
std::cout << "NO (the biggest number is bigger than the sum of the second big and the third big numbers)" << std::endl;
return 1;
}
std::cout << "YES" << std::endl;
std::cout << "Circle: " << *it1 << ' ' << *it0;
do
{
std::cout << ' ' << *it;
}
while( ++it != input_set.rend() );
std::cout << std::endl;
return 0;
}

C++ finding uint8_t in vector<uint8_t>

I have the following simple code. I declare a vector and initialize it with one value 21 in this case. And then i am trying to find that value in the vector using find. I can see that the element "21" in this case is in the vector since i print it in the for loop. However why the iterator of find does not resolve to true?
vector<uint8_t> v = { 21 };
uint8_t valueToSearch = 21;
for (vector<uint8_t>::const_iterator i = v.begin(); i != v.end(); ++i){
cout << unsigned(*i) << ' ' << endl;
}
auto it = find(v.begin(), v.end(), valueToSearch);
if ( it != v.end() )
{
string m = "valueToSearch was found in the vector " + valueToSearch;
cout << m << endl;
}

are you sure it doesn't work?
I just tried it:
#include<iostream> // std::cout
#include<vector>
#include <algorithm>
using namespace std;
int main()
{
vector<uint8_t> v = { 21 };
uint8_t valueToSearch = 21;
for (vector<uint8_t>::const_iterator i = v.begin(); i != v.end(); ++i){
cout << unsigned(*i) << ' ' << endl;
}
auto it = find(v.begin(), v.end(), valueToSearch);
if ( it != v.end() )
{// if we hit this condition, we found the element
string error = "valueToSearch was found in the vector ";
cout << error << int(valueToSearch) << endl;
}
return 0;
}
There are two small modifications:
in the last lines inside the "if", because you cannot add directly a
number to a string:
string m = "valueToSearch was found in the vector " + valueToSearch;
and it prints:
21
valueToSearch was found in the vector 21
while it's true that you cannot add a number to a string, cout
support the insertion operator (<<) for int types, but not uint8_t,
so you need to convert it to it.
cout << error << int(valueToSearch) << endl;
This to say that the find is working correctly, and it is telling you that it found the number in the first position, and for this, it != end (end is not a valid element, but is a valid iterator that marks the end of your container.)
Try it here

vector element compare c++

This program takes a word from text and puts it in a vector; after this it compares every element with the next one.
So I'm trying to compare element of a vector like this:
sort(words.begin(), words.end());
int cc = 1;
int compte = 1;
int i;
//browse the vector
for (i = 0; i <= words.size(); i++) { // comparison
if (words[i] == words[cc]) {
compte = compte + 1;
}
else { // displaying the word with comparison
cout << words[i] << " Repeated : " << compte; printf("\n");
compte = 1; cc = i;
}
}
My problem in the bounds: i+1 may exceed the vector borders. How to I handle this case?

You need to pay more attention on the initial conditions and bounds when you do iteration and comparing at the same time. It is usually a good idea to execute your code using pen and paper at first.
sort(words.begin(), words.end()); // make sure !words.empty()
int cc = 0; // index of the word we need to compare.
int compte = 1; // counting of the number of occurrence.
for( size_t i = 1; i < words.size(); ++i ){
// since you already count the first word, now we are at i=1
if( words[i] == words[cc] ){
compte += 1;
}else{
// words[i] is going to be different from words[cc].
cout << words[cc] << " Repeated : " << compte << '\n';
compte = 1;
cc = i;
}
}
// to output the last word with its repeat
cout << words[cc] << " Repeated : " << compte << '\n';
Just for some additional information.
There are better ways to count the number of word appearances.
For example, one can use unordered_map<string,int>.
Hope this help.

C++ uses zero-based indexing, e.g., an array of length 5 has indices: {0, 1, 2, 3, 4}. This means that index 5 is outside of the range.
Similarly, given an array arr of characters:
char arr[] = {'a', 'b', 'c', 'd', 'e'};
The loop for (int i = 0; i <= std::size(arr); ++i) { arr[i]; } will cause a read from outside of the range when i is equal to the length of arr, which causes undefined behaviour. To avoid this the loop must stop before i is equal to the length of the array.
for (std::size_t i = 0; i < std::size(arr); ++i) { arr[i]; }
Also note the use of std::size_t as type of the index counter. This is common practice in C++.
Now, let's finish with an example of how much easier this can be done using the standard library.
std::sort(std::begin(words), std::end(words));
std::map<std::string, std::size_t> counts;
std::for_each(std::begin(words), std::end(words), [&] (const auto& w) { ++counts[w]; });
Output using:
for (auto&& [word, count] : counts) {
std::cout << word << ": " << count << std::endl;
}

My problem in the bounds: i+1 may exceed the vector borders. How to I
handle this case?
In modern C++ coding, the problem of an index going past vector bounds can be avoided. Use the STL containers and avoid using indices. With a little effort devoted to learning how to use containers this way, you should never see these kind of 'off-by-one' errors again! As a benefit, the code becomes more easily understood and maintained.
#include <iostream>
#include <vector>
#include <map>
using namespace std;
int main() {
// a test vector of words
vector< string > words { "alpha", "gamma", "beta", "gamma" };
// map unique words to their appearance count
map< string, int > mapwordcount;
// loop over words
for( auto& w : words )
{
// insert word into map
auto ret = mapwordcount.insert( pair<string,int>( w, 1 ) );
if( ! ret.second )
{
// word already present
// so increment count
ret.first->second++;
}
}
// loop over map
for( auto& m : mapwordcount )
{
cout << "word '" << m.first << "' appears " << m.second << " times\n";
}
return 0;
}
Produces
word 'alpha' appears 1 times
word 'beta' appears 1 times
word 'gamma' appears 2 times
https://ideone.com/L9VZt6
If some book or person is teaching you to write code full of
for (i = 0; i < ...
then you should run away quickly and learn modern coding elsewhere.

Same repeated words counting using some C++ STL goodies via multiset and upper_bound:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int main()
{
std::vector<std::string> words{ "one", "two", "three", "two", "one" };
std::multiset<std::string> ms(words.begin(), words.end());
for (auto it = ms.begin(), end = ms.end(); it != end; it = ms.upper_bound(*it))
std::cout << *it << " is repeated: " << ms.count(*it) << " times" << std::endl;
return 0;
}
https://ideone.com/tPYw4a

I which situation will std::map<A,B> be faster than sorted std::vector<std::pair<A,B>>?

I was using a map in some code to store ordered data. I found out that for huge maps, destruction could take a while. In this code I had, replacing map by vector<pair> reduced processing time by 10000...
Finally, I was so surprised that I decided to compare map performances with sorted vector or pair.
And I'm surprised because I could not find a situation where map was faster than a sorted vector of pair (filled randomly and later sorted)...there must be some situations where map is faster....else what's the point in providing this class?
Here is what I tested:
Test one, compare map filling and destroying vs vector filling, sorting (because I want a sorted container) and destroying:
#include <iostream>
#include <time.h>
#include <cstdlib>
#include <map>
#include <vector>
#include <algorithm>
int main(void)
{
clock_t tStart = clock();
{
std::map<float,int> myMap;
for ( int i = 0; i != 10000000; ++i )
{
myMap[ ((float)std::rand()) / RAND_MAX ] = i;
}
}
std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
tStart = clock();
{
std::vector< std::pair<float,int> > myVect;
for ( int i = 0; i != 10000000; ++i )
{
myVect.push_back( std::make_pair( ((float)std::rand()) / RAND_MAX, i ) );
}
// sort the vector, as we want a sorted container:
std::sort( myVect.begin(), myVect.end() );
}
std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
return 0;
}
Compiled with g++ main.cpp -O3 -o main and got:
Time taken by map: 21.7142
Time taken by vect: 7.94725
map's 3 times slower...
Then, I said, "OK, vector is faster to fill and sort, but search will be faster with the map"....so I tested:
#include <iostream>
#include <time.h>
#include <cstdlib>
#include <map>
#include <vector>
#include <algorithm>
int main(void)
{
clock_t tStart = clock();
{
std::map<float,int> myMap;
float middle = 0;
float last;
for ( int i = 0; i != 10000000; ++i )
{
last = ((float)std::rand()) / RAND_MAX;
myMap[ last ] = i;
if ( i == 5000000 )
middle = last; // element we will later search
}
std::cout << "Map created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
float sum = 0;
for ( int i = 0; i != 10; ++i )
sum += myMap[ last ]; // search it
std::cout << "Sum is " << sum << std::endl;
}
std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
tStart = clock();
{
std::vector< std::pair<float,int> > myVect;
std::pair<float,int> middle;
std::pair<float,int> last;
for ( int i = 0; i != 10000000; ++i )
{
last = std::make_pair( ((float)std::rand()) / RAND_MAX, i );
myVect.push_back( last );
if ( i == 5000000 )
middle = last; // element we will later search
}
std::sort( myVect.begin(), myVect.end() );
std::cout << "Vector created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
float sum = 0;
for ( int i = 0; i != 10; ++i )
sum += (std::find( myVect.begin(), myVect.end(), last ))->second; // search it
std::cout << "Sum is " << sum << std::endl;
}
std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl;
return 0;
}
Compiled with g++ main.cpp -O3 -o main and got:
Map created after 19.5357
Sum is 1e+08
Time taken by map: 21.41
Vector created after 7.96388
Sum is 1e+08
Time taken by vect: 8.31741
Even search is apparently faster with the vector (10 searchs with the map took almost 2sec and it took only half a second with the vector)....
So:
Did I miss something?
Is my tests not correct/accurate?
Is map simply a class to avoid or is there really situations where map offers good performances?

Generally a map will be better when you're doing a lot of insertions and deletions interspersed with your lookups. If you build the data structure once and then only do lookups, a sorted vector will almost certainly be faster, if only because of processor cache effects. Since insertions and deletions at arbitrary locations in a vector are O(n) instead of O(log n), there will come a point where those will become the limiting factor.

std::find has linear time complexity whereas a map search has log N complexity.
When you find that one algorithm is 100000x faster than the other you should get suspicious! Your benchmark is invalid.
You need to compare realistic variants. Probably, you meant to compare map with a binary search. Run each of those variants for at least 1 second of CPU time so that you can realistically compare the results.
When a benchmark returns "0.00001 seconds" time spent you are well in the clock inaccuracy noise. This number means nothing.

Check if a string contains a string in C++

I have a variable of type std::string. I want to check if it contains a certain std::string. How would I do that?
Is there a function that returns true if the string is found, and false if it isn't?

Use std::string::find as follows:
if (s1.find(s2) != std::string::npos) {
std::cout << "found!" << '\n';
}
Note: "found!" will be printed if s2 is a substring of s1, both s1 and s2 are of type std::string.

You can try using the find function:
string str ("There are two needles in this haystack.");
string str2 ("needle");
if (str.find(str2) != string::npos) {
//.. found.
}

Starting from C++23 you can use std::string::contains
#include <string>
const auto haystack = std::string("haystack with needles");
const auto needle = std::string("needle");
if (haystack.contains(needle))
{
// found!
}

Actually, you can try to use boost library,I think std::string doesn't supply enough method to do all the common string operation.In boost,you can just use the boost::algorithm::contains:
#include <string>
#include <boost/algorithm/string.hpp>
int main() {
std::string s("gengjiawen");
std::string t("geng");
bool b = boost::algorithm::contains(s, t);
std::cout << b << std::endl;
return 0;
}

You can try this
string s1 = "Hello";
string s2 = "el";
if(strstr(s1.c_str(),s2.c_str()))
{
cout << " S1 Contains S2";
}

In the event if the functionality is critical to your system, it is actually beneficial to use an old strstr method. The std::search method within algorithm is the slowest possible. My guess would be that it takes a lot of time to create those iterators.
The code that i used to time the whole thing is
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <random>
#include <chrono>
std::string randomString( size_t len );
int main(int argc, char* argv[])
{
using namespace std::chrono;
const size_t haystacksCount = 200000;
std::string haystacks[haystacksCount];
std::string needle = "hello";
bool sink = true;
high_resolution_clock::time_point start, end;
duration<double> timespan;
int sizes[10] = { 10, 20, 40, 80, 160, 320, 640, 1280, 5120, 10240 };
for(int s=0; s<10; ++s)
{
std::cout << std::endl << "Generating " << haystacksCount << " random haystacks of size " << sizes[s] << std::endl;
for(size_t i=0; i<haystacksCount; ++i)
{
haystacks[i] = randomString(sizes[s]);
}
std::cout << "Starting std::string.find approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(haystacks[i].find(needle) != std::string::npos)
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
std::cout << "Starting strstr approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(strstr(haystacks[i].c_str(), needle.c_str()))
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
std::cout << "Starting std::search approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(std::search(haystacks[i].begin(), haystacks[i].end(), needle.begin(), needle.end()) != haystacks[i].end())
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
}
return 0;
}
std::string randomString( size_t len)
{
static const char charset[] = "abcdefghijklmnopqrstuvwxyz";
static const int charsetLen = sizeof(charset) - 1;
static std::default_random_engine rng(std::random_device{}());
static std::uniform_int_distribution<> dist(0, charsetLen);
auto randChar = [charset, &dist, &rng]() -> char
{
return charset[ dist(rng) ];
};
std::string result(len, 0);
std::generate_n(result.begin(), len, randChar);
return result;
}
Here i generate random haystacks and search in them the needle. The haystack count is set, but the length of strings within each haystack is increased from 10 in the beginning to 10240 in the end. Most of the time the program spends actually generating random strings, but that is to be expected.
The output is:
Generating 200000 random haystacks of size 10
Starting std::string.find approach
Processing of 200000 elements took 0.00358503 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0022727 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0346258 seconds.
Generating 200000 random haystacks of size 20
Starting std::string.find approach
Processing of 200000 elements took 0.00480959 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00236199 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0586416 seconds.
Generating 200000 random haystacks of size 40
Starting std::string.find approach
Processing of 200000 elements took 0.0082571 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00341435 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0952996 seconds.
Generating 200000 random haystacks of size 80
Starting std::string.find approach
Processing of 200000 elements took 0.0148288 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00399263 seconds.
Starting std::search approach
Processing of 200000 elements took 0.175945 seconds.
Generating 200000 random haystacks of size 160
Starting std::string.find approach
Processing of 200000 elements took 0.0293496 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00504251 seconds.
Starting std::search approach
Processing of 200000 elements took 0.343452 seconds.
Generating 200000 random haystacks of size 320
Starting std::string.find approach
Processing of 200000 elements took 0.0522893 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00850485 seconds.
Starting std::search approach
Processing of 200000 elements took 0.64133 seconds.
Generating 200000 random haystacks of size 640
Starting std::string.find approach
Processing of 200000 elements took 0.102082 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00925799 seconds.
Starting std::search approach
Processing of 200000 elements took 1.26321 seconds.
Generating 200000 random haystacks of size 1280
Starting std::string.find approach
Processing of 200000 elements took 0.208057 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0105039 seconds.
Starting std::search approach
Processing of 200000 elements took 2.57404 seconds.
Generating 200000 random haystacks of size 5120
Starting std::string.find approach
Processing of 200000 elements took 0.798496 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0137969 seconds.
Starting std::search approach
Processing of 200000 elements took 10.3573 seconds.
Generating 200000 random haystacks of size 10240
Starting std::string.find approach
Processing of 200000 elements took 1.58171 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0143111 seconds.
Starting std::search approach
Processing of 200000 elements took 20.4163 seconds.

If the size of strings is relatively big (hundreds of bytes or more) and c++17 is available, you might want to use Boyer-Moore-Horspool searcher (example from cppreference.com):
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
int main()
{
std::string in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit,"
" sed do eiusmod tempor incididunt ut labore et dolore magna aliqua";
std::string needle = "pisci";
auto it = std::search(in.begin(), in.end(),
std::boyer_moore_searcher(
needle.begin(), needle.end()));
if(it != in.end())
std::cout << "The string " << needle << " found at offset "
<< it - in.begin() << '\n';
else
std::cout << "The string " << needle << " not found\n";
}

If you don't want to use standard library functions, below is one solution.
#include <iostream>
#include <string>
bool CheckSubstring(std::string firstString, std::string secondString){
if(secondString.size() > firstString.size())
return false;
for (int i = 0; i < firstString.size(); i++){
int j = 0;
// If the first characters match
if(firstString[i] == secondString[j]){
int k = i;
while (firstString[i] == secondString[j] && j < secondString.size()){
j++;
i++;
}
if (j == secondString.size())
return true;
else // Re-initialize i to its original value
i = k;
}
}
return false;
}
int main(){
std::string firstString, secondString;
std::cout << "Enter first string:";
std::getline(std::cin, firstString);
std::cout << "Enter second string:";
std::getline(std::cin, secondString);
if(CheckSubstring(firstString, secondString))
std::cout << "Second string is a substring of the frist string.\n";
else
std::cout << "Second string is not a substring of the first string.\n";
return 0;
}

Good to use std::regex_search also. Stepping stone for making the search more generic. Below is an example with comments.
//THE STRING IN WHICH THE SUBSTRING TO BE FOUND.
std::string testString = "Find Something In This Test String";
//THE SUBSTRING TO BE FOUND.
auto pattern{ "In This Test" };
//std::regex_constants::icase - TO IGNORE CASE.
auto rx = std::regex{ pattern,std::regex_constants::icase };
//SEARCH THE STRING.
bool isStrExists = std::regex_search(testString, rx);
Need to include #include <regex>
For some reason, suppose the input string is observed something like "Find Something In This Example String", and interested to search either "In This Test" or "In This Example" then the search can be enhanced by simply adjusting the pattern as shown below.
//THE SUBSTRING TO BE FOUND.
auto pattern{ "In This (Test|Example)" };

what about
string response = "hello world";
string findMe = "world";
if(response.find(findMe) != string::npos)
{
//found
}

#include <algorithm> // std::search
#include <string>
using std::search; using std::count; using std::string;
int main() {
string mystring = "The needle in the haystack";
string str = "needle";
string::const_iterator it;
it = search(mystring.begin(), mystring.end(),
str.begin(), str.end()) != mystring.end();
// if string is found... returns iterator to str's first element in mystring
// if string is not found... returns iterator to mystring.end()
if (it != mystring.end())
// string is found
else
// not found
return 0;
}

From so many answers in this website I didn't find out a clear answer so in 5-10 minutes I figured it out the answer myself.
But this can be done in two cases:
Either you KNOW the position of the sub-string you search for in the string
Either you don't know the position and search for it, char by char...
So, let's assume we search for the substring "cd" in the string "abcde", and we use the simplest substr built-in function in C++
for 1:
#include <iostream>
#include <string>
using namespace std;
int i;
int main()
{
string a = "abcde";
string b = a.substr(2,2); // 2 will be c. Why? because we start counting from 0 in a string, not from 1.
cout << "substring of a is: " << b << endl;
return 0;
}
for 2:
#include <iostream>
#include <string>
using namespace std;
int i;
int main()
{
string a = "abcde";
for (i=0;i<a.length(); i++)
{
if (a.substr(i,2) == "cd")
{
cout << "substring of a is: " << a.substr(i,2) << endl; // i will iterate from 0 to 5 and will display the substring only when the condition is fullfilled
}
}
return 0;
}

This is a simple function
bool find(string line, string sWord)
{
bool flag = false;
int index = 0, i, helper = 0;
for (i = 0; i < line.size(); i++)
{
if (sWord.at(index) == line.at(i))
{
if (flag == false)
{
flag = true;
helper = i;
}
index++;
}
else
{
flag = false;
index = 0;
}
if (index == sWord.size())
{
break;
}
}
if ((i+1-helper) == index)
{
return true;
}
return false;
}

You can also use the System namespace.
Then you can use the contains method.
#include <iostream>
using namespace System;
int main(){
String ^ wholeString = "My name is Malindu";
if(wholeString->ToLower()->Contains("malindu")){
std::cout<<"Found";
}
else{
std::cout<<"Not Found";
}
}

Note: I know that the question requires a function, which means the user is trying to find something simpler. But still I post it in case anyone finds it useful.
Approach using a Suffix Automaton. It accepts a string (haystack), and after that you can input hundreds of thousands of queries (needles) and the response will be very fast, even if the haystack and/or needles are very long strings.
Read about the data structure being used here: https://en.wikipedia.org/wiki/Suffix_automaton
#include <bits/stdc++.h>
using namespace std;
struct State {
int len, link;
map<char, int> next;
};
struct SuffixAutomaton {
vector<State> st;
int sz = 1, last = 0;
SuffixAutomaton(string& s) {
st.assign(s.size() * 2, State());
st[0].len = 0;
st[0].link = -1;
for (char c : s) extend(c);
}
void extend(char c) {
int cur = sz++, p = last;
st[cur].len = st[last].len + 1;
while (p != -1 && !st[p].next.count(c)) st[p].next[c] = cur, p = st[p].link;
if (p == -1)
st[cur].link = 0;
else {
int q = st[p].next[c];
if (st[p].len + 1 == st[q].len)
st[cur].link = q;
else {
int clone = sz++;
st[clone].len = st[p].len + 1;
st[clone].next = st[q].next;
st[clone].link = st[q].link;
while (p != -1 && st[p].next[c] == q) st[p].next[c] = clone, p = st[p].link;
st[q].link = st[cur].link = clone;
}
}
last = cur;
}
};
bool is_substring(SuffixAutomaton& sa, string& query) {
int curr = 0;
for (char c : query)
if (sa.st[curr].next.count(c))
curr = sa.st[curr].next[c];
else
return false;
return true;
}
// How to use:
// Execute the code
// Type the first string so the program reads it. This will be the string
// to search substrings on.
// After that, type a substring. When pressing enter you'll get the message showing the
// result. Continue typing substrings.
int main() {
string S;
cin >> S;
SuffixAutomaton sa(S);
string query;
while (cin >> query) {
cout << "is substring? -> " << is_substring(sa, query) << endl;
}
}

We can use this method instead.
Just an example from my projects.
Refer the code.
Some extras are also included.
Look to the if statements!
/*
Every C++ program should have an entry point. Usually, this is the main function.
Every C++ Statement ends with a ';' (semi-colon)
But, pre-processor statements do not have ';'s at end.
Also, every console program can be ended using "cin.get();" statement, so that the console won't exit instantly.
*/
#include <string>
#include <bits/stdc++.h> //Can Use instead of iostream. Also should be included to use the transform function.
using namespace std;
int main(){ //The main function. This runs first in every program.
string input;
while(input!="exit"){
cin>>input;
transform(input.begin(),input.end(),input.begin(),::tolower); //Converts to lowercase.
if(input.find("name") != std::string::npos){ //Gets a boolean value regarding the availability of the said text.
cout<<"My Name is AI \n";
}
if(input.find("age") != std::string::npos){
cout<<"My Age is 2 minutes \n";
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

string manipulation - character counting - c++

int cnt[125]; // ASCII '&' = 46, '|' = 124 cnt['&'] = 0; for(int i = start; i < end; i++) { cnt[S.at(i)]++; } cout << cnt['&'] << endl; if is expensive as it compares and branches. So it would be better.

You can use the std::count from algorithm standard C++ library. Just include the header <algorithm> std::string s{"&|&&|&&&|&"}; // https://en.cppreference.com/w/cpp/algorithm/count auto const count = std::count(s.begin() + 1 // starting index ,s.begin() + 8 // one pass end index ,'&');

Related

C++ code - problems with сode execution time

C++ finding uint8_t in vector<uint8_t>

vector element compare c++

I which situation will std::map<A,B> be faster than sorted std::vector<std::pair<A,B>>?

Check if a string contains a string in C++

Categories

Resources