I have a variable of type std::string. I want to check if it contains a certain std::string. How would I do that?
Is there a function that returns true if the string is found, and false if it isn't?
Use std::string::find as follows:
if (s1.find(s2) != std::string::npos) {
std::cout << "found!" << '\n';
}
Note: "found!" will be printed if s2 is a substring of s1, both s1 and s2 are of type std::string.
You can try using the find function:
string str ("There are two needles in this haystack.");
string str2 ("needle");
if (str.find(str2) != string::npos) {
//.. found.
}
Starting from C++23 you can use std::string::contains
#include <string>
const auto haystack = std::string("haystack with needles");
const auto needle = std::string("needle");
if (haystack.contains(needle))
{
// found!
}
Actually, you can try to use boost library,I think std::string doesn't supply enough method to do all the common string operation.In boost,you can just use the boost::algorithm::contains:
#include <string>
#include <boost/algorithm/string.hpp>
int main() {
std::string s("gengjiawen");
std::string t("geng");
bool b = boost::algorithm::contains(s, t);
std::cout << b << std::endl;
return 0;
}
You can try this
string s1 = "Hello";
string s2 = "el";
if(strstr(s1.c_str(),s2.c_str()))
{
cout << " S1 Contains S2";
}
In the event if the functionality is critical to your system, it is actually beneficial to use an old strstr method. The std::search method within algorithm is the slowest possible. My guess would be that it takes a lot of time to create those iterators.
The code that i used to time the whole thing is
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <random>
#include <chrono>
std::string randomString( size_t len );
int main(int argc, char* argv[])
{
using namespace std::chrono;
const size_t haystacksCount = 200000;
std::string haystacks[haystacksCount];
std::string needle = "hello";
bool sink = true;
high_resolution_clock::time_point start, end;
duration<double> timespan;
int sizes[10] = { 10, 20, 40, 80, 160, 320, 640, 1280, 5120, 10240 };
for(int s=0; s<10; ++s)
{
std::cout << std::endl << "Generating " << haystacksCount << " random haystacks of size " << sizes[s] << std::endl;
for(size_t i=0; i<haystacksCount; ++i)
{
haystacks[i] = randomString(sizes[s]);
}
std::cout << "Starting std::string.find approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(haystacks[i].find(needle) != std::string::npos)
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
std::cout << "Starting strstr approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(strstr(haystacks[i].c_str(), needle.c_str()))
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
std::cout << "Starting std::search approach" << std::endl;
start = high_resolution_clock::now();
for(size_t i=0; i<haystacksCount; ++i)
{
if(std::search(haystacks[i].begin(), haystacks[i].end(), needle.begin(), needle.end()) != haystacks[i].end())
{
sink = !sink; // useless action
}
}
end = high_resolution_clock::now();
timespan = duration_cast<duration<double>>(end-start);
std::cout << "Processing of " << haystacksCount << " elements took " << timespan.count() << " seconds." << std::endl;
}
return 0;
}
std::string randomString( size_t len)
{
static const char charset[] = "abcdefghijklmnopqrstuvwxyz";
static const int charsetLen = sizeof(charset) - 1;
static std::default_random_engine rng(std::random_device{}());
static std::uniform_int_distribution<> dist(0, charsetLen);
auto randChar = [charset, &dist, &rng]() -> char
{
return charset[ dist(rng) ];
};
std::string result(len, 0);
std::generate_n(result.begin(), len, randChar);
return result;
}
Here i generate random haystacks and search in them the needle. The haystack count is set, but the length of strings within each haystack is increased from 10 in the beginning to 10240 in the end. Most of the time the program spends actually generating random strings, but that is to be expected.
The output is:
Generating 200000 random haystacks of size 10
Starting std::string.find approach
Processing of 200000 elements took 0.00358503 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0022727 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0346258 seconds.
Generating 200000 random haystacks of size 20
Starting std::string.find approach
Processing of 200000 elements took 0.00480959 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00236199 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0586416 seconds.
Generating 200000 random haystacks of size 40
Starting std::string.find approach
Processing of 200000 elements took 0.0082571 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00341435 seconds.
Starting std::search approach
Processing of 200000 elements took 0.0952996 seconds.
Generating 200000 random haystacks of size 80
Starting std::string.find approach
Processing of 200000 elements took 0.0148288 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00399263 seconds.
Starting std::search approach
Processing of 200000 elements took 0.175945 seconds.
Generating 200000 random haystacks of size 160
Starting std::string.find approach
Processing of 200000 elements took 0.0293496 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00504251 seconds.
Starting std::search approach
Processing of 200000 elements took 0.343452 seconds.
Generating 200000 random haystacks of size 320
Starting std::string.find approach
Processing of 200000 elements took 0.0522893 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00850485 seconds.
Starting std::search approach
Processing of 200000 elements took 0.64133 seconds.
Generating 200000 random haystacks of size 640
Starting std::string.find approach
Processing of 200000 elements took 0.102082 seconds.
Starting strstr approach
Processing of 200000 elements took 0.00925799 seconds.
Starting std::search approach
Processing of 200000 elements took 1.26321 seconds.
Generating 200000 random haystacks of size 1280
Starting std::string.find approach
Processing of 200000 elements took 0.208057 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0105039 seconds.
Starting std::search approach
Processing of 200000 elements took 2.57404 seconds.
Generating 200000 random haystacks of size 5120
Starting std::string.find approach
Processing of 200000 elements took 0.798496 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0137969 seconds.
Starting std::search approach
Processing of 200000 elements took 10.3573 seconds.
Generating 200000 random haystacks of size 10240
Starting std::string.find approach
Processing of 200000 elements took 1.58171 seconds.
Starting strstr approach
Processing of 200000 elements took 0.0143111 seconds.
Starting std::search approach
Processing of 200000 elements took 20.4163 seconds.
If the size of strings is relatively big (hundreds of bytes or more) and c++17 is available, you might want to use Boyer-Moore-Horspool searcher (example from cppreference.com):
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
int main()
{
std::string in = "Lorem ipsum dolor sit amet, consectetur adipiscing elit,"
" sed do eiusmod tempor incididunt ut labore et dolore magna aliqua";
std::string needle = "pisci";
auto it = std::search(in.begin(), in.end(),
std::boyer_moore_searcher(
needle.begin(), needle.end()));
if(it != in.end())
std::cout << "The string " << needle << " found at offset "
<< it - in.begin() << '\n';
else
std::cout << "The string " << needle << " not found\n";
}
If you don't want to use standard library functions, below is one solution.
#include <iostream>
#include <string>
bool CheckSubstring(std::string firstString, std::string secondString){
if(secondString.size() > firstString.size())
return false;
for (int i = 0; i < firstString.size(); i++){
int j = 0;
// If the first characters match
if(firstString[i] == secondString[j]){
int k = i;
while (firstString[i] == secondString[j] && j < secondString.size()){
j++;
i++;
}
if (j == secondString.size())
return true;
else // Re-initialize i to its original value
i = k;
}
}
return false;
}
int main(){
std::string firstString, secondString;
std::cout << "Enter first string:";
std::getline(std::cin, firstString);
std::cout << "Enter second string:";
std::getline(std::cin, secondString);
if(CheckSubstring(firstString, secondString))
std::cout << "Second string is a substring of the frist string.\n";
else
std::cout << "Second string is not a substring of the first string.\n";
return 0;
}
Good to use std::regex_search also. Stepping stone for making the search more generic. Below is an example with comments.
//THE STRING IN WHICH THE SUBSTRING TO BE FOUND.
std::string testString = "Find Something In This Test String";
//THE SUBSTRING TO BE FOUND.
auto pattern{ "In This Test" };
//std::regex_constants::icase - TO IGNORE CASE.
auto rx = std::regex{ pattern,std::regex_constants::icase };
//SEARCH THE STRING.
bool isStrExists = std::regex_search(testString, rx);
Need to include #include <regex>
For some reason, suppose the input string is observed something like "Find Something In This Example String", and interested to search either "In This Test" or "In This Example" then the search can be enhanced by simply adjusting the pattern as shown below.
//THE SUBSTRING TO BE FOUND.
auto pattern{ "In This (Test|Example)" };
what about
string response = "hello world";
string findMe = "world";
if(response.find(findMe) != string::npos)
{
//found
}
#include <algorithm> // std::search
#include <string>
using std::search; using std::count; using std::string;
int main() {
string mystring = "The needle in the haystack";
string str = "needle";
string::const_iterator it;
it = search(mystring.begin(), mystring.end(),
str.begin(), str.end()) != mystring.end();
// if string is found... returns iterator to str's first element in mystring
// if string is not found... returns iterator to mystring.end()
if (it != mystring.end())
// string is found
else
// not found
return 0;
}
From so many answers in this website I didn't find out a clear answer so in 5-10 minutes I figured it out the answer myself.
But this can be done in two cases:
Either you KNOW the position of the sub-string you search for in the string
Either you don't know the position and search for it, char by char...
So, let's assume we search for the substring "cd" in the string "abcde", and we use the simplest substr built-in function in C++
for 1:
#include <iostream>
#include <string>
using namespace std;
int i;
int main()
{
string a = "abcde";
string b = a.substr(2,2); // 2 will be c. Why? because we start counting from 0 in a string, not from 1.
cout << "substring of a is: " << b << endl;
return 0;
}
for 2:
#include <iostream>
#include <string>
using namespace std;
int i;
int main()
{
string a = "abcde";
for (i=0;i<a.length(); i++)
{
if (a.substr(i,2) == "cd")
{
cout << "substring of a is: " << a.substr(i,2) << endl; // i will iterate from 0 to 5 and will display the substring only when the condition is fullfilled
}
}
return 0;
}
This is a simple function
bool find(string line, string sWord)
{
bool flag = false;
int index = 0, i, helper = 0;
for (i = 0; i < line.size(); i++)
{
if (sWord.at(index) == line.at(i))
{
if (flag == false)
{
flag = true;
helper = i;
}
index++;
}
else
{
flag = false;
index = 0;
}
if (index == sWord.size())
{
break;
}
}
if ((i+1-helper) == index)
{
return true;
}
return false;
}
You can also use the System namespace.
Then you can use the contains method.
#include <iostream>
using namespace System;
int main(){
String ^ wholeString = "My name is Malindu";
if(wholeString->ToLower()->Contains("malindu")){
std::cout<<"Found";
}
else{
std::cout<<"Not Found";
}
}
Note: I know that the question requires a function, which means the user is trying to find something simpler. But still I post it in case anyone finds it useful.
Approach using a Suffix Automaton. It accepts a string (haystack), and after that you can input hundreds of thousands of queries (needles) and the response will be very fast, even if the haystack and/or needles are very long strings.
Read about the data structure being used here: https://en.wikipedia.org/wiki/Suffix_automaton
#include <bits/stdc++.h>
using namespace std;
struct State {
int len, link;
map<char, int> next;
};
struct SuffixAutomaton {
vector<State> st;
int sz = 1, last = 0;
SuffixAutomaton(string& s) {
st.assign(s.size() * 2, State());
st[0].len = 0;
st[0].link = -1;
for (char c : s) extend(c);
}
void extend(char c) {
int cur = sz++, p = last;
st[cur].len = st[last].len + 1;
while (p != -1 && !st[p].next.count(c)) st[p].next[c] = cur, p = st[p].link;
if (p == -1)
st[cur].link = 0;
else {
int q = st[p].next[c];
if (st[p].len + 1 == st[q].len)
st[cur].link = q;
else {
int clone = sz++;
st[clone].len = st[p].len + 1;
st[clone].next = st[q].next;
st[clone].link = st[q].link;
while (p != -1 && st[p].next[c] == q) st[p].next[c] = clone, p = st[p].link;
st[q].link = st[cur].link = clone;
}
}
last = cur;
}
};
bool is_substring(SuffixAutomaton& sa, string& query) {
int curr = 0;
for (char c : query)
if (sa.st[curr].next.count(c))
curr = sa.st[curr].next[c];
else
return false;
return true;
}
// How to use:
// Execute the code
// Type the first string so the program reads it. This will be the string
// to search substrings on.
// After that, type a substring. When pressing enter you'll get the message showing the
// result. Continue typing substrings.
int main() {
string S;
cin >> S;
SuffixAutomaton sa(S);
string query;
while (cin >> query) {
cout << "is substring? -> " << is_substring(sa, query) << endl;
}
}
We can use this method instead.
Just an example from my projects.
Refer the code.
Some extras are also included.
Look to the if statements!
/*
Every C++ program should have an entry point. Usually, this is the main function.
Every C++ Statement ends with a ';' (semi-colon)
But, pre-processor statements do not have ';'s at end.
Also, every console program can be ended using "cin.get();" statement, so that the console won't exit instantly.
*/
#include <string>
#include <bits/stdc++.h> //Can Use instead of iostream. Also should be included to use the transform function.
using namespace std;
int main(){ //The main function. This runs first in every program.
string input;
while(input!="exit"){
cin>>input;
transform(input.begin(),input.end(),input.begin(),::tolower); //Converts to lowercase.
if(input.find("name") != std::string::npos){ //Gets a boolean value regarding the availability of the said text.
cout<<"My Name is AI \n";
}
if(input.find("age") != std::string::npos){
cout<<"My Age is 2 minutes \n";
}
}
}
Related
I have a string S = "&|&&|&&&|&" where we should get the number of '&' between 2 indexes of the string.
So the output with the 2 indexes 1 and 8 here should be 5. And here's my brute force style code:
std::size_t cnt = 0;
for(i = start; i < end; i++) {
if (S[i] == '&')
cnt++;
}
cout << cnt << endl;
The problem I faced was my code was getting timed out for larger inputs in a coding platform. Can anyone suggest a better way to reduce the time complexity here?
I decided to try several approaches, including the ones proposed by the other two answers to this question. I made several assumptions about the input, with the goal to find a fast implementation for a single large string that would only be searched once for a single character. For a string that will have multiple queries made against it for more than one character, I suggest building a segment tree as suggested in a comment by user Jefferson Rondan.
I used std::chrono::steady_clock::now() to measure implementation times.
Assumptions
The program prompts the user for a string size, search character, start index, and end index.
The inputs are well formed (start <= end <= size).
The string is randomly generated from a uniform distribution of ascii characters between ' ' and '~'.
The underlying data in the string object is stored contiguously in memory.
Approaches
Naive for loop: an index variable is incremented, and the string is indexed, character by character, using the index.
Iterator loop: a string iterator is used, dereferenced at each iteration, and compared to the search character.
Underlying data pointer: a pointer to the underlying character array of the string is found, and this is incremented in a loop. The dereferenced pointer is compared to the search character.
Index mapping (as suggested by GyuHyeon Choi): An int-type array of max printable ascii character elements is initialized to 0, and for each character encountered while iterating through the array, that corresponding index is incremented by one. At the end, the index of the search character is dereferenced to find how many of that character were found.
Just use std::count (as suggested by Atul Sharma): Just use the builting counting functionality.
Recast the underlying data as a pointer to a larger data type and iterate: the underlying const char* const pointer that holds the string data is reinterpreted as a pointer to a wider data type (in this case a pointer to type uint64_t). Each dereferenced uint64_t is then XOR'ed with a mask made up of the search character, and each byte of the uint64_t masked with 0xff. This reduces the number of pointer increments needed to step through the entire array.
Results
For a 1,000,000,000 size string searching from index 5 to 999999995, the results of each method follow:
Naive for loop: 843 ms
Iterator loop: 818 ms
Underlying data pointer: 750 ms
Index mapping (as suggested by GyuHyeon Choi): 929 ms
Just use std::count (as suggested by Atul Sharma): 819 ms
Recast the underlying data as a pointer to a larger data type and iterate: 664 ms
Discussion
The best performing implementation was my own data pointer recast, which completed in a little over 75% of the time it took for the naive solution. The fastest "simple" solution is pointer iteration over the underlying data structure. This method has the benefit of being easy to implement, understand, and maintain. The index mapping method, despite being marketed as 2x faster than the naive solution, didn't see such speedups on my benchmarks. The std::count method is about as fast as the by-hand pointer iteration, and even simpler to implement. If speed really matters, consider recasting the underlying pointer. Otherwise, stick with std::count.
The Code
#include <algorithm>
#include <iostream>
#include <random>
#include <string>
#include <functional>
#include <typeinfo>
#include <chrono>
int main(int argc, char** argv)
{
std::random_device device;
std::mt19937 generator(device());
std::uniform_int_distribution<short> short_distribution(' ', '~');
auto next_short = std::bind(short_distribution, generator);
std::string random_string = "";
size_t string_size;
size_t start_search_index;
size_t end_search_index;
char search_char;
std::cout << "String size: ";
std::cin >> string_size;
std::cout << "Search char: ";
std::cin >> search_char;
std::cout << "Start search index: ";
std::cin >> start_search_index;
std::cout << "End search index: ";
std::cin >> end_search_index;
if (!(start_search_index <= end_search_index && end_search_index <= string_size))
{
std::cout << "Requires start_search <= end_search <= string_size\n";
return 0;
}
for (size_t i = 0; i < string_size; i++)
{
random_string += static_cast<char>(next_short());
}
// naive implementation
size_t count = 0;
auto start_time = std::chrono::steady_clock::now();
for (size_t i = start_search_index; i < end_search_index; i++)
{
if (random_string[i] == search_char)
count++;
}
auto end_time = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Naive implementation. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterator solution
count = 0;
start_time = std::chrono::steady_clock::now();
for (auto it = random_string.begin() + start_search_index, end = random_string.begin() + end_search_index;
it != end;
it++)
{
if (*it == search_char)
count++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterator solution. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterate on data
count = 0;
start_time = std::chrono::steady_clock::now();
for (auto it = random_string.data() + start_search_index,
end = random_string.data() + end_search_index;
it != end; it++)
{
if (*it == search_char)
count++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterate on underlying data solution. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// use index mapping
count = 0;
size_t count_array['~']{ 0 };
start_time = std::chrono::steady_clock::now();
for (size_t i = start_search_index; i < end_search_index; i++)
{
count_array[random_string.at(i)]++;
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
count = count_array[search_char];
std::cout << "Using index mapping. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// using std::count
count = 0;
start_time = std::chrono::steady_clock::now();
count = std::count(random_string.begin() + start_search_index
, random_string.begin() + end_search_index
, search_char);
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Using std::count. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
// Iterate on larger type than underlying char
count = end_search_index - start_search_index;
start_time = std::chrono::steady_clock::now();
// Iterate through underlying data until the address is modulo 4
{
auto it = random_string.data() + start_search_index;
auto end = random_string.data() + end_search_index;
// iterate until we reach a pointer that is divisible by 8
for (; (reinterpret_cast<std::uintptr_t>(it) & 0x07) && it != end; it++)
{
if (*it != search_char)
count--;
}
// iterate on 8-byte sized chunks until we reach the last full chunk that is 8-byte aligned
auto chunk_it = reinterpret_cast<const uint64_t* const>(it);
auto chunk_end = reinterpret_cast<const uint64_t* const>((reinterpret_cast<std::uintptr_t>(end)) & ~0x07);
uint64_t search_xor_mask = 0;
for (size_t i = 0; i < 64; i+=8)
{
search_xor_mask |= (static_cast<uint64_t>(search_char) << i);
}
constexpr uint64_t all_ones = 0xff;
for (; chunk_it != chunk_end; chunk_it++)
{
auto chunk = (*chunk_it ^ search_xor_mask);
if (chunk & (all_ones << 56))
count--;
if (chunk & (all_ones << 48))
count--;
if (chunk & (all_ones << 40))
count--;
if (chunk & (all_ones << 32))
count--;
if (chunk & (all_ones << 24))
count--;
if (chunk & (all_ones << 16))
count--;
if (chunk & (all_ones << 8))
count--;
if (chunk & (all_ones << 0))
count--;
}
// iterate on the remainder of the bytes, should be no more than 7, tops
it = reinterpret_cast<decltype(it)>(chunk_it);
for (; it != end; it++)
{
if (*it != search_char)
count--;
}
}
end_time = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time);
std::cout << "Iterate on underlying data with larger step sizes. Found: " << count << "\n";
std::cout << "Elapsed time: " << duration.count() << "us.\n\n";
}
Example Output
String size: 1000000000
Search char: &
Start search index: 5
End search index: 999999995
Naive implementation. Found: 10527454
Elapsed time: 843179us.
Iterator solution. Found: 10527454
Elapsed time: 817762us.
Iterate on underlying data solution. Found: 10527454
Elapsed time: 749513us.
Using index mapping. Found: 10527454
Elapsed time: 928560us.
Using std::count. Found: 10527454
Elapsed time: 819412us.
Iterate on underlying data with larger step sizes. Found: 10527454
Elapsed time: 664338us.
int cnt[125]; // ASCII '&' = 46, '|' = 124
cnt['&'] = 0;
for(int i = start; i < end; i++) {
cnt[S.at(i)]++;
}
cout << cnt['&'] << endl;
if is expensive as it compares and branches. So it would be better.
You can use the std::count from algorithm standard C++ library.
Just include the header <algorithm>
std::string s{"&|&&|&&&|&"};
// https://en.cppreference.com/w/cpp/algorithm/count
auto const count = std::count(s.begin() + 1 // starting index
,s.begin() + 8 // one pass end index
,'&');
std::count returns a value and I need this value to reset to 0 for all characters in the variable 'counter' after executing the inner for loop. Goal is to count how many times a character appears. If this character appears twice in the string, add one to variable 'd'. If it appears three times, add one to variable 'e'.
Not sure what else to try or if there is potentially a better function to achieve my result.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstring>
int main() {
std::string data;
std::vector<std::string> myString;
std::vector<char> myChar;
int d = 0, e = 0;
std::ifstream inFile;
inFile.open("C:\\Users\\Administrator\\Desktop\\c++ files\\input2.txt");
if (!inFile) {
std::cout << "oops";
}
for (int i = 0; i < 1; i++) {
inFile >> data;
std::copy(data.begin(), data.end(), std::back_inserter(myChar)); //copy from string data to vector myChar via back inserter.
char counter = 'a';
for (int i = 0; i < 26; i++) {
int myCount = std::count(myChar.begin(), myChar.end(), counter);
if (myCount == 2) {
d++;
}
else if (myCount == 3) {
e++;
}
std::cout << "Counter : " << counter << " myCount : " << myCount << "\n";
counter++;
}
}
std::cout << "d is: " << d << "\n";
std::cout << "e is: " << e << "\n";
return 0;
}
input file -- https://adventofcode.com/2018/day/2
The program works correctly on first inner for loop, but second and after return values that are too high (albeit correct) for the 'myCount' variable.
std::count doesn't just give you a random value, it gives you a specific value based on the contents of the range you give it. You can't change that behaviour, not should you want to.
Instead, look at that range. Why does std::count gives values that you don't expect? They are either "too high" or they are "correct" and cannot be both; fortunately they are the latter.
This is because you repeatedly std::back_insert to the vector inside your loop. As the loop progresses, you keep counting the old characters from the last time!
If you first cleared myChar then you wouldn't have the problem. Or, ideally, bring the declaration of myChar inside the loop.
A few fixes
1) On error the program should end, not continue:
if (!inFile)
{
std::cout << "oops";
return 1;
}
2) a)myChar is accumulating all the chars of all previously read words, so it has to be cleared before use with every pass of the loop, best to move it's declaration into the block required;
b) if you're using a counter just to count but not using it, better to iterate over the data - in this case get rid of i and iterate with chars checked_char:
while (inFile >> data)
{
std::vector< char > myChar;
std::copy(data.begin(),
data.end(),
std::back_inserter(myChar)); //copy from string data to vector myChar via back inserter.
for (char checked_char = 'a'; checked_char <= 'z'; ++checked_char)
{
int myCount = std::count(myChar.begin(), myChar.end(), checked_char);
if (myCount == 2)
{
d++;
}
else if (myCount == 3)
{
e++;
}
std::cout << "Counter : " << checked_char << " myCount : " << myCount << "\n";
}
}
This program takes a word from text and puts it in a vector; after this it compares every element with the next one.
So I'm trying to compare element of a vector like this:
sort(words.begin(), words.end());
int cc = 1;
int compte = 1;
int i;
//browse the vector
for (i = 0; i <= words.size(); i++) { // comparison
if (words[i] == words[cc]) {
compte = compte + 1;
}
else { // displaying the word with comparison
cout << words[i] << " Repeated : " << compte; printf("\n");
compte = 1; cc = i;
}
}
My problem in the bounds: i+1 may exceed the vector borders. How to I handle this case?
You need to pay more attention on the initial conditions and bounds when you do iteration and comparing at the same time. It is usually a good idea to execute your code using pen and paper at first.
sort(words.begin(), words.end()); // make sure !words.empty()
int cc = 0; // index of the word we need to compare.
int compte = 1; // counting of the number of occurrence.
for( size_t i = 1; i < words.size(); ++i ){
// since you already count the first word, now we are at i=1
if( words[i] == words[cc] ){
compte += 1;
}else{
// words[i] is going to be different from words[cc].
cout << words[cc] << " Repeated : " << compte << '\n';
compte = 1;
cc = i;
}
}
// to output the last word with its repeat
cout << words[cc] << " Repeated : " << compte << '\n';
Just for some additional information.
There are better ways to count the number of word appearances.
For example, one can use unordered_map<string,int>.
Hope this help.
C++ uses zero-based indexing, e.g., an array of length 5 has indices: {0, 1, 2, 3, 4}. This means that index 5 is outside of the range.
Similarly, given an array arr of characters:
char arr[] = {'a', 'b', 'c', 'd', 'e'};
The loop for (int i = 0; i <= std::size(arr); ++i) { arr[i]; } will cause a read from outside of the range when i is equal to the length of arr, which causes undefined behaviour. To avoid this the loop must stop before i is equal to the length of the array.
for (std::size_t i = 0; i < std::size(arr); ++i) { arr[i]; }
Also note the use of std::size_t as type of the index counter. This is common practice in C++.
Now, let's finish with an example of how much easier this can be done using the standard library.
std::sort(std::begin(words), std::end(words));
std::map<std::string, std::size_t> counts;
std::for_each(std::begin(words), std::end(words), [&] (const auto& w) { ++counts[w]; });
Output using:
for (auto&& [word, count] : counts) {
std::cout << word << ": " << count << std::endl;
}
My problem in the bounds: i+1 may exceed the vector borders. How to I
handle this case?
In modern C++ coding, the problem of an index going past vector bounds can be avoided. Use the STL containers and avoid using indices. With a little effort devoted to learning how to use containers this way, you should never see these kind of 'off-by-one' errors again! As a benefit, the code becomes more easily understood and maintained.
#include <iostream>
#include <vector>
#include <map>
using namespace std;
int main() {
// a test vector of words
vector< string > words { "alpha", "gamma", "beta", "gamma" };
// map unique words to their appearance count
map< string, int > mapwordcount;
// loop over words
for( auto& w : words )
{
// insert word into map
auto ret = mapwordcount.insert( pair<string,int>( w, 1 ) );
if( ! ret.second )
{
// word already present
// so increment count
ret.first->second++;
}
}
// loop over map
for( auto& m : mapwordcount )
{
cout << "word '" << m.first << "' appears " << m.second << " times\n";
}
return 0;
}
Produces
word 'alpha' appears 1 times
word 'beta' appears 1 times
word 'gamma' appears 2 times
https://ideone.com/L9VZt6
If some book or person is teaching you to write code full of
for (i = 0; i < ...
then you should run away quickly and learn modern coding elsewhere.
Same repeated words counting using some C++ STL goodies via multiset and upper_bound:
#include <iostream>
#include <vector>
#include <string>
#include <set>
int main()
{
std::vector<std::string> words{ "one", "two", "three", "two", "one" };
std::multiset<std::string> ms(words.begin(), words.end());
for (auto it = ms.begin(), end = ms.end(); it != end; it = ms.upper_bound(*it))
std::cout << *it << " is repeated: " << ms.count(*it) << " times" << std::endl;
return 0;
}
https://ideone.com/tPYw4a
The program adds different strings to a set. The iterator checks the set for a certain string, what i want to achieve is to get the line where the iterator finds this certain string. Is it possible to get this with a set or do i have to create a vector? The reason i use sets is because i also want not to have duplicates in the end. It is a bit confusing i know, i hope you'll understand.
Edit: i want to get the line number of the original element already existing in the set, if a duplicate is found
#include <iostream>
#include <set>
#include <string>
#include <vector>
#include <atlstr.h>
#include <sstream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
set<string> test;
set<string>::iterator it;
vector<int> crossproduct(9, 0);
for (int i = 0; i < 6; i++)
{
crossproduct[i] = i+1;
}
crossproduct[6] = 1;
crossproduct[7] = 2;
crossproduct[8] = 3;
for (int i = 0; i < 3; i++)
{
ostringstream cp; cp.precision(1); cp << fixed;
ostringstream cp1; cp1.precision(1); cp1 << fixed;
ostringstream cp2; cp2.precision(1); cp2 << fixed;
cp << crossproduct[i*3];
cp1 << crossproduct[i*3+1];
cp2 << crossproduct[i*3+2];
string cps(cp.str());
string cps1(cp1.str());
string cps2(cp2.str());
string cpstot = cps + " " + cps1 + " " + cps2;
cout << "cpstot: " << cpstot << endl;
it = test.find(cpstot);
if (it != test.end())
{
//Display here the line where "1 2 3" was found
cout << "i: " << i << endl;
}
test.insert(cpstot);
}
set<string>::iterator it2;
for (it2 = test.begin(); it2 != test.end(); ++it2)
{
cout << *it2 << endl;
}
cin.get();
return 0;
}
"Line number" is not very meaningful to a std::set<string>,
because as you add more strings to the set you may change the
order in which the existing strings are iterated through
(which is about as much of a "line number" as the set::set template
itself will give you).
Here's an alternative that may work better:
std::map<std::string, int> test.
The way you use this is you keep a "line counter" n somewhere.
Each time you need to put a new string cpstot in your set,
you have code like this:
std::map<std::string>::iterator it = test.find(cpstot);
if (it == test.end())
{
test[cpstot] = n;
// alternatively, test.insert(std::pair<std::string, int>(cpstot, n))
++n;
}
else
{
// this prints out the integer that was associated with cpstot in the map
std::cout << "i: " << it->second;
// Notice that we don't try to insert cpstot into the map in this case.
// It's already there, and we don't want to change its "line number",
// so there is nothing good we can accomplish by an insertion.
// It's a waste of effort to even try.
}
If you set n = 0 before you started putting any strings in test then
(and don't mess with the value of n in any other way)
then you will end up with strings at "line numbers" 0, 1, 2, etc.
in test and n will be the number of strings stored in test.
By the way, neither std::map<std::string, int>::iterator nor
std::set<std::string>::iterator is guaranteed to iterate through
the strings in the sequence in which they were first inserted.
Instead, what you'll get is the strings in whatever order the
template's comparison object puts the string values.
(I think by default you get them back in lexicographic order,
that is, "alphabetized".)
But when you store the original "line number" of each string in
std::map<std::string, int> test, when you are ready to
print out the list of strings you can copy the string-integer pairs
from test to a new object, std::map<int, std::string> output_sequence,
and now (assuming you do not override the default comparison object)
when you iterate through output_sequence you will get its
contents sorted by line number.
(You will then probably want to get the string
from the second field of the iterator.)
I need to progressively build a string and am trying to find the best way to do it. The maximum it can grow to is about 10k and hence was planning to do something like this:
const unsigned long long configSize = 10240; //Approximation
void makeMyConfig() {
std::string resp;
std::string input;
resp.reserve(configSize);
while ( getInput(input) ) {
resp += input;
}
if ( resp.length > configSize )
log << "May need to adjust configSize. Validate" << endl;
if (!sendConfig(resp)){
log << "Error sending config" << endl;
}
}
getInput may read from file/tcp conn or ftp and is decided at runtime. It receives const char* and puts it into a string (which I may be able to avoid but left it for convenience)
However, I heard there is a much efficient way of doing with string streams but not sure how to do. Appreciate any insights.
Looks pretty great to me.
You're pre-allocating the buffer, avoiding ongoing allocations and copies.
You already have it implemented
Don't optimize until you actually have a performance problem and can measure any performance changes. You might end up making it worse without realizing!
For anyone interested in the difference between reserving and not reserving a string, consider this code:
#include <iostream>
#include <string>
#include <ctime>
using namespace std;
const int BIG = 10000000;
const int EXTRA = 100;
int main() {
string s1;
s1.reserve( BIG );
clock_t t = clock();
for ( int i = 0; i < BIG + EXTRA; i++ ) {
s1 += 'x';
}
cout << clock() - t << endl;
string s2;
t = clock();
for ( int i = 0; i < BIG + EXTRA; i++ ) {
s2 += 'x';
}
cout << clock() - t << endl;
}
In the first case, the string is reserved, in the second not. This produces the timings:
60
78
for g++ compiled with -O2.