Checking if a word is contained within an array - c++

I want to check for a word contained within a bigger string, but not necessarily in the same order. Example: The program will check if the word "car" exists in "crqijfnsa". In this case, it does, because the second string contains c, a, and r.

You could build a map containing the letters "car" with the values set to 0. Cycle through the array with all the letters and if it is a letter in the word "car" change the value to 1. If all the keys in the map have a value greater than 0, than the word can be constructed. Try implementing this.

An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once;
So, actually what you are looking for is an algorithm to check if two words are "Anagrams" are not.
Following thread provides psuedocode that might be helpful
Finding anagrams for a given word

A very primitive code would be something like this:
for ( std::string::iterator it=str.begin(); it!=str.end(); ++it)
for ( std::string::iterator it2=str2.begin(); it2!=str2.end(); ++it2) {
if (*it == *it2) {
str2.erase(it);
break;
}
}
if (str2.empty())
found = true;

You could build up a table of count of characters of each letter in the word you are searching for, then decrement those counts as you work through the search string.
bool IsWordInString(const char* word, const char* str)
{
// build up table of characters in word to match
std::array<int, 256> cword = {0};
for(;*word;++word) {
cword[*word]++;
}
// work through str matching characters in word
for(;*str; ++str) {
if (cword[*str] > 0) {
cword[*str]--;
}
}
return std::accumulate(cword.begin(), cword.end(), 0) == 0;
}
It's also possible to return as soon as you find a match, but the code isn't as simple.
bool IsWordInString(const char* word, const char* str)
{
// empty string
if (*word == 0)
return true;
// build up table of characters in word to match
int unmatched = 0;
char cword[256] = {0};
for(;*word;++word) {
cword[*word]++;
unmatched++;
}
// work through str matching characters in word
for(;*str; ++str) {
if (cword[*str] > 0) {
cword[*str]--;
unmatched--;
if (unmatched == 0)
return true;
}
}
return false;
}
Some test cases
"" in "crqijfnsa" => 1
"car" in "crqijfnsa" => 1
"ccar" in "crqijfnsa" => 0
"ccar" in "crqijfnsac" => 1

I think the easiest (and probably fastest, test that youself :) ) implementation would be done with std::includes:
std::string testword {"car"};
std::string testarray {"crqijfnsa"};
std::sort(testword.begin(),testword.end());
std::sort(testarray.begin(),testarray.end());
bool is_in_array = std::includes(testarray.begin(),testarray.end(),
testword.begin(),testword.end());
This also handles all cases of duplicate letters correctly.
The complexity of this approach should be O(n * log n) where n is the length of testarray. (sort is O(n log n) and includes has linear complexity.

Related

Generate string lexicographically larger than input

Given an input string A, is there a concise way to generate a string B that is lexicographically larger than A, i.e. A < B == true?
My raw solution would be to say:
B = A;
++B.back();
but in general this won't work because:
A might be empty
The last character of A may be close to wraparound, in which case the resulting character will have a smaller value i.e. B < A.
Adding an extra character every time is wasteful and will quickly in unreasonably large strings.
So I was wondering whether there's a standard library function that can help me here, or if there's a strategy that scales nicely when I want to start from an arbitrary string.
You can duplicate A into B then look at the final character. If the final character isn't the final character in your range, then you can simply increment it by one.
Otherwise you can look at last-1, last-2, last-3. If you get to the front of the list of chars, then append to the length.
Here is my dummy solution:
std::string make_greater_string(std::string const &input)
{
std::string ret{std::numeric_limits<
std::string::value_type>::min()};
if (!input.empty())
{
if (std::numeric_limits<std::string::value_type>::max()
== input.back())
{
ret = input + ret;
}
else
{
ret = input;
++ret.back();
}
}
return ret;
}
Ideally I'd hope to avoid the explicit handling of all special cases, and use some facility that can more naturally handle them. Already looking at the answer by #JosephLarson I see that I could increment more that the last character which would improve the range achievable without adding more characters.
And here's the refinement after the suggestions in this post:
std::string make_greater_string(std::string const &input)
{
constexpr char minC = ' ', maxC = '~';
// Working with limits was a pain,
// using ASCII typical limit values instead.
std::string ret{minC};
auto rit = input.rbegin();
while (rit != input.rend())
{
if (maxC == *rit)
{
++rit;
if (rit == input.rend())
{
ret = input + ret;
break;
}
}
else
{
ret = input;
++(*(ret.rbegin() + std::distance(input.rbegin(), rit)));
break;
}
}
return ret;
}
Demo
You can copy the string and append some letters - this will produce a lexicographically larger result.
B = A + "a"

C++ comparing a string to a word

I'm trying to build an program that takes a word input from user and then converts it to a number for the system to work off.
int FunctionName(string WordInput)
{
int ReferenceNumber;
if (WordInput.find("Specific Word"))
{
return ReferenceNumber = 0;
}
else if (WordInput.find("Specific Word 2"))
{
return ReferenceNumber = 1;
}
list goes on and has and else to get out.
right now the code is going into the first "if" statement and regardless of what I put as the "WordInput" it returns 0.
can ".find" not be used this way? Is there a way to do this without having to make each Specific Word as its own string?
thanks
if (WordInput.find("Specific Word"))
std::string::find() returns a value of type size_t. It gives you the position of the search term in the string. If the search string isn't found it returns string::npos.
Instead of a long list of if and else if statements, you could use a map here. The lookup will be faster, and if you use an std::unordered_map it will be O(1) (constant time).
So your function can be simplified to:
int FunctionName(const std::string &word)
{
//Map of string and reference number
const static std::unordered_map<std::string, int> WordMap =
{ {"one", 1 }, { "two", 2 }, { "three", 3 } };
auto it = WordMap.find(word);
if (it != WordMap.end())
return it->second;
return 0;
}
Here each entry in the map is a pair of (string, int). If a given word is found, the corresponding int is returned, else 0.
Working version here.
Consider this example:
std::string str = "Hello world, what's up today on SO?";
if (str.find("world") != std::string::npos)
return 1;
else if (str.find("what's up") != std::string::npos)
return 2;
else
return 0; // or something
Here we've used std::string::npos to see if the specified word exists till the last of the string. If yes, then assigns a value for future use as you've asked to do.

What's time complexity of this algorithm for getting all word ladders?

Word Ladder
Given two words (start and end), and a dictionary, find all
shortest transformation sequence(s) from start to end,
such that: Only one letter can be changed at a time Each intermediate word must exist in the dictionary
For example, Given: start = "hit" end = "cog" dict = ["hot","dot","dog","lot","log"] Return
[
["hit","hot","dot","dog","cog"],
["hit","hot","lot","log","cog"]
]
Note: All words have the same length. All words contain only lowercase alphabetic characters.
Personally I think, the time complexity for this algorithm depends on
the input(start, end, dict), can not write out like time complexity =
O(?).
Thank you AbcAeffchen. The tight time complexity =
O(len*N*(26^(N/2))), len is the length of the given start string(or
end string), N is the number of elements of the dict.(Assume C++
unordered_set is implemented by has set). Pleas check details below.
Idea of this solution: BFS(Map) + DFS.[C++]
#include <vector>
#include <unordered_map>
#include <deque>
#include <string>
using namespace std;
struct Node {
string val;
int level;
vector<Node *> prevs;
Node (string val, int level): val(val), level(level) {};
};
class Solution {
public:
vector<vector<string>> findLadders(string start, string end, unordered_set<string> &dict) {
vector<vector<string>> list;
// Input validation.
if (start.compare(end) == 0) {
vector<string> subList = {start, end};
list.push_back(subList);
return list;
}
deque<string> queue;
unordered_map<string, Node *> map;
queue.push_back(start);
Node *start_node = new Node(start, 0);
map.emplace(start, start_node);
while (!queue.empty()) {
// Dequeue.
string curr_string = queue.front();
queue.pop_front();
Node *curr_node = map.find(curr_string)->second;
int curr_level = curr_node->level;
int len = curr_string.length();
if (curr_string.compare(end) == 0) {
// Find the end.
vector<string> subList;
subList.push_back(curr_node->val);
getAllPathes(curr_node, list, subList);
return list;
}
// Iterate all children.
for (int i = 0; i < len; i ++) {
char curr_original_char = curr_string[i];
// Have a try.
for (char c = 'a'; c <= 'z'; c ++) {
if (c == curr_original_char) continue;
curr_string[i] = c;
if (dict.find(curr_string) != dict.end()) {
if (map.find(curr_string) == map.end()) {
// The new string has not been visited.
Node *child = new Node(curr_string, curr_level + 1);
// Add the parents of the current into prevs.
child->prevs.push_back(curr_node);
// Enqueue.
queue.push_back(curr_string);
map.emplace(curr_string, child);
} else {
// The new string has been visited.
Node *child = map.find(curr_string)->second;
if (child->level == curr_level + 1) {
child->prevs.push_back(curr_node);
}
}
}
}
// Roll back.
curr_string[i] = curr_original_char;
}
}
return list;
}
void getAllPathes(Node *end, vector<vector<string>> &list, vector<string> &subList) {
// Base case.
if (end == NULL) {
// Has been get to the top level, no topper one.
vector<string> one_rest(subList);
list.push_back(one_rest);
return;
}
vector<Node *> prevs = end->prevs;
if (prevs.size() > 0) {
for (vector<Node *>::iterator it = prevs.begin();
it != prevs.end(); it ++) {
// Have a try.
subList.insert(subList.begin(), (*it)->val);
// Do recursion.
getAllPathes((*it), list, subList);
// Roll back.
subList.erase(subList.begin());
}
} else {
// Do recursion.
getAllPathes(NULL, list, subList);
}
}
};
Split-up
Lets split the complexity of in three parts:
Find a next word in transformation sequence
The length of a shortest transformation sequence
The number of transformation sequences
Assumptions
Let n be the length of the given words and N the number of words in the dictionary. Lets also assume that the dictionary is sorted.
1. Part
Then you can find a next word in O(n ⋅ 26 ⋅ log(N)) = O(n log N) steps.
n characters in your word, that can be changed.
26 possible changes per character.
log(N) to look up, if this word exists in the dictionary.
2. Part
How long can a shortest transforamtion sequence be?
Example: Let the start word be "aa", the end word "zz" and the dictionary
["ab", "bb", "bc", "cc", ..].
This example needs 26 transformations. I think you can build worst case inputs that needs something like 26n-1 transformations.
But this depends on the words in the dictionary. So the worst case will be N, ie. all words in the dictionary are used.
3. Part
How many different sequences exists?
Everytime you looking for the next word in the sequence, it is possible to find 26 different next steps. But only for the first half of the lenght of the shortest sequence, because this holds also if you switch start and end word. So there could be up to O(26N/2) different sequences, as long as the worst case lenght of a shortest sequence is O(N).
Summary
O(n log N) finding the next transformation in a sequence.
O(N) transformations per sequence
O(26N/2) different sequences.
In total you get O(26N/2 N log N).
Notice
This holds only if your dictionary can contain any sequence of characters as "words". If you only allow words, that exists in a real language, you can use statistics to proof a better complexity.
The length of a shortest sequence is correlated to the number of different sequences. If you have a lot of words in your dictionary the sequence can become very long, but if you have too many, you maybe get more different sequences but they become also shorter. Maybe one can use some statistic to proof here also a better complexity.
I hope this helps

Match a structure against set of patterns

I need to match a structure against set of patterns and take some action for each match.
Patterns should support wildcards and i need to determine which patterns is matching incoming structure, example set:
action=new_user email=*
action=del_user email=*
action=* email=*#gmail.com
action=new_user email=*#hotmail.com
Those patterns can be added/removed at realtime. There can be thousands connections, each have its own pattern and i need to notify each connection about I have received A structure which is matching. Patterns are not fully regex, i just need to match a string with wildcards * (which simple match any number of characters).
When server receives message (lets call it message A) with structure action=new_user email=testuser#gmail.com and i need to find out that patterns 1 and 3 are matching this message, then i should perform action for each pattern that match (send this structure A to corresponding connection).
How this can be done with most effecient way? I can iterate this patterns and check one-by-one, but im looking for more effecient and thread-safe way to do this. Probably its possible to group those patterns to reduce checking.. Any suggestions how this can be done?
UPD: Please note i want match multiplie patterns(thousands) aganst fixed "string"(actually a struct), not vice versa. In other words, i want to find which patterns are fitting into given structure A.
Convert the patterns to regular expressions, and match them using RE2, which is written in C++ and is one of the fastest.
Actually, if I understood correctly, the fourth pattern is redundant, since the first pattern is more general, and includes every string that is matched by the fourth. That leaves only 3 patterns, which can be easly checked by this function:
bool matches(const char* name, const char* email)
{
return strstr(name, "new_user") || strstr(name, "del_user") || strstr(email, "#gmail.com");
}
And if you prefer to parse whole string, not just match the values of action and email, then the following function should do the trick:
bool matches2(const char* str)
{
bool match = strstr(str, "action=new_user ") || strstr(str, "action=del_user ");
if (!match)
{
const char* emailPtr = strstr(str, "email=");
if (emailPtr)
{
match = strstr(emailPtr, "#gmail.com");
}
}
return match;
}
Note that the strings you put as arguments must be escaped with \0. You can read about strstr function here.
This strglobmatch supports * and ? only.
#include <string.h> /* memcmp, index */
char* strfixstr(char *s1, char *needle, int needle_len) {
int l1;
if (!needle_len) return (char *) s1;
if (needle_len==1) return index(s1, needle[0]);
l1 = strlen(s1);
while (l1 >= needle_len) {
l1--;
if (0==memcmp(s1,needle,needle_len)) return (char *) s1;
s1++;
}
return 0;
}
int strglobmatch(char *str, char *glob) {
/* Test: strglobmatch("almamxyz","?lmam*??") */
int min;
while (glob[0]!='\0') {
if (glob[0]!='*') {
if ((glob[0]=='?') ? (str[0]=='\0') : (str[0]!=glob[0])) return 0;
glob++; str++;
} else { /* a greedy search is adequate here */
min=0;
while (glob[0]=='*' || glob[0]=='?') min+= *glob++=='?';
while (min--!=0) if (*str++=='\0') return 0;
min=0; while (glob[0]!='*' && glob[0]!='?' && glob[0]!='\0') { glob++; min++; }
if (min==0) return 1; /* glob ends with star */
if (!(str=strfixstr(str, glob-min, min))) return 0;
str+=min;
}
}
return str[0]=='\0';
}
If all you want is wildcart matching, then you might try this algorithm. The point is to check all substrings that is not a wildcart to be subsequent in a string.
patterns = ["*#gmail.com", "akalenuk#*", "a*a#*", "ak*#gmail.*", "ak*#hotmail.*", "*#*.ua"]
string = "akalenuk#gmail.com"
preprocessed_patterns = [p.split('*') for p in patterns]
def match(s, pp):
i = 0
for w in pp:
wi = s.find(w, i)
if wi == -1:
return False
i = wi+len(w)
return i == len(s) or pp[-1] == ''
print [match(string, pp) for pp in preprocessed_patterns]
But it might be best to still use regexp in case you would need something more than a wildcart in a future.

skipping a character in an array if previous character is the same

I'm iterating through an array of chars to do some manipulation. I want to "skip" an iteration if there are two adjacent characters that are the same.
e.g. x112abbca
skip----------^
I have some code but it's not elegant and was wondering if anyone can think of a better way? I have a few case's in the switch statement and would be happy if I didn't have to use an if statement inside the switch.
switch(ent->d_name[i])
{
if(i > 0 && ent->d_name[i] == ent->d_name[i-1])
continue;
case ' ' :
...//code omited
case '-' :
...
}
By the way, an instructor once told me "avoid continues unless much code is required to replace them". Does anyone second that? (Actually he said the same about breaks)
Put the if outside the switch.
While I don't have anything against using continue and break, you can certainly bypass them this time without much code at all: simply revert the condition and put the whole switch statement within the if-block.
Answering the rectified question: what's clean depends on many factors. How long is this list of characters to consider: should you iterate over them yourself, or perhaps use a utility function from <algorithm>? In any case, if you are referring to the same character multiple times, perhaps you ought to give it an alias:
std::string interesting_chars("-_;,.abc");
// ...
for (i...) {
char cur = abc->def[i];
if (cur != prev || interesting_chars.find(cur) == std::string::npos)
switch (current) // ...
char chr = '\0';
char *cur = &ent->d_name[0];
while (*cur != '\0') {
if (chr != *cur) {
switch(...) {
}
}
chr = *cur++;
}
If you can clobber the content of the array you are analyzing, you can preprocess it with std::unique():
ent->erase(std::unique(ent->d_name.begin(), ent->d_name.end()), ent.end());
This should replace all sequences of identical characters by a single copy and shorten the string appropriately. If you can't clobber the string itself, you can create a copy with character sequences of just one string:
std::string tmp;
std::unique_copy(ent->d_name.begin(), ent->d_name.end(), std::back_inserter(tmp));
In case you are using C-strings: use std::string instead. If you insist in using C-strings and don't want to play with std::unique() a nicer approach than yours is to use a previous character, initialized to 0 (this can't be part of a C-string, after all):
char previous(0);
for (size_t i(0); ent->d_name[i]; ++i) {
if (ent->d_name[i] != previous) {
switch (previous = ent->d_name[i]) {
...
}
}
}
I hope I understand what you are trying to do, anyway this will find matching pairs and skip over a match.
char c_anotherValue[] = "Hello World!";
int i_len = strlen(c_anotherValue);
for(int i = 0; i < i_len-1;i++)
{
if(c_anotherValue[i] == c_anotherValue[i+1])
{
printf("%c%c",c_anotherValue[i],c_anotherValue[i+1]);
i++;//this will force the loop to skip
}
}