trouble with case insensitive partial matching two strings? - c++

I am trying to partial match two strings without case sensitivity. I do not want to use the boost libraries as most people don't have them on their compilers. I tried .find() that is in the standard c++ library, but it only checks if the user inputted string is in the first word of the string that is already there. like, if I have a dvd named Harry_Potter_Goblet, if I search for "goblet" or "Goblet", the program doesnt show Harry_Potter_Goblet as a result, only if I do case sensitive search for "Harry", then the resul shows a match. What am I doing wrong here? Here is my code.

Define a case-insensitive character comparison function:
#include <cctype>
bool case_insensitive_comp(char lhs, char rhs)
{
return std::toupper(lhs) == std::toupper(rhs);
}
Then, use std::search to find the sub-string within the larger string.
#include <algorithm>
....
std::string s1="Harry_Potter_Goblet";
std::string s2 = "goblet";
bool found = std::search(s1.begin(), s1.end(), s2.begin(), s2.end(), case_insensitive_comp) != s1.end();

Related

Spell-Checker c++; checking whether the word is in the dictionary text

I'm new to this place so I might not ask my questions clearly. But I really do need help. So my homework is to create a spell-checker in C++ which takes a text file and compares it with another dictionary text file. I have a specific snippet of code that I need solving. I created a help function isValidWord which takes in the dictionary which is of container unordered_set and a string. The function will return true if the string matches a word in the dictionary. I'll just show you what I have so far. My problem is the string doesn't match with everything in the library and just checks only some in the dictionary.
#include <unordered_set>
#include <string>
bool isValidWord(std::unordered_set<std::string> dictionary, std::string& word) {
std::unordered_set<std::string>::iterator it;
for (it = dictionary.begin(); it != dictionary.end(); ++it) {
if (word == *it) {
return true;
}
}
return false;
}
There is a built-in find method in unordered_set that you can utilize instead of reinventing the wheel. Also it is a good idea to pass dictionary by reference to avoid pointless copying.
You may simplify your method with (I added missing const and reference too):
bool isValidWord(const std::unordered_set<std::string>& dictionary,
const std::string& word)
{
return dictionary.count(word) != 0;
}
Your current implementation is correct but not performant:
you pass your dictionary by copy (so you recreate it each time).
You use linear search whereas container provide better complexity. (std::unordered_set::find or std::unordered_set::count).
Final note, if you want to retrieve all invalid words, you may look at std::set_difference (require to have words and dictionary sorted).

Prebuilt function to find character sequence in a string?

I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.

Parse string into and unknown amount of regex groups in C++

I know the exact format of the text I should be getting. In particular, it should match a regex with a variable number of groups.
I want to use the C++ regex library to determine (a) if it is valid text, and (b) to parse those groups into a vector. How can I do this? I can find examples online to do (a), but not (b).
#include <string>
#include <regex>
#include <vector>
bool parse_this_text(std::string & text, std::vector<std::string> & group) {
// std::string text_regex = "^([a-z]*)(,[0-9]+)*$"
// if the text matches the regex, return true and parse each group into the vector
// else return false
???
}
Such that the following lines of code return the expected results.
std::vector<std::string> group;
parse_this_text("green,1", group);
// should return true with group = {"green", ",1"};
parse_this_text("yellow", group);
// should return true with group = {"yellow"};
parse_this_text("red,1,2,3", group);
// should return true with group = {"red", ",1", ",2", ",3"};
parse_this_text("blue,1.0,3.0,1,a", group);
// should return false (since it doesn't match the regex)
Thanks!
(?=^([a-zA-Z]*)(?:\,\d+)+$)^.*?(?:((?:\,\d+)+)).*?$
You can use this.This will first validate using lookahead and then return 2 groups.
1) containing name
2) containing all the rest of integers (This can be easily split) or you can use re.findall here
Though it doesnot answer your question fully , it might be of help.
Have a look.
http://regex101.com/r/wE3dU7/3
One option is to scan the string twice, the first time to check for validity and the second time to split it into fields. With the example in the OP, you don't really need regexen to split the line, once you know that it is correct; you can simply split on commas. But for the sake of exposition, you could use a std::regex_token_iterator (assuming you have a C++ library which supports those), something like this:
bool parse_this_text(const std::string& s, std::vector<std::string>& result) {
static const std::regex check("[[:alpha:]][[:alnum:]]*(,[[:digit:]])*",
std::regex_constants::nosubs);
static const std::regex split(",");
if (!std::regex_match(s, check))
return false;
std::sregex_token_iterator tokens(s.begin(), s.end(), split, -1);
result.clear();
std::copy(tokens, std::sregex_token_iterator(), std::back_inserter(result));
return true;
}
For more complicated cases, or applications in which the double scan is undesired, you can tokenize using successive calls to std::regex_search(), supplying the end of the previous match as the starting point, and std::regex_constants::continuous as the match flags; that will anchor each search to the character after the previous match. You could, in that case, use a std::regex_iterator, but I'm not convinced that the resulting code is any simpler.

How to order strings case-insensitively (not lexicographically)?

I'm attempting to order a list input from a file alphabetically (not lexicographically). So, if the list were:
C
d
A
b
I need it to become:
A
b
C
d
Not the lexicographic ordering:
A
C
b
d
I'm using string variables to hold the input, so I'm looking for some way to modify the strings I'm comparing to all uppercase or lowercase, or if there's some easier way to force an alphabetic comparison, please impart that wisdom. Thanks!
I should also mention that we are limited to the following libraries for this assignment: iostream, iomanip, fstream, string, as well as C libraries, like cstring, cctype, etc.
It looks like I'm just going to have to defeat this problem via some very tedious method of character extraction and toppering for each string.
Converting the individual strings to upper case and comparing them is not made particularly worse by being restricted from using algorithm, iterator, etc. The comparison logic is about four lines of code. Even though it would be nice not to have to write those four lines having to write a sorting algorithm is far more difficult and tedious. (Well, assuming that the usual C version of toupper is acceptable in the first place.)
Below I show a simple strcasecmp() implementation and then put it to use in a complete program which uses restricted libraries. The implementation of strcasecmp() itself doesn't use restricted libraries.
#include <string>
#include <cctype>
#include <iostream>
void toupper(std::string &s) {
for (char &c : s)
c = std::toupper(c);
}
bool strcasecmp(std::string lhs, std::string rhs) {
toupper(lhs); toupper(rhs);
return lhs < rhs;
}
// restricted libraries used below
#include <algorithm>
#include <iterator>
#include <vector>
// Example usage:
// > ./a.out <<< "C d A b"
// A b C d
int main() {
std::vector<std::string> input;
std::string word;
while(std::cin >> word) {
input.push_back(word);
}
std::sort(std::begin(input), std::end(input), strcasecmp);
std::copy(std::begin(input), std::end(input),
std::ostream_iterator<std::string>(std::cout, " "));
std::cout << '\n';
}
You don't have to modify the strings before sorting. You can sort them in place with a case-insensitive single character comparator and std::sort:
bool case_insensitive_cmp(char lhs, char rhs) {
return ::toupper(static_cast<unsigned char>(lhs) <
::toupper(static_cast<unsigned char>(rhs);
}
std::string input = ....;
std::sort(input.begin(), input.end(), case_insensitive_cmp);
std::vector<string> vec {"A", "a", "lorem", "Z"};
std::sort(vec.begin(),
vec.end(),
[](const string& s1, const string& s2) -> bool {
return strcasecmp(s1.c_str(), s2.c_str()) < 0 ? true : false;
});
Use strcasecmp() as comparison function in qsort().
I am not completely sure how to write it, but what you want to do is convert the strings to lower or uppercase.
If the strings are in an array to begin with, you would run through the list, and save the indexes in order in an (int) array.
If you're just comparing letters, then a terrible hack which will work is to mask the upper two bits off each character. Then upper and lower case letters fall on top of each other.

vector of string, need to search for a particular character in it

I have a vector of strings, i need to search for a particular character in it
vector<string> users;
users.push_back("user25_5");
users.push_back("user65_6");
users.push_back("user95_9");
I have to search for the number 65 in the vector
the find library of vectors just searches for the entire string, it does not work for particular character in the string
You can use std::find_if with a suitable functor:
bool has_65(const std::string& s)
{
// search for "65" and return bool
}
then
auto it = std::find_if(users.begin(), users.end(), has_65);
For finding strings inside strings, have a look at std::string::find.