String matching implementation

String matching implementation - c++

I wrote the below code to check whether a certin string exists in a text or no. The issue is that match() function always returns false even the pattern exists in the text.
int main(){
char *text="hello my name is plapla";
char *patt="my";
cout<<match(patt,text);
system("pause");
return 0;
}
bool match(char* patt,char* text){
int textLoc=0, pattLoc=0, textStart=0;
while(textLoc <= (int) strlen(text) && pattLoc <= (int)strlen(patt)){
if( *(patt+pattLoc) == *(text+textLoc) ){
textLoc= textLoc+1;
pattLoc= pattLoc+1;
}
else{
textStart=textStart+1;
textLoc=textStart;
pattLoc=0;
}
}
if(pattLoc > (int) strlen(patt))
return true;
else return false;
}

Try pattLoc < (int)strlen(patt) in your while loop.
Loop will stop when pattLoc == 2, so you avoid comparing the '\0' of "my" with the ' ' of "hello my name is pala", which set pattloc to 0 and return false.
Or better, use string substr.

The obvious solution is:
bool
match( std::string const& pattern, std::string const& text )
{
return std::search( text.begin(), text.end(),
pattern.begin(), pattern.end() )
!= text.end();
}
This is idiomatic C++, and the way I would expect any C++ programmer to
write it, at least in a professional environment.
If the goal is to learn how to write such a function, then of course,
the above isn't much of a solution. The solution then should be mroe
divide and conquer; there's much too much in match for you to put it
in one function. I'd recommend something like:
bool
startsWith( std::string::const_iterator begin,
std::string::const_iterator end,
std::string const& pattern )
{
return end - begin >= pattern.size()
&& std::equal( pattern.begin(), pattern.end(), begin );
}
bool
match( std::string const& pattern, std::string const& text )
{
std::string::const_iterator current = text.begin();
while ( current != text.end()
&& !startsWith( begin, text.end(), pattern ) ) {
++ current;
}
return current != text.end();
}
This can obviously be improved; for example, there's no point in
continuing in the while loop when the length of the remaining text is
less than the length of the pattern.
And if your professor insists on your using char const* (if he insists
on char*, then he's totally incompetent, and should be fired), this
can easily be rewritten to do so: just replace all calls to begin with
the pointer, and all calls to end with pointer + strlen(pointer).

I have solved the problem:
while(textLoc <= (int) strlen(text) && pattLoc <= (int)strlen(patt))
should be:
while(textLoc < (int) strlen(text) && pattLoc < (int)strlen(patt))
and
if(pattLoc > (int) strlen(patt))
to
if(pattLoc >= (int) strlen(patt))

Related

Determine If String Has All Same Character

Is there a function like find_first_not_of that returns true or false as opposed to a position? I do not need the position, but rather whether or not the string contains all of the same char.

You could write your own function:
bool all_chars_same(string testStr) {
char letter = testStr[0];
for (int i = 1; i < testStr.length(); i++) {
if (testStr[i] != letter)
return false;
}
return true;
}
Or use the built in find_first_not_of:
bool all_chars_same(string testStr) {
return testStr.find_first_not_of(testStr[0]) == string::npos;
}

Just check the value returned by find_first_not_of for string::npos:
// needs to check if str.size() > 0
bool all_same = str.find_first_not_of(str[0]) == string::npos;
Alternatively, since you're looking for a single character, there's also std::all_of.
bool all_same = std::all_of(str.cbegin(), str.cend(), [&](char c){ return str[0] == c; });

use yourstring.find(keyword);
you can get detail here
http://www.cplusplus.com/reference/string/string/find/

I would recomend a define, it is the faster way.
#define find_not_of(a) (a.find_first_not_of(a[0]) != std::string::npos)

The best way and the quickest i can of is create a map and put the first value of the string as the key of the map. then iterate through the string and once you find one characters that is not in the map , you are done
bool allSameCharacters ( string s){
unordered_map < char , int> m;
// m.reserve(s.size());
m[s[0]]++;
for (char c : s ){
if (m.find(c) == m.end()) return false;
}
return true;
}

Recursively find a string within a string in C++

(C++)
Given myString, I want to check if myString contains substring. Here's what I have so far, but it only returns true if the string begins with the substring.
bool find(string myString, string substring)
{
if(mystring.length() < substring.length())
{
return false;
}
if(mystring == substring)
{
return true;
}
for(int i = 0; i < substring.length() - 1 ; ++i)
{
if(mystring.at(i) == substring.at(i))
{
continue;
}
else
{
string string2 = mystring.substr(1, mystring.length() - 1);
return find(string2, substring);
}
return true;
}
return false;
}
What is wrong with this function?

Check this function, it based on your code, with removal of extra code and fix of the errors.
I also changed the signature to get const reference to improve the efficiency.
bool find(const string& myString, const string& substring)
{
if(myString.length() < substring.length()){
return false;
}
else if(myString.substr(0,substring.size()) == substring){
return true;
}
else if (myString.length() > substring.length()){
return find(myString.substr(1), substring);
}
else{
return false;
}
}

First of all the function can be written simpler. For example
bool find( const std::string &myString, const std::string &subString )
{
return
( myString.substr( 0, subString.size() ) == subString ) ||
( subString.size() < myString.size() && find( myString.substr( 1 ), subString ) );
}
Here is a demonstrative program
#include <iostream>
#include <iomanip>
#include <string>
bool find( const std::string &myString, const std::string &subString )
{
return
( myString.substr( 0, subString.size() ) == subString ) ||
( subString.size() < myString.size() && find( myString.substr( 1 ), subString ) );
}
int main()
{
std::cout << std::boolalpha << find( "Hello World", "World" ) << std::endl;
std::cout << std::boolalpha << find( "Hello C", "C++" ) << std::endl;
}
The program output is
true
false
As for your function then it will return true only in the case when the both string have the same length and are equal each other
if(myString == substring){
return true;
}
And in case when myString.length() > substring.length() the function returns nothing
else if (myString.length() > substring.length()){
int start = 1;
int end = (int) myString.length() - 1;
string string2 = myString.substr(start, end);
find(string2, substring);
}
I think you mean
return find(string2, substring);
in this code snippet.
EDIT: I see that you changed the code of the function in your post. But in any case this code snippet
for(int i = 0; i < substring.length() - 1 ; ++i)
{
if(mystring.at(i) == substring.at(i))
{
continue;
}
else
{
string string2 = mystring.substr(1, mystring.length() - 1);
return find(string2, substring);
}
return true;
}
makes no sense.

You're missing a return before the recursive call to find. As it stands it falls through to the return false at the end.
Also, if (mystring == substring) should be checking if mystring starts with substring, not exact equality.

First, this is expensive because of the memory copies in substr.
Second, you havent checked for substring length > 0.
Third, the "else if" check for mystring.length > 0 is redundant if you have done the other checks (including substring length > 0).
Now to your core logic. In the recursion your start is never moving, so you are tied to the beginning. What you need to do is start with position 1, and increment start at every recursion, and also extract using substr the substring from "start" to "start + substring.length". That way you start from the beginning, keep moving forward, and check the correct length. You could also start from the end (as you have) and move back, what you would have to do there is: find sart position (end position minus length of substring), and check that the start position is not less than zero before calling the function recursively.

You're just removing the leftmost characters of myString and then comparing the rest to your substring. Obviously, this is not going to work in a general case, when your substring is somewhere in the middle of myString.
On each iteration try comparing not the whole myString, but rather the first substring.size() characters of it. This should fix your issue.

Here's what I have so far, but it only returns true if the string
begins with the substring.
It also fails for find("foo", "f").
To see why, add some test output to the function:
bool find(string myString, string substring)
{
std::cout << myString << ", " << substring << "\n";
// ...
}
It will print:
foo, f
oo, f
o, f
You see why this cannot work? You just keep removing the first character, until only the last character is compared with the substring to be found.
But it fails even for find("foo", "o"):
foo, o
oo, o
o, o
That's because of this line:
find(string2, substring);
You don't return the result of the recursive call.
All things considered, I think you just have the wrong algorithm here. It simply cannot work the way you have written the code.
A few other observations:
int start = 1;
int end = (int) myString.length() - 1;
That's not good style. For historical reasons, a std::string's size is unsigned, and you are using a C-style cast where static_cast should be preferred. You should just use std::string::size_type here, because it's just an internal piece of implementation code and you gain nothing from casting to int.
string string2 = myString.substr(start, end);
The second argument of substr defines the length of the substring, not the index of the last character. end sounds like you use the value as the index of the last character. Have a look at http://en.cppreference.com/w/cpp/string/basic_string/substr.

How can I validate if string is numeric and/or one character "." is allowed

I am trying to validate if string has numeric or not. I want to see if string has character or more that are not allowed such as not numeric and/or one character "."
my codes are
//this code is call function (is_number). sTempArray[3] is amount such as $00.00
if(!is_number(sTempArray[3]))
{
cout << "Your amount have letter(s) are not allowed!;
}
//the is_number is function and will run if anyone call this function.
bool MyThread::is_number(const string& data)
{
string::const_iterator it = data.begin();
while (it != data.end() && std::isdigit(*it))
{
++it;
}
return !data.empty() && it == data.end();
}
I want to validate the string is allowed. For example, string has a value, it is 500.00 and it will be allowed but it always be denied because period character is in the string. Another example, string has a value, it is 500.00a and it should be not allowed.

In the while loop of your is_number function you could add an if statement to check if the the current iteration is a digit or to check if it is a "." in it (and maybe add a boolean value to check wether there was only one "."?).
It would look something like this:
bool MyThread::is_number(const string& data)
{
string::const_iterator it = data.begin();
while (it != data.end())
{
if (std::isdigit(*it) || it == "."){
++it;
}
}
return !data.empty() && it == data.end();
}

You can add a boolean flags if dot and digit was already met and modify the loop:
bool MyThread::is_number(const string& data)
{
bool dot_met = false, digit_met = false;
for( string::const_iterator it = data.begin(); it != data.end(); ++it )
{
if( is_digit( *it ) ) {
digit_met = true;
continue;
}
if( *it == '.' ) {
if( !digit_met || dot_met ) return false;
dot_met = true;
continue;
}
return false;
}
return digit_met;
}
Function in this form will not accept number started with . (like .05 ) if you do want that, change is trivial. Alternatively you can use regual expressions library, with "\\d+\\.?\\d*" expression.

Idiomatic way to match a regular expression in C++

I want to compare two strings in C++:
There is a function getName() that returns a string.
Now I can write Out << getName(); this will print the string.
However I want to print the string only if its value is arpit or arpit*N* where N is an integer. I don't want to print it if its value is arpita, arpitx, where N is something other than an integer or an empty string.
I know this can be easily done, but I want to do this in a minimal number of lines.
What I have done so far is:
char name1[] = getName();
char name2[] = "arpit";
for (int x = 0; x <= 4; x++){
if (name1[x] == name2[x]) continue;
else return ( Out << "not equal") ;
}
while(name1[x] ! = "\0"){
if(isdigit(name1[x])
x++;
else return (Out << "not equal") ;
}
Out << getName();
UPDATE 1
getName() returns a string until it encounters white space, and it will not return any line or 2 or more words.

If you have C++11:
static std::regex const matcher( "arpit\\d*" );
if ( regex_match( name, matcher ) ) {
// matches...
}
If you don't have C++11, boost::regex is practically identical.
If you don't have C++11, and you can't use boost:
if ( name1.size() >= name2.size()
&& std::equal( name2.begin(), name2.end(), name1.begin() )
&& std::find_if( name1.begin() + name2.size(),
name1.end(),
[]( unsigned char ch ) { return !isdigit( ch ); }
) == name1.end() )
// matches...
}
For the rest, your code has quite a few errors, and shouldn't
compile. In particular, there is nothing which getName()
could return which can be used to initialize a char []; the
type of a string in C++ is std::string, and your variables
should be:
std::string name1( getName() );
std::string name2( "arpit" );
(except that you need better names. The second might be
something like reference or header, for example.)
And of course, it's undefined behavior to call isdigit with
a char; you have to convert to unsigned char first.

auto s = getName();
if(s.size() >= 5 && s.substr(0,5) == "arpit")
Out << s;

std::set<std::string> m_allowed_strings;
std::string validate_string(const string & s)
{
if(m_allowed_strings.find(s) != m_allowed_strings.end())
return s;
return "";
}
cout << validate_string(getName());

Efficient way to check if std::string has only spaces

I was just talking with a friend about what would be the most efficient way to check if a std::string has only spaces. He needs to do this on an embedded project he is working on and apparently this kind of optimization matters to him.
I've came up with the following code, it uses strtok().
bool has_only_spaces(std::string& str)
{
char* token = strtok(const_cast<char*>(str.c_str()), " ");
while (token != NULL)
{
if (*token != ' ')
{
return true;
}
}
return false;
}
I'm looking for feedback on this code and more efficient ways to perform this task are also welcome.

if(str.find_first_not_of(' ') != std::string::npos)
{
// There's a non-space.
}

In C++11, the all_of algorithm can be employed:
// Check if s consists only of whitespaces
bool whiteSpacesOnly = std::all_of(s.begin(),s.end(),isspace);

Why so much work, so much typing?
bool has_only_spaces(const std::string& str) {
return str.find_first_not_of (' ') == str.npos;
}

Wouldn't it be easier to do:
bool has_only_spaces(const std::string &str)
{
for (std::string::const_iterator it = str.begin(); it != str.end(); ++it)
{
if (*it != ' ') return false;
}
return true;
}
This has the advantage of returning early as soon as a non-space character is found, so it will be marginally more efficient than solutions that examine the whole string.

To check if string has only whitespace in c++11:
bool is_whitespace(const std::string& s) {
return std::all_of(s.begin(), s.end(), isspace);
}
in pre-c++11:
bool is_whitespace(const std::string& s) {
for (std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if (!isspace(*it)) {
return false;
}
}
return true;
}

Here's one that only uses STL (Requires C++11)
inline bool isBlank(const std::string& s)
{
return std::all_of(s.cbegin(),s.cend(),[](char c) { return std::isspace(c); });
}
It relies on fact that if string is empty (begin = end) std::all_of also returns true
Here is a small test program: http://cpp.sh/2tx6

Using strtok like that is bad style! strtok modifies the buffer it tokenizes (it replaces the delimiter chars with \0).
Here's a non modifying version.
const char* p = str.c_str();
while(*p == ' ') ++p;
return *p != 0;
It can be optimized even further, if you iterate through it in machine word chunks. To be portable, you would also have to take alignment into consideration.

I do not approve of you const_casting above and using strtok.
A std::string can contain embedded nulls but let's assume it will be all ASCII 32 characters before you hit the NULL terminator.
One way you can approach this is with a simple loop, and I will assume const char *.
bool all_spaces( const char * v )
{
for ( ; *v; ++v )
{
if( *v != ' ' )
return false;
}
return true;
}
For larger strings, you can check word-at-a-time until you reach the last word, and then assume the 32-bit word (say) will be 0x20202020 which may be faster.

Something like:
return std::find_if(
str.begin(), str.end(),
std::bind2nd( std::not_equal_to<char>(), ' ' ) )
== str.end();
If you're interested in white space, and not just the space character,
then the best thing to do is to define a predicate, and use it:
struct IsNotSpace
{
bool operator()( char ch ) const
{
return ! ::is_space( static_cast<unsigned char>( ch ) );
}
};
If you're doing any text processing at all, a collection of such simple
predicates will be invaluable (and they're easy to generate
automatically from the list of functions in <ctype.h>).

it's highly unlikely you'll beat a compiler optimized naive algorithm for this, e.g.
string::iterator it(str.begin()), end(str.end())
for(; it != end && *it == ' '; ++it);
return it == end;
EDIT: Actually - there is a quicker way (depending on size of string and memory available)..
std::string ns(str.size(), ' ');
return ns == str;
EDIT: actually above is not quick.. it's daft... stick with the naive implementation, the optimizer will be all over that...
EDIT AGAIN: dammit, I guess it's better to look at the functions in std::string
return str.find_first_not_of(' ') == string::npos;

I had a similar problem in a programming assignment, and here is one other solution I came up with after reviewing others. here I simply create a new sentence without the new spaces. If there are double spaces I simply overlook them.
string sentence;
string newsent; //reconstruct new sentence
string dbl = " ";
getline(cin, sentence);
int len = sentence.length();
for(int i = 0; i < len; i++){
//if there are multiple whitespaces, this loop will iterate until there are none, then go back one.
if (isspace(sentence[i]) && isspace(sentence[i+1])) {do{
i++;
}while (isspace(sentence[i])); i--;} //here, you have to dial back one to maintain at least one space.
newsent +=sentence[i];
}
cout << newsent << "\n";

Hm...I'd do this:
for (auto i = str.begin(); i != str.end() ++i)
if (!isspace(i))
return false;
Pseudo-code, isspace is located in cctype for C++.
Edit: Thanks to James for pointing out that isspace has undefined behavior on signed chars.

If you are using CString, you can do
CString myString = " "; // All whitespace
if(myString.Trim().IsEmpty())
{
// string is all whitespace
}
This has the benefit of trimming all newline, space and tab characters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String matching implementation - c++

Try pattLoc < (int)strlen(patt) in your while loop. Loop will stop when pattLoc == 2, so you avoid comparing the '\0' of "my" with the ' ' of "hello my name is pala", which set pattloc to 0 and return false. Or better, use string substr.

I have solved the problem: while(textLoc <= (int) strlen(text) && pattLoc <= (int)strlen(patt)) should be: while(textLoc < (int) strlen(text) && pattLoc < (int)strlen(patt)) and if(pattLoc > (int) strlen(patt)) to if(pattLoc >= (int) strlen(patt))

Related

Determine If String Has All Same Character

Recursively find a string within a string in C++

How can I validate if string is numeric and/or one character "." is allowed

Idiomatic way to match a regular expression in C++

Efficient way to check if std::string has only spaces

Categories

Resources