Matching a whole string with case insensitive - c++

I'm looking for a function if available to match a whole word for example:
std::string str1 = "I'm using firefox browser";
std::string str2 = "The quick brown fox.";
std::string str3 = "The quick brown fox jumps over the lazy dog.";
Only str2 and str3 should match for the word fox. So, it doesn't matter if there is a symbol like period (.) or a comma (,) before or after the word and it should match and it also has to be case-insensitive search at the same time.
I've found many ways to search a case insensitive string but I would like to know something for matching a whole word.

I would like to recommend std::regex of C++11. But, it is not working yet with g++4.8. So I recommend the replacement boost::regex.
#include<iostream>
#include<string>
#include<algorithm>
#include<boost/regex.hpp>
int main()
{
std::vector <std::string> strs = {"I'm using firefox browser",
"The quick brown fox.",
"The quick brown Fox jumps over the lazy dog."};
for( auto s : strs ) {
std::cout << "\n s: " << s << '\n';
if( boost::regex_search( s, boost::regex("\\<fox\\>", boost::regex::icase))) {
std::cout << "\n Match: " << s << '\n';
}
}
return 0;
}
/*
Local Variables:
compile-command: "g++ --std=c++11 test.cc -lboost_regex -o ./test.exe && ./test.exe"
End:
*/
The output is:
s: I'm using firefox browser
s: The quick brown fox.
Match: the quick brown fox.
s: The quick brown Fox jumps over the lazy dog.
Match: the quick brown fox jumps over the lazy dog.

Related

Split a line read from a file in C++

How do I access individual elements of a line read from a file?
I used the following to read a line from a file:
getline(infile, data) // where infile is an object of ifstream and data is the variable where the line will be stored
The following line is stored in data : "The quick brown fox jumped over the lazy dog"
How do I access particular elements of the line now? What if I want to play around with the second element ( quick ) of the line or get hold of a certain word in the line? How do I select it?
Any help will be appreciated
data = "The quick brown fox jumped over the lazy dog" and the data is string , your string delimeter is " ",you can use std::string::find() to find the position of the string delimeter and std::string::substr() to get a token:
std::string data = "The quick brown fox jumped over the lazy dog";
std::string delimiter = " ";
std::string token = data.substr(0, data.find(delimiter)); // token is "the"
Since your text is space separated, you can use std::istringstream to separate the words.
std::vector<std::string> words;
const std::string data = "The quick brown fox jumped over the lazy dog";
std::string w;
std::istringstream text_stream(data);
while (text_stream >> w)
{
words.push_back(w);
std::cout << w << "\n";
}
The operator>> will read characters into a string until a space is found.

regexp, how to match longer strings first?

There must be an option/flag for this I missed with matlab:
I want to use regular expressions to match to my given string, but multiple matches are possible. I want them sorted to first match the longer ones, before the shorter ones.
How can this be achieved?
regexpi('A quick brown fox jumps over the lazy dog.','quick|the|a','match','once')
%returns 'A', would like it to return 'quick'
Maybe you can try the following code
% match all possible key words, don't use argument 'once' in `regexpi()`
v = regexpi('A quick brown fox jumps over the lazy dog.','quick|the|a','match');
% calculate the lengths of matched words
lens = cellfun(#length,v);
% output the longest word
v{lens == max(lens)}
such that
ans = quick
You could do
>> regexpi('A quick brown fox jumps over the lazy dog.','.*?(quick)|.*?(the)|.*?(a)','tokens','once')
ans =
1×1 cell array
{'quick'}
but that's pretty ugly. Another solution, which is a smidge less ugly, is
>> str = "A quick brown fox jumps over the lazy dog.";
>> list = ["quick" "the" "a"];
>> list(find(arrayfun(#(x)contains(str,x), list), 1))
ans =
"quick"
I think I like Thomas' solution the most.

How does split and strip function work together in python?

headers = table.by_tag('th')
labels = [str(t.content).split('(')[0].strip() for t in headers[3:-1]]
I know what is meant by split() and strip(). But what does split('(')[0] means? headers is a content from a table.
For example. HTML was..
<table>
<tr><th>Jerry Brown (D)</th><th>Meg Whitman(D)</th></tr>
<tr><td>1</td><td>4</td></tr>
<tr><td>2</td><td>1</td></tr>
<tr><td>3</td><td>2</td></tr>
</table>
headers may be extracted by BeautifulSoup
and result is a list contains below
["<th>Jerry Brown (D)</th>", "<th>Meg Whitman(D)</th>"]
so t.content is Jerry Brown (D) and Meg Whitman(D)
"Jerry Brown (D)".split('(') = ["Jerry Brown ", "D)"]
"Meg Whitman(D)".split('(') = ["Meg Whitman", "D)"]
["Jerry Brown ", "D)"][0] = "Jerry Brown "
["Meg Whitman", "D)"][0] = "Meg Whitman"
and strip() may remove whitespace on both sides of string so...
labels means ["Jerry Brown","Meg Whitman"]

Compare part of the string with another string from STDLB

I'm wondering if there is a function like preg_match in PHP where I can find or match a string with another string.
//In Array `word` // in array `part`
"Backdoor", 0 "mark" 3 (matches "Market")
"DVD", 1 "of" 2 (matches "Get off")
"Get off", 2 "" -1 (no match)
"Market", 3 "VD" 1 (matches "DVD")
I'm thinking that if there is a function that can match just part of the string it would be great, but as far as I know there is only strcmp but that will only compare if is match or not for the whole string in which my case will always be false.
std::strstr(). It doesn't do regexes, but it does do simple string-in-string matching.
const char *foo = "Quick brown fox";
const char *bar = "brown";
printf("%d\n", strstr(foo, bar) - foo); // Displays "6"
And as you're in C++, there's also std::string::find():
std::string foo = "Quick brown fox";
std::string bar = "brown";
std::cout << foo.find(bar) << "\n"; // Displays "6"
you can use std::string::find()
also you can use std::strstr()
as another alternative you can implement this function using dynamic programming or backtrack method (Dynamic Programming has higher performance).
Naturally, i know this question is not an algorithmic problem, but i think this answer can be useful

why is my std::string being cut off?

I initialize a string as follows:
std::string myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
and the myString ends up being cut off like this:
'The quick brown fox jumps over the
lazy dog' is an English-language
pangram (a phrase that contains
Where can i set the size limit?
I tried the following without success:
std::string myString;
myString.resize(300);
myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
Many thanks!
Of course it was just the debugger cutting it off (xcode). I'm just getting started with xcode/c++, so thanks a lot for the quick replies.
Are you sure?
kkekan> ./a.out
'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)
There is no good reason why this should have happen!
Try the following (in debug mode):
assert(!"Congratulations, I am in debug mode! Let's do a test now...")
std::string myString = "'The quick brown fox jumps over the lazy dog' is an English-language pangram (a phrase that contains all of the letters of the alphabet)";
assert(myString.size() > 120);
Does the (second) assertion fail?
When printing, or displaying text, the output machinery buffers the output. You can tell it to flush the buffers (display all remaining text) by output a '\n' or using std::endl or executing the flush() method:
#include <iostream>
using std::cout;
using std::endl;
int main(void)
{
std::string myString =
"'The quick brown fox jumps over the lazy dog'" // Compiler concatenates
" is an English-language pangram (a phrase" // these contiguous text
" that contains all of the letters of the" // literals automatically.
" alphabet)";
// Method 1: use '\n'
// A newline forces the buffers to flush.
cout << myString << '\n';
// Method 2: use std::endl;
// The std::endl flushes the buffer then sends '\n' to the output.
cout << myString << endl;
// Method 3: use flush() method
cout << myString;
cout.flush();
return 0;
}
For more information about buffers, search Stack Overflow for "C++ output buffer".