How to find string in a string - c++

I somehow need to find the longest string in other string, so if string1 will be "Alibaba" and string2 will be "ba" , the longest string will be "baba". I have the lengths of strings, but what next ?
char* fun(char* a, char& b)
{
int length1=0;
int length2=0;
int longer;
int shorter;
char end='\0';
while(a[i] != tmp)
{
i++;
length1++;
}
int i=0;
while(b[i] != tmp)
{
i++;
length++;
}
if(dlug1 > dlug2){
longer = length1;
shorter = length2;
}
else{
longer = length2;
shorter = length1;
}
//logics here
}
int main()
{
char name1[] = "Alibaba";
char name2[] = "ba";
char &oname = *name2;
cout << fun(name1, oname) << endl;
system("PAUSE");
return 0;
}

Wow lots of bad answers to this question. Here's what your code should do:
Find the first instance of "ba" using the standard string searching functions.
In a loop look past this "ba" to see how many of the next N characters are also "ba".
If this sequence is longer than the previously recorded longest sequence, save its length and position.
Find the next instance of "ba" after the last one.
Here's the code (not tested):
string FindLongestRepeatedSubstring(string longString, string shortString)
{
// The number of repetitions in our longest string.
int maxRepetitions = 0;
int n = shortString.length(); // For brevity.
// Where we are currently looking.
int pos = 0;
while ((pos = longString.find(shortString, pos)) != string::npos)
{
// Ok we found the start of a repeated substring. See how many repetitions there are.
int repetitions = 1;
// This is a little bit complicated.
// First go past the "ba" we have already found (pos += n)
// Then see if there is still enough space in the string for there to be another "ba"
// Finally see if it *is* "ba"
for (pos += n; pos+n < longString.length() && longString.substr(pos, n) == shortString; pos += n)
++repetitions;
// See if this sequence is longer than our previous best.
if (repetitions > maxRepetitions)
maxRepetitions = repetitions;
}
// Construct the string to return. You really probably want to return its position, or maybe
// just maxRepetitions.
string ret;
while (maxRepetitions--)
ret += shortString;
return ret;
}

What you want should look like this pseudo-code:
i = j = count = max = 0
while (i < length1 && c = name1[i++]) do
if (j < length2 && name2[j] == c) then
j++
else
max = (count > max) ? count : max
count = 0
j = 0
end
if (j == length2) then
count++
j = 0
end
done
max = (count > max) ? count : max
for (i = 0 to max-1 do
print name2
done
The idea is here but I feel that there could be some cases in which this algorithm won't work (cases with complicated overlap that would require going back in name1). You may want to have a look at the Boyer-Moore algorithm and mix the two to have what you want.

The Algorithms Implementation Wikibook has an implementation of what you want in C++.

http://www.cplusplus.com/reference/string/string/find/
Maybe you made it on purpose, but you should use the std::string class and forget archaic things like char* string representation.
It will make you able to use lots of optimized methods, such as string research, etc.

why dont you use strstr function provided by C.
const char * strstr ( const char * str1, const char * str2 );
char * strstr ( char * str1, const char * str2 );
Locate substring
Returns a pointer to the first occurrence of str2 in str1,
or a null pointer if str2 is not part of str1.
The matching process does not include the terminating null-characters.
use the length's now and create a loop and play with the original string anf find the longest string inside.

Related

Search a string for all occurrences of a substring in C++

Write a function countMatches that searches the substring in the given string and returns how many times the substring appears in the string.
I've been stuck on this awhile now (6+ hours) and would really appreciate any help I can get. I would really like to understand this better.
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < small; i++) {
// Output substring stored in string
for (int j = 0; j < large; j++) {
if (comp.substr(i, small) == str.substr(j, large)) {
count++;
}
}
}
cout << count << endl;
return count;
}
When I call this function from main, with countMatches("Hello", "Hello"); I get the output of 5. Which is completely wrong as it should return 1. I just want to know what I'm doing wrong here so I don't repeat the mistake and actually understand what I am doing.
I figured it out. I did not need a nested for loop because I was only comparing the secondary string to that of the string. It also removed the need to take the substring of the first string. SOOO... For those interested, it should have looked like this:
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < large; i++) {
// Output substring stored in string
if (comp == str.substr(i, small)) {
count++;
}
}
cout << count << endl;
return count;
}
The usual approach is to search in place:
std::string::size_type pos = 0;
int count = 0;
for (;;) {
pos = large.find(small, pos);
if (pos == std::string::npos)
break;
++count;
++pos;
}
That can be tweaked if you're not concerned about overlapping matches (i.e., looking for all occurrences of "ll" in the string "llll", the answer could be 3, which the above algorithm will give, or it could be 2, if you don't allow the next match to overlap the first. To do that, just change ++pos to pos += small.size() to resume the search after the entire preceding match.
The problem with your function is that you are checking that:
Hello is substring of Hello
ello is substring of ello
llo is substring of llo
...
of course this matches 5 times in this case.
What you really need is:
For each position i of str
check if the substring of str starting at i and of length = comp.size() is exactly comp.
The following code should do exactly that:
size_t countMatches(const string& str, const string& comp)
{
size_t count = 0;
for (int j = 0; j < str.size()-comp.size()+1; j++)
if (comp == str.substr(j, comp.size()))
count++;
return count;
}

Finding substring inside string with any order of characters of substring in C/C++

Suppose I have a string "abcdpqrs",
now "dcb" can be counted as a substring of above string as the characters are together.
Also "pdq" is a part of above string. But "bcpq" is not. I hope you got what I want.
Is there any efficient way to do this.
All I can think is taking help of hash to do this. But it is taking long time even in O(n) program as backtracking is required in many cases. Any help will be appreciated.
Here is an O(n * alphabet size) solution:
Let's maintain an array count[a] = how many times the character a was in the current window [pos; pos + lenght of substring - 1]. It can be recomputed in O(1) time when the window is moved by 1 to the right(count[s[pos]]--, count[s[pos + substring lenght]]++, pos++). Now all we need is to check for each pos that count array is the same as count array for the substring(it can be computed only once).
It can actually be improved to O(n + alphabet size):
Instead of comparing count arrays in a naive way, we can maintain the number diff = number of characters that do not have the same count value as in a substring for the current window. The key observation is that diff changes in obvious way we apply count[c]-- or count[c]++ (it either gets incremented, decremented or stays the same depending on only count[c] value). Two count arrays are the same if and only if diff is zero for current pos.
Lets say you have the string "axcdlef" and wants to search "opde":
bool compare (string s1, string s2)
{
// sort both here
// return if they are equal when sorted;
}
you would need to call this function for this example with the following substrings of size 4(same as length as "opde"):
"axcd"
"xcdl"
"cdle"
"dlef"
bool exist = false;
for (/*every split that has the same size as the search */)
exist = exist || compare(currentsplit, search);
You can use a regex (i.e boost or Qt) for this. Alternately you an use this simple approach. You know the length k of the string s to be searched in string str. So take each k consecutive characters from str and check if any of these characters is present in s.
Starting point ( a naive implementation to make further optimizations):
#include <iostream>
/* pos position where to extract probable string from str
* s string set with possible repetitions being searched in str
* str original string
*/
bool find_in_string( int pos, std::string s, std::string str)
{
std::string str_s = str.substr( pos, s.length());
int s_pos = 0;
while( !s.empty())
{
std::size_t found = str_s.find( s[0]);
if ( found!=std::string::npos)
{
s.erase( 0, 1);
str_s.erase( found, 1);
} else return 0;
}
return 1;
}
bool find_in_string( std::string s, std::string str)
{
bool found = false;
int pos = 0;
while( !found && pos < str.length() - s.length() + 1)
{
found = find_in_string( pos++, s, str);
}
return found;
}
Usage:
int main() {
std::string s1 = "abcdpqrs";
std::string s2 = "adcbpqrs";
std::string searched = "dcb";
std::string searched2 = "pdq";
std::string searched3 = "bcpq";
std::cout << find_in_string( searched, s1);
std::cout << find_in_string( searched, s2);
std::cout << find_in_string( searched2, s1);
std::cout << find_in_string( searched3, s1);
return 0;
}
prints: 1110
http://ideone.com/WrSMeV
To use an array for this you are going to need some extra code to map where each character goes in there... Unless you know you are only using 'a' - 'z' or something similar that you can simply subtract from 'a' to get the position.
bool compare(string s1, string s2)
{
int v1[SIZE_OF_ALFABECT];
int v2[SIZE_OF_ALFABECT];
int count = 0;
map<char, int> mymap;
// here is just pseudocode
foreach letter in s1:
if map doesnt contain this letter already:
mymap[letter] = count++;
// repeat the same foreach in s2
/* You can break and return false here if you try to add new char into map,
that means that the second string has a different character already... */
// count will now have the number of distinct chars that you have in both strs
// you will need to check only 'count' positions in the vectors
for(int i = 0; i < count; i++)
v1[i] = v2[i] = 0;
//another pseudocode
foreach letter in s1:
v1[mymap[leter]]++;
foreach letter in s1:
v2[mymap[leter]]++;
for(int i = 0; i < count; i++)
if(v1[i] != v2[i])
return false;
return true;
}
Here is a O(m) best case, O(m!) worst case solution - m being the length of your search string:
Use a suffix-trie, e.g. a Ukkonnen Trie (there are some floating around, but I have no link at hand at the moment), and search for any permutation of the substring. Note that any lookup needs just O(1) for each chararacter of the string to search, regardless of the size of n.
However, while the size of n does not matter, this becomes inpractical for large m.
If however n is small enough anf one is willing to sacrifice lookup performance for index size, the suffix trie can store a string that contains all permutations of the original string.
Then the lookup will always be O(m).
I'd suggest to go with the accepted answer for the general case. However, here you have a suggestion that can perform (much) better for small substrings and large string.

Getting last N segments of URL in C++

I need to write a function to return the last N segments of a given URL, i.e. given /foo/bar/zoo and N=2, I expect to get back /bar/zoo. Boundary conditions should be handled appropriately. I have no problem doing it in C, but the best C++ version I could come up is this:
string getLastNSegments(const string& url, int N)
{
basic_string<char>::size_type found = 0, start = path.length()+1;
int segments = 2;
while (start && segments && (start = path.find_last_of('/', start-1)) != string::npos) {
found = start;
segments--;
}
return url.substr(found);
}
cout << "result: " << getLastNSegments("/foo/bar/zoo", 2) << endl;
Is there a more idiomatic (STL+algorithms) way of doing this?
Use std::string and rfind().
You call rfind successively N times feeding the last index as parameter. You now have the start index of the string you're looking for and use substr to extract the substring.
std::string x("http:/example.org/a/b/abc/bcd");
int N = 3;
int idx = x.length();
while ( idx >= 0 && --N > 0 )
{
idx = x.rfind('/',idx) - 1;
}
std::string final = x.substr(idx);
Nothing wrong with just using a loop.. Don't know of any STL string functions that will do what you want in a single call.
By the way, what happens when You ask for the last 3 segments of http://www.google.com/?
Call me old-school, but personally I would not use any STL searches here... What's the matter with this:
if( N <= 0 || url.length() == 0 ) return "";
const char *str = url.c_str();
const char *start = str + url.length();
int remain = N;
while( --start != str )
{
if( *start == '/' && --remain == 0 ) break;
}
return string(start);
Last but not least, a simple boost split solution
string getLastNSegments(const string& url, int n)
{
string selected;
vector<string> elements;
boost::algorithm::split(elements, url, boost::is_any_of("/"));
for (int i = 0; i < min(n, int(elements.size())); i++)
selected = "/" + elements.at(elements.size()-1-i) + selected;
return selected;
}

To remove garbage characters from a string using regex

I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.
public String stripGarbage(String s) {
String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
}
return result;
}
Can anyone tell me a better way to achieve the same. Probably regex may be better option.
Regards
Harry
Here you go:
result = result.replaceAll("[^a-zA-Z0-9]", "");
But if you understand your code and it's readable then maybe you have the best solution:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems.
The following should be faster than anything using regex, and your initial attempt.
public String stripGarbage(String s) {
StringBuilder sb = new StringBuilder(s.length());
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if ((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9')) {
sb.append(ch);
}
}
return sb.toString();
}
Key points:
It is significantly faster use a StringBuilder than string concatenation in a loop. (The latter generates N - 1 garbage strings and copies N * (N + 1) / 2 characters to build a String containing N characters.)
If you have a good estimate of the length of the result String, it is a good idea to preallocate the StringBuilder to hold that number of characters. (But if you don't have a good estimate, the cost of the internal reallocations etc amortizes to O(N) where N is the final string length ... so this is not normally a major concern.)
Searching testing a character against (up to) 3 character ranges will be significantly faster on average than searching for a character in a 62 character String.
A switch statement might be faster especially if there are more character ranges. However, in this case it will take many more lines of code to list the cases for all of the letters and digits.
If the non-garbage characters match existing predicates of the Character class (e.g. Character.isLetter(char) etc) you could use those. This would be a good option if you wanted to match any letter or digit ... rather than just ASCII letters and digits.
Other alternatives to consider are using a HashSet<Character> or a boolean[] indexed by character that were pre-populated with the non-garbage characters. These approaches work well if the set of non-garbage characters is not known at compile time.
This regex works:
result=s.replace(/[^A-Z0-9a-z]/ig,'');
s being the string passed to you function and result is the string with alphanumeric and numbers only.
I know this post is old, but you can shorten Stephen C's answer a little by using the System.Char structure.
public String RemoveNonAlphaNumeric(String value)
{
StringBuilder sb = new StringBuilder(value);
for (int i = 0; i < value.Length; i++)
{
char ch = value[i];
if (Char.IsLetterOrDigit(ch))
{
sb.Append(ch);
}
}
return sb.ToString();
}
Still accomplishes the same thing in a more compact fashion.
The Char has some really great functions for checking text. Here are some for your future reference.
Char.GetNumericValue()
Char.IsControl()
Char.IsDigit()
Char.IsLetter()
Char.IsLower()
Char.IsNumber()
Char.IsPunctuation()
Char.IsSeparator()
Char.IsSymbol()
Char.IsWhiteSpace()
this works:
public static String removeGarbage(String s) {
String r = "";
for ( int i = 0; i < s.length(); i++ )
if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers
r = r.concat(s.substring(i, i+1));
return r;
}
(edit: although it's not so efficient)
/**
* Remove characters from a string other than ASCII
*
* */
private static StringBuffer goodBuffer = new StringBuffer();
// Static initializer for ACSII
static {
for (int c=1; c<128; c++) {
goodBuffer.append((char)c);
}
}
public String stripGarbage(String s) {
//String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String good = goodBuffer.toString();
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
else
result += " ";
}
return result;
}

How many palindromes can be formed by selections of characters from a string?

I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}
First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.
I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.
Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.
My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.
Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.
Here is a program for finding all the possible palindromes in a string written in both Java and C++.
int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}
public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}
I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents