to lexicographically compare kth word from two strings - c++

I m trying to write a c++ function to lexicographically compare kth word from two strings. here is my function :
bool kth_lexo ()
{
int k = 2 ;
str1 = "123 300 60009" ;
str2 = "1500 10002" ;
// to store the kth word of fist string in ptr1
char *ptr1 = strtok( (char*)str1.c_str() ," ");
for(int i = 1; i<k; i++)
{
ptr1 = strtok(NULL," ");
}
// to store the kth word of second string in ptr2
char *ptr2 = strtok( (char*)str2.c_str() ," ");
for(int i = 1; i<k; i++)
{
ptr2 = strtok(NULL," ");
}
string st1 = ptr1 ;
string st2 = ptr2 ;
return st1 > st2 ;
}
In this function my lexicographical comparison works fine, as this func returns 1 because 300 (2nd word of str1) is lexicographically bigger than 10002 (2nd word of str2)
My Problem :
If i slightly modify my function by replacing last line of previous function by this return ptr1>ptr2 ;
now my new function lokks something like this :
bool kth_lexo ()
{
int k = 2 ;
str1 = "123 300 60009" ;
str2 = "1500 10002" ;
// to store the kth word of fist string in ptr1
char *ptr1 = strtok( (char*)str1.c_str() ," ");
for(int i = 1; i<k; i++)
{
ptr1 = strtok(NULL," ");
}
// to store the kth word of second string in ptr2
char *ptr2 = strtok( (char*)str2.c_str() ," ");
for(int i = 1; i<k; i++)
{
ptr2 = strtok(NULL," ");
}
// modified line compared to previous function
return ptr1 > ptr2 ;
}
for this modified function each time my output consistently comes out to be 0, no matter whether kth word of str1 stored in ptr1 is lexicographically greater or smaller than kth word of str2 stored in ptr2.
also even after modifying the return statement by this line doesn't bring much help :
return (*ptr1)>(*ptr2) ;
So what's the problem with either of these two return statement lines in my modified function for comparing the kth word of both the strings:
return ptr1 > ptr2 ;
OR
return (*ptr1) > (*ptr2) ;

You are using a very C-like program. Using modern C++ makes this much simpler and easier to read, since we can use very expressive syntax:
#include <string_view>
#include <iostream>
#include <cassert>
auto find_kth_char(std::string_view to_search, char c, std::size_t k, std::size_t pos = 0) {
for (; pos < std::string_view::npos && k > 0; --k) {
pos = to_search.find(c, pos + 1);
}
return pos;
}
auto get_kth_word(std::string_view to_search, std::size_t k) {
// We count starting on 1
assert(k > 0);
auto start = find_kth_char(to_search, ' ', k - 1);
if (start == std::string_view::npos) {
return std::string_view{};
}
auto end = find_kth_char(to_search, ' ', 1, start);
return to_search.substr(start, end - start);
}
auto compare_kth(std::string_view lhs, std::string_view rhs, std::size_t k) {
auto l_word = get_kth_word(lhs, k);
auto r_word = get_kth_word(rhs, k);
// returnvalue <=> 0 == lhs <=> rhs
return l_word.compare(r_word);
}
int main() {
auto str1 = "123 300 60009";
auto str2 = "1500 10002";
for (std::size_t k = 1; k < 4; ++k) {
std::cout << k << ":\t" << compare_kth(str1, str2, k) << '\n';
}
}
I am using C++17's string_view since we do not change anything in the strings and taking substrings etc. is very cheap with them. We use the find and compare member functions for doing the real work.
The return value from our function is an int that tells us whether the left hand side is smaller (negative result), equal (0) or greater (positve result) than the right hand side.

If you would stop using C and consequently use C++, then this problem would not occur.
You are here mixing up C++ std::string and char* or const char*. Basically, for strings, std::string is that superior to the old style C-char-arrays or char* that you from now on and in the future should never use something else than std::string
A char pointer is an adress into some area in the memory, where your char data is stored. Dereferencing the pointer with *, will give you the element stored at this address. So only one element. Not a string or whatever. Only exactly one character.
comparing ptr1 > ptr2 , will not compare strings. It will compare some values, where the strings are stored in memory. "ptr1" could be 0x578962574 and "ptr2" could be 0x95324782, or whatever. We do not know the address. This will be defined by the linker.
And if you compare (*ptr1)>(*ptr2), then you compare only 2 singgle characters, and that may give you also the wrong result.
On the other hand, Comparing 2 std::strings, will always work as expected.
So, simple answer: Use std::string for all strings.

Related

Character array adds random characters at outputting

cin.get(a, 256);
for(int i = 0; i < strlen(a); i++){
if(strchr("aeiou", a[i])){
s = 0;
strcpy(substr, empty);
isubstr = 0;
}
else{
s++;
substr[isubstr++] = a[i];
if(s > maax || s == maax){
maax = s;
memset(show, 0, 256);
strcpy(show, substr);
}
}
}
cout << show;
This is the code. It intends to find the longest substring with only consonants and if there are 2+ with the same lenght it outputs the farthest one (closer to the right)
Consider the following sequence:
jfoapwjfppawefjdsjkflwea
Splitted by vowels it wold look something like this:
jf |oa| pwjfpp |a| w |e| fjdsjkflw |ea|
Notice how "fjdsjkflw" is the largest substring without a vowel. This code outputs just that including some random numbers at the end:
fjdsjkflwê²a
Why does this happen? Why does it put NULL 3 characters beyond of what it's intended to?
For starters you should write a function that finds such a longest sequence of consonants.
You provided an incomplete code so it is difficult to analyze it. For example it is not seen where and how variables substr and empty used in this call
strcpy(substr, empty);
are defined and what are their meanings.
Also there are statements like this
memset(show, 0, 256);
that do not make a sense because for example after this statement there is the statement
strcpy(show, substr);
So the previous statement is just redundant.
Or for example it seems that one of these variables s and isubstr is also redundant.
I can suggest the following solution implemented as a function.
#include <iostream>
#include <utility>
#include <cstring>
std::pair<const char *, size_t> max_consonant_seq( const char *s )
{
const char *vowels = "aeiouAEIOU";
std::pair<const char *, size_t> p( nullptr, 0 );
do
{
size_t n = std::strcspn( s, vowels );
if ( n != 0 && !( n < p.second ) )
{
p.first = s;
p.second = n;
}
s += n;
s += std::strspn( s, vowels );
} while ( *s );
return p;
}
int main()
{
const char *s = "jfoapwjfppawefjdsjkflwea";
auto p = max_consonant_seq( s );
if ( p.second ) std::cout.write( p.first, p.second ) << '\n';
return 0;
}
The program output is
fjdsjkflw
The function returns a pair of objects. The first one specifies the starting pointer of the maximum sequence of consonants in the passed string and the second object specifies the length of the sequence.
All what you need to understand how the function works is to read the description of the two C string functions strspn and strcspn.

Function to shift array elements not working

I have a null terminated array of chars. Also known as a c-string. I have written a function that will shift the elements at each index left, <---- by a given number of indexes. For example, when the char array of "hello world" is passed to the function, with a shiftBy value of 3, it should transform the char array to be: "lo worldhel".
Currently, this function works for all strings that <= 11 elelements. Anything over that and the last three spots in the array don't get shifted. Keep in mind, the very last index is holding the null terminator!
This is a tricky one and I have been stuck for hours. I also can't use any standard functions or vectors, I am stuck with these deprecated arrays and simple loops. So please don't troll with "Why don;t you use blank function"....because trust me, if I could I wouldn't be here.
Here is the code, have at at:
void shiftLeft (char szString[], int size, int shiftBy)
{
if(shiftBy > size){
shiftBy = shiftBy - size;
}
if(size == 1){
//do nothing, do nothing, exit function with no change made to myarray
}
else{
char temp;
//for loop to print the array with indexes moved up (to the left) <-- by 2
for (int i = 0; i <= size-shiftBy; i++)//size = 11
{//EXAMPLE shift by 3 for a c-string of `hello world`
if(i < size-shiftBy){
temp = szString[shiftBy + i];//temp = h
szString[shiftBy + i] = szString[i];//d becomes l
szString[i] = temp;//h becomes l
}
else{//it will run once while i=8
temp = szString[i];//temp = l
szString[i] = szString[i+1];//8th element becomes h
szString[i+1] = szString[size-1];//9th element becomes e
szString[size-1] = temp;//last element becomes l
}
}
}
}
If the only purpose you're trying to accomplish is shifting chars in a terminated string left with rotation (and judging by your sample of "helloworld" resulting in "loworldhel" after a 3-shift, that seems to be the case), you're making this much harder than it needs to be.
The traditional algorithm to do this in O(N) time with no temporary space requirements is to reverse the left-side of the shift, then the entire sequence, then the right side of the shift, all based from the beginning of the sequence. For example, suppose we want to left-shift the following string 3 slots:
1234567890
First, reverse the first shiftBy slots
1234567890
^-^
3214567890
Second, reverse the entire sequence
3214567890
^--------^
0987654123
Finally, reverse the (length-shiftBy) slots:
0987654123
^-----^
4567890123
Using the standard library would make this trivial, but apparently you're prof considers that... cheating. Without using any library apis the above algorithm isn't very hard regardless:
#include <iostream>
void shiftLeft(char sz[], size_t shiftBy)
{
const char *p = sz;
while (*p) ++p;
std::size_t len = p - sz;
if (len > 1 && (shiftBy %= len))
{
char *ends[] = { sz+shiftBy, sz+len, sz+(len - shiftBy) };
for (std::size_t i=0; i<3; ++i)
{
char *start = sz, *end = ends[i];
while (start < --end)
{
char ch = *start;
*start++ = *end;
*end = ch;
}
}
}
}
int main()
{
char sz[] = "1234567890";
std::cout << sz << '\n';
shiftLeft(sz, 11);
std::cout << sz << '\n';
shiftLeft(sz, 4);
std::cout << sz << '\n';
shiftLeft(sz, 1);
std::cout << sz << '\n';
shiftLeft(sz, 20);
std::cout << sz << '\n';
}
Output
1234567890
2345678901
6789012345
7890123456
7890123456
If you're really set on doing this in temp space, so be it, but I cannot possibly fathom why you would do so.
Best of luck.
From azillionmonkeys.com/qed/case8.html
void shiftLeft(char szString[], int size, int shiftBy) {
int c, tmp, v;
if (size <= 0) return;
if (shiftBy < 0 || shiftBy >= size) {
shiftBy %= size;
if (shiftBy < 0) shiftBy += size;
}
if (shiftBy == 0) return;
c = 0;
for (v = 0; c < size; v++) {
int t = v, tp = v + shiftBy;
char tmp = szString[v];
c++;
while (tp != v) {
szString[t] = szString[tp];
t = tp;
tp += shiftBy;
if (tp >= size) tp -= size;
c++;
}
szString[t] = tmp;
}
}

Finding common characters in two strings

I am coding for the problem in which we got to count the number of common characters in two strings. Main part of the count goes like this
for(i=0; i < strlen(s1); i++) {
for(j = 0; j < strlen(s2); j++) {
if(s1[i] == s2[j]) {
count++;
s2[j] = '*';
break;
}
}
}
This goes with an O(n^2) logic. However I could not think of a better solution than this. Can anyone help me in coding with an O(n) logic.
This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.
It can be done in O(n) time with constant space.
The pseudo code goes like this :
int map1[26], map2[26];
int common_chars = 0;
for c1 in string1:
map1[c1]++;
for c2 in string2:
map2[c2]++;
for i in 1 to 26:
common_chars += min(map1[i], map2[i]);
Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).
This code counts letters in common (each letter being counted at most once) in O(n).
int common(const char *a, const char *b) {
int table[256] = {0};
int result = 0;
for (; *a; a++)table[*a]++;
for (; *b; b++)result += (table[*b]-- > 0);
return result;
}
Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).
int main(int argc, char *argv[]) {
struct { const char *a, *b; int want; } cases[] = {
{"a", "a", 1},
{"a", "b", 0},
{"a", "aa", 1},
{"aa", "a", 1},
{"ccc", "cccc", 3},
{"aaa", "aaa", 3},
{"abc", "cba", 3},
{"aasa", "asad", 3},
};
int fail = 0;
for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
int got = common(cases[i].a, cases[i].b);
if (got != cases[i].want) {
fail = 1;
printf("common(%s, %s) = %d, want %d\n",
cases[i].a, cases[i].b, got, cases[i].want);
}
}
return fail;
}
You can do it with 2n:
int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;
for( i=0; i<len1; i++ )
{
allChars[ (unsigned char) s1[i] ] = 1;
}
for( i=0; i<len2; i++ )
{
if( allChars[ (unsigned char) s1[i] ] == 1 )
{
allChars[ (unsigned char) s2[i] ] = 2;
}
}
for( i=0; i<256; i++ )
{
if( allChars[i] == 2 )
{
cout << allChars[i] << endl;
count++;
}
}
Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.
#include<stdio.h>
int main() {
char a[] = "Hello world";
char b[] = "woowrd";
int x[26] = {0};
int i;
int index;
for (i = 0; a[i] != '\0'; i++) {
index = a[i] - 'a';
if (index > 26) {
//capital char
index = a[i] - 'A';
}
x[index]++;
}
for (i = 0; b[i] != '\0'; i++) {
index = b[i] - 'a';
if (index > 26) {
//capital char
index = b[i] - 'A';
}
if (x[index] > 0)
x[index] = -1;
}
printf("Common characters in '%s' and '%s' are ", a, b);
for (i = 0; i < 26; i++) {
if (x[i] < 0)
printf("%c", 'a'+i);
}
printf("\n");
}
int count(string a, string b)
{
int i,c[26]={0},c1[26]={};
for(i=0;i<a.length();i++)
{
if(97<=a[i]&&a[i]<=123)
c[a[i]-97]++;
}
for(i=0;i<b.length();i++)
{
if(97<=b[i]&&b[i]<=123)
c1[b[i]-97]++;
}
int s=0;
for(i=0;i<26;i++)
{
s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));
}
return (s);
}
This is much easier and better solution
for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
if (std::find(s2.begin(), s2.end(), *i) != s2.end())
{
dest.push_back(*i);
}
}
taken from here
C implementation to run in O(n) time and constant space.
#define ALPHABETS_COUNT 26
int commonChars(char *s1, char *s2)
{
int c_count = 0, i;
int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};
/* Compute the number of occurances of each character */
while (*s1) arr1[*s1++-'a'] += 1;
while (*s2) arr2[*s2++-'a'] += 1;
/* Increment count based on match found */
for(i=0; i<ALPHABETS_COUNT; i++) {
if(arr1[i] == arr2[i]) c_count += arr1[i];
else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
}
return c_count;
}
First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.
You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.
An example in C++, assuming:
ASCII characters
All characters included (letters, numbers, special, spaces, etc...)
Case sensitive
std::vector<char> strIntersect(std::string const&s1, std::string const&s2){
std::vector<bool> presents(256, false); //Assuming ASCII
std::vector<char> intersection;
for (auto c : s1) {
presents[c] = true;
}
for (auto c : s2) {
if (presents[c]){
intersection.push_back(c);
presents[c] = false;
}
}
return intersection;
}
int main() {
std::vector<char> result;
std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";
//Expected: "S a i n t R o q u e s d g h l , b c m r z p"
result = strIntersect(s1, s2);
for (auto c : result) {
std::cout << c << " ";
}
std::cout << std::endl;
return 0;
}
Their is a more better version in c++ :
C++ bitset and its application
A bitset is an array of bool but each Boolean value is not stored separately instead bitset optimizes the space such that each bool takes 1 bit space only, so space taken by bitset bs is less than that of bool bs[N] and vector bs(N). However, a limitation of bitset is, N must be known at compile time, i.e., a constant (this limitation is not there with vector and dynamic array)
As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector. We can access each bit of bitset individually with help of array indexing operator [] that is bs[3] shows bit at index 3 of bitset bs just like a simple array. Remember bitset starts its indexing backward that is for 10110, 0 are at 0th and 3rd indices whereas 1 are at 1st 2nd and 4th indices.
We can construct a bitset using integer number as well as binary string via constructors which is shown in below code. The size of bitset is fixed at compile time that is, it can’t be changed at runtime.
For more information about bitset visit the site : https://www.geeksforgeeks.org/c-bitset-and-its-application
The code is as follows :
// considering the strings to be of lower case.
int main()
{
string s1,s2;
cin>>s1>>s2;
//Declaration for bitset type variables
bitset<26> b_s1,b_s2;
// setting the bits in b_s1 for the encountered characters of string s1
for(auto& i : s1)
{
if(!b_s1[i-'a'])
b_s1[i-'a'] = 1;
}
// setting the bits in b_s2 for the encountered characters of string s2
for(auto& i : s2)
{
if(!b_s2[i-'a'])
b_s2[i-'a'] = 1;
}
// counting the number of set bits by the "Logical AND" operation
// between b_s1 and b_s2
cout<<(b_s1&b_s2).count();
}
No need to initialize and keep an array of 26 elements (numbers for each letter in alphabet). Just fo the following:
Using HashMap store letter as a key and integer got the count as a value.
Create a Set of characters.
Iterate through each string characters, add to the Set from step 2. If add() method returned false, (means that same character already exists in the Set), then add the character to the map and increment the value.
These steps are written considering Java programming language.
Python Code:
>>>s1='abbc'
>>>s2='abde'
>>>p=list(set(s1).intersection(set(s2)))
>>print(p)
['a','b']
Hope this helps you, Happy Coding!
can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

Longest common substring from more than two strings - C++

I need to compute the longest common substrings from a set of filenames in C++.
Precisely, I have an std::list of std::strings (or the QT equivalent, also fine)
char const *x[] = {"FirstFileWord.xls", "SecondFileBlue.xls", "ThirdFileWhite.xls", "ForthFileGreen.xls"};
std::list<std::string> files(x, x + sizeof(x) / sizeof(*x));
I need to compute the n distinct longest common substrings of all strings, in this case e.g. for n=2
"File" and ".xls"
If I could compute the longest common subsequence, I could cut it out it and run the algorithm again to get the second longest, so essentially this boils down to:
Is there a (reference?) implementation for computing the LCS of a std::list of std::strings?
This is not a good answer but a dirty solution that I have - brute force on a QList of QUrls from which only the part after the last "/" is taken. I'd love to replace this with "proper" code.
(I have discovered http://www.icir.org/christian/libstree/ - which would help greatly, but I can't get it to compile on my machine. Someone used this maybe?)
QString SubstringMatching::getMatchPattern(QList<QUrl> urls)
{
QString a;
int foundPosition = -1;
int foundLength = -1;
for (int i=urls.first().toString().lastIndexOf("/")+1; i<urls.first().toString().length(); i++)
{
bool hit=true;
int xj;
for (int j=0; j<urls.first().toString().length()-i+1; j++ ) // try to match from position i up to the end of the string :: test character at pos. (i+j)
{
if (!hit) break;
QString firstString = urls.first().toString().right( urls.first().toString().length()-i ).left( j ); // this needs to match all k strings
//qDebug() << "SEARCH " << firstString;
for (int k=1; k<urls.length(); k++) // test all other strings, k = test string number
{
if (!hit) break;
//qDebug() << " IN " << urls.at(k).toString().right(urls.at(k).toString().length() - urls.at(k).toString().lastIndexOf("/")+1);
//qDebug() << " RES " << urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1);
if (urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1)<0) {
xj = j;
//qDebug() << "HIT LENGTH " << xj-1 << " : " << firstString;
hit = false;
}
}
}
if (hit) xj = urls.first().toString().length()-i+1; // hit up to the end of the string
if ((xj-2)>foundLength) // have longer match than existing, j=1 is match length
{
foundPosition = i; // at the current position
foundLength = xj-1;
//qDebug() << "Found at " << i << " length " << foundLength;
}
}
a = urls.first().toString().right( urls.first().toString().length()-foundPosition ).left( foundLength );
//qDebug() << a;
return a;
}
If as you say suffix trees are too heavyweight or otherwise impractical, the following
fairly simple brute-force approach may be adequate for your application.
I assume distinct substrings shall be non-overlapping and are picked from
left to right.
Even with these assumptions, there need not be a unique set that comprises
"the N distinct longest common substrings" of a set of strings. Whatever N is,
there might be more than N distinct common substrings all of the same maximal
length and any choice of N from among them would be arbitrary. Accordingly
the solution finds the at-most N *sets* of the longest distinct common
substrings in which all those of the same length are one set.
The algorithm is as follows:
Q is the target quota of lengths.
Strings is the problem set of strings.
Results is an initially empty multimap that maps a length to a set of strings,
Results[l] being the set with length l
N, initially 0, is the number of distinct lengths represented in Results
If Q is 0 or Strings is empty return Results
Find any shortest member of Strings; keep a copy of it S and remove it
from Strings. We proceed by comparing the substrings of S with those
of Strings because all the common substrings of {Strings, S} must be
substrings of S.
Iteratively generate all the substrings of S, longest first, using the
obvious nested loop controlled by offset and length. For each substring ss of
S:
If ss is not a common substring of Strings, next.
Iterate over Results[l] for l >= the length of ss until end of
Results or until ss is found to be a substring of the examined
result. In the latter case, ss is not distinct from a result already
in hand, so next.
ss is common substring distinct from any already in hand. Iterate over
Results[l] for l < the length of ss, deleting each result that is a
substring of ss, because all those are shorter than ss and not distinct
from it. ss is now a common substring distinct from any already in hand and
all others that remain in hand are distinct from ss.
For l = the length of ss, check whether Results[l] exists, i.e. if
there are any results in hand the same length as ss. If not, call that
a NewLength condition.
Check also if N == Q, i.e. we have already reached the target quota of distinct
lengths. If NewLength obtains and also N == Q, call that a StickOrRaise condition.
If StickOrRaise obtains then compare the length of ss with l = the
length of the shortest results in hand. If ss is shorter than l
then it is too short for our quota, so next. If ss is longer than l
then all the shortest results in hand are to be ousted in favour of ss, so delete
Results[l] and decrement N.
Insert ss into Results keyed by its length.
If NewLength obtains, increment N.
Abandon the inner iteration over substrings of S that have the
same offset of ss but are shorter, because none of them are distinct
from ss.
Advance the offset in S for the outer iteration by the length of ss,
to the start of the next non-overlapping substring.
Return Results.
Here is a program that implements the solution and demonstrates it with
a list of strings:
#include <list>
#include <map>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
// Get a non-const iterator to the shortest string in a list
list<string>::iterator shortest_of(list<string> & strings)
{
auto where = strings.end();
size_t min_len = size_t(-1);
for (auto i = strings.begin(); i != strings.end(); ++i) {
if (i->size() < min_len) {
where = i;
min_len = i->size();
}
}
return where;
}
// Say whether a string is a common substring of a list of strings
bool
is_common_substring_of(
string const & candidate, list<string> const & strings)
{
for (string const & s : strings) {
if (s.find(candidate) == string::npos) {
return false;
}
}
return true;
}
/* Get a multimap whose keys are the at-most `quota` greatest
lengths of common substrings of the list of strings `strings`, each key
multi-mapped to the set of common substrings of that length.
*/
multimap<size_t,string>
n_longest_common_substring_sets(list<string> & strings, unsigned quota)
{
size_t nlengths = 0;
multimap<size_t,string> results;
if (quota == 0) {
return results;
}
auto shortest_i = shortest_of(strings);
if (shortest_i == strings.end()) {
return results;
}
string shortest = *shortest_i;
strings.erase(shortest_i);
for ( size_t start = 0; start < shortest.size();) {
size_t skip = 1;
for (size_t len = shortest.size(); len > 0; --len) {
string subs = shortest.substr(start,len);
if (!is_common_substring_of(subs,strings)) {
continue;
}
auto i = results.lower_bound(subs.size());
for ( ;i != results.end() &&
i->second.find(subs) == string::npos; ++i) {}
if (i != results.end()) {
continue;
}
for (i = results.begin();
i != results.end() && i->first < subs.size(); ) {
if (subs.find(i->second) != string::npos) {
i = results.erase(i);
} else {
++i;
}
}
auto hint = results.lower_bound(subs.size());
bool new_len = hint == results.end() || hint->first != subs.size();
if (new_len && nlengths == quota) {
size_t min_len = results.begin()->first;
if (min_len > subs.size()) {
continue;
}
results.erase(min_len);
--nlengths;
}
nlengths += new_len;
results.emplace_hint(hint,subs.size(),subs);
len = 1;
skip = subs.size();
}
start += skip;
}
return results;
}
// Testing ...
int main()
{
list<string> strings{
"OfBitWordFirstFileWordZ.xls",
"SecondZWordBitWordOfFileBlue.xls",
"ThirdFileZBitWordWhiteOfWord.xls",
"WordFourthWordFileBitGreenZOf.xls"};
auto results = n_longest_common_substring_sets(strings,4);
for (auto const & val : results) {
cout << "length: " << val.first
<< ", substring: " << val.second << endl;
}
return 0;
}
Output:
length: 1, substring: Z
length: 2, substring: Of
length: 3, substring: Bit
length: 4, substring: .xls
length: 4, substring: File
length: 4, substring: Word
(Built with gcc 4.8.1)

How to find string in a string

I somehow need to find the longest string in other string, so if string1 will be "Alibaba" and string2 will be "ba" , the longest string will be "baba". I have the lengths of strings, but what next ?
char* fun(char* a, char& b)
{
int length1=0;
int length2=0;
int longer;
int shorter;
char end='\0';
while(a[i] != tmp)
{
i++;
length1++;
}
int i=0;
while(b[i] != tmp)
{
i++;
length++;
}
if(dlug1 > dlug2){
longer = length1;
shorter = length2;
}
else{
longer = length2;
shorter = length1;
}
//logics here
}
int main()
{
char name1[] = "Alibaba";
char name2[] = "ba";
char &oname = *name2;
cout << fun(name1, oname) << endl;
system("PAUSE");
return 0;
}
Wow lots of bad answers to this question. Here's what your code should do:
Find the first instance of "ba" using the standard string searching functions.
In a loop look past this "ba" to see how many of the next N characters are also "ba".
If this sequence is longer than the previously recorded longest sequence, save its length and position.
Find the next instance of "ba" after the last one.
Here's the code (not tested):
string FindLongestRepeatedSubstring(string longString, string shortString)
{
// The number of repetitions in our longest string.
int maxRepetitions = 0;
int n = shortString.length(); // For brevity.
// Where we are currently looking.
int pos = 0;
while ((pos = longString.find(shortString, pos)) != string::npos)
{
// Ok we found the start of a repeated substring. See how many repetitions there are.
int repetitions = 1;
// This is a little bit complicated.
// First go past the "ba" we have already found (pos += n)
// Then see if there is still enough space in the string for there to be another "ba"
// Finally see if it *is* "ba"
for (pos += n; pos+n < longString.length() && longString.substr(pos, n) == shortString; pos += n)
++repetitions;
// See if this sequence is longer than our previous best.
if (repetitions > maxRepetitions)
maxRepetitions = repetitions;
}
// Construct the string to return. You really probably want to return its position, or maybe
// just maxRepetitions.
string ret;
while (maxRepetitions--)
ret += shortString;
return ret;
}
What you want should look like this pseudo-code:
i = j = count = max = 0
while (i < length1 && c = name1[i++]) do
if (j < length2 && name2[j] == c) then
j++
else
max = (count > max) ? count : max
count = 0
j = 0
end
if (j == length2) then
count++
j = 0
end
done
max = (count > max) ? count : max
for (i = 0 to max-1 do
print name2
done
The idea is here but I feel that there could be some cases in which this algorithm won't work (cases with complicated overlap that would require going back in name1). You may want to have a look at the Boyer-Moore algorithm and mix the two to have what you want.
The Algorithms Implementation Wikibook has an implementation of what you want in C++.
http://www.cplusplus.com/reference/string/string/find/
Maybe you made it on purpose, but you should use the std::string class and forget archaic things like char* string representation.
It will make you able to use lots of optimized methods, such as string research, etc.
why dont you use strstr function provided by C.
const char * strstr ( const char * str1, const char * str2 );
char * strstr ( char * str1, const char * str2 );
Locate substring
Returns a pointer to the first occurrence of str2 in str1,
or a null pointer if str2 is not part of str1.
The matching process does not include the terminating null-characters.
use the length's now and create a loop and play with the original string anf find the longest string inside.