comparing char one by one like function strcmp

comparing char one by one like function strcmp - c++

I am trying to compare char one by one. I am emulating the strcmp function from an assignment from class. Here is what I cam up with. Unfortunately I get 0 all the time because all the chars match until it gets to the last one. I assume its only checking the first char and stops. I added i++ to more to the next char but i don't think its working.
strComp("abc", "abcd");
int strComp(char a[], char b[]) {
int i = 0;
if (strLen(a) == strLen(b)) {
while (a[i] != NULL && b[i] != NULL) {
if (a[i] == b[i]) {
return 0;
} else if(a[i] > b[i]) {
return 1;
} else {
return -1;
}
}
i++;
} else if (strLen(a) > strLen(b)) {
return 1;
} else {
return -1;
}
}

Note that
NULL is different from '\0'
char[] really decays into a char*
in C/C++ anything that can be const shall be declared const
the use of strlen in a basic function like this is inefficient
Here is a very fast solution:
inline int compare(char const* const a, char const* const b)
{
/* Return -1 less than, 0 equal, 1 greater than */
if (!a[0] && b[0])
return -1;
else if (a[0] && !b[0])
return 1;
register int i = 0;
for (; a[i] && b[i]; i++) {
if (a[i] < b[i])
return -1;
if (a[i] > b[i])
return 1;
}
#if 1 /* this addition makes this code work like std::strcmp */
if (!a[i] && b[i])
return -1;
else if (a[i] && !b[i])
return 1;
#endif
return 0;
}
This one I coded more that 20 years ago as a prototype for a 386 assembler routine. For a case-insensitive string-compare #include <locale> and modify the for-loop:
.
.
for (; a[i] && b[i]; i++) {
if (std::toupper(a[i]) < std::toupper(b[i]))
return -1;
if (std::toupper(a[i]) > std::toupper(b[i]))
return 1;
}

Put
++i;
inside the while loop
just 2 lines above...

You can detect a mismatch as soon as you see two characters that are different -- but you can't detect a match until you've reached the end of the string, and the characters are still identical.
At least in my opinion, most attempts at this get the basic idea sort of backwards, trying to compare characters immediately (with some special-casing for one or both strings being empty). Instead, it's usually best to start by just skipping non-zero bytes that are equal. Then, you're either at the end of (at least one) string, or else you've found a mismatch between bytes in the two strings. Either way, at that point you can sort out what's going on, and return the correct value.
int cmp_str(char const *a, char const *b) {
while (*a && *a == *b) {
++a;
++b;
}
if (*b < *a)
return 1;
if (*b > *a)
return -1;
return 0;
}
This keeps the loop very simple, with very few conditions, so it can execute quickly. All the more complex comparisons to figure out the actual ordering happen outside the loop where they happen only once, and have almost no effect on speed.
I should probably add one warning: this does not make any attempt at dealing with international characters correctly. To do that, you just about need to add collating tables that define the relative order of characters because (at least in many code pages) the values of characters don't correspond to the order in which the characters should be sorted.
For what it's worth, here's a quick test comparing the results and speed from this to Andreas's compare and the strcmp in the standard library:
int cmp_str(char const *a, char const *b) {
while (*a && *a == *b) {
++a;
++b;
}
if (*b < *a)
return 1;
if (*b > *a)
return -1;
return 0;
}
inline int compare(char const* const a, char const* const b)
{
/* Return -1 less than, 0 equal, 1 greater than */
if (!a[0] && b[0])
return -1;
else if (a[0] && !b[0])
return 1;
register int i = 0;
for (; a[i] && b[i]; i++) {
if (a[i] < b[i])
return -1;
if (a[i] > b[i])
return 1;
}
#if 1 /* this addition makes this code work like std::strcmp */
if (!a[i] && b[i])
return -1;
else if (a[i] && !b[i])
return 1;
#endif
return 0;
}
#ifdef TEST
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <stdlib.h>
int main(){
char *s1 [] = { "", "a", "one", "two", "three", "one", "final" };
char *s2 [] = { "x", "b", "uno", "deux", "three", "oneone", "" };
for (int i = 0; i < 7; i++) {
printf("%d\t", cmp_str(s1[i], s2[i]));
printf("%d\t", compare(s1[i], s2[i]));
printf("%d\n", strcmp(s1[i], s2[i]));
}
// Test a long string:
static const int size = 5 * 1024 * 1024;
static char s3[size];
for (int i = 0; i < size - 1; i++)
s3[i] = (rand() % 254) + 1;
s3[size - 1] = '\0';
static char s4[size];
strcpy(s4, s3);
s3[size - 5] = (s3[size - 5] + 4) % 255;
clock_t start = clock();
int val1 = cmp_str(s3, s4);
clock_t t1 = clock() - start;
start = clock();
int val2 = compare(s3, s4);
clock_t t2 = clock() - start;
start = clock();
int val3 = strcmp(s3, s4);
clock_t t3 = clock() - start;
double v1 = (double) t1 / CLOCKS_PER_SEC;
double v2 = (double) t2 / CLOCKS_PER_SEC;
double v3 = (double) t3 / CLOCKS_PER_SEC;
printf("Jerry: %d, %f\nAndreas: %d, %f\nstdlib: %d, %f\n", val1, v1, val2, v2, val3, v3);
}
#endif
Results:
-1 -1 -1
-1 -1 -1
-1 -1 -1
1 1 1
0 0 0
-1 -1 -1
1 1 1
Jerry: 1, 0.007000
Andreas: 1, 0.010000
stdlib: 1, 0.007000
Since Andreas has corrected his code, all three produce identical results for all the tests, but this version and the standard library do so about 30% faster than Andreas's versions. That does vary somewhat with the compiler though. With VC++, my code almost matches the code in the standard library (but if I use a huge string, like 200 megabytes, the version in the standard library is measurably better. With g++, the code in the standard library seems to be a little slower than the code in the VC++ standard library, but the result it generates for either Andreas's code or my code is quite a bit worse than VC++ produces for them. On a 200 megabyte string, I get these results with VC++:
Jerry: 1, 0.288000
Andreas: 1, 0.463000
stdlib: 1, 0.256000
...but with g++ (MinGW), I get results like this:
Jerry: 1, 0.419000
Andreas: 1, 0.523000
stdlib: 1, 0.268000
Although the ranking remains the same either way, the difference in speed between the standard library and my code is much larger with g++ than with VC++.

I have to say two things about your code:
i++ needs to go inside loop, but you should use ++i, depending on compiler i++ may be slower then ++i.
It is enough while (a[i]) instead of while (a[i] != NULL && b[i] != NULL), because a and b lengths are equal.

Its just because you are returning too early. Once you execute a return in a function the control go backs to where it was invoked.
In this case you are returning in a while loop which is a logical error.Lets take the case here.First it will compare a[0] and b[0] and it will return in all three cases according to your code.. ie a[0]b[0] return 1 and else return 0...
you have to change the whole function.I will edit your function according to need please wait
int strComp(char a[], char b[]) {
int i = 0;
if (strLen(a) == strLen(b)) {
while (a[i] != NULL && b[i] != NULL) {
if (a[i] == b[i]) {
return 0;
} else if(a[i] > b[i]) {
return 1;
} else {
return -1;
}
}
i++;
} else if (strLen(a) > strLen(b)) {
return 1;
} else {
return -1;
}
}
EDITED CODE(PS:please check i havent tried):
int strComp(char a[], char b[])
{
int i = 0;
while (a[i]!='\0'&&b[i]!='\0')
{
if(a[i] > b[i])
{
return 1;
}
else if (a[i] < b[i])
{
return -1;
}
i++; //place i++ here
}
if(a[i]==b[i])
return 0; //if string are equal
if(a[i]=='\0')
return -1;
else
return 1;
}

Related

Alphabetically partitioning indexes of substrings in C++ issues

for a while now I've been trying to get this code to work to partition (as if preparing for quicksort) the indexes of substring suffixes, and while it's close I'm not getting what I'm expecting. I was wondering if a fresh set of eyes may help.
int partition(const string &S, vector<int> &indices, int low, int high, int pivotIndex)
{
int i = low;
int j = high - 1;
swap(indices[high], indices[pivotIndex]);
while (i <= j)
{
while (i < high && !lessThan(S, S[i], S[indices[high]]))
i++;
while (j >= low && lessThan(S, S[j], S[indices[high]]))
j--;
if (i < j)
{
int temp = indices[i];
indices[i] = indices[j];
indices[j] = temp;
i++;
j--;
}
}
swap(indices[high], indices[i]);
return i;
}
Indices is simply a vector of 0, 1, 2, ..., n of same size as string S.
And here's the program I wrote for lessThan just so you know what I'm working with:
bool lessThan(const string &S, int first, int second)
{
int counter = (int)S.length() - ((first <= second) ? second : first);
for (int i = 0; i <= counter; ++i)
{
if (S[first + i] != S[second + i])
{
if (S[first + i] < S[second + i])
{
return true;
}
else
{
return false;
}
}
}
if (first < second)
{
return false;
}
else
{
return true;
}
}
lessThan seems to work just fine when I test it separately, so I don't think it's the issue, but maybe it is.
Whenever I test, with say the string "abracadabra", and setting the pivotIndex to 4, I expect to get "0 1 8 3 10 5 7 4 2 9 6" as my output but I instead get "0 1 8 3 7 5 4 10 2 9 6". Close, but not quite. Can anyone spot my mistake?
(P.S. I know I could probably use substr() or some other solution to do lessThan easier, but I'm trying to do it without allocating extra memory, my focus is on the partition function)
edit: I figured it out. Complete error on my side. Check below for answer

I was an idiot, input to lessThan was supposed to be given two ints. I instead gave it two chars from S. Also swapped which lessThan call the ! was on. I think I was just up too late programming and blame this all on sleep deprivation.
Fixed Code:
int partition(const string &S, vector<int> &indices, int low, int high, int pivotIndex)
{
int i = low;
int j = high - 1;
swap(indices[high], indices[pivotIndex]);
while (i <= j)
{ //This right here
while (i < high && lessThan(S, indices[i], indices[high]))
i++;
while (j >= low && !lessThan(S, indices[j], indices[high]))
j--;
if (i < j)
{
swap(indices[i], indices[j]);
i++;
j--;
}
}
swap(indices[high], indices[i]);
return i;
}

Is there a way to ignore elements in an array without changing the array in a recursive function?

I'm trying to write a recursive function that checks if two arrays have the same elements even if they aren't sorted, but I I can't change the arrays and I can't copy them or use a third/fourth arrays and it has to be recursive, lastly, I can't change the signature of the function.
So now I have to get rid of overwrite(A2, len, i); because that's destroying A2, but I don't see any way to do it and still have a working function... can I have a hint on how to do it? Maybe there's a way to save the elements of A2 by swapping them and then by the end of the recursion to restore them?
In short the algorithm below does a linear search of the last element of A1 in A2, if it's found, overwrite it and continue, this is done so the algorithm won't pick the same element twice, reaching the stopping condition means all the elements are there thus it will return true, otherwise will return false.
bool foo(int A1[], int A2[], int len){//both arrays are size len
int i;
bool found = false;
if (len == 0)return true;//stopping condition for recursion
else{
for (i = 0; i < len && !found; i++)//linear search
if (A1[len - 1] == A2[i]){
overwrite(A2, len, i);//this function shifts back the whole array
found = true;
}
if (found == false) return false;
else foo(A1, A2, len - 1);
}
}
Sample i/o:
A1: 3 2 1
A2: 1 2 3
True
A1: 3 2 3
A2: 1 2 3
False

A solution could be:
find what is the maximum value M in in A1 and how many times it appears
check if it's the same for A2, including the count
find what is the maximum value M1 among all values smaller than M and how many times is present in A1
check if it's the same for A2, including the count
find what is the maximum value M2 among all values smaller than M1 and how many times is present in A1
check if it's the same for A2, including the count
repeat this way until the counter for A1 and A2 is zero or is different
in code:
bool checkSame(int *A1, int *A2, int len) {
struct Same {
static bool check(int *A1, int *A2, int len, int limit) {
int index1=-1, count1=0;
for (int i=0; i<len; i++) {
if (A1[i] <= limit) {
if (index1==-1 || A1[i] > A1[index1]) {
index1 = i;
count1 = 1;
} else if (A1[i] == A1[index1]) {
count1++;
}
}
}
int index2=-1, count2=0;
for (int i=0; i<len; i++) {
if (A2[i] <= limit) {
if (index2==-1 || A2[i] > A2[index2]) {
index2 = i;
count2 = 1;
} else if (A2[i] == A2[index2]) {
count2++;
}
}
}
if (index1 == -1 && index2 == -1) return true;
if (count1 != count2 || count1 == 0 ||
A1[index1] != A2[index2]) return false;
return check(A1, A2, len, A1[index1]-1);
}
};
return Same::check(A1, A2, len, INT_MAX);
}
This algorithm is O(n^2) in time (worst case: arrays are identical and all values unique) and requires constant space if the compiler supports tail call optimization.
The following is a chart for the time needed in ms from 0 to 3000 elements on my PC.
Note that however all this is not a decent solution for the problem but just an exercise in futility. A real solution of course would need more context as there are different criteria for optimality, but I'd probably go for a closed hash table... adding elements while processing A1 and removing elements processing A2 (the removal will fail at some point if and only if the arrays are different):
bool checkSame2(int *A1, int *A2, int len) {
std::vector<int> ht(len, -1), next(len, -1);
for (int i=0; i<len; i++) {
int k = (unsigned)A1[i]*69069 % len;
next[i] = ht[k]; ht[k] = i;
}
for (int i=0; i<len; i++) {
int k = (unsigned)A2[i]*69069 % len;
int prev=-1,p=ht[k];
while (p!=-1 && A1[p] != A2[i]) {
prev = p; p = next[p];
}
if (p == -1) return false;
if (prev == -1) ht[k] = next[p]; else next[prev] = next[p];
}
return true;
}
The execution time for this solution is the purple line touching the N axis in the previous chart (hard to tell with this scale but it's linear + noise, as expected).
Just out of curiosity I also tried what would be the solution if "optimal" means just getting something working that is not hideous:
bool checkSame3(int *A1, int *A2, int len) {
std::map<int, int> counts;
for (int i=0; i<len; i++) counts[A1[i]]++;
for (int i=0; i<len; i++) {
if (--counts[A2[i]] < 0) return false;
}
return true;
}
and this is, unsurprisingly, about 30-40 times slower than the hand-coded hash table version on my PC (but of course still much faster than the recursive version).

Here is a solution that works given all your requirements. It rearranges the arrays, and then un-rearranges them. It uses recursion, uses no additional arrays, and does not change the function signature.
bool foo(int A1[], int A2[], int len){
int i;
if (len == 0){
return true;
} else {
for (i = len - 1; i >= 0; i--){
if (A1[len - 1] == A2[i]){
A2[i] = A2[len - 1];
A2[len - 1] = A1[len - 1];
bool result = foo(A1, A2, len - 1);
A2[len - 1] = A2[i];
A2[i] = A1[len - 1];
return result;
}
}
return false;
}
}

If you are allowed to temporarily change the arrays, provided that you restore them before the last recursive call has returned, you can swap the matching element in A2 with the element at index len - 1 before the recursive call, and swap them back afterwards. Since the recursive call will only look at the index range 0 through len - 2, the matching element will not be considered.

Find if we can get palindrome

Given a string S.We need to tell if we can make it to palindrome by removing exactly one letter from it or not.
I have a O(N^2) approach by modifying Edit Distance method.Is their any better way ?
My Approach :
int ModifiedEditDistance(const string& a, const string& b, int k) {
int i, j, n = a.size();
int dp[MAX][MAX];
memset(dp, 0x3f, sizeof dp);
for (i = 0 ; i < n; i++)
dp[i][0] = dp[0][i] = i;
for (i = 1; i <= n; i++) {
int from = max(1, i-k), to = min(i+k, n);
for (j = from; j <= to; j++) {
if (a[i-1] == b[j-1]) // same character
dp[i][j] = dp[i-1][j-1];
// note that we don't allow letter substitutions
dp[i][j] = min(dp[i][j], 1 + dp[i][j-1]); // delete character j
dp[i][j] = min(dp[i][j], 1 + dp[i-1][j]); // insert character i
}
}
return dp[n][n];
}
How to improve space complexity as max size of string can go upto 10^5.
Please help.
Example : Let String be abc then answer is "NO" and if string is "abbcbba then answer is "YES"

The key observation is that if the first and last characters are the same then you needn't remove either of them; which is to say that xSTRINGx can be turned into a palindrome by removing a single letter if and only if STRING can (as long as STRING is at least one character long).
You want to define a method (excuse the Java syntax--I'm not a C++ coder):
boolean canMakePalindrome(String s, int startIndex, int endIndex, int toRemove);
which determines whether the part of the string from startIndex to endIndex-1 can be made into a palindrome by removing toRemove characters.
When you consider canMakePalindrome(s, i, j, r), then you can define it in terms of smaller problems like this:
If j-i is 1 then return true; if it's 0 then return true if and only if r is 0. The point here is that a 1-character string is a palindrome regardless of whether you remove a character; a 0-length string is a palindrome, but can't be made into one by removing a character (because there aren't any to remove).
If s[i] and s[j-1] are the same, then it's the same answer as canMakePalindrome(s, i+1, j-1, r).
If they're different, then either s[i] or s[j-1] needs removing. If toRemove is zero, then return false, because you haven't got any characters left to remove. If toRemove is 1, then return true if either canMakePalindrome(s, i+1, j, 0) or canMakePalindrome(s, i, j-1, 0). This is because you're now testing whether it's already a palindrome if you remove one of those two characters.
Now this can be coded up pretty easily, I think.
If you wanted to allow for removal of more than one character, you'd use the same idea, but using dynamic programming. With only one character to remove, dynamic programming will reduce the constant factor, but won't reduce the asymptotic time complexity (linear in the length of the string).

Psudocode (Something like this I havn't tested it at all).
It is based on detecting the conditions that you CAN remove a character, ie
There is exactly 1 wrong character
It is a palendrome (0 mismatch)
O(n) in time, O(1) in space.
bool foo(const std::string& s)
{
int i = 0;
int j = s.size()-1;
int mismatch_count = 0;
while (i < j)
{
if (s[i]==s[j])
{
i++; j--;
}
else
{
mismatch_count++;
if (mismatch_count > 1) break;
//override first preference if cannot find match for next character
if (s[i+1] == s[j] && ((i+2 >= j-1)||s[i+2]==s[j-1]))
{
i++;
}
else if (s[j-1]==s[i])
{
j--;
}
else
{
mismatch_count++; break;
}
}
}
//can only be a palendrome if you remove a character if there is exactly one mismatch
//or if a palendrome
return (mismatch_count == 1) || (mismatch_count == 0);
}

Here's a (slightly incomplete) solution which takes O(n) time and O(1) space.
// returns index to remove to make a palindrome; string::npos if not possible
size_t willYouBeMyPal(const string& str)
{
size_t toRemove = string::npos;
size_t len = str.length();
for (size_t c1 = 0, c2 = len - 1; c1 < c2; ++c1, --c2) {
if (str[c1] != str[c2]) {
if (toRemove != string::npos) {
return string::npos;
}
bool canRemove1 = str[c1 + 1] == str[c2];
bool canRemove2 = str[c1] == str[c2 - 1];
if (canRemove1 && canRemove2) {
abort(); // TODO: handle the case where both conditions are true
} else if (canRemove1) {
toRemove = c1++;
} else if (canRemove2) {
toRemove = c2--;
} else {
return string::npos;
}
}
}
// if str is a palindrome already, remove the middle char and it still is
if (toRemove == string::npos) {
toRemove = len / 2;
}
return toRemove;
}
Left as an exercise is what to do if you get this:
abxyxcxyba
The correct solution is:
ab_yxcxyba
But you might be led down a bad path:
abxyxcx_ba
So when you find the "next" character on both sides is a possible solution, you need to evaluate both possibilities.

I wrote a sample with O(n) complexity that works for the tests I threw at it. Not many though :D
The idea behind it is to ignore the first and last letters if they are the same, deleting one of them if they are not, and reasoning what happens when the string is small enough. The same result could be archived with a loop instead of the recursion, which would save some space (making it O(1)), but it's harder to understand and more error prone IMO.
bool palindrome_by_1(const string& word, int start, int end, bool removed = false) // Start includes, end excludes
{
if (end - start == 2){
if (!removed)
return true;
return word[start] == word[end - 1];
}
if (end - start == 1)
return true;
if (word[start] == word[end - 1])
return palindrome_by_1(word, start + 1, end - 1, removed);
// After this point we need to remove a letter
if (removed)
return false;
// When two letters don't match, try to eliminate one of them
return palindrome_by_1(word, start + 1, end, true) || palindrome_by_1(word, start, end - 1, true);
}

Checking if a single string is palindrome is O(n). You can implement a similar algorithm than moves two pointers, one from the start and another from the end. Move each pointer as long as the chars are the same, and on the first mismatch try to match which char you can skip, and keep moving both pointers as long as the rest chars are the same. Keep track of the first mismatch. This is O(n).

I hope my algorithm will pass without providing code.
If a word a1a2....an can be made a palindrome by removing ak, we can search for k as following:
If a1 != an, then the only possible k would be 1 or n. Just check if a1a2....an-1 or a2a3....an is a palindrome.
If a1 == an, next step is solving the same problem for a2....an-1. So we have a recursion here.

public static boolean pal(String s,int start,int end){
if(end-start==1||end==start)
return true;
if(s.charAt(start)==s.charAt(end))
return pal(s.substring(start+1, end),0,end-2);
else{
StringBuilder sb=new StringBuilder(s);
sb.deleteCharAt(start);
String x=new String(sb);
if(x.equals(sb.reverse().toString()))
return true;
StringBuilder sb2=new StringBuilder(s);
sb2.deleteCharAt(end);
String x2=new String(sb2);
if(x2.equals(sb2.reverse().toString()))
return true;
}
return false;
}

I tried the following,f and b are the indices at which characters do not match
int canwemakepal(char *str)//str input string
{
long int f,b,len,i,j;
int retval=0;
len=strlen(str);
f=0;b=len-1;
while(str[f]==str[b] && f<b)//continue matching till we dont get a mismatch
{
f++;b--;
}
if(f>=b)//if the index variable cross over each other, str is palindrome,answer is yes
{
retval=1;//true
}
else if(str[f+1]==str[b])//we get a mismatch,so check if removing character at str[f] will give us a palindrome
{
i=f+2;j=b-1;
while(str[i]==str[j] && i<j)
{
i++;j--;
}
if(i>=j)
retval=1;
else
retval=0;
}
else if(str[f]==str[b-1])//else check the same for str[b]
{
i=f+1;j=b-2;
while(str[i]==str[j] && i<j)
{
i++;j--;
}
if(i>=j)
retval=1;
else
retval=0;
}
else
retval=0;
return retval;
}

I created this solution,i tried with various input giving correct result,still not accepted as correct solution,Check it n let me know if m doing anything wrong!! Thanks in advance.
public static void main(String[] args)
{
Scanner s = new Scanner(System.in);
int t = s.nextInt();
String result[] = new String[t];
short i = 0;
while(i < t)
{
String str1 = s.next();
int length = str1.length();
String str2 = reverseString(str1);
if(str1.equals(str2))
{
result[i] = "Yes";
}
else
{
if(length == 2)
{
result[i] = "Yes";
}
else
{
int x = 0,y = length-1;
int counter = 0;
while(x<y)
{
if(str1.charAt(x) == str1.charAt(y))
{
x++;
y--;
}
else
{
counter ++;
if(str1.charAt(x) == str1.charAt(y-1))
{
y--;
}
else if(str1.charAt(x+1) == str1.charAt(y))
{
x++;
}
else
{
counter ++;
break;
}
}
}
if(counter >= 2)
{
result[i] = "No";
}
else
result[i]="Yes";
}
}
i++;
} // Loop over
for(int j=0; j<i;j++)
{
System.out.println(result[j]);
}
}
public static String reverseString(String original)
{
int length = original.length();
String reverse = "";
for ( int i = length - 1 ; i >= 0 ; i-- )
reverse = reverse + original.charAt(i);
return reverse;
}

Optimisation ideas - Longest common substring

I have this program which is supposed to find the Longest Common Substring of a number of strings. Which it does, but if the strings are very long (i.e. >8000 characters long), it works slowly (1.5 seconds).
Is there any way to optimise that?
The program is this:
//#include "stdafx.h"
#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;
const unsigned short MAX_STRINGS = 10;
const unsigned int MAX_SIZE=10000;
vector<string> strings;
unsigned int len;
string GetLongestCommonSubstring( string string1, string string2 );
inline void readNumberSubstrings();
inline const string getMaxSubstring();
void readNumberSubstrings()
{
cin >> len;
assert(len > 1 && len <=MAX_STRINGS);
strings.resize(len);
for(register unsigned int i=0; i<len;i++)
strings[i]=string(MAX_SIZE,0);
for(register unsigned int i=0; i<len; i++)
cin>>strings[i];
}
const string getMaxSubstring()
{
string maxSubstring=strings[0];
for(register unsigned int i=1; i < len; i++)
maxSubstring=GetLongestCommonSubstring(maxSubstring, strings[i]);
return maxSubstring;
}
string GetLongestCommonSubstring( string string1, string string2 )
{
const int solution_size = string2.length()+ 1;
int *x=new int[solution_size]();
int *y= new int[solution_size]();
int **previous = &x;
int **current = &y;
int max_length = 0;
int result_index = 0;
int j;
int length;
int M=string2.length() - 1;
for(register int i = string1.length() - 1; i >= 0; i--)
{
for(register int j = M; j >= 0; j--)
{
if(string1[i] != string2[j])
(*current)[j] = 0;
else
{
length = 1 + (*previous)[j + 1];
if (length > max_length)
{
max_length = length;
result_index = i;
}
(*current)[j] = length;
}
}
swap(previous, current);
}
string1[max_length+result_index]='\0';
return &(string1[result_index]);
}
int main()
{
readNumberSubstrings();
cout << getMaxSubstring() << endl;
return 0;
}
Note: there is a reason why I didn't write code that would solve this problem with suffix trees (they're large).

Often when it comes to optimization, a different approach might be your only true option rather than trying to incrementally improve the current implementation. Here's my idea:
create a list of valid characters that might appear in the longest common substring. I.e., if a character doesn't appear in all strings, it can't be part of the longest common substring.
separate each string into multiple strings containing only valid characters
for every such string, create every possible substring and add it to the list as well
filter (as with the characters) all strings, that don't show up in all lists.
The complexity of this obviously depends largely on the number of invalid characters. if it's zero, this approach doesn't help at all.
Some remarks on your code: Don't try to be overly clever. The compiler will optimize so much, there's really no need for you to put register in your code. Second, your allocating strings and then overwrite them (in readNumberSubstrings), that's totally unnecessary. Third, pass by const reference if you can. Fourth, don't use raw pointers, especially if you never delete [] your new []d objects. Use std::vectors instead, it behaves well with exceptions (which you might encounter, you're using strings a lot!).

You have to use suffix tree. This struct will make algorithm, which work about 1 second for 10 string with 10000 symbols.

Give a Suffix Arraya try, they take as much memory as your input strings (depending on your text encoding though) and a built quickly in linear time.
http://en.wikipedia.org/wiki/Suffix_array
Here is my JavaScript code for this
function LCS(as, bs, A, B) {
var a = 0, b = 0, R = [], max = 1
while (a < A.length && b < B.length) {
var M = cmpAt(as, bs, A[a], B[b])
if (M.size > 0) {
if (M.ab < 0) {
var x = b; while (x < B.length) {
var C = cmpAt(as, bs, A[a], B[x])
if (C.size >= M.size) { if (C.size >= max) max = C.size, R.push([a, x, C.size]) } else break
x++
}
} else {
var x = a; while (x < A.length) {
var C = cmpAt(as, bs, A[x], B[b])
if (C.size >= M.size) { if (C.size >= max) max = C.size, R.push([x, b, C.size]) } else break
x++
}
}
}
if (M.ab < 0) a++; else b++
}
R = R.filter(function(a){ if (a[2] == max) return true })
return R
}
function cmpAt(a, b, x, y) {
var c = 0
while (true) {
if (x == a.length) {
if (y == b.length) return { size: c, ab: 0 }
return { size: c, ab: -1 }
}
if (y == b.length) return { size: c, ab: 1 }
if (a.charCodeAt(x) != b.charCodeAt(y)) {
var ab = 1;
if (a.charCodeAt(x) < b.charCodeAt(y)) ab = -1
return { size: c, ab: ab }
}
c++, x++, y++
}
}

Test if multiple variable values are either all positive or negative

Is there a good and fast way in C/C++ to test if multiple variables contains either all positive or all negative values?
Say there a 5 variables to test:
Variant 1
int test(int a[5]) {
if (a[0] < 0 && a[1] < 0 && a[2] < 0 && a[3] < 0 && a[4] < 0) {
return -1;
} else if (a[0] > 0 && a[1] > 0 && a[2] > 0 && a[3] > 0 && a[4] > 0) {
return 1;
} else {
return 0;
}
}
Variant 2
int test(int a[5]) {
unsigned int mask = 0;
mask |= (a[0] >> numeric_limits<int>::digits) << 1;
mask |= (a[1] >> numeric_limits<int>::digits) << 2;
mask |= (a[2] >> numeric_limits<int>::digits) << 3;
mask |= (a[3] >> numeric_limits<int>::digits) << 4;
mask |= (a[4] >> numeric_limits<int>::digits) << 5;
if (mask == 0) {
return 1;
} else if (mask == (1 << 5) - 1) {
return -1;
} else {
return 0;
}
}
Variant 2a
int test(int a[5]) {
unsigned int mask = 0;
for (int i = 0; i < 5; i++) {
mask <<= 1;
mask |= a[i] >> numeric_limits<int>::digits;
}
if (mask == 0) {
return 1;
} else if (mask == (1 << 5) - 1) {
return -1;
} else {
return 0;
}
}
What Version should I prefer? Is there any adavantage using variant 2/2a over 1? Or is there a better/faster/cleaner way?

I think your question and what you're looking for don't agree. You asked how to detect if they're signed or unsigned, but it looks like you mean how to test if they're positive or negative.
A quick test for all negative:
if ((a[0]&a[1]&a[2]&a[3]&a[4])<0)
and all non-negative (>=0):
if ((a[0]|a[1]|a[2]|a[3]|a[4])>=0)
I can't think of a good way to test that they're all strictly positive (not zero) right off, but there should be one.
Note that these tests are correct and portable for twos complement systems (anything in the real world you would care about), but they're slightly wrong for ones complement or sign-magnitude. They might can be fixed if you really care.

I guess you mean negative/positive, (un)signed means whether a sign exists at all. This one works for any iterable (this assumes you count 0 as positive):
template <class T>
bool allpos(const T start, const T end) {
T it;
for (it = start; it != end; it++) {
if (*it < 0) return false;
}
return true;
}
// usage
int a[5] = {-5, 3, 1, 0, 4};
bool ispos = allpos(a, a + 5);
Note: This is a good and fast way
This may not be the absolutely extremely superduperfastest way to do it, but it certainly is readable and really fast. Optimizing this is just not worth it.

Variant 1 is the only readable one.
However, you could make it nicer using a loop:
int test(int *a, int n) {
int neg = 0;
for(int i = 0; i < n; i++) {
if(a[i] < 0) neg++;
}
if(neg == 0) return 1;
else if(neg == n) return -1;
else return 0;
}

I agree with previous posters that loops are simpler. The following solution combines Nightcracker's template and ThiefMaster's full solution, with early-outing if a sign-change is detected while looping over the variables (early-outing). And it works for floating point values.
template<typename T>
int testConsistentSigns(const T* i_begin, const T* i_end)
{
bool is_positive = !(*i_begin < 0);
for(const T* it = i_begin + 1; it < i_end; ++it)
{
if((*it < 0) && is_positive)
return 0;
}
if(is_positive)
return 1;
return -1;
}

In terms of speed, I suggest you profile each of your example in turn to discover which is the fastest on your particular platform.
In terms of ease of understanding, I'd say that the first example is the most obvious, though that's just my opinion.
Of course, the first version is a maintenance nightmare if you have more than 5 variables. The second and third variants are better for this, but obviously have a limit of 32 variables. To make them fully flexible, I would suggest keeping counters of the number of positive and negative variables, in a loop. After the end of the loop, just check that one or other counter is zero.

First off, create a method\procedure. That'll boost readability by a whole lot (no matter how you implement it, it'll be cleaner then all the options above).
Second, I think that the function:
bool isNeg(int x) { return x < 0;}
s cleaner then using bit masks, so I'll go with option 1, and when it comes to speed, let the compiler work that out for you in such low-level cases.
The final code should look something like:
int test(int a[5]) {
bool allNeg = true;
bool allPos = true;
for (i = 0; i < 5; i++){
if (isNeg(a[i]) allPos = false;
if (isPos(a[i]) allNeg = false;
}
if (allNeg) return -1;
if (allPos) return 1;
return 0;
}

You could find maximum element, if it is negative then all elements are negative:
template<typename T>
bool all_negative( const T* first, const T* last )
{
const T* max_el = std::max_element( first, last );
if ( *max_el < T(0) ) return true;
else return false;
}
You could use boost::minmax_element to find if all elements are negative/positive in one loop:
template<typename T>
int positive_negative( const T* first, const T* last )
{
std::pair<const T*,const T*> min_max_el = boost::minmax_element( first, last );
if ( *min_max_el.second < T(0) ) return -1;
else if ( *min_max_el.first > T(0) ) return 1;
else return 0;
}
If the sequence is non-empty, the function minmax_element performs at most 3 * (last - first - 1) / 2 comparisons.

If you only need to know less/greater than zero one at a time, or can be content with < and >= you can do it easily with find_if like this:
#include <iostream>
template <class Iter>
int all_neg(Iter begin, Iter end)
{
return std::find_if(begin, end, std::bind2nd(std::greater_equal<int>(), 0)) == end;
}
int main()
{
int a1[5] = { 1, 2, 3, 4, 5 };
int a2[5] = { -1, 2, 3, 4, 5 };
int a3[5] = { -1, -2, -3, -4, -5 };
int a4[5] = { 0 };
std::cout << all_neg(a1, a1 + 5) << ":"
<< all_neg(a2, a2 + 5) << ":"
<< all_neg(a3, a3 + 5) << ":"
<< all_neg(a4, a4 + 5) << std::endl;
}
You can also use a more complicated predicate that keeps track of any pos/neg to answer your original question if you really need that level of detail.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

comparing char one by one like function strcmp - c++

Put ++i; inside the while loop just 2 lines above...

I have to say two things about your code: i++ needs to go inside loop, but you should use ++i, depending on compiler i++ may be slower then ++i. It is enough while (a[i]) instead of while (a[i] != NULL && b[i] != NULL), because a and b lengths are equal.

Related

Alphabetically partitioning indexes of substrings in C++ issues

Is there a way to ignore elements in an array without changing the array in a recursive function?

Find if we can get palindrome

Optimisation ideas - Longest common substring

Test if multiple variable values are either all positive or negative

Categories

Resources