I have following problem to solve:
There are two strings of arbitrary length with arbitrary content. I need to find all ordered sequences with maximum length, which appears in both strings.
Example 1:
input: "a1b2c3" "1a2b3c"
output: "123" "12c" "1b3" "1bc" "a23" "a2c" "ab3" "abc"
Example 2:
input: "cadb" "abcd"
output: "ab" "ad" "cd"
I wrote it in straight way with two loops, recursion, then removing duplicates and results which are part of larger result (for instance "abc" sequence contains "ab" "ac" and "bc" sequences, so I am filtering those)
// "match" argument here used as temporary buffer
void match_recursive(set<string> &matches, string &match, const string &a_str1, const string &a_str2, size_t a_pos1, size_t a_pos2)
{
bool added = false;
for(size_t i = a_pos1; i < a_str1.length(); ++i)
{
for(size_t j = a_pos2; j < a_str2.length(); ++j)
{
if(a_str1[i] == a_str2[j])
{
match.push_back(a_str1[i]);
if(i < a_str1.length() - 1 && j < a_str2.length() - 1)
match_recursive(matches, match, a_str1, a_str2, i + 1, j + 1);
else
matches.emplace(match);
added = true;
match.pop_back();
}
}
}
if(!added)
matches.emplace(match);
}
This function solves problem, but complexity is unacceptable. For instance solution for "0q0e0t0c0a0d0a0d0i0e0o0p0z0" "0w0r0y0d0s0a0b0w0k0f0.0k0x0" takes 28 seconds on my machine (debug target, but anyway this is extremely slow). I think there should be some simple algorithm for this problem, but somehow I can't find any on the net.
Can you guys point me to right direction?
Look up "longest common subsequence (LCS)" problem, e.g. http://en.wikipedia.org/wiki/Longest_common_subsequence_problem and see how the dynamic programming solution works to find a LCS of two sequences, based on building up the solution efficiently starting with trivially getting the the LCS for the first character of each sequence, and then building up the LCS solution for longer and longer pairs of prefixes of the two sequences. The only modification you need to make is that when you get the LCS for a current prefix pair from the previously computed LCS solutions for earlier prefix pairs, you need to have stored ALL previous LCS strings for the earlier prefix pairs, and then combine these sets of LCS strings together (possibly with an added character) into an overall set of LCS strings you store for the current prefix pair. This will solve your problem efficiently. You can solve even a bit more efficiently by first just getting a single LCS and getting the overall LCS length, and then finding all earlier prefix pairs that contribute to computational paths that obtain the LCS length, and then going back and repeating the dynamic programming iterations just for those prefix pairs, and this time keeping track of all possible LCS sequences like I described earlier.
Here is the code for the dynamic programming solution. I test it with the examples you give. I have solved the LCS problem, but this is the first time to print them all.
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
#include <set>
using namespace std;
#define MAX_LENGTH 100
int lcs(const char* a, const char* b)
{
int row = strlen(a)+ 1;
int column = strlen(b) + 1;
//Memoization lower the function's time cost in exchange for space cost.
int **matrix = (int**)malloc(sizeof(int*) * row);
int i, j;
for(i = 0; i < row; ++i)
matrix[i] = (int*)calloc(sizeof(int), column);
typedef set<string> lcs_set;
lcs_set s_matrix[MAX_LENGTH][MAX_LENGTH];
//initiate
for(i = 0; i < MAX_LENGTH ; ++i)
s_matrix[0][i].insert("");
for(i = 0; i < MAX_LENGTH ; ++i)
s_matrix[i][0].insert("");
//Bottom up calculation
for(i = 1; i < row; ++i)
{
for(j = 1; j < column; ++j)
{
if(a[i - 1] == b[j - 1])
{
matrix[i][j] = matrix[i -1][j - 1] + 1;
// if your compiler support c++ 11, you can simplify this code.
for(lcs_set::iterator it = s_matrix[i - 1][j - 1].begin(); it != s_matrix[i - 1][j - 1].end(); ++it)
s_matrix[i][j].insert(*it + a[i - 1]);
}
else
{
if(matrix[i][j - 1] > matrix[i - 1][j])
{
matrix[i][j] = matrix[i][j - 1];
for(lcs_set::iterator it = s_matrix[i][j - 1].begin(); it != s_matrix[i][j - 1].end(); ++it)
s_matrix[i][j].insert(*it);
}
else if(matrix[i][j - 1] == matrix[i - 1][j])
{
matrix[i][j] = matrix[i][j - 1];
for(lcs_set::iterator it = s_matrix[i][j - 1].begin(); it != s_matrix[i][j - 1].end(); ++it)
s_matrix[i][j].insert(*it);
for(lcs_set::iterator it = s_matrix[i - 1][j].begin(); it != s_matrix[i - 1][j].end(); ++it)
s_matrix[i][j].insert(*it);
}
else
{
matrix[i][j] = matrix[i - 1][j];
for(lcs_set::iterator it = s_matrix[i - 1][j].begin(); it != s_matrix[i - 1][j].end(); ++it)
s_matrix[i][j].insert(*it);
}
}
}
}
int lcs_length = matrix[row - 1][column -1];
// all ordered sequences with maximum length are here.
lcs_set result_set;
int m, n;
for(m = 1; m < row; ++m)
{
for(n = 1; n < column; ++n)
{
if(matrix[m][n] == lcs_length)
{
for(lcs_set::iterator it = s_matrix[m][n].begin(); it != s_matrix[m][n].end(); ++it)
result_set.insert(*it);
}
}
}
//comment it
for(lcs_set::iterator it = result_set.begin(); it != result_set.end(); ++it)
printf("%s\t", it->c_str());
printf("\n");
for(i = 0; i < row; ++i)
free(matrix[i]);
free(matrix);
return lcs_length;
}
int main()
{
char buf1[MAX_LENGTH], buf2[MAX_LENGTH];
while(scanf("%s %s", buf1, buf2) != EOF)
{
printf("length is: %d\n", lcs(buf1, buf2) );
}
return 0;
}
Sounds like you are trying to find similarities between 2 string? I found this code, and modified slightly, somewhere on the web many years ago (sorry I cannot quote the source any longer) and use it often. It works very quick (for strings anyway). You may need to change for your purpose. Sorry it's in VB.
Private Shared piScore As Integer
''' <summary>
''' Compares two not-empty strings regardless of case.
''' Returns a numeric indication of their similarity
''' (0 = not at all similar, 100 = identical)
''' </summary>
''' <param name="psStr1">String to compare</param>
''' <param name="psStr2">String to compare</param>
''' <returns>0-100 (0 = not at all similar, 100 = identical)</returns>
''' <remarks></remarks>
Public Shared Function Similar(ByVal psStr1 As String, ByVal psStr2 As String) As Integer
If psStr1 Is Nothing Or psStr2 Is Nothing Then Return 0
' Convert each string to simplest form (letters
' and digits only, all upper case)
psStr1 = ReplaceSpecial(psStr1.ToUpper)
psStr2 = ReplaceSpecial(psStr2.ToUpper)
If psStr1.Trim = "" Or psStr2.Trim = "" Then
' One or both of the strings is now empty
Return 0
End If
If psStr1 = psStr2 Then
' Strings are identical
Return 100
End If
' Initialize cumulative score (this will be the
' total length of all the common substrings)
piScore = 0
' Find all common sub-strings
FindCommon(psStr1, psStr2)
' We now have the cumulative score. Return this
' as a percent of the maximum score. The maximum
' score is the average length of the two strings.
Return piScore * 200 / (Len(psStr1) + Len(psStr2))
End Function
''' <summary>USED BY SIMILAR FUNCTION</summary>
Private Shared Sub FindCommon(ByVal psS1 As String, ByVal psS2 As String)
' Finds longest common substring (other than single
' characters) in psS1 and psS2, then recursively
' finds longest common substring in left-hand
' portion and right-hand portion. Updates the
' cumulative score.
Dim iLongest As Integer = 0, iStartPos1 As Integer = 0, iStartPos2 As Integer = 0, iJ As Integer = 0
Dim sHoldStr As String = "", sTestStr As String = "", sLeftStr1 As String = "", sLeftStr2 As String = ""
Dim sRightStr1 As String = "", sRightStr2 As String = ""
sHoldStr = psS2
Do While Len(sHoldStr) > iLongest
sTestStr = sHoldStr
Do While Len(sTestStr) > 1
iJ = InStr(psS1, sTestStr)
If iJ > 0 Then
' Test string is sub-set of the other string
If Len(sTestStr) > iLongest Then
' Test string is longer than previous
' longest. Store its length and position.
iLongest = Len(sTestStr)
iStartPos1 = iJ
iStartPos2 = InStr(psS2, sTestStr)
End If
' No point in going further with this string
Exit Do
Else
' Test string is not a sub-set of the other
' string. Discard final character of test
' string and try again.
sTestStr = Left(sTestStr, Len(sTestStr) - 1)
End If
Loop
' Now discard first char of test string and
' repeat the process.
sHoldStr = Right(sHoldStr, Len(sHoldStr) - 1)
Loop
' Update the cumulative score with the length of
' the common sub-string.
piScore = piScore + iLongest
' We now have the longest common sub-string, so we
' can isolate the sub-strings to the left and right
' of it.
If iStartPos1 > 3 And iStartPos2 > 3 Then
sLeftStr1 = Left(psS1, iStartPos1 - 1)
sLeftStr2 = Left(psS2, iStartPos2 - 1)
If sLeftStr1.Trim <> "" And sLeftStr2.Trim <> "" Then
' Get longest common substring from left strings
FindCommon(sLeftStr1, sLeftStr2)
End If
Else
sLeftStr1 = ""
sLeftStr2 = ""
End If
If iLongest > 0 Then
sRightStr1 = Mid(psS1, iStartPos1 + iLongest)
sRightStr2 = Mid(psS2, iStartPos2 + iLongest)
If sRightStr1.Trim <> "" And sRightStr2.Trim <> "" Then
' Get longest common substring from right strings
FindCommon(sRightStr1, sRightStr2)
End If
Else
sRightStr1 = ""
sRightStr2 = ""
End If
End Sub
''' <summary>USED BY SIMILAR FUNCTION</summary>
Private Shared Function ReplaceSpecial(ByVal sString As String) As String
Dim iPos As Integer
Dim sReturn As String = ""
Dim iAsc As Integer
For iPos = 1 To sString.Length
iAsc = Asc(Mid(sString, iPos, 1))
If (iAsc >= 48 And iAsc <= 57) Or (iAsc >= 65 And iAsc <= 90) Then
sReturn &= Chr(iAsc)
End If
Next
Return sReturn
End Function
Just call the Similar function and you get a result between 0 at 100.
Hope this helps
Related
To start with, we have array of strings, I have to print this array that way, that one word before space or first 12 characters = one string.
For example, lets say we have string "hello world qwerty------asd" , this must be printed as :
hello
world
qwerty------ (12 characters without space)
asd
So, it will be easy to do without this 12 characters condition in the task ( just strtok function I guess ), but in this case, I dont know what to do, I have idea, but it works with only 50% of inputs, here it is, it is quite a big and very stupid, I know its about strings functions, but cant make algoritm , thank you:
int counter = 0;// words counter
int k1 = 0;// since I also need to print addresses of letters of third word, I have to know where 3rd word is
int jbegin=0,// beginning and end of 3rd word
jend=0;
for (int k = 0; k < i; k++) {
int lastspace = 0;//last index of new string( space or 12 characters)
for (int j = 0; j < strlen(*(arr + k)); j++) {
if (*(*(arr + k) + j) == ' ' ) { //if space
printf("\n");
lastspace = j;
counter++;
if ( counter == 3 ) { // its only for addreses, doesnt change anything
k1 = k;
jbegin = j + 1;
jend = jbegin;
}
}
if (j % 12 == 0 && (j-lastspace>11 || lastspace==0) ) { // if 12 characters without space - make a new string
printf(" \n");
counter++;
lastspace = j;
}
if (counter==3 ) {
jend++;
}
printf("%c", *(*(arr+k) + j)); // printing by char
}
printf("\n ");
}
if ( jend!=0 && jbegin!=0 ) {
printf("\n Addreses of third word are :\n");
for (int j = jbegin; j < jend; j++) {
printf("%p \n", arr + k1 + j);
printf("%c \n", *(*(arr + k1) + j));
}
}
I tried to understand your code, but to be honest, I have no idea what you are doing there. If you print character by character you only need to add a line break when you encounter a space and you need to keep track of how many characters you already printed on the same line.
#include <iostream>
int main() {
char x[] = "hello world qwerty------asd";
int chars_on_same_line = 0;
const int max_chars_on_same_line = 12;
for (auto& c : x) {
std::cout << c;
++chars_on_same_line;
if (c == ' ' || chars_on_same_line == max_chars_on_same_line){
std::cout << "\n";
chars_on_same_line = 0;
}
}
}
If for some reason you cannot use auto and rage based for loops then you need to get the length of the string and use an index, as in
size_t len = std::strlen(x);
for (size_t i = 0; i < len; ++i) {
c = x[i];
...
}
printf( "%.12s\n", wordStart);
can limit printed chars to 12.
Otherwise there are 2 independent data word starts and line limits.
word starts - each transition from white space to word char needs to be tracked.
whenever a word is completed = wordchar to whitespace
less than or equal to 12 chars since word start. Print whole word + new line.
greater than 12 chars. Print 12 chars and dump rest.
whitespace followed by whitespace - ignore
Given a string, what's the most optimized solution to find the maximum number of equal substrings? For example "aaaa" is composed of four equal substrings "a", or "abab" is composed of two "ab"s. But for something as "abcd" there isn't any substrings but "abcd" that when concatenated to itself would make up "abcd".
Checking all the possible substrings isn't a solution since the input can be a string of length 1 million.
Since there is no given condition for the substrings, an optimized solution to find the maximum number of equal substrings is to count the shortest possible strings, letters. Create a map and count the letters of the string. Find the letter with the maximum number. That is your solution.
EDIT:
If the string must only consist of the substrings then the following code computes a solution
#include <iostream>
#include <string>
using ull = unsigned long long;
int main() {
std::string str = "abab";
ull length = str.length();
for (ull i = 1; (2 * i) <= str.length() && str.length() % i == 0; ++i) {
bool found = true;
for (ull j = 1; (j * i) < str.length(); ++j) {
for (ull k = 0; k < i; ++k) {
if(str[k] != str[k + j * i]) {
found = false;
}
}
}
if(found) {
length = i;
break;
}
}
std::cout << "Maximal number: " << str.length() / length << std::endl;
return 0;
}
This algorithm checks if the head of the string is repeated and if the string only consists of repetitions of the head.
i-loop iterates over the length of the head,
j-loop iterates over each repetition,
k-loop iterates over each character in the substring
The word which i have to match is "FFM_L_REEF_30" (the length of the word will not be fixed)
The list of words where i have to match is
"FFM_H_REEF_40"
"FFM_H_REEF_50"
"FFM_L_REEF_20"
"FFM_L_RAEF_30"
is it possible to write a regular expression where it can match the words with only one character difference, like
"FFM_L_REEF_20" - In this number 3 is changed to 2
"FFM_L_QEEF_30" - In this character R is changed to Q
QUESTION EDITED
Give a try to the below regex.
^(?:(.)\1*(?:(?!\1).)\1*|(.)(?:(?!\2)(.))\3*$)$
DEMO
Let me know the case where the above regex fails.
Here's a JS example of how you can solve it using the Levenshtein algorithm.
(Note that the getEditDistance method is copied from wikibooks, where there are implementations for a multitude of programming languages).
var testWord = "FFM_L_REEF_30",
wordList = [
"FFM_H_REEF_40",
"FFM_H_REEF_50",
"FFM_L_REEF_20",
"FFM_L_RAEF_30",
"A rendom sentence",
testWord,
"FFM_L_REEF_3"
],
w;
function getEditDistance(a, b) {
// Code from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
if (a.length === 0) return b.length;
if (b.length === 0) return a.length;
var matrix = [];
// increment along the first column of each row
var i;
for (i = 0; i <= b.length; i++) {
matrix[i] = [i];
}
// increment each column in the first row
var j;
for (j = 0; j <= a.length; j++) {
matrix[0][j] = j;
}
// Fill in the rest of the matrix
for (i = 1; i <= b.length; i++) {
for (j = 1; j <= a.length; j++) {
if (b.charAt(i - 1) == a.charAt(j - 1)) {
matrix[i][j] = matrix[i - 1][j - 1];
} else {
matrix[i][j] = Math.min(matrix[i - 1][j - 1] + 1, // substitution
Math.min(matrix[i][j - 1] + 1, // insertion
matrix[i - 1][j] + 1)); // deletion
}
}
}
return matrix[b.length][a.length];
};
wordList.forEach(function (w) {
var ld = getEditDistance(w, testWord);
document.write('<p style="color:' + (ld>1?'red':'green') + ';">"' + w + '" gives levenshtein distance ' + ld + '</p>');
});
Yes, sure. You can use the next regex:
a*.a*
It means that your string contains any number of 'a', then one any symbol, and again any number of 'a'.
'.' means any symbol except of '\n'. If you also want the string will contain '\n', you can use the next regex:
a*[\s\S]a*
Below is an example code that is not working the way I want.
#include <iostream>
using namespace std;
int main()
{
char testArray[] = "1 test";
int numReplace = 2;
testArray[0] = (int)numReplace;
cout<< testArray<<endl; //output is "? test" I wanted it 2, not a '?' there
//I was trying different things and hoping (int) helped
testArray[0] = '2';
cout<<testArray<<endl;//"2 test" which is what I want, but it was hardcoded in
//Is there a way to do it based on a variable?
return 0;
}
In a string with characters and integers, how do you go about replacing numbers? And when implementing this, is it different between doing it in C and C++?
If numReplace will be in range [0,9] you can do :-
testArray[0] = numReplace + '0';
If numReplace is outside [0,9] you need to
a) convert numReplace into string equivalent
b) code a function to replace a part of string by another evaluated in (a)
Ref: Best way to replace a part of string by another in c and other relevant post on SO
Also, since this is C++ code, you might consider using std::string, here replacement, number to string conversion, etc are much simpler.
You should look over the ASCII table over here: http://www.asciitable.com/
It's very comfortable - always look on the Decimal column for the ASCII value you're using.
In the line: TestArray[0] = (int)numreplace; You've actually put in the first spot the character with the decimal ASCII value of 2. numReplace + '0' could do the trick :)
About the C/C++ question, it is the same in both and about the characters and integers...
You should look for your number start and ending.
You should make a loop that'll look like this:
int temp = 0, numberLen, i, j, isOk = 1, isOk2 = 1, from, to, num;
char str[] = "asd 12983 asd";//will be added 1 to.
char *nstr;
for(i = 0 ; i < strlen(str) && isOk ; i++)
{
if(str[i] >= '0' && str[i] <= '9')
{
from = i;
for(j = i ; j < strlen(str) && isOk2)
{
if(str[j] < '0' || str[j] > '9')//not a number;
{
to=j-1;
isOk2 = 0;
}
}
isOk = 0; //for the loop to stop.
}
}
numberLen = to-from+1;
nstr = malloc(sizeof(char)*numberLen);//creating a string with the length of the number.
for(i = from ; i <= to ; i++)
{
nstr[i-from] = str[i];
}
/*nstr now contains the number*/
num = atoi(numstr);
num++; //adding - we wanted to have the number+1 in string.
itoa(num, nstr, 10);//putting num into nstr
for(i = from ; i <= to ; i++)
{
str[i] = nstr[i-from];
}
/*Now the string will contain "asd 12984 asd"*/
By the way, the most efficient way would probably be just looking for the last digit and add 1 to it's value (ASCII again) as the numbers in ASCII are following each other - '0'=48, '1'=49 and so on. But I just showed you how to treat them as numbers and work with them as integers and so. Hope it helped :)
I hope this place is the best for this kind of question.
I've the following problem (I think is more complex than it appears).
I'm using a double-ended queue (deque) data structure of strings.
deque < string > extractions;
The deque contains only N different strings, every string repeated for M times in random order, so that the lenght of the deque is N*M, for example, suppose M=4, N=2, string1="A", string2="B":
extractions[1] = "A"
extractions[2] = "A"
extractions[3] = "B"
extractions[4] = "B"
extractions[5] = "A"
extractions[6] = "B"
extractions[7] = "B"
extractions[8] = "A"
I'm in search of an algorithm which allows me to find an interesting configuration in which there aren't two consecutive equal elements, in this case there should be only two solutions, the "A","B","A","B","A","B","A","B" and the "B","A","B","A","B","A","B","A".
For "interesting" configuration I mean the configuration not simply given by a number N of nested loops.
A very stupid solution I've implemented is to randomly shuffle the deck with std::random_shuffle until no occurence of consecutive equal elements is found, but this is both stupid and slow, this is more like a bogosort...
Clearly maximizing the edit distance between the strings should be better.
Some hint?
Start with a trivial configuration, e.g for N=4 an M=4, start from
A B C D A B C D A B C D A B C D
and then run a standard shuffling algorithm but observing the constraint that you do not bring two equal elements next to each others, i.e.
for i = 0 .. N*M - 2
let j = random(N*M - 2 - i) + i + 1
if ((i == 0 || array[i - 1] != array[j])
&& (array[i + 1] != array[j])
&& (array[i] != array[j - 1])
&& (j == N*M - 1 || array[i] != array[j + 1]))
swap (array[i], array[j])
This should leave you very quickly with a random configuration that fulfills your requirement of not having two consecutive equal elements.
int total = m * n;
for (int i = 1; i < total - 1; i++)
{
int j = total - 1;
while ((j > i) && (queue[i - 1] == queue[j]))
j--;
if (queue[i - 1] == queue[j])
{
String aux = queue[i - 1];
queue[i - 1] = queue[j];
queue[j] = aux;
}
}
This code is not tested, but you get the idea.
I would do it with recursion:
example is in C#: I find it more "speaking" than the nested loops:
public List<String> GetPermutations(string s, IEnumerable<String> parts, string lastPart, int targetLength)
{
List<String> possibilities = new List<string>();
if (s.Length >= targetLength)
{
possibilities.Add(s);
return possibilities;
}
foreach (String part in parts)
{
if (!part.Equals(lastPart))
{
possibilities.AddRange(
GetPermutations(s + part, parts, part, targetLength));
}
}
return possibilities;
}
usage:
List<String> parts = new List<String>() { "A", "B", "C"};
int numOccurences = 4;
List<String> results =
GetPermutations("", parts, "", numOccurences * parts.Count );
But if you want just a single possible solution (which is way faster to calculate of course):
it will create you a random, non trivial solution like: CACBCBCABABACAB (for A, B, C)
public String GetRandomValidPermutation(
string s,
List<String> parts,
string lastPart,
int targetLength)
{
if (s.Length >= targetLength)
{
return s;
}
String next = String.Empty;
while(
(next = parts[new Random().Next(0, parts.Count)])
.Equals(lastPart)
){}
return GetRandomValidPermutation(s + next, parts, next, targetLength);
}
call:
String validString =
GetRandomValidPermutation("", parts, "", numOccurences * parts.Count);