Minimum edit distance of two documents - c++

I know how to get the minimum edit distance of two short strings like 'abcde' and 'abfde',but how to do the MED of two documents or essays with spaces , tabs , enters or even two codes?
For example:
text1:
The computer learns from a huge database of four million videos from volunteers and paid-for market
researchers in various emotional states and the algorithms are constantly updated and tested against real-world
scenarios.
The next stage is to integrate voice analysis and other measures of physical wellbeing such as heart rate and
hand gestures.
and
text2:A computer model has been developed that can predict what word you are thinking of. The model may help to
resolve questions about how the brain processes words and language, and might even lead to techniques for
decoding people’s thoughts.
Researchers led by Tom Mitchell of Carnegie Mellon University in Pittsburgh, Pennsylvania, 'trained' a computer
model to recognize the patterns of brain activity associated with 60 images, each of which represented a
different noun, such as 'celery' or 'aeroplane'.
This is the code I wrote of finding the MED of two strings within 20 characters.
`
int med_lev(char S[], char T[]) {
int dis_lev[20][20];
int n = strlen(S);
int m = strlen(T);
for (int i = 0; i <= n; i++) {
dis_lev[i][0] = i;
}
for (int j = 0; j <= m; j++) {
dis_lev[0][j] = j;
}
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (S[i - 1] == T[j - 1]) {
dis_lev[i][j] = dis_lev[i - 1][j - 1];
}
else {
dis_lev[i][j] = min(dis_lev[i - 1][j - 1] + 2, min(dis_lev[i - 1][j] + 1, dis_lev[i][j - 1] + 1));
}
}
}
return dis_lev[n][m];
}
`
I've thought out a method that is delete all spaces , tabs , enters and put every word in a single string but the problem is the string might be too long. Is there a better way?

Related

Bit-wise shift for Matrix iteration?

Ok some background
I have been working on this project, which I had started back in college, (no longer in school but want to expand on it to help me improve my understanding of C++). I digress... The problem is to find the Best path through a matrix. I generate a matrix filled with a set integer value lets say 9. I then create a path along the outer edge (Row 0, Col length-1) so that all values along it are 1.
The goal is that my program will run through all the possible paths and determine the best path. To simplify the problem I decide to just calculate the path SUM and then compare that to what the SUM computed by the application.
(The title is miss leading S=single-thread P=multi-threads)
OK so to my question.
In one section the algorithm does some simple bit-wise shifts to come up with the bounds for iteration. My question is how exactly do these shifts work so that the entire matrix (or MxN array) is completely traversed?
void AltitudeMapPath::bestPath(unsigned int threadCount, unsigned int threadIndex) {
unsigned int tempPathCode;
unsigned int toPathSum, toRow, toCol;
unsigned int fromPathSum, fromRow, fromCol;
Coordinates startCoord, endCoord, toCoord, fromCoord;
// To and From split matrix in half along the diagonal
unsigned int currentPathCode = threadIndex;
unsigned int maxPathCode = ((unsigned int)1 << (numRows - 1));
while (currentPathCode < maxPathCode) {
tempPathCode = currentPathCode;
// Setup to path iteration
startCoord = pathedMap(0, 0);
toPathSum = startCoord.z;
toRow = 0;
toCol = 0;
// Setup from path iteration
endCoord = pathedMap(numRows - 1, numCols - 1);
fromPathSum = endCoord.z;
fromRow = numRows - 1;
fromCol = numCols - 1;
for (unsigned int index = 0; index < numRows - 1; index++) {
if (tempPathCode % 2 == 0) {
toCol++;
fromCol--;
}
else {
toRow++;
fromRow--;
}
toCoord = pathedMap(toRow, toCol);
toPathSum += toCoord.z;
fromCoord = pathedMap(fromRow, fromCol);
fromPathSum += fromCoord.z;
tempPathCode = tempPathCode >> 1;
}
if (toPathSum < bestToPathSum[threadIndex][toRow]) {
bestToPathSum[threadIndex][toRow] = toPathSum;
bestToPathCode[threadIndex][toRow] = currentPathCode;
}
if (fromPathSum < bestFromPathSum[threadIndex][fromRow]) {
bestFromPathSum[threadIndex][fromRow] = fromPathSum;
bestFromPathCode[threadIndex][fromRow] = currentPathCode;
}
currentPathCode += threadCount;
}
}
I simplified the code since all the extra stuff just detracts from the question. Also if people are wondering I wrote most of the application but this idea of using the bit-wise operators was given to me by my past instructor.
Edit:
I added the entire algorithm for which each thread executes on. The entire project is still a work a progress but here is the source code for the whole thing if any one is interested [GITHUB]
A right bit shift is equivalent to dividing by 2 to the power of the number of bits shifted. IE 1 >> 2 = 1 / (2 ^ 2) = 1 / 4
A left bit shift is equivalent to multiplying by 2 to the power of the number of bits shifted. IE 1 << 2 = 1 * 2 ^ 2 = 1 * 4
I'm not entirely sure what that algorithm does and why it needs to multiply by 2^ (num rows - 1) and then progressively divide by 2.

Sort Function Deleting portion of array, C

I have this very weird issue going on. The function you will see in a moment is supposed to ensure that all elements (chars) in a 2D array are at there utmost position, that is, there is no empty space above any of the characters. For instance a board could look like this:
1 X * * X ^
2 * X ^ *
3 o o * X ^
4 o ^ X X X
5 ^ * X * ^
1 2 3 4 5
And there is an issue at (2,1) because there is an empty space above a non empty space.
My function does the sorting correctly but it deletes any character in the bottom row that has an empty space above it. I cannot, for the life of me, figure out why. Here is my sort function.
int bb_float_one_step(BBoard board){
int i,j;
for (i = 0; i < board->rows; i++){
for (j = 0; j < board->cols; j++){
if (board->boardDim[i][j] == None && (board->boardDim[i + 1][j] != None && i + 1 <= board->rows)){
char tmp = board->boardDim[i + 1][j];
board->boardDim[i + 1][j] = board->boardDim[i][j];
board->boardDim[i][j] = tmp;
}
}
}
for (i = 0; i < board->rows; i++){
for (j = 0; j < board->cols; j++){
printf("%c",board->boardDim[i][j]);}printf("\n");}
}
Below Is a picture of the full sequence, The Program prints a board. The user is asked to select a region to 'pop.' A function then replaces all the characters that are connected with a blank space. Then in the last portion of the picture you can see how the characters are deleted. The board that doesn't have a border is there because I was using it to check if the characters actually were deleted or not.
Thank you in advanced for 1, reading this whole post, and 2, any help you can give.
Since you are comparing current row with next row you should use for(i = 0; i < board->rows-1; i++)
Then in your complex if statement, get rid of && i <= board->rows. That should have been a less-than anyway, not less-than-or-equals. You're going out of bounds and getting garbage in your array.
You are checking the row beyond the maximum number of rows.
(board->boardDim[i + 1][j] != None && i + 1 <= board->rows)
That memory is not guaranteed to be 0. If it is not 0, your function will swap it in. If it is not human readable, printf won't print anything for it thereby shifting the | to the left.

Discrete-event Simulation Algorithm 1.2.1 in C++

I'm currently trying to work and extend on the Algorithm given in "Discrete-event Simulation" text book pg 15. My C++ knowledge is limited, It's not homework problem just want to understand how to approach this problem in C++ & understand what going.
I want to be able to compute 12 delays in a single-server FIFO service node.
Algorithm in the book is as follow:
Co = 0.0; //assumes that a0=0.0
i = 0;
while (more jobs to process) {
i++;
a_i = GetArrival ();
if (a_i < c_i - 1)
d_i = c_i - 1 - a_i; //calculate delay for job i
else
d_i = 0.0; // job i has no delay
s_i = GetService ();
c_i = a_i + d_i + s_i; // calculate departure time for job i
}
n = i;
return d_1, d_2,..., d_n
The GetArrival and GetService procedures read the next arrival and service time from a file.
Just looking at the pseudo-code, it seems that you just need one a which is a at step i, one c which is c at step i-1, and an array of ds to store the delays. I'm assuming the first line in your pseudo-code is c_0 = 0 and not Co = 0, other wise the code doesn't make a lot of sense.
Now here is a C++-ized version of the pseudo-code:
std::vector<int> d;
int c = 0;
int a, s;
while(!arrivalFile.eof() && !serviceFile.eof())
{
arrivalFile >> a;
int delay = 0;
if (a < c)
delay = c - a;
d.push_back(delay);
serviceFile >> s;
c = a + delay + s;
}
return d;
If I understand the code right, d_1, d_2, ..., d_n are the delays you have, number of delays depends on number of jobs to process. while (more jobs to process)
thus if you have 12 processes you will have 12 delays.
In general if arrival time is less than previous departure time then the delay is the previous departure time - current arrival time
if (a_i < c_i-1)
d_i = c_i-1 - a_i;
the first departure time is set to zero
if something is not clear let me know

Party schedule - classical knapsack

The input is a available budget, number of parties, the ticket-price of each party and the amount of fun on the party. The task is to output the maximum possible fun with available budget and the budget used. If you can choose between two parties with the same fun, choose the cheaper one. (It is a SPOJ problem.)
I created two arrays:
m[i][j] is the maximum fun to get from all parties up to i with
budget j
p[i][j] minimum price to py to get max. fun from parties
up to i with budget j
Then, for each i up to #parties and for each j up to budget I calculated the value of m[i][j] and p[i][j] like this:
for(T i = 1; i <= parties; i++) {
for(T j = 0; j <= budget; j++) {
//We get more fun by attending party i
if(price[i] <= j && m[i-1][j-price[i]] + fun[i] > m[i-1][j]) {
m[i][j] = m[i-1][j-price[i]] + fun[i];
p[i][j] = p[i-1][j-price[i]] + price[i];
//We get same fun by attending i, but more cheaply
} else if(price[i] <= j && m[i-1][j-price[i]] + fun[i] == m[i-1][j] && p[i-1][j-price[i]] + price[i] < p[i-1][j]) {
m[i][j] = m[i-1][j-price[i]] + fun[i];
p[i][j] = p[i-1][j-price[i]] + price[i];
//We can't visit the party
} else {
m[i][j] = m[i-1][j];
p[i][j] = p[i-1][j];
}
}
}
For any test case I found (I may share some if needed), this algorithm outputs the same answer as the algorithms approved by the online judge. However, this one is not approved.
What is wrong with the algorithm?
Here is the complete program.
I checked your complete code without going through your logic, but there were some must-be-wrong points:
You decalred your price array inside function as price[parties], which only allows price[0..parties - 1], but you used up to price[parties], same as fun[];
The condition for your while is while(scanf("%u %u",&budget,&parties), budget != 0 && parties != 0), however budget can be 0 even in a valid input, so your program may terminate earlier than expected;
You declared your m[][] and p[][] inside function but didn't initialize it, so it would be filled up with rubbish values;
You print the answer using printf("%u %u"), but the problem requires a new line for each output, so here should be printf("%u %u\n").
After I changed these 4 "bugs" in your program, it gets accepted :) So your algorithm logic is approved, but some "irrelevant" things prevents you from getting accepted. Don't look down on these "details", they DO count!

League fixture algorithm explanation

this works for me but I don't understand how it works at all. Could anyone explain?
for(int round = 0; round < rounds_count; round++)
{
for(int match = 0; match < matches_per_round; match++)
{
int home = (round + match) % (teams_count - 1);
int away = (teams_count - 1 - match + round) % (teams_count - 1);
if(match == 0)
away = teams_count - 1;
matches.push_back(Match(&teams[home], &teams[away], round));
}
}
What's the trick with modulo?
I'm not sure why this would be using teams_count-1 instead of teams_count, but in general, the modulus is making it "wrap around" so that if round+match is greater than the last team number, it will wrap back ground to one of the first teams instead of going past the last team.
The way away is handled, is a bit special. The % operator doesn't wrap around the way you want when you have negative numbers. For example -1 % 5 gives you -1 instead of 4. A trick to get around this problem is to add your divisor. (-1+5)%5 gives you 4.
Let's rework the code a little to make it clearer. First I'll use another variable n to represent the number of teams (again I'm not sure why teams_count-1 is used for this in your code):
int n = teams_count-1;
int home = (round + match) % n;
int away = (n - match + round) % n;
Then I'll reorganize the away calculation a little:
int n = teams_count-1;
int home = (round + match) % n;
int away = (round - match + n) % n;
It should now be clearer that the home team is starting with the current round and then adding the match, while the away team is starting with the current round and subtracting the match. The % n makes it wrap around, and the + n for away makes it wrap around properly with negative numbers