I'm looping through a text file, reading each paragraph into a string. I want to process any paragraphs that contain a year, but if no year is found then I want to continue looping through the file. When a year is found, I want to know the index where that year was found.
I'm trying to avoid any boost or regex code for simplicity. I also assume that the only years of interest will be in the 1900s and 2000s, for simplicity. I tried the following code, but the wildcard characters were not working for some reason. Is is because wildcard characters do not work for numbers?
string sParagraph = "Aramal et al. (2011), Title";
int iIndex;
if (sParagraph.find("19??")!=string::npos)
iIndex = sParagraph.find("19??");
else if (sParagraph.find("20??")!=string::npos)
iIndex = sParagraph.find("20??");
else
continue;
Without using regex or boost code you might be making your code more readable, but it will not be more simple.
A "simple" one-pass pseudo algorithm:
map<int, std::vector<int>> years;
String par = " ... "
//inefficient but didn't want to have to add more complicated code
//in the while loop. Just want to solution to be clear
int par_index = par.find_first_of("19");
if(par_index == string::npos)
par_index = par.find_first_of("20");
if(par_index == string::npos)
//skip //No years in this paragraph
while(par_index < par.size()) {
string year(par, par_index, 4);
int year = atoi(year.c_str()); //or use C++11 stoi
if(2100 < year && year >= 1900)
years.at(year).push_back(par_index);
par_index += 4;
}
This will create a map where the key is the year and the value is a vector of ints that represent the index that the year landed on.
EDIT: I just reread the question and noticed that this answer might be too irrelevant. Sorry if it is.
I was looking for something similar a couple days ago. My approach might be very (very very) inefficient: I looped through the entire string and used 'atoi()' to see whether each group of four characters was a year.
for (int i = 0; i < txt.length() - 3; i++)
{
string t = txt.substr(i, 4); //Take a group of four characters.
int year = atoi((char*)t.c_str());
if (year > 1800 && year < 3000)
{
break;
}
else year = 0;
}
At the end, 'year' is either zero or an actual year.
So of course you can do this. But it will not be simpler, it will be more complex.
Anyway, this is probably your best non-regex solution. It uses a string::iterator not a position:
string sParagraph = "Aramal et al. (2011), Title";
auto iIndex = adjacent_find(sParagraph.begin(), sParagraph.end(), [](char i, char j){return i == '1' && j == '9' || i == '2' && j == '0'; });
const auto end = next(sParagraph.end(), -3);
while (iIndex < end && (isdigit(static_cast<int>(*next(iIndex, 2))) == false || isdigit(static_cast<int>(*next(iIndex, 3))) == false)){
iIndex = adjacent_find(next(iIndex, 4), sParagraph.end(), [](char i, char j){return i == '1' && j == '9' || i == '2' && j == '0'; });
}
To use this you'll need to check if you have iterated to end:
if(iIndex < end){
continue;
}
Just for purpose of comparison, you can use a regex_search to determine if the year exists:
string sParagraph = "Aramal et al. (2011), Title";
smatch iIndex;
if (!regex_search(sParagraph, iIndex, regex("(?:19|20)\\d{2}"))){
continue;
}
An smatch contains far more information and just a position, but if you want the index of the start of the year you can do: iIndex.position()
A common pitfall for people who are not familiar with the features of C++11 is to say: "I don't understand how to use this stuff, it must be more complicated than what I already know." And then go back to what they already know. Don't make that mistake, use the regex_search.
Related
This Leetcode problem is about how to match a pattern string against a text string as efficiently as possible. The pattern string can consists of letters, dots, and stars, where a letter only matches itself, a dot matches any individual character, and a star matches any number of copies of the preceding character. For example, the pattern
ab*c.
would match ace and abbbbcc. I know that it's possible to solve this original problem using dynamic programming.
My question is whether it's possible to see whether two patterns match one another. For example, the pattern
bdbaa.*
can match
bdb.*daa
Is there a nice algorithm for solving this pattern-on-pattern matching problem?
Here's one approach that works in polynomial time. It's slightly heavyweight and there may be a more efficient solution, though.
The first observation that I think helps here is to reframe the problem. Rather than asking whether these patterns match each other, let's ask this equivalent question:
Given patterns P1 and P2, is there a string w where P1 and P2 each match w?
In other words, rather than trying to get the two patterns to match one another, we'll search for a string that each pattern matches.
You may have noticed that the sorts of patterns you're allowed to work with are a subset of the regular expressions. This is helpful, since there's a pretty elaborate theory of what you can do with regular expressions and their properties. So rather than taking aim at your original problem, let's solve this even more general one:
Given two regular expressions R1 and R2, is there a string w that both R1 and R2 match?
The reason for solving this more general problem is that it enables us to use the theory that's been developed around regular expressions. For example, in formal language theory we can talk about the language of a regular expression, which is the set of all strings that the regex matches. We can denote this L(R). If there's a string that's matched by two regexes R1 and R2, then that string belongs to both L(R1) and L(R2), so our question is equivalent to
Given two regexes R1 and R2, is there a string w in L(R1) ∩ L(R2)?
So far all we've done is reframe the problem we want to solve. Now let's go solve it.
The key step here is that it's possible to convert any regular expression into an NFA (a nondeterministic finite automaton) so that every string matched by the regex is accepted by the NFA and vice-versa. Even better, the resulting NFA can be constructed in polynomial time. So let's begin by constructing NFAs for each input regex.
Now that we have those NFAs, we want to answer this question: is there a string that both NFAs accept? And fortunately, there's a quick way to answer this. There's a common construction on NFAs called the product construction that, given two NFAs N1 and N2, constructs a new NFA N' that accepts all the strings accepted by both N1 and N2 and no other strings. Again, this construction runs in polynomial time.
Once we have N', we're basically done! All we have to do is run a breadth-first or depth-first search through the states of N' to see if we find an accepting state. If so, great! That means there's a string accepted by N', which means that there's a string accepted by both N1 and N2, which means that there's a string matched by both R1 and R2, so the answer to the original question is "yes!" And conversely, if we can't reach an accepting state, then the answer is "no, it's not possible."
I'm certain that there's a way to do all of this implicitly by doing some sort of implicit BFS over the automaton N' without actually constructing it, and it should be possible to do this in something like time O(n2). If I have some more time, I'll revisit this answer and expand on how to do that.
I have worked on my idea of DP and came out with the below implementation of the above problem. Please feel free to edit the code in case someone finds any test cases failed. From my side, I tried few test cases and passed all of them, which I will be mentioning below as well.
Please note that I have extended the idea which is used to solve the regex pattern matching with a string using DP. To refer to that idea, please refer to the LeetCode link provided in the OP and look out for discussion part. They have given the explanation for regex matching and the string.
The idea is to create a dynamic memoization table, entries of which will follow the below rules:
If pattern1[i] == pattern2[j], dp[i][j] = dp[i-1][j-1]
If pattern1[i] == '.' or pattern2[j] == '.', then dp[i][j] = dp[i-1][j-1]
The trick lies here: If pattern1[i] = '*', then if dp[i-2][j] exists, then
dp[i][j] = dp[i-2][j] || dp[i][j-1] else dp[i][j] = dp[i][j-1].
If pattern2[j] == '*', then if pattern1[i] == pattern2[j-1], then
dp[i][j] = dp[i][j-2] || dp[i-1][j]
else dp[i][j] = dp[i][j-2]
pattern1 goes row-wise and pattern2 goes column-wise. Also, please note that this code should also work for normal regex pattern matching with any given string. I have verified it by running it on LeetCode and it passed all the available test cases there!
Below is the complete working implementation of the above logic:
boolean matchRegex(String pattern1, String pattern2){
boolean dp[][] = new boolean[pattern1.length()+1][pattern2.length()+1];
dp[0][0] = true;
//fill up for the starting row
for(int j=1;j<=pattern2.length();j++){
if(pattern2.charAt(j-1) == '*')
dp[0][j] = dp[0][j-2];
}
//fill up for the starting column
for(int j=1;j<=pattern1.length();j++){
if(pattern1.charAt(j-1) == '*')
dp[j][0] = dp[j-2][0];
}
//fill for rest table
for(int i=1;i<=pattern1.length();i++){
for(int j=1;j<=pattern2.length();j++){
//if second character of pattern1 is *, it will be equal to
//value in top row of current cell
if(pattern1.charAt(i-1) == '*'){
dp[i][j] = dp[i-2][j] || dp[i][j-1];
}
else if(pattern1.charAt(i-1)!='*' && pattern2.charAt(j-1)!='*'
&& (pattern1.charAt(i-1) == pattern2.charAt(j-1)
|| pattern1.charAt(i-1)=='.' || pattern2.charAt(j-1)=='.'))
dp[i][j] = dp[i-1][j-1];
else if(pattern2.charAt(j-1) == '*'){
boolean temp = false;
if(pattern2.charAt(j-2) == pattern1.charAt(i-1)
|| pattern1.charAt(i-1)=='.'
|| pattern1.charAt(i-1)=='*'
|| pattern2.charAt(j-2)=='.')
temp = dp[i-1][j];
dp[i][j] = dp[i][j-2] || temp;
}
}
}
//comment this portion if you don't want to see entire dp table
for(int i=0;i<=pattern1.length();i++){
for(int j=0;j<=pattern2.length();j++)
System.out.print(dp[i][j]+" ");
System.out.println("");
}
return dp[pattern1.length()][pattern2.length()];
}
Driver method:
System.out.println(e.matchRegex("bdbaa.*", "bdb.*daa"));
Input1: bdbaa.* and bdb.*daa
Output1: true
Input2: .*acd and .*bce
Output2: false
Input3: acd.* and .*bce
Output3: true
Time complexity: O(mn) where m and n are lengths of two regex patterns given. Same will be the space complexity.
You can use a dynamic approach tailored to this subset of a Thompson NFA style regex implementing only . and *:
You can do that either with dynamic programming (here in Ruby):
def is_match(s, p)
return true if s==p
len_s, len_p=s.length, p.length
dp=Array.new(len_s+1) { |row| [false] * (len_p+1) }
dp[0][0]=true
(2..len_p).each { |j| dp[0][j]=dp[0][j-2] && p[j-1]=='*' }
(1..len_s).each do |i|
(1..len_p).each do |j|
if p[j-1]=='*'
a=dp[i][j - 2]
b=[s[i - 1], '.'].include?(p[j-2])
c=dp[i - 1][j]
dp[i][j]= a || (b && c)
else
a=dp[i - 1][j - 1]
b=['.', s[i - 1]].include?(p[j - 1])
dp[i][j]=a && b
end
end
end
dp[len_s][len_p]
end
# 139 ms on Leetcode
Or recursively:
def is_match(s,p,memo={["",""]=>true})
if p=="" && s!="" then return false end
if s=="" && p!="" then return p.scan(/.(.)/).uniq==[['*']] && p.length.even? end
if memo[[s,p]]!=nil then return memo[[s,p]] end
ch, exp, prev=s[-1],p[-1], p.length<2 ? 0 : p[-2]
a=(exp=='*' && (
([ch,'.'].include?(prev) && is_match(s[0...-1], p, memo) ||
is_match(s, p[0...-2], memo))))
b=([ch,'.'].include?(exp) && is_match(s[0...-1], p[0...-1], memo))
memo[[s,p]]=(a || b)
end
# 92 ms on Leetcode
In each case:
The operative starting point in the string and pattern is at the second character looking for * and matches one character back for as long as s matches the character in p prior to the *
The meta character . is being used as a fill in for the actual character. This allows any character in s to match . in p
You can solve this with backtracking too, not very efficiently (because the match of the same substrings may be recalculated many times, which could be improved by introducing a lookup table where all non-matching pairs of strings are saved and the calculation only happens when they cannot be found in the lookup table), but seems to work (js, the algorithm assumes the simple regex are valid, which means not beginning with * and no two adjacent * [try it yourself]):
function canBeEmpty(s) {
if (s.length % 2 == 1)
return false;
for (let i = 1; i < s.length; i += 2)
if (s[i] != "*")
return false;
return true;
}
function match(a, b) {
if (a.length == 0 || b.length == 0)
return canBeEmpty(a) && canBeEmpty(b);
let x = 0, y = 0;
// process characters up to the next star
while ((x + 1 == a.length || a[x + 1] != "*") &&
(y + 1 == b.length || b[y + 1] != "*")) {
if (a[x] != b[y] && a[x] != "." && b[y] != ".")
return false;
x++; y++;
if (x == a.length || y == b.length)
return canBeEmpty(a.substr(x)) && canBeEmpty(b.substr(y));
}
if (x + 1 < a.length && y + 1 < b.length && a[x + 1] == "*" && b[y + 1] == "*")
// star coming in both strings
return match(a.substr(x + 2), b.substr(y)) || // try skip in a
match(a.substr(x), b.substr(y + 2)); // try skip in b
else if (x + 1 < a.length && a[x + 1] == "*") // star coming in a, but not in b
return match(a.substr(x + 2), b.substr(y)) || // try skip * in a
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x), b.substr(y + 1))); // try skip char in b
else // star coming in b, but not in a
return match(a.substr(x), b.substr(y + 2)) || // try skip * in b
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x + 1), b.substr(y))); // try skip char in a
}
For a little optimization you could normalize the strings first:
function normalize(s) {
while (/([^*])\*\1([^*]|$)/.test(s) || /([^*])\*\1\*/.test(s)) {
s = s.replace(/([^*])\*\1([^*]|$)/, "$1$1*$2"); // move stars right
s = s.replace(/([^*])\*\1\*/, "$1*"); // reduce
}
return s;
}
// example: normalize("aa*aa*aa*bb*b*cc*cd*dd") => "aaaa*bb*ccc*ddd*"
There is a further reduction of the input possible: x*.* and .*x* can both be replaced by .*, so to get the maximal reduction you would have to try to move as many stars as possible next to .* (so moving some stars to the left can be better than moving all to the right).
IIUC, you are asking: "Can a regex pattern match another regex pattern?"
Yes, it can. Specifically, . matches "any character" which of course includes . and *. So if you have a string like this:
bdbaa.*
How could you match it? Well, you could match it like this:
bdbaa..
Or like this:
b.*
Or like:
.*ba*.*
I am currently finishing up an assignment I have to complete for my OOP class and I am struggling with 1 part in particular. Keep in mind I am still a beginner. The question is as followed:
If the string contains 13 characters, all of characters are digits and the check digit is modulo 10, this function returns true; false otherwise.
This is in regards to a EAN. I basically have to separate every second digit from the rest digits. for example 9780003194876 I need to do calculations with 7,0,0,1,4,7. I have no clue about doing this.
Any help would be greatly appreciated!
bool isValid(const char* str){
if (atoi(str) == 13){
}
return false;
}
You can start with a for loop which increments itself by 2 for each execution:
for (int i = 1, len = strlen(str); i < len; i += 2)
{
int digit = str[i] - '0';
// do something with digit
}
The above is just an example though...
Since the question was tagged as C++ (Not C, so I suggest other answerers to not solve this using C libraries, please. Let us getting OP's C++ knoweledge in the right way since the beggining), and is an OOP class I'm going to solve this with the C++ way: Use the std::string class:
bool is_valid( const std::string& str )
{
if( str.size() == 13 )
{
for( std::size_t i = 0 ; i < 13 ; i += 2 )
{
int digit = str[i] - '0';
//Do what you wan't with the digit
}
}
else
return false;
}
First, if it's EAN, you have to process every digit, not just
every other one. In fact, all you need to do is a weighted sum
of the digits; for EAN-13, the weigths alternate between 1 and
3, starting with three. The simplest solution is probably to
put them in a table (i.e. int weigt[] = { 1, 3, 1, 3... };,
and iterate over the string (in this case, using an index rather
than iterators, since you want to be able to index into
weight as well), converting each digit into a numerical value
(str[i] - '0', if isdigit(static_cast<unsigned char>(str[i])
is true; if it's false, you haven't got a digi.), then
multiplying it by the running total. When you're finished, if
the total, modulo 10, is 0, it's correct. Otherwise, it isn't.
You certainly don't want to use atoi, since you don't want the
numerical value of the string; you want to treat each digit
separately.
Just for the record, professionally, I'd write something like:
bool
isValidEAN13( std::string const& value )
{
return value.size() == 13
&& std::find_if(
value.begin(),
value.end(),
[]( unsigned char ch ){ return !isdigit( ch ); } )
== value.end()
&& calculateEAN13( value ) == value.back() - '0';
}
where calculateEAN13 does the actual calculations (and can be
used for both generation and checking). I suspect that this
goes beyond the goal of the assignment, however, and that all
your teacher is looking for is the calculateEAN13 function,
with the last check (which is why I'm not giving it in full).
This question already has answers here:
Given a word and a text, we need to return the occurrences of anagrams
(6 answers)
Closed 9 years ago.
For eg. word is for and the text is forxxorfxdofr, anagrams of for will be ofr, orf, fro, etc. So the answer would be 3 for this particular example.
Here is what I came up with.
#include<iostream>
#include<cstring>
using namespace std;
int countAnagram (char *pattern, char *text)
{
int patternLength = strlen(pattern);
int textLength = strlen(text);
int dp1[256] = {0}, dp2[256] = {0}, i, j;
for (i = 0; i < patternLength; i++)
{
dp1[pattern[i]]++;
dp2[text[i]]++;
}
int found = 0, temp = 0;
for (i = 0; i < 256; i++)
{
if (dp1[i]!=dp2[i])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
for (i = 0; i < textLength - patternLength; i++)
{
temp = 0;
dp2[text[i]]--;
dp2[text[i+patternLength]]++;
for (j = 0; j < 256; j++)
{
if (dp1[j]!=dp2[j])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
}
return found;
}
int main()
{
char pattern[] = "for";
char text[] = "ofrghofrof";
cout << countAnagram(pattern, text);
}
Does there exist a faster algorithm for the said problem?
Most of the time will be spent searching, so to make the algorithm more time efficient, the objective is to reduce the quantities of searches or optimize the search.
Method 1: A table of search starting positions.
Create a vector of lists, one vector slot for each letter of the alphabet. This can be space-optimized later.
Each slot will contain a list of indices into the text.
Example text: forxxorfxdofr
Slot List
'f' 0 --> 7 --> 11
'o' 1 --> 5 --> 10
'r' 2 --> 6 --> 12
For each word, look up the letter in the vector to get a list of indexes into the text. For each index in the list, compare the text string position from the list item to the word.
So with the above table and the word "ofr", the first compare occurs at index 1, second compare at index 5 and last compare at index 10.
You could eliminate near-end of text indices where (index + word length > text length).
You can use the commutativity of multiplication, along with uniqueness of primal decomposition. This relies on my previous answer here
Create a mapping from each character into a list of prime numbers (as small as possible). For e.g. a-->2, b-->3, c-->5, etc.. This can be kept in a simple array.
Now, convert the given word into the multiplication of the primes matching each of its characters. This results will be equal to a similar multiplication of any anagram of that word.
Now sweep over the array, and at any given step, maintain the multiplication of the primes matching the last L characters (where L is the length of your word). So every time you advance you do
mul = mul * char2prime(text[i]) / char2prime(text[i-L])
Whenever this multiplication equals that of your word - increment the overall counter, and you're done
Note that this method would work well on short words, but the primes multiplication can overflow a 64b var pretty fast (by ~9-10 letters), so you'll have to use a large number math library to support longer words.
This algorithm is reasonably efficient if the pattern to be anagrammed is so short that the best way to search it is to simply scan it. To allow longer patterns, the scans represented here by the 'for jj' and 'for mm' loops could be replaced by more sophisticated search techniques.
// sLine -- string to be searched
// sWord -- pattern to be anagrammed
// (in this pseudo-language, the index of the first character in a string is 0)
// iAnagrams -- count of anagrams found
iLineLim = length(sLine)-1
iWordLim = length(sWord)-1
// we need a 'deleted' marker char that will never appear in the input strings
chNil = chr(0)
iAnagrams = 0 // well we haven't found any yet have we
// examine every posn in sLine where an anagram could possibly start
for ii from 0 to iLineLim-iWordLim do {
chK = sLine[ii]
// does the char at this position in sLine also appear in sWord
for jj from 0 to iWordLim do {
if sWord[jj]=chK then {
// yes -- we have a candidate starting posn in sLine
// is there an anagram of sWord at this position in sLine
sCopy = sWord // make a temp copy that we will delete one char at a time
sCopy[jj] = chNil // delete the char we already found in sLine
// the rest of the anagram would have to be in the next iWordLim positions
for kk from ii+1 to ii+iWordLim do {
chK = sLine[kk]
cc = false
for mm from 0 to iWordLim do { // look for anagram char
if sCopy[mm]=chK then { // found one
cc = true
sCopy[mm] = chNil // delete it from copy
break // out of 'for mm'
}
}
if not cc then break // out of 'for kk' -- no anagram char here
}
if cc then { iAnagrams = iAnagrams+1 }
break // out of 'for jj'
}
}
}
-Al.
I have to write 2 functions. One that takes in a date as a string and checks if its in mm/dd/yy format; if its not in the correct format, it should be edited to make it so. The other function should convert the validated date to the format "Month dd, 20yy".
I'm pretty sure I can take care of the second function, but I am having trouble with the first one. I just have no idea how to check if its in that format... any ideas?
I thought that this would work, but it doesn't seem to...
Updated code:
bool dateValidation(string shipDate)
{
string temp;
if(shipDate.length() == 8 )
{
if(shipDate[2] == '/' && shipDate[5] =='/')
{
int tempDay, tempMonth, tempYear;
//Gather month
temp = shipDate[0];
temp += shipDate[1];
//convert string to int
tempMonth = temp.atoi;
temp = "";
//Gather day
temp = shipDate[3];
temp += shipDate[4];
//convert string to int
tempDay = temp.atoi;
temp = "";
//Gather year
temp = shipDate[6];
temp += shipDate[7];
//convert string to int
tempYear = temp.atoi;
temp = "";
if(tempMonth > 0 && tempMonth <= 12)
{
if(tempMonth == 9 ||
tempMonth == 4 ||
tempMonth == 6 ||
tempMonth == 11 ||)
{
if(tempDay > 0 && tempDay <= 30)
{
if 30 days
}
}
else if(tempMonth == 2)
{
if(tempDay > 0 && tempDay <= 28)
{
if 28 days
}
}
else
{
if(tempDay > 0 && tempDay <= 31)
{
if 31 days
}
}
}
}
}
}
There are 4 things you want to check:
Is there 8 characters ? If not, then don't even bother checking anything else. It's not in the proper format.
Are the third and fifth characters '/'. If not, then you still don't have the proper format.
Check each pair for its valid values. A month has days between 1 and
31 at most, there are no more than 12 months and months range from 01
to 12. A year can be any combination of any 2 digits.
This should take care of the format, but if you want to make sure that the date is valid:
Check for valid number of days in each month (january 31, february
28-29...) and indeed check for those leap years.
This looks a lot like a project I am about to grade.... You should verify that it is Gregorian Calendar compliant if it is the project I am about to grade. 1/1/2012 is definitely valid though so what you may want to do and what I would hope you consider is creating a switch statement that examines for formats like 1/12/2012 and 10/2/2012 because these are valid. Then parse out the month day and year from these. Then verify that they are within the limit of the Gregorian calendar. If it is for a class which I would guess that it is, you should consider writing the verification as a separate function from the parsing function.
So first ask whether the date is too long if not, is it too short, if not which version is it, then pass the d m y to the verification function. This kind of modularity will simplify your code and reduce instructions.
something like
bool dateValidation(string shipDate)
{
string temp;
switch(shipDate.length())
{
case(10):
// do what your doing
verify(m,d,y);
break;
case(8):
//dealing with single digits
// verify 1 and 3 are '/' and the rest are numbers
verifiy(m,d,y);
break;
case(9):
//a little more heavy lifting here
// but its good thinking for a new programmer
verifiy(m,d,y);
break;
default:
//fail message
break;
}
I am trying to solve the following code jam question,ive made some progress but for few cases my code give wrong outputs..
Welcome to Code jam
So i stumbled on a solution by dev "rem" from russia.
I've no idea how his/her solution is working correctly.. the code...
const string target = "welcome to code jam";
char buf[1<<20];
int main() {
freopen("input.txt", "rt", stdin);
freopen("output.txt", "wt", stdout);
gets(buf);
FOR(test, 1, atoi(buf)) {
gets(buf);
string s(buf);
int n = size(s);
int k = size(target);
vector<vector<int> > dp(n+1, vector<int>(k+1));
dp[0][0] = 1;
const int mod = 10000;
assert(k == 19);
REP(i, n) REP(j, k+1) {// Whats happening here
dp[i+1][j] = (dp[i+1][j]+dp[i][j])%mod;
if (j < k && s[i] == target[j])
dp[i+1][j+1] = (dp[i+1][j+1]+dp[i][j])%mod;
}
printf("Case #%d: %04d\n", test, dp[n][k]);
}
exit(0);
}//credit rem
Can somebody explain whats happening in the two loops?
Thanks.
What he is doing: dynamic programming, this far you can see too.
He has 2D array and you need to understand what is its semantics.
The fact is that dp[i][j] counts the number of ways he can get a subsequence of the first j letters of welcome to code jam using all the letters in the input string upto the ith index. Both indexes are 1 -based to allow for the case of not taking any letters from the strings.
For example if the input is:
welcome to code jjam
The values of dp in different situations are going to be:
dp[1][1] = 1; // first letter is w. perfect just the goal
dp[1][2] = 0; // no way to have two letters in just one-letter string
dp[2][2] = 1; // again: perfect
dp[1][2] = 1; // here we ignore the e. We just need the w.
dp[7][2] = 2; // two ways to construct we: [we]lcome and [w]elcom[e].
The loop you are specifically asking about calculates new dynamic values based on the already calculated ones.
Whoa, I was practicing this problem few days ago and and stumbled across this question.
I suspect that saying "he's doing dynamic programming" won't not explain too much if you did not study DP.
I can give clearer implementation and easier explanation:
string phrase = "welcome to code jam"; // S
string text; getline(cin, text); // T
vector<int> ob(text.size(), 1);
int ans = 0;
for (int p = 0; p < phrase.size(); ++p) {
ans = 0;
for (int i = 0; i < text.size(); ++i) {
if (text[i] == phrase[p]) ans = (ans + ob[i]) % 10000;
ob[i] = ans;
}
}
cout << setfill('0') << setw(4) << ans << endl;
To solve the problem if S had only one character S[0] we could just count number of its occurrences.
If it had only two characters S[0..1] we see that each occurrence T[i]==S[1] increases answer by the number of occurrences of S[0] before index i.
For three characters S[0..2] each occurrence T[i]==S[2] similarly increases answer by number of occurrences of S[0..1] before index i. This number is the same as the answer value at the moment the previous paragraph had processed T[i].
If there were four characters, the answer would be increasing by number of occurrences of the previous three before each index at which fourth character is found, and so on.
As every other step uses values from the previous ones, this can be solved incrementally. On each step p we need to know number of occurrences of previous substring S[0..p-1] before any index i, which can be kept in array of integers ob of the same length as T. Then the answer goes up by ob[i] whenever we encounter S[p] at i. And to prepare ob for the next step, we also update each ob[i] to be the number of occurrences of S[0..p] instead — i.e. to the current answer value.
By the end the latest answer value (and the last element of ob) contain the number of occurrences of whole S in whole T, and that is the final answer.
Notice that it starts with ob filled with ones. The first step is different from the rest; but counting number of occurrences of S[0] means increasing answer by 1 on each occurrence, which is what all other steps do, except that they increase by ob[i]. So when every ob[i] is initially 1, the first step will run just like all others, using the same code.