Can you zip all pairs in Chapel? - chapel

Suppose, just for a minute, then get back to your day, that I have two string domains in Chapel and I want to get a string domain of all pairs.
var dd: domain(string), sd: domain(string);
dd += "Monday"; dd+="Tuesday"; dd+="Wednesday";
sd += "Rainy"; sd+= "Sunny";
var crossDomain = // 6 strings like "Monday-Rainy", "Monday-Sunny"
I can do nest for loops, but I'm wondering if there is a more succinct way.

I think nested for or for loops are a fine bet:
for day in dd do
for cond in sd do
crossDomain += (day + "-" + cond);
An alternative would be to write an iterator to factor the loops away from the code:
for (day,cond) in allpairs(dd, sd) do
crossDomain += (day + "-" + cond);
iter allpairs(d1, d2) {
for i in d1 do
for j in d2 do
yield (i,j);
}
This requires slightly more code, but has the advantage that if you were to do this all-pairs pattern multiple times in your code, you could re-use the iterator multiple times. Note that the iterator could also yield the concatenated strings directly if you preferred.

Related

Elm - Executing multiple lines per if branch

For example, in one branch, I want see how many times a number is divisible by 1000, then pass the starting number less that amount into the function recursively. This is what I have written:
if num // 1000 > 0 then
repeat (num // 1000) (String.fromChar 'M')
convertToRom (num % 1000)
However, I get the following error in the REPL when testing:
> getRomNums 3500
-- TYPE MISMATCH ----------------------------------------- .\.\RomanNumerals.elm
Function `repeat` is expecting 2 arguments, but was given 4.
34| repeat (num // 1000) (String.fromChar 'M')
35|> convertToRom (num % 1000)
Maybe you forgot some parentheses? Or a comma?
How can I write multiple lines of code for a single if branch?
Unrelated side note: The format system makes the double slash a comment, but in Elm the double slash is integer division. Not sure how to fix that.
In Elm (and other functional languages like Haskell), you don't write code in iterative steps like you do in imperative languages. Every function has to return a value, and every branch of logic has to return a value. There is no single answer around how to "do multiple things" in Elm but with Elm's type system, tuples, and recursion, you'll find that the lack of imperative doesn't really hold you back from anything. It's just a paradigm shift from writing code in an imperative style.
For your purposes of writing a roman numeral conversion function, I think an immediate answer lies in using explicit recursion and string concatenation on the result:
convertToRom : Int -> String
convertToRom num =
if num // 1000 > 0 then
String.repeat (num // 1000) (String.fromChar 'M') ++ convertToRom (num % 1000)
else if ...
else
""
As you grow your functional programming toolset, you'll find yourself explicitly using recursion less and less and relying on higher levels of abstraction like folds and maps.

Huge time difference between STL string compare method and manually written one

The task of the program is to check if a string s2 is a substring of another string (s1 + s1) given s1 and s2 which are equal in length.
For example: [s1, s2] = ["abc", "bca"] should return true while [s1, s2] = ["abc", "bac"] should return false.
and the limits of the length of both strings is 10^5.
using (s1+s1).find(s2) == string::npos took about .1 second to finish.
I implemented it in a naive approach with a complexity of O(n*m) and it took 30 seconds!!
s1 = s1 + s1;
bool f = 0;
for(int i = 0;i < s1.size() - s2.size() + 1;i++){
f = 1;
for(int j = 0;j < s2.size();j++){
if(s1[i+j] != s2[j]){
f = 0;
break;
}
}
if(f)
break;
}
//the answer is f
So I thought that C++ used an algorithm like KMP but the I found out that it is implementation-defind and GNU gcc used only the naive approach with some improvements.
But that's not even the biggest problem.
when I replaced the inner loop with s1.compare(i, s2.size(), s2), it took about the same time as STL find method .1 second.
bool f = 0;
for(int i = 0;i < s1.size() - s2.size() + 1;i++){
if(s1.compare(i, s2.size(), s2) == 0){
f = 1;
break;
}
}
So does C++ compilers implement compare in a different way?
And what is the improvements of the method used by C++ compilers to surpass the naive approach by 300 times while using the same complexity?
NOTE:
The test I used for the previous codes is
s1 = "ab"*50000 + "ca"
s2 = "ab"*50000 + "ac"
As answered in the comments above.
The program was run in a non-optimized debug build and the time reduced to only 3 seconds after switching to release mode.
The remaining difference might be because the runtime library uses some method like memcmp which is heavily optimized compared to looping and comparing characters one by one.

For Loops for a Time Class

I am trying to write a for loop for a time class. Where if the minutes entered are over 60, 60 is subtracted from the total minutes and hours is incremented by 1 until the final minutes left is less than 60 . I was doing if statements like
if (m > 59){
m = m - 60;
h++;
if (m > 59)... etc..
but that doesn't cover every case and I feel like I should know how to do this for loop but I can't figure it out. Any help would be appreciated, thanks
Well if it doesn't have to be implemented using loops, you could do simply
h = m / 60;
m = m % 60;
It is the fastest and cleanest way to do that, I suppose.
Not really sure whether you want to do anything else inside the loops. If so, this won't help you very much.
Edit:
Here is some explanation of how it works.
What m / 60 does is called integer division. It returns floor of the expression. So for example if m = 131 than m / 60 = 2.
The second expression uses the modulo operator. Basically it finds the reminder after division. Back to our example, m % 60 = 11 since m can be written as m = 60 * 2 + 11 = 131. For further information please refer to wiki.
#Jendas has a good simple answer to the overall problem, but if you want to keep with this format but fix your issue with loops, you could put the whole thing in a while loop instead of individual if statements:
while(m >59)
{
m = m - 60;
h++;
// do anything else you need to take care of
}
// finishing statements
h = 0;
while (m >= 60)
{
m = m - 60;
h++;
}
You probably want to use >= 60 instead of 59.
Also, as Jendas rightly suggested you might want to research a little about the modulus operator '%'

Wildcard String Search Algorithm

In my program I need to search in a quite big string (~1 mb) for a relatively small substring (< 1 kb).
The problem is the string contains simple wildcards in the sense of "a?c" which means I want to search for strings like "abc" or also "apc",... (I am only interested in the first occurence).
Until now I use the trivial approach (here in pseudocode)
algorithm "search", input: haystack(string), needle(string)
for(i = 0, i < length(haystack), ++i)
if(!CompareMemory(haystack+i,needle,length(needle))
return i;
return -1; (Not found)
Where "CompareMemory" returns 0 iff the first and second argument are identical (also concerning wildcards) only regarding the amount of bytes the third argument gives.
My question is now if there is a fast algorithm for this (you don't have to give it, but if you do I would prefer c++, c or pseudocode). I started here
but I think most of the fast algorithms don't allow wildcards (by the way they exploit the nature of strings).
I hope the format of the question is ok because I am new here, thank you in advance!
A fast way, which is kind of the same thing as using a regexp, (which I would recommend anyway), is to find something that is fixed in needle, "a", but not "?", and search for it, then see if you've got a complete match.
j = firstNonWildcardPos(needle)
for(i = j, i < length(haystack)-length(needle)+j, ++i)
if(haystack[i] == needle[j])
if(!CompareMemory(haystack+i-j,needle,length(needle))
return i;
return -1; (Not found)
A regexp would generate code similar to this (I believe).
Among strings over an alphabet of c characters, let S have length s and let T_1 ... T_k have average length b. S will be searched for each of the k target strings. (The problem statement doesn't mention multiple searches of a given string; I mention it below because in that paradigm my program does well.)
The program uses O(s+c) time and space for setup, and (if S and the T_i are random strings) O(k*u*s/c) + O(k*b + k*b*s/c^u) total time for searching, with u=3 in program as shown. For longer targets, u should be increased, and rare, widely-separated key characters chosen.
In step 1, the program creates an array L of s+TsizMax integers (in program, TsizMax = allowed target length) and uses it for c lists of locations of next occurrences of characters, with list heads in H[] and tails in T[]. This is the O(s+c) time and space step.
In step 2, the program repeatedly reads and processes target strings. Step 2A chooses u = 3 different non-wild key characters (in current target). As shown, the program just uses the first three such characters; with a tiny bit more work, it could instead use the rarest characters in the target, to improve performance. Note, it doesn't cope with targets with fewer than three such characters.
The line "L[T[r]] = L[g+i] = g+i;" within Step 2A sets up a guard cell in L with proper delta offset so that Step 2G will automatically execute at end of search, without needing any extra testing during the search. T[r] indexes the tail cell of the list for character r, so cell L[g+i] becomes a new, self-referencing, end-of-list for character r. (This technique allows the loops to run with a minimum of extraneous condition testing.)
Step 2B sets vars a,b,c to head-of-list locations, and sets deltas dab, dac, and dbc corresponding to distances between the chosen key characters in target.
Step 2C checks if key characters appear in S. This step is necessary because otherwise a while loop in Step 2E will hang. We don't want more checks within those while loops because they are the inner loops of search.
Step 2D does steps 2E to 2i until var c points to after end of S, at which point it is impossible to make any more matches.
Step 2E consists of u = 3 while loops, that "enforce delta distances", that is, crawl indexes a,b,c along over each other as long as they are not pattern-compatible. The while loops are fairly fast, each being in essence (with ++si instrumentation removed) "while (v+d < w) v = L[v]" for various v, d, w. Replicating the three while loops a few times may increase performance a little and will not change net results.
In Step 2G, we know that the u key characters match, so we do a complete compare of target to match point, with wild-character handling. Step 2H reports result of compare. Program as given also reports non-matches in this section; remove that in production.
Step 2I advances all the key-character indexes, because none of the currently-indexed characters can be the key part of another match.
You can run the program to see a few operation-count statistics. For example, the output
Target 5=<de?ga>
012345678901234567890123456789012345678901
abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg
# 17, de?ga and de3ga match
# 24, de?ga and defg4 differ
# 31, de?ga and defga match
Advances: 'd' 0+3 'e' 3+3 'g' 3+3 = 6+9 = 15
shows that Step 2G was entered 3 times (ie, the key characters matched 3 times); the full compare succeeded twice; step 2E while loops advanced indexes 6 times; step 2I advanced indexes 9 times; there were 15 advances in all, to search the 42-character string for the de?ga target.
/* jiw
$Id: stringsearch.c,v 1.2 2011/08/19 08:53:44 j-waldby Exp j-waldby $
Re: Concept-code for searching a long string for short targets,
where targets may contain wildcard characters.
The user can enter any number of targets as command line parameters.
This code has 2 long strings available for testing; if the first
character of the first parameter is '1' the jay[42] string is used,
else kay[321].
Eg, for tests with *hay = jay use command like
./stringsearch 1e?g a?cd bc?e?g c?efg de?ga ddee? ddee?f
or with *hay = kay,
./stringsearch bc?e? jih? pa?j ?av??j
to exercise program.
Copyright 2011 James Waldby. Offered without warranty
under GPL v3 terms as at http://www.gnu.org/licenses/gpl.html
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
//================================================
int main(int argc, char *argv[]) {
char jay[]="abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg";
char kay[]="ludehkhtdiokihtmaihitoia1htkjkkchajajavpajkihtijkhijhipaja"
"etpajamhkajajacpajihiatokajavtoia2pkjpajjhiifakacpajjhiatkpajfojii"
"etkajamhpajajakpajihiatoiakavtoia3pakpajjhiifakacpajjhkatvpajfojii"
"ihiifojjjjhijpjkhtfdoiajadijpkoia4jihtfjavpapakjhiifjpajihiifkjach"
"ihikfkjjjjhijpjkhtfdoiajakijptoik4jihtfjakpapajjkiifjpajkhiifajkch";
char *hay = (argc>1 && argv[1][0]=='1')? jay:kay;
enum { chars=1<<CHAR_BIT, TsizMax=40, Lsiz=TsizMax+sizeof kay, L1, L2 };
int L[L2], H[chars], T[chars], g, k, par;
// Step 1. Make arrays L, H, T.
for (k=0; k<chars; ++k) H[k] = T[k] = L1; // Init H and T
for (g=0; hay[g]; ++g) { // Make linked character lists for hay.
k = hay[g]; // In same loop, could count char freqs.
if (T[k]==L1) H[k] = T[k] = g;
T[k] = L[T[k]] = g;
}
// Step 2. Read and process target strings.
for (par=1; par<argc; ++par) {
int alpha[3], at[3], a=g, b=g, c=g, da, dab, dbc, dac, i, j, r;
char * targ = argv[par];
enum { wild = '?' };
int sa=0, sb=0, sc=0, ta=0, tb=0, tc=0;
printf ("Target %d=<%s>\n", par, targ);
// Step 2A. Choose 3 non-wild characters to follow.
// As is, chooses first 3 non-wilds for a,b,c.
// Could instead choose 3 rarest characters.
for (j=0; j<3; ++j) alpha[j] = -j;
for (i=j=0; targ[i] && j<3; ++i)
if (targ[i] != wild) {
r = alpha[j] = targ[i];
if (alpha[0]==alpha[1] || alpha[1]==alpha[2]
|| alpha[0]==alpha[2]) continue;
at[j] = i;
L[T[r]] = L[g+i] = g+i;
++j;
}
if (j != 3) {
printf (" Too few target chars\n");
continue;
}
// Step 2B. Set a,b,c to head-of-list locations, set deltas.
da = at[0];
a = H[alpha[0]]; dab = at[1]-at[0];
b = H[alpha[1]]; dbc = at[2]-at[1];
c = H[alpha[2]]; dac = at[2]-at[0];
// Step 2C. See if key characters appear in haystack
if (a >= g || b >= g || c >= g) {
printf (" No match on some character\n");
continue;
}
for (g=0; hay[g]; ++g) printf ("%d", g%10);
printf ("\n%s\n", hay); // Show haystack, for user aid
// Step 2D. Search for match
while (c < g) {
// Step 2E. Enforce delta distances
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
// Step 2F. See if delta distances were met
if (a+dab==b && b+dbc==c && c<g) {
// Step 2G. Yes, so we have 3-letter-match and need to test whole match.
r = a-da;
for (k=0; targ[k]; ++k)
if ((hay[r+k] != targ[k]) && (targ[k] != wild))
break;
printf ("# %3d, %s and ", r, targ);
for (i=0; targ[i]; ++i) putchar(hay[r++]);
// Step 2H. Report match, if found
puts (targ[k]? " differ" : " match");
// Step 2I. Advance all of a,b,c, to go on looking
a = L[a]; ++ta;
b = L[b]; ++tb;
c = L[c]; ++tc;
}
}
printf ("Advances: '%c' %d+%d '%c' %d+%d '%c' %d+%d = %d+%d = %d\n",
alpha[0], sa,ta, alpha[1], sb,tb, alpha[2], sc,tc,
sa+sb+sc, ta+tb+tc, sa+sb+sc+ta+tb+tc);
}
return 0;
}
Note, if you like this answer better than current preferred answer, unmark that one and mark this one. :)
Regular expressions usually use a finite state automation-based search, I think. Try implementing that.

Reversed offset tokenizer

I have a string to tokenize. It's form is HHmmssff where H, m, s, f are digits.
It's supposed to be tokenized into four 2-digit numbers, but I need it to also accept short-hand forms, like sff so it interprets it as 00000sff.
I wanted to use boost::tokenizer's offset_separator but it seems to work only with positive offsets and I'd like to have it work sort of backwards.
Ok, one idea is to pad the string with zeroes from the left, but maybe the community comes up with something uber-smart. ;)
Edit: Additional requirements have just come into play.
The basic need for a smarter solution was to handle all cases, like f, ssff, mssff, etc. but also accept a more complete time notation, like HH:mm:ss:ff with its short-hand forms, e.g. s:ff or even s: (this one's supposed to be interpreted as s:00).
In the case where the string ends with : I can obviously pad it with two zeroes as well, then strip out all separators leaving just the digits and parse the resulting string with spirit.
But it seems like it would be a bit simpler if there was a way to make the offset tokenizer going back from the end of string (offsets -2, -4, -6, -8) and lexically cast the numbers to ints.
I keep preaching BNF notation. If you can write down the grammar that defines your problem, you can easily convert it into a Boost.Spirit parser, which will do it for you.
TimeString := LongNotation | ShortNotation
LongNotation := Hours Minutes Seconds Fractions
Hours := digit digit
Minutes := digit digit
Seconds := digit digit
Fraction := digit digit
ShortNotation := ShortSeconds Fraction
ShortSeconds := digit
Edit: additional constraint
VerboseNotation = [ [ [ Hours ':' ] Minutes ':' ] Seconds ':' ] Fraction
In response to the comment "Don't mean to be a performance freak, but this solution involves some string copying (input is a const & std::string)".
If you really care about performance so much that you can't use a big old library like regex, won't risk a BNF parser, don't want to assume that std::string::substr will avoid a copy with allocation (and hence can't use STL string functions), and can't even copy the string chars into a buffer and left-pad with '0' characters:
void parse(const string &s) {
string::const_iterator current = s.begin();
int HH = 0;
int mm = 0;
int ss = 0;
int ff = 0;
switch(s.size()) {
case 8:
HH = (*(current++) - '0') * 10;
case 7:
HH += (*(current++) - '0');
case 6:
mm = (*(current++) - '0') * 10;
// ... you get the idea.
case 1:
ff += (*current - '0');
case 0: break;
default: throw logic_error("invalid date");
// except that this code goes so badly wrong if the input isn't
// valid that there's not much point objecting to the length...
}
}
But fundamentally, just 0-initialising those int variables is almost as much work as copying the string into a char buffer with padding, so I wouldn't expect to see any significant performance difference. I therefore don't actually recommend this solution in real life, just as an exercise in premature optimisation.
Regular Expressions come to mind. Something like "^0*?(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)$" with boost::regex. Submatches will provide you with the digit values. Shouldn't be difficult to adopt to your other format with colons between numbers (see sep61.myopenid.com's answer). boost::regex is among the fastest regex parsers out there.