I have
([[AA ww me bl qw 100] [AA ee rr aa aa 100] [AA qq rr aa aa 90]] [[CC ww me bl qw 100] [CC ee rr aa aa 67]])
and I need to remove the top nest so it becomes:
([AA ww me bl qw 100] [AA ee rr aa aa 100] [AA qq rr aa aa 90] [CC ww me bl qw 100] [CC ee rr aa aa 67])
Using flatten returns a list with just the inner elements which is not what I need. Thanks for helping..
You can use concat to remove a single level of nesting. Since you have a sequence of sequences you need to use apply:
(apply concat
([[AA ww me bl qw 100] [AA ee rr aa aa 100] [AA qq rr aa aa 90]] [[CC ww me bl qw 100] [CC ee rr aa aa 67]]))
Related
Suppose you have the following multi-line string:
C1 10
C2 20
C3 30
C2 40
C4 50
C3 60
And you want to match only those lines which have the same leading word, so as to build the following result:
C1 10
C2 20 40
C3 30 60
C4 50
I am trying to figure out a solution with pure Regex, but I am stuck. Any help?
I did try what the regex that follows, but it didn't work...
Regex: /(^\w+\b)(.*$)([\s\S]*?\n)(\1)(.*$)/gm
Substitution:$1$2$5$3
Result:
C1 10
C2 20 40
C3 30
C4 50
C3 60
As you can see, it only works with the first occurrence, despite the fact that I have used a lazy quantifier in the third capturing group.
Any help?
You could also accomplish this using reduce()
const data = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;
const result = data.split("\n").reduce((acc, val) => {
const vals = val.split(" ");
if (!acc[vals[0]]) acc[vals[0]] = vals[1];
else acc[vals[0]] += ` ${vals[1]}`;
return acc;
}, {});
console.log(result);
I have two group of arrays
a1 a2 a3 a4 a5 a6 a7 a8 <= name it as key1
b1 b2 b3 b4 b5 b6 b7 b8 <= val1
c1 c2 c3 c4 c5 c6 c7 c8
and
d1 d2 d3 d4 d5 d6 d7 d8 <= key2
e1 e2 e3 e4 e5 e6 e7 e8 <= val2
f1 f2 f3 f4 f5 f6 f7 f8
The arrays a1,...,an and d1,...,dn are sorted and might be repeated. i.e. their values might be something like 1 1 2 3 4 6 7 7 7 ... I want to check if for each Tuple di,ei check if it is equal to any of ai,bi. If it is (di==ai,bi==ei) then I have to combine fi and ci using some function e.g. add and store in fi.
Firstly, is it possible to do this using zip iterators and transformation in thurst library to solve this efficiently?
Secondly, the simplest method that I can imagine is to count occurance of number of each keys (ai) do prefix sum and use both to get start and end index of each keys and then for each di use above counting to iterate through those indices and check if ei==di. and perform the transformation.
i.e. If I have
1 1 2 3 5 6 7
2 3 4 5 2 4 6
2 4 5 6 7 8 5
as first array, I count the occurance of 1,2,3,4,5,6,7,...:
2 1 1 0 1 1 1 <=name it as count
and then do prefix sum to get:
2 3 4 4 5 6 7 <= name it as cumsum
and use this to do:
for each element di,
for i in (cumsum[di] -count[di]) to cumsum[di]:
if ei==val1[i] then performAddition;
What I fear is that since not all threads are equal, this will lead to warp divergence, and I may not have efficient performance.
You could treat your data as two key-value tables.Table1: (a,b) -> c and Table2: (d,e)->f, where pair (a,b) and (d,e) are keys, and c, f are values.
Then your problem simplifies to
foreach key in Table2
if key in Table1
Table2[key] += Table1[key]
Suppose a and b have limited ranges and are positive, such as unsigned char, a simple way to combine a and b into one key is
unsigned short key = (unsigned short)(a) * 256 + b;
If the range of key is still not too large as in the above example, you could create your Table1 as
int Table1[65536];
Checking if key in Table1 becomes
if (Table1[key] != INVALID_VALUE)
....
With all these restrictions, implementation with thrust should be very simple.
Similar combining method could still be used if a and b have larger range like int.
But if the range of key is too large, you have to go to the method suggested by Robert Crovella.
When I run the following code, I get the ensuing error:
(run 3 [q]
(fresh [a0 a1 a2
b0 b1 b2
c0 c1 c2]
(== q [[a0 a1 a2] [b0 b1 b2] [c0 c1 c2]])
(fd/in a0 a1 a2 b0 b1 b2 c0 c1 c2 (fd/interval 1 9))
(fd/distinct [a0 a1 a2 b0 b1 b2 c0 c1 c2])
(fd/eq
(= a0 4)
(= 22 (- (* a0 a1) a2))
(= -1 (-> b0 (* b1) (- b2)))
)))
error:
2. Unhandled clojure.lang.Compiler$CompilerException
Error compiling src/euler/core.clj at (1392:5)
1. Caused by java.lang.IllegalArgumentException
Can't call nil
core.clj: 3081 clojure.core/eval
main.clj: 240 clojure.main/repl/read-eval-print/fn
main.clj: 240 clojure.main/repl/read-eval-print
main.clj: 258 clojure.main/repl/fn
main.clj: 258 clojure.main/repl
RestFn.java: 1523 clojure.lang.RestFn/invoke
interruptible_eval.clj: 87 clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
AFn.java: 152 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
core.clj: 630 clojure.core/apply
core.clj: 1868 clojure.core/with-bindings*
RestFn.java: 425 clojure.lang.RestFn/invoke
...
Notice the line with the threading macro ->, in CIDER I'll macro expand it and everything looks fine, but in the end the code crashes. I'm assuming this is the fault of macros, but I'm not sure why. Any ideas?
You should look at the source of clojure.core.logic.fd. The error occurs in the macro eq, which process all of its forms before macroexpansion can occur.
For a quick solution, I've created a version of eq which calls macroexpand-all on all of its forms before doing anything else. It seems to work for your code, although I have not tested it in other contexts:
(defmacro mac-eq [& forms]
(let [exp-forms (map clojure.walk/macroexpand-all forms)]
`(fd/eq ~#exp-forms)))
Let's try it!
stack-prj.logicThreadMacro> (run 3 [q]
(fresh [a0 a1 a2
b0 b1 b2
c0 c1 c2]
(== q [[a0 a1 a2] [b0 b1 b2] [c0 c1 c2]])
(fd/in a0 a1 a2 b0 b1 b2 c0 c1 c2 (fd/interval 1 9))
(fd/distinct [a0 a1 a2 b0 b1 b2 c0 c1 c2])
(mac-eq
(= a0 4)
(= 22 (- (* a0 a1) a2))
(= -1 (-> b0 (* b1) (- b2))))))
()
I have two data frames:
df1 =
Id ColA ColB ColC
1 aa bb cc
3 11 ww 55
5 11 bb cc
df2 =
Id ColD ColE ColF
1 ff ee rr
2 ww rr 55
3 hh 11 22
4 11 11 cc
5 cc bb aa
I need to merge these two data frames to get the following result:
result =
Id ColA ColB ColC ColD ColE ColF
1 aa bb cc ff ee rr
2 NaN NaN NaN ww rr 55
3 11 ww 55 hh 11 22
4 NaN NaN NaN 11 11 cc
5 11 bb cc cc bb aa
I do the merging this way:
import pandas as pd
result = pd.merge(df1,df2,on='Id')
However my result looks as follows instead of the expected above-shown result:
result =
Id ColA ColB ColC ColD ColE ColF
1 aa bb cc ff ee rr
3 11 ww 55 hh 11 22
5 11 bb cc cc bb aa
According to the documentation of merge, you need to specify the 'how' parameter as outer (the default is inner, which is consistent with what you're getting):
outer: use union of keys from both frames (SQL: full outer join)
inner: use intersection of keys from both frames (SQL: inner join)
I have a SAS data set, which I have sorted according to my needs. I want to split it into BY groups and, for each group, output each observation until the first occurrence of a particular value in a particular column.
ID No C1 Year2 C3 Date (DD/MM/YYYY)
---------------------------------------------------------
AB123 4 B4 2008E OC 09/04/2008
AB123 3 B4 2008E EL 09/04/2008
AB123 2 B4 2008E ZZ 09/04/2008
AB123 1 B4 2008E OC 09/04/2008
AB123 0 B4 2008E ZZ 09/04/2008
AB123 1 B4 2008E OC 06/02/2008
AB123 0 B4 2008E ZZ 06/02/2008
This is one BY group: the data set is grouped by ID, C1, Year2 and sorted by ID, C1, Year2, Date(desc), No(desc). Further instances of each of ID, C1 and Year2 could occur anywhere in the data set, but the 3 variables define each BY group.
I want to output all observations per BY group up to and including the first occurrence of ZZ in C3. So above I would want the first 3 observations output (or flagged) and then move on to the next BY group.
Any help would be greatly appreciated. Please let me know if you need any more details of the problem. Thanks.
Here's one way that should work.
data have;
input ID $ No C1 $ Year2 $ C3 $ Date :DDMMYY10.;
format date DDMMYY10.;
cards;
AB123 4 B4 2008E OC 09/04/2008
AB123 3 B4 2008E EL 09/04/2008
AB123 2 B4 2008E ZZ 09/04/2008
AB123 1 B4 2008E OC 09/04/2008
AB123 0 B4 2008E ZZ 09/04/2008
AB123 1 B4 2008E OC 06/02/2008
AB123 0 B4 2008E ZZ 06/02/2008
;
run;
data want (drop=stopflag);
set have;
by id c1 year2;
retain stopflag;
if max(first.id,first.c1,first.year2)=1 then stopflag=0;
if c3='ZZ' and stopflag=0 then do;
output;
stopflag=1;
end;
if stopflag=0 then output;
run;