Hi I am pretty new to Elixir
And I am not able to create a map out of tow lists which is explained below
I have two Lists and want to transform the data into map as shown below
rows_array = ['R1', 'R2', 'R3', 'R4' ]
data_arrays = [
[c1, v11, v12, v13, v14],
[c2, v21, v22, v23, v24],
[c3, v31, v32, v33, v34],
[c4, v41, v42, v43, v44],
[c5, v51, v52, v53, v54]
]
I want to create a map like the one given below
%{
"R1": %{c1: v11, c2: v21, c3: v31, c4: v41, c5: c51},
"R2": %{c1: v12, c2: v22, c3: v32, c4: v42, c5: c52},
"R3": %{c1: v13, c2: v23, c3: v33, c4: v43, c5: c53},
"R4": %{c1: v14, c2: v24, c3: v34, c4: v44, c5: c54},
}
Thanks
The following should work.
Please note that Enum.zip_with/3 is new since 1.12, but you can accomplish the same with Enum.zip then Enum.map.
empty_rows = Enum.map(rows_array, fn _ -> %{} end)
rows =
Enum.reduce(data_arrays, empty_rows, fn [column_name | values], rows_acc ->
Enum.zip_with(values, rows_acc, fn value, row ->
Map.put(row, column_name, value)
end)
end)
result = Enum.zip(rows_array, rows) |> Map.new()
Also, I kept your variable names, but please note that these are all linked lists and not arrays. Erlang ships with an :array module but it rarely used in practice.
Related
Trying to combine two queries in Power BI to have an output of unique combinations.
for instance one list(or column): A, B, C
and another list(or column): 1, 2, 3, 4, 5, 6, 7
Output should be: A1, A2, A3, A4, A5, A6, A7, B1, B2, B3, B4, B5, B6, B7, C1, C2, C3, C4, C5, C6, C7
Is there a way to accomplish this? (Yes my rows are not equal in count)
Just don't know the best or right approach for this (tried using combine with a helper column and hit a dead end as duplicates get created, unless I did that wrong)
This is essentially a Cartesian product (a.k.a. cross product) of two lists.
If you just have two text lists, you can do a one-liner like this:
List.Combine(List.Transform(List1, (L1) => List.Transform(List2, (L2) => L1 & L2)))
This says for each item X in the first list, create a list that is the second list with X prepended to each element. This gives a list of lists that is flattened out to a single list using the combine function.
It's not uncommon to want to do this with table though too. In this case, the analogous idea is to define a new column on one table where each row is the entire list/column of the other and then expand that new column.
Assuming we want the cross product of Table1[Column1] and Table2[Column2]:
let
Table1 = <Table1 Source>,
AddCustom = Table.AddColumn(Table1 , "Custom", each Table2),
Expand = Table.ExpandTableColumn(AddCustom, "Custom", {"Column2"}, {"Column2"}),
Concatenate = Table.AddColumn(Expand, "Concatenate", each [Column1] & [Column2])
in
Concatenate
Edit:
You can do the concatenation before the expand too:
let
Table1 = <Table1 Source>,
AddCustom = Table.AddColumn(Table1 , "Custom",
(T1) => List.Transform(Table2[Column2], each T1[Column1] & _)),
Expanded = Table.ExpandListColumn(AddCustom, "Custom")
in
Expanded
References with more detail:
Cartesian Product Joins
Cartesian Join of two queries...
I have dataset that consist of three columns subject, predicate, and object
subject predicate object
c1 B V3
c1 A V3
c1 T V2
c2 A V2
c2 A V3
c2 T V1
c2 B V3
c3 B V3
c3 A V3
c3 T V1
c4 A V3
c4 T V1
c5 B V3
c5 T V2
c6 B V3
c6 T V1
I want to apply association mining rules on this data by using sql queries.
I take this idea from this paper Association Rule Mining on Semantic data by using sparql(SAG algorithm)
first, the user has to specify T (target predicate) and minimum support,then query if this T is frequent or not:
SELECT ?pt ?ot (COUNT(*) AS ?Yent)
WHERE {?s ?pt ?ot.
FILTER (regex (str(?pt), 'T', 'i'».}
GROUP BY ?pt ?ot
HAVING (?Yent >= 2)
I tried following code and I got same result:
q=mtcars1.select('s','p','o').where(mtcars1['p']=='T')
q1=q.groupBy('p','o').count()
q1.filter(q1['count']>=2).show()
result
+---+---+-----+
| p| o|count|
+---+---+-----+
| T| V2| 2|
| T| V1| 4|
+---+---+-----+
second query to calculate other predicates and objects if they are frequent:
q2=mtcars1.select('s','p','o').where(mtcars1['p']!='T')
q3=q2.groupBy('p','o').count()
q3.filter(q3['count']>=2).show()
result
+---+---+-----+
| p| o|count|
+---+---+-----+
| A| V3| 4|
| B| V3| 5|
+---+---+-----+
in order to find rules between two above queries, we will scan dataset again and find if they are repeated together greater than or equal minimum support
SELECT ?pe ?oe ?pt ?ot (count(*) AS ?supCNT)
WHERE { ?s ?pt ?ot .
FILTER (regex (str(?pt), 'T','i'».
?s ?pe ?oe .
FILTER (!regex (str(?pe), 'T','i'».}
GROUP BY ?pe ?oe ?pt ?ot
HAVING (?supCNT >= I)
ORDER BY ?pt ?ot
I tried to store subject in list then join between items ,but this took long time, and this will take very long time if data is very large.
w=mtcars1.select('s','p' ,'o').where(mtcars1['p']=='T')
w1=w.groupBy('p','o').agg(collect_list('s')).show()
result
+---+---+----------------+
| p| o| collect_list(s)|
+---+---+----------------+
| T| V2| [c1, c5]|
| T| V1|[c2, c3, c4, c6]|
+---+---+----------------+
w2=mtcars1.select('s','p' ,'o').where(mtcars1['p']!='T')
w3=w2.groupBy('p','o').agg(collect_list('s')).show()
result
+---+---+--------------------+
| p| o| collect_list(s)|
+---+---+--------------------+
| A| V3| [c1, c2, c3, c4]|
| B| V3|[c1, c2, c3, c5, c6]|
| A| V2| [c2]|
+---+---+--------------------+
join code
from pyspark.sql.functions import *
w44=w1.alias("l")\
.crossJoin(w3.alias("r"))\
.select(
f.col('l.p').alias('lp'),
f.col('l.o').alias('lo'),
f.col('r.p').alias('rp'),
f.col('r.o').alias('ro'),
intersection_udf(f.col('l.collect_list(s)'), f.col('r.collect_list(s)')).alias('TID'),
intersection_length_udf(f.col('l.collect_list(s)'), f.col('r.collect_list(s)')).alias('len')
)\
.where(f.col('len') > 1)\
.select(
f.struct(f.struct('lp', 'lo'), f.struct('rp', 'ro')).alias('2-Itemset'),
'TID'
)\
.show()
result
+---------------+------------+
| 2-Itemset| TID|
+---------------+------------+
|[[T,V2],[B,V3]]| [c1, c5]|
|[[T,V1],[A,V3]]|[c3, c2, c4]|
|[[T,V1],[B,V3]]|[c3, c2, c6]|
+---------------+------------+
so,I have to re scan dataset again and find association rules between items, and re scan again to find again rules.
following query is used to construct 3-factor set:
SELECT ?pel ?oel ?pe2 ?oe2 ?pt ?ot (eount(*) AS
?supCNT)
WHERE { ?s ?pt ?ot .
FILTER (regex (str(?pt), 'T','i'».
?s ?pel ?oel .
FILTER (!regex (str(?pel), 'T','i'».
FILTER (!regex (str(?pc2), 'T','i')&& !regex
(str(?pc2), str(?pcl),'i') ).}
GROUP BY ?pcl ?ocl ?pc2 ?oc2 ?pt ?ot
HAVING (?supCNT >=2)
ORDER BY ?pt ?ot
result for this query should be
{[(A, V3) (B, V3) (T, V1), 2]}
and we will repeat queries until no other rules between items
can anyone help me how can make association rules by sql queries,where subject is used as ID ,predicate + object=items
I have a csv-file that contains "pivot-like" data that I would like to store into a pandas DataFrame. The original data file is divided using different number of whitespaces to differentiate between the level in the pivot-data like so:
Text that I do not want to include,,
,Text that I do not want to include,Text that I do not want to include
,header A,header B
Total,100,100
A,,2.15
a1,,2.15
B,,0.22
b1,,0.22
" slightly longer name"...,,0.22
b3,,0.22
C,71.08,91.01
c1,57.34,73.31
c2,5.34,6.76
c3,1.33,1.67
x1,0.26,0.33
x2,0.26,0.34
x3,0.48,0.58
x4,0.33,0.42
c4,3.52,4.33
x5,0.27,0.35
x6,0.21,0.27
x7,0.49,0.56
x8,0.44,0.47
x9,0.15,0.19
x10,,0.11
x11,0.18,0.23
x12,0.18,0.23
x13,0.67,0.85
x14,0.24,0.2
x15,0.68,0.87
c5,0.48,0.76
x16,,0.15
x17,0.3,0.38
x18,0.18,0.23
d2,6.75,8.68
d3,0.81,1.06
x19,0.3,0.38
x20,0.51,0.68
Others,24.23,0
N/A,,
"Text that I do not want to include(""at all"") ",,
(It looks aweful, but you should be able to paste in e.g. Notepad to see it a bit clearer)
Basically, there are only two columns a and b, but the rows are indented using 0, 3, 6, 9, ... etc whitespaces to differentiate between the levels. So for instance,
zero level, the main group, A has 0 spaces,
first level a1 has 3 spaces,
second level a2 has 6 spaces,
third level a3 has 9 spaces and
fourth and final level has 12 spaces with the corresponding values for columns a and b respectively.
I would now like to be able to read and group this data on these levels in order to create a new summarizing DataFrame, with columns corresponding to these different levels, looking like:
Level 4 Diff(a,b) Level 0 Level 1 Level 2 Level 3
x7 525 C c1 c2 c3
x5 -0.03 A a1 a22 NaN
x4 -0.04 A a1 a22 NaN
x8 -0.08 C c1 c2 c3
…
Any clue on how to do this?
Thanks
Easiest is to split this into different functions
read the file
parse the lines
generate the 'tree'
construct the DataFrame
Parse the lines
def parse_file(file):
import ast
import re
pat = re.compile(r'^( *)(\w+),([\d.]+),([\d.]+)$')
for line in file:
r = pat.match(line)
if r:
spaces, label, a, b = r.groups()
diff = ast.literal_eval(a) - ast.literal_eval(b)
yield len(spaces)//3, label, diff
Reads each line, yields the level, 'label' and diff using a regular expression. I use ast to convert the string to int or float
Generate the tree
def parse_lines(lines):
previous_label = list(range(5))
for level, label, diff in lines:
previous_label[level] = label
if level == 4:
yield tuple(previous_label), diff
Initiates a list of length 5, and then overwrites the level this node is on.
Construct the DataFrame
with StringIO(file_content) as file:
lines = parse_file(file)
index, data = zip(*parse_lines(lines))
idx = pd.MultiIndex.from_tuples(index, names=[f'level_{i}' for i in range(len(index[0]))])
df = pd.DataFrame(data={'Diff(a,b)': list(data)}, index=idx)
Opens the file, constructs the index and generates the DataFrame with the different levels in the index. If you don't want this, you can add a .reset_index() or construct the DataFrame slightly different
df
level_0 level_1 level_2 level_3 level_4 Diff(a,b)
A a1 a2 a3 x1 -0.07
A a1 a2 a3 x2 -0.08000000000000002
A a1 a22 a3 x3 -0.04999999999999999
A a1 a22 a3 x4 -0.04000000000000001
A a1 a22 a3 x5 -0.03
A a1 a22 a3 x6 -0.06999999999999998
C c1 c2 c3 x7 525.0
C c1 c2 c3 x8 -0.08000000000000002
alternative for missing levels
def parse_lines(lines):
labels = [None] * 5
previous_level = None
for level, label, diff in lines:
labels[level] = label
if level == 4:
if previous_level < 3:
labels = labels[:previous_level + 1] + [None] * (5 - previous_level)
labels[level] = label
yield tuple(labels), diff
previous_level = level
the items under a22 don't seem to have a level_3, so it copies that from the previous. If this is unwanted, you can take this variation
df
level_0 level_1 level_2 level_3 level_4 Diff(a,b)
C c1 c2 c3 x1 -0.07
C c1 c2 c3 x2 -0.08000000000000002
C c1 c2 c3 x3 -0.09999999999999998
C c1 c2 c3 x4 -0.08999999999999997
C c1 c2 c4 x5 -0.07999999999999996
C c1 c2 c4 x6 -0.060000000000000026
C c1 c2 c4 x7 -0.07000000000000006
C c1 c2 c4 x8 -0.02999999999999997
C c1 c2 c4 x9 -0.04000000000000001
C c1 c2 c4 x11 -0.05000000000000002
C c1 c2 c4 x12 -0.05000000000000002
C c1 c2 c4 x13 -0.17999999999999994
C c1 c2 c4 x14 0.03999999999999998
C c1 c2 c4 x15 -0.18999999999999995
C c1 c2 c5 x17 -0.08000000000000002
C c1 c2 c5 x18 -0.05000000000000002
C c1 d2 d3 x19 -0.08000000000000002
C c1 d2 d3 x20 -0.17000000000000004
Python Pandas: How to replace values in a Dataframe based on another array in conditional base
For the same problem as above question, If I want to make them as list like following code to list them.
z=d1.stack().map(d2.set_index('member_ID')['Label']).unstack()
rstlist=z.groupby('member_ID')['Label'].apply(list)
print(rstlist)
I can display the list on python screen as below.
member_ID
a1 [a3, b4, b5]
a3 [a1, b2, b5]
b2 [a3, b4]
b4 [a1, b2]
b5 [a1, a3]
Name: Label, dtype: object
I need to write those lists into any .txt file, and I tried following code.
np.savetxt('tst_net.dat', rstlist.values, fmt='%s', delimiter="\t",
header="member_ID\t\tConnectedTo")
Although it write the file, the format is as below.
# member_ID ConnectedTo
['a3', 'b4', 'b5']
['a1', 'b2', 'b5']
['a3', 'b4']
['a1', 'b2']
['a1', 'a3']
But, I need to write it to a .txt file as it display on the screen.
member_ID Connected_to
a1 [a3, b4, b5]
a3 [a1, b2, b5]
b2 [a3, b4]
b4 [a1, b2]
b5 [a1, a3]
I think this might help you:
thefile = open('test.txt', 'w')
for key, value in rstlist.items():
thefile.write("%s\t%s\n" % (key, value))
Currently I have the Cards list,Now I want to show all the possible pairs of cards in another list.
For example: [(Card Club R2, Card Heart R3), (Card Club R2, Card Heart R4), (Card Club R2, Card Heart R5), (Card Club R2, Card Heart R6).........].
The total result might be 1326 different pairs
Just do
[ (c1, c2) | c1 <- allCards, c2 <- allCards, c1 /= c2 ]
But this will return 2652 pairs, as mentioned.
To restict this to 1326 pairs, either do as Zeta suggested or add Ordto Card:
[ (c1, c2) | c1 <- allCards, c2 <- allCards, c1 < c2 ]