How can I split a variable by line in TCL? - list

I have a variable named "results" with this value:
{0 0 0 0 0 0 0 0 0 0 0 3054 11013}
{0 0 0 0 0 0 0 0 0 0 0 5 13 15}
{0.000 3272.744 12702.352 30868.696}
I'd like to store each line (values between the '{}') in a separate variable and then, compare each of the elements of each line with a threshold (this threshold will be different for each line, that's why I need to split them).
I've tried
set result [split $results \n]
But it doesn't really give me a neat list of elements. Any to get 3 lists from the variable "results"?

If I understand correctly, and the representation of your exemplary data is accurate, then you do not have to process ([split]) the data held by results, but leave that to Tcl's list parser. In other words, the input is already a valid string representation of a Tcl list eligible for further processing. Watch:
set results {
{0 0 0 0 1}
{2 2 3 3 3}
{1 1 2 3 4}
};
set thresholds {
3
2
1
}
lmap values $results threshold $thresholds {
lmap v $values {expr {$v >= $threshold}}
}
This will produce:
{0 0 0 0 0} {1 1 1 1 1} {1 1 1 1 1}
Background: when $results is worked on by [lmap], it will be turned into a list automatically.

I think its better to split according to new line character and then apply regexp to fetch the data. I have tried a sample code.
set results "{0 0 0 0 1}
{2 2 3 3 3}
{1 1 2 3 4}";
set result [split $results \n];
foreach line $result {
if {[regexp {^\s*\{(.+)\}\s*} $line Complete_Match Content]} {
puts "$Content\n";
}
}

Related

tcl split list by n character

I have a list like below.
{2 1 0 2 2 0 2 3 0 2 4 0}
I would like to add comma between each 3 characters with using TCL.
{2 1 0,2 2 0,2 3 0,2 4 0}
I am looking for your help.
Regards
If it is always definitely three elements, it is easy to use lmap and join:
set theList {2 1 0 2 2 0 2 3 0 2 4 0}
set joined [join [lmap {a b c} $theList {list $a $b $c}] ","]
One way:
Append n elements at a time from your list to a string using a loop, but first append a comma if it's not the first time through.
#!/usr/bin/env tclsh
proc insert_commas {n lst} {
set res ""
set len [llength $lst]
for {set i 0} {$i < $len} {incr i $n} {
if {$i > 0} {
append res ,
}
append res [lrange $lst $i [expr {$i + $n - 1}]]
}
return $res;
}
set lst {2 1 0 2 2 0 2 3 0 2 4 0}
puts [insert_commas 3 $lst] ;# 2 1 0,2 2 0,2 3 0,2 4 0

Create duplicate rows in SAS and change values of variables

I have been so confused on how to implement this in SAS. I am trying to create duplicate rows if the value of "2" occurs more than once between the variables (member1 -member4). For example, if a row has the value 2 in member2, member3, and member4, then I will create 2 duplicate rows since the initial row will serve for the first variable and the duplicate rows will be for member 3 and 4. On the duplicate row for member3 for example, member 2 and 4 will be missing if their values is equal to 2. Basically the value "2" can only occur once per row. let's assume sa1 to sa4 corresponds to other variables of member1 to member4 respectively. When we create a duplicate row for each member, the other variables should be missing if they have a value of "1". For example, if the duplicate row is for member 3, then values that equal "1" for sa1, sa2 and sa4 should be set to missing. There are other variables in the dataset that will have same values for all duplicate rows as initial rows. Duplicate rows will also have a suffix for the ID to indicate the parent rows.
This is an example of the data I have
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
Then this is the output I am trying to achieve
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 . 0 0 1 . 0
1_1 0 . 2 0 0 . 1 0
2 2 . 0 5 . . 0 0
2_1 . 2 0 5 . 1 0 0
3 2 . 3 . 1 . 0 .
3_1 . 2 3 . . 1 0 .
3_2 . . 3 2 . . 0 1
Will appreciate any help. Thank you!
You need to count the number of '2's. You also need to remember where they used to be. "I had the spots removed for good luck, but I remember where the spots formerly were."
data have ;
input id :$10. member1 member2 member3 member4 sa1 sa2 sa3 sa4 ;
cards;
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
4 2 0 0 0 . . . .
5 0 0 0 0 . . . .
;
data want ;
set have ;
array m member1-member4 ;
array x [4] _temporary_;
do index=1 to dim(m);
x[index]=m[index]=2;
end;
n2 = sum(of x[*]);
if n2<2 then output;
else do counter=1 to n2;
id=scan(id,1,'_');
if counter > 1 then id=catx('_',id,counter-1);
counter2=0;
do index=1 to dim(m);
if x[index] then do;
counter2+1;
if counter = counter2 then m[index]=2;
else m[index]=.;
end;
end;
output;
end;
drop index n2 counter counter2;
run;
Results
Obs id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 1 0 2 . 0 0 1 1 0
2 1_1 0 . 2 0 0 1 1 0
3 2 2 . 0 5 . 1 0 0
4 2_1 . 2 0 5 . 1 0 0
5 3 2 . 3 . 1 1 0 1
6 3_1 . 2 3 . 1 1 0 1
7 3_2 . . 3 2 1 1 0 1
8 4 2 0 0 0 . . . .
9 5 0 0 0 0 . . . .
I think your expecting us to code the whole thing for you... I dont get your logic explanation of what you want - but to start off with:
create a new dataset
rename all the variables on the way in - prefix with O_ (Original)
code however you like to see how many values contain 2 (HOWMANYTWOS)
do ROW = 1 to HOWMANYTWOS
4.1 again go through the values on the O_ variables you have
4.2 if the ROW - corresponds to your increasing counter its the 2 you wish to keep and so you dont touch it - if the 2 does not correspond to your ROW - make it .
4.3 output the record with a new(if required) ID
a start for you:
data NEW;
set ORIG (rename=(MEMBER1-MEMBER4=O_MEMBER1-O_MEMBER4 ID=O_ID etc..)
HOWMANYTWOS = sum(O_MEMBER1=2,O_MEMBER2=2,O_MEMBER3=2,O_MEMBER4=2);
do ROW = 1 to HOWMANYTWOS; /* This is stepping through and creating the new rows - you need to step through the variables to see if you want to make them null before outputting... NOTE do not change O_ variables only create/update the variables going to the output dataset (The O_ version is for checking against only)
ID = ifc(ROW = 1, O_ID, catx("_", O_ID, ROW);
/* create a counter
output;
end;
run;
Sorry - Not got sas here and its been a little while

Create boolean dataframe showing existance of each element in a dictionary of lists

I have a dictionary of lists and I have constructed a dataframe where the index is the dictionary keys and the columns are the set of possible values contained within the lists. The dataframe values represent existance of each column for each list contained in the dictionary. What is the most efficient way to construct this? Below is the way I have done it now using for loops, but I am sure there is a more efficient way using either vectorization or concatenation.
import pandas as pd
data = {0:[1,2,3,4],1:[2,3,4],2:[3,4,5,6]}
cols = sorted(list(set([x for y in data.values() for x in y])))
df = pd.DataFrame(0,index=data.keys(),columns=cols)
for row in df.iterrows():
for col in cols:
if col in data[row[0]]:
df.loc[row[0],col] = 1
else:
df.loc[row[0],col] = 0
print(df)
Output:
1 2 3 4 5 6
0 1 1 1 1 0 0
1 0 1 1 1 0 0
2 0 0 1 1 1 1
Use MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(data.values()),
columns=mlb.classes_,
index=data.keys())
print (df)
1 2 3 4 5 6
0 1 1 1 1 0 0
1 0 1 1 1 0 0
2 0 0 1 1 1 1
Pure pandas, but much slowier solution with str.get_dummies:
df = pd.Series(data).astype(str).str.strip('[]').str.get_dummies(', ')

Element-wise multiplication of two lists in Tcl

I have two lists of same length and I want to multiply them element-wise(like Cartesian product in sets). How do I do it? For example, if I write
set a {1 2 3 4 5}
set b {1 2 3 4 5}
,then the desired output is :
{1 4 9 16 25}
A two-list lmap is perfect for this:
set a {1 2 3 4 5}
set b {1 2 3 4 5}
set result [lmap x $a y $b {expr {$x * $y}}]
If you're on Tcl 8.5 (or older) use this instead:
set a {1 2 3 4 5}
set b {1 2 3 4 5}
set result {}
foreach x $a y $b {
lappend result [expr {$x * $y}]
}
The multi-list form of foreach has been supported for a very long time indeed.

Convert this Word DataFrame into Zero One Matrix Format DataFrame in Python Pandas

Want to convert user_Id and skills dataFrame matrix into zero one DataFrame matrix format user and their corresponding skills
Input DataFrame
user_Id skills
0 user1 [java, hdfs, hadoop]
1 user2 [python, c++, c]
2 user3 [hadoop, java, hdfs]
3 user4 [html, java, php]
4 user5 [hadoop, php, hdfs]
Desired Output DataFrame
user_Id java c c++ hadoop hdfs python html php
user1 1 0 0 1 1 0 0 0
user2 0 1 1 0 0 1 0 0
user3 1 0 0 1 1 0 0 0
user4 1 0 0 0 0 0 1 1
user5 0 0 0 1 1 0 0 1
You can join new DataFrame created by astype if need convert lists to str (else omit), then remove [] by strip and use get_dummies:
df = df[['user_Id']].join(df['skills'].astype(str).str.strip('[]').str.get_dummies(', '))
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0
df1 = df['skills'].astype(str).str.strip('[]').str.get_dummies(', ')
#if necessary remove ' from columns names
df1.columns = df1.columns.str.strip("'")
df = pd.concat([df['user_Id'], df1], axis=1)
print (df)
user_Id c c++ hadoop hdfs html java php python
0 user1 0 0 1 1 0 1 0 0
1 user2 1 1 0 0 0 0 0 1
2 user3 0 0 1 1 0 1 0 0
3 user4 0 0 0 0 1 1 1 0
4 user5 0 0 1 1 0 0 1 0