Specify Column and Row of a String Search - row

Because I'm working with a very complex table with nasty repeated values in variable places, I'd like to do a string search between specific rows and columns.
For example:
table={{"header1", "header2", "header3",
"header4"}, {"falsepositive", "falsepositive", "name1",
"falsepositive"}, {"falsepositive", "falsepositive", "name2",
"falsepositive"}, {"falsepositive", "falsepositive",
"falsepositive", "falsepositive"}}
%//TableForm=
header1 header1 header1 header1
falsepositive falsepositive name1 falsepositive
falsepositive falsepositive name2 falsepositive
falsepositive falsepositive falsepositive falsepositive
How do I look for a string, for example, in column three, rows one through two?
I'd like to use Which to assign values based on a string's location in the table.
E.g.,
Which[string matched in location one, value, matched in location two, value2]

As I understand it you want a test whether or not a given string is in a certain subsection of a matrix. You can pick these subsections using Part ([[...]]) and Span (;;), with which you can indicate ranges or subsamples of ranges. Testing whether or not this subsection contains your pattern can be done by MemberQ, like this:
MemberQ[table[[1 ;; 2, 3]], "name2"]
(* ==> False *)
MemberQ[table[[1 ;; 2, 3]], "header3"]
(* ==> True *)
In this way, your Which statement could look like this:
myVar =
Which[
MemberQ[table[[1 ;; 2, 3]], "name2"], 5,
MemberQ[table[[2 ;; 3, 4]], "falsepositive"], 6,
...
True, 20
]

Length[Cases[Position[table, "name1"], {1 | 2, 3}]] >= 1
Output -> True
Or
Cases[Position[table, "name1"], {1 | 2, 3}]
Output -> {{2, 3}}

Perhaps, if I understand you:
f[table_, value_, rowmin_, rowmax_, colmin_, colmax_] :=
Select[Position[table, value],
rowmin <= First## <= rowmax && colmin <= Last## <= colmax &]
f[table, "name1", 1, 10, 1, 10]
(*
-> {{2, 3}}
*)

Related

In Raku, how does one write the equivalent of Haskell's span function?

In Raku, how does one write the equivalent of Haskell's span function?
In Haskell, given a predicate and a list, one can split the list into two parts:
the longest prefix of elements satisfying the predicate
the remainder of the list
For example, the Haskell expression …
span (< 10) [2, 2, 2, 5, 5, 7, 13, 9, 6, 2, 20, 4]
… evaluates to …
([2,2,2,5,5,7],[13,9,6,2,20,4])
How does one write the Raku equivalent of Haskell's span function?
Update 1
Based on the answer of #chenyf, I developed the following span subroutine (additional later update reflects negated predicate within span required to remain faithful to the positive logic of Haskell's span function) …
sub span( &predicate, #numberList )
{
my &negatedPredicate = { ! &predicate($^x) } ;
my $idx = #numberList.first(&negatedPredicate):k ;
my #lst is Array[List] = #numberList[0..$idx-1], #numberList[$idx..*] ;
#lst ;
} # end sub span
sub MAIN()
{
my &myPredicate = { $_ <= 10 } ;
my #myNumberList is Array[Int] = [2, 2, 2, 5, 5, 7, 13, 9, 6, 2, 20, 4] ;
my #result is Array[List] = span( &myPredicate, #myNumberList ) ;
say '#result is ...' ;
say #result ;
say '#result[0] is ...' ;
say #result[0] ;
say #result[0].WHAT ;
say '#result[1] is ...' ;
say #result[1] ;
say #result[1].WHAT ;
} # end sub MAIN
Program output is …
#result is ...
[(2 2 2 5 5 7) (13 9 6 2 20 4)]
#result[0] is ...
(2 2 2 5 5 7)
(List)
#result[1] is ...
(13 9 6 2 20 4)
(List)
Update 2
Utilizing information posted to StackOverflow concerning Raku's Nil, the following updated draft of subroutine span is …
sub span( &predicate, #numberList )
{
my &negatedPredicate = { ! &predicate($^x) } ;
my $idx = #numberList.first( &negatedPredicate ):k ;
if Nil ~~ any($idx) { $idx = #numberList.elems ; }
my List $returnList = (#numberList[0..$idx-1], #numberList[$idx..*]) ;
$returnList ;
} # end sub span
sub MAIN()
{
say span( { $_ == 0 }, [2, 2, 5, 7, 4, 0] ) ; # (() (2 2 5 7 4 0))
say span( { $_ < 6 }, (2, 2, 5, 7, 4, 0) ) ; # ((2 2 5) (7 4 0))
say span( { $_ != 9 }, [2, 2, 5, 7, 4, 0] ) ; # ((2 2 5 7 4 0) ())
} # end sub MAIN
I use first method and :k adverb, like this:
my #num = [2, 2, 2, 5, 5, 7, 13, 9, 6, 2, 20, 4];
my $idx = #num.first(* > 10):k;
#num[0..$idx-1], #num[$idx..*];
A completely naive take on this:
sub split_on(#arr, &pred) {
my #arr1;
my #arr2 = #arr;
loop {
if not &pred(#arr2.first) {
last;
}
push #arr1: #arr2.shift
}
(#arr1, #arr2);
}
Create a new #arr1 and copy the array into #arr2. Loop, and if the predicate is not met for the first element in the array, it's the last time through. Otherwise, shift the first element off from #arr2 and push it onto #arr1.
When testing this:
my #a = [2, 2, 2, 5, 5, 7, 13, 9, 6, 2, 20, 4];
my #b = split_on #a, -> $x { $x < 10 };
say #b;
The output is:
[[2 2 2 5 5 7] [13 9 6 2 20 4]]
Only problem here is... what if the predicate isn't met? Well, let's check if the list is empty or the predicate isn't met to terminate the loop.
sub split_on(#arr, &pred) {
my #arr1;
my #arr2 = #arr;
loop {
if !#arr2 || not &pred(#arr2.first) {
last;
}
push #arr1: #arr2.shift;
}
(#arr1, #arr2);
}
So I figured I'd throw my version in because I thought that classify could be helpful :
sub span( &predicate, #list ) {
#list
.classify({
state $f = True;
$f &&= &predicate($_);
$f.Int;
}){1,0}
.map( {$_ // []} )
}
The map at the end is to handle the situation where either the predicate is never or always true.
In his presentation 105 C++ Algorithms in 1 line* of Raku (*each) Daniel Sockwell discusses a function that almost answers your question. I've refactored it a bit to fit your question, but the changes are minor.
#| Return the index at which the list splits given a predicate.
sub partition-point(&p, #xs) {
my \zt = #xs.&{ $_ Z .skip };
my \mm = zt.map({ &p(.[0]) and !&p(.[1]) });
my \nn = mm <<&&>> #xs.keys;
return nn.first(?*)
}
#| Given a predicate p and a list xs, returns a tuple where first element is
#| longest prefix (possibly empty) of xs of elements that satisfy p and second
#| element is the remainder of the list.
sub span(&p, #xs) {
my \idx = partition-point &p, #xs;
idx.defined ?? (#xs[0..idx], #xs[idx^..*]) !! ([], #xs)
}
my #a = 2, 2, 2, 5, 5, 7, 13, 9, 6, 2, 20, 4;
say span { $_ < 10 }, #a; #=> ((2 2 2 5 5 7) (13 9 6 2 20 4))
say span { $_ < 5 }, [6, 7, 8, 1, 2, 3]; #=> ([] [6 7 8 1 2 3])
Version 6.e of raku will sport the new 'snip' function:
use v6.e;
dd (^10).snip( * < 5 );
#«((0, 1, 2, 3, 4), (5, 6, 7, 8, 9)).Seq␤»

Pandas unstack but only create multi index for certain columns

I have a data frame that is production data for a factory. The factory is organised into lines. The structure of the data is such that one of the columns contains repeating values that properly thought of are headers. I need to reshape the data. So in the following DataFrame the 'Quality' column contains 4 measures, that are then measured for each hour. Clearly this gives us four observations per line.
The goal here is to transpose this data, but such that some of the columns are single index and some are multi index. The row index should remain ['Date', 'ID']. The single index columns should be 'line_no', 'floor', 'buyer' and the multi index columns should be the hourly measures for each of the quality measures.
I know that this is possible because I accidentally stumbled across the way to do it. Basically as my code will show, I put everything in the index except the hourly data and then unstacked the quality column from the index. Then by chance, I tried to reset the index and it created this amazing dataframe where some columns were single index and some multi. Of course its highly impractical to have loads of columns in the index, because we might want to do stuff with them, like change them. My question is how to achieve this type of thing without having to go through this (what I feel is a) workaraound.
import random
import pandas as pd
d = {'ID' : [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3] * 2,
'Date' : ['2013-05-04' for x in xrange(12)] + \
['2013-05-06' for x in xrange(12)],
'line_no' : [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3] * 2,
'floor' : [5, 5, 5, 5, 6, 6, 6, 6, 5, 5, 5, 5] * 2,
'buyer' : ['buyer1', 'buyer1', 'buyer1', 'buyer1',\
'buyer2', 'buyer2', 'buyer2', 'buyer2',\
'buyer1', 'buyer1', 'buyer1', 'buyer1'] * 2,
'Quality' : ['no_checked', 'good', 'alter', 'rejected'] * 6,
'Hour1' : [random.randint(1000, 15000) for x in xrange(24)],
'Hour2' : [random.randint(1000, 15000) for x in xrange(24)],
'Hour3' : [random.randint(1000, 15000) for x in xrange(24)],
'Hour4' : [random.randint(1000, 15000) for x in xrange(24)],
'Hour5' : [random.randint(1000, 15000) for x in xrange(24)],
'Hour6' : [random.randint(1000, 15000) for x in xrange(24)]}
DF = pd.DataFrame(d, columns = ['ID', 'Date', 'line_no', 'floor', 'buyer',
'Quality', 'Hour1', 'Hour2', 'Hour3', 'Hour4',
'Hour5', 'Hour6'])
DF.set_index(['Date', 'ID'])
So this is how I achieved what I wanted, but there must be a way to do this without having to go through all these steps. Help please...
# Reset the index
DF.reset_index(inplace = True)
# Put everything in the index
DF.set_index(['Date', 'ID', 'line_no', 'floor', 'buyer', 'Quality'], inplace = True)
# Unstack Quality
DFS = DF.unstack('Quality')
#Now this was the accidental workaround - gives exactly the result I want
DFS.reset_index(inplace = True)
DFS.set_index(['Date', 'ID'], inplace = True)
All help appreciated. Sorry for the long question, but at least there is some data riiiight!
In general inplace operations are not faster and IMHO less readable.
In [18]: df.set_index(['Date','ID','Quality']).unstack('Quality'))
Out[18]:
line_no floor buyer Hour1 Hour2 Hour3 Hour4 Hour5 Hour6
Quality alter good no_checked rejected alter good no_checked rejected alter good no_checked rejected alter good no_checked rejected alter good no_checked rejected alter good no_checked rejected
Date ID
2013-05-04 1 1 5 buyer1 6920 8681 9317 14631 5739 2112 4211 12026 13577 1855 13884 12710 7250 2540 1948 7116 9874 7302 10961 8251 3070 2793 14293 10895
2 2 6 buyer2 7943 7501 13725 1648 7178 9670 6278 6888 9969 11766 9968 4722 7242 4049 6704 2225 6546 8688 11513 14550 2140 11941 1142 6683
3 3 5 buyer1 5155 2449 13648 2183 14184 7309 1185 10454 11742 14102 2242 14297 6185 5554 12505 13312 3062 7426 4421 5693 12342 11622 10431 13375
2013-05-06 1 1 5 buyer1 14563 1343 14419 3350 8526 1185 5244 14777 2238 3640 6717 1109 7777 13136 1732 8681 14454 1059 10606 6942 9349 4524 13931 11799
2 2 6 buyer2 14837 9524 8453 6074 11516 12356 9651 10650 15000 11374 4690 10914 1857 3231 14627 6590 6503 9268 13108 8581 8448 12013 14175 10783
3 3 5 buyer1 9032 12959 4613 6793 7918 2827 6027 13002 11771 13370 12767 11080 12624 13269 11740 10543 8609 14709 11921 12484 8670 12706 8001 8991
[6 rows x 27 columns]
is a quite reasonable idiom for what you are doing

Sorting number of lists according to indexes and priority

I have a collection of lists with each containing around 6 to 7 values. Like,
list1 = 2,4,7,4,9,5
list2 = 4,3,7.3,9,8,1.2
list3 = 2,2.4,7,9,8,5
list4 = 9,1.6,4,3,4,1
list5 = 2,5,7,9,1,4
list6 = 6,8,7,2,1,5
list7 = 4,2,5,2,1,3
Now I want to sort these with index1 as primary and index3 as secondary and index2 as tertiary and so on. That is, the output should be like:
2,2.4,7,9,8,5
2,4,7,4,9,5
2,5,7,9,1,4
4,2,5,2,1,3
6,8,7,2,1,5
9,1.6,4,3,4,1
I want the list order to be sorted for index1 first and if the values are same for index1 than sorting is done on index3 and if further same than on index2. Here the number of lists are less which can increase to 20 and the indexes can grow up to 20 as well.
The algorithm I want to know is the same as that of iTunes song sorting, in which songs with the same album are sorted first and then by artist and then by rank and then by name. That's the album's if album names are the same then sorting is done on the artist if same, then by rank and so on. The code can be in C/C++/tcl/shell.
sort -n -t ',' -k 1 -k 3 -k 2
Feed the lists as individual lines into it.
To do this in Tcl, assuming there's not huge amounts of data (a few MB wouldn't be “huge”) the easiest way would be:
# Read the values in from stdin, break into lists of lists
foreach line [split [read stdin] "\n"] {
lappend records [split $line ","]
}
# Sort twice, first by secondary key then by primary (lsort is _stable_)
set records [lsort -index 1 -real $records]
set records [lsort -index 0 -real $records]
# Write the values back out to stdout
foreach record $records {
puts [join $record ","]
}
If you're using anything more complex than simple numbers, consider using the csv package in Tcllib for parsing and formatting, as it will deal with many syntactic issues that crop up in Real Data. If you're dealing with a lot of data (where “lot” depends on how much memory you deploy with) then consider using a more stream-oriented method for handling the data (and there are a few other optimizations in the memory handling) and you might also want to use the -command option to lsort to supply a custom comparator so you can sort only once; the performance hit of a custom comparator is quite high, alas, but for many records the reduced number of comparisons will win out. Or shove the data into a database like SQLite or Postgres.
You can use STL's sort, and then all you have to do is to write a comparison function that does what you want (the example in the link should be good enough).
Since you asked for a Tcl solution:
set lol {
{2 4 7 4 9 5}
{4 3 7.3 9 8 1.2}
{2 2.4 7 9 8 5}
{9 1.6 4 3 4 1}
{2 5 7 9 1 4}
{6 8 7 2 1 5}
{4 2 5 2 1 3}
}
set ::EPS 10e-6
proc compareLists {ixo e1 e2} {
foreach ix $ixo {
set d [expr {[lindex $e1 $ix] - [lindex $e2 $ix]}]
if {abs($d) > $::EPS} {
return [expr {($d>0)-($d<0)}]
}
}
return 0
}
foreach li [lsort -command [list compareLists {0 2 1}] $lol] {
puts $li
}
Hope that helps.
Here is a C++ solution:
#include <iostream>
#include <vector>
#include <algorithm>
template <typename Array, typename CompareOrderIndex>
struct arrayCompare
{
private:
size_t
size ;
CompareOrderIndex
index ;
public:
arrayCompare( CompareOrderIndex idx ) : size( idx.size() ), index(idx) { }
bool helper( const Array &a1, const Array &a2, unsigned int num ) const
{
if( a1[ index[size-num] ] > a2[ index[size-num] ] )
{
return false ;
}
if( !(a1[ index[size-num] ] < a2[ index[size-num] ]) )
{
if( 1 != num )
{
return helper( a1, a2, num-1 ) ;
}
}
return true ;
}
bool operator()( const Array &a1, const Array &a2 ) const
{
return helper( a1, a2, size ) ;
}
} ;
int main()
{
std::vector< std::vector<float> > lists = { { 2, 4, 7, 4, 9, 5},
{ 4, 3, 7.3, 9, 8, 1.2 },
{ 2, 2.4, 7, 9, 8, 5 },
{ 4, 2, 5, 2, 1, 3 },
{ 9, 1.6, 4, 3, 4, 1 },
{ 2, 5, 7, 9, 1, 4 },
{ 6, 8, 7, 2, 1, 5 },
{ 4, 2, 5, 2, 1, 1 },
};
//
// Specify the column indexes to compare and the order to compare.
// In this case it will first compare column 1 then 3 and finally 2.
//
//std::vector<int> indexOrder = { 0, 2, 1, 3, 4 ,5 } ;
std::vector<int> indexOrder = { 0, 2, 1 } ;
arrayCompare< std::vector<float>, std::vector<int>> compV( indexOrder ) ;
std::sort( lists.begin(), lists.end(), arrayCompare< std::vector<float>, std::vector<int>>( indexOrder ) ) ;
for(auto p: lists)
{
for( unsigned int i = 0; i < p.size(); ++i )
{
unsigned int idx = ( i > (indexOrder.size() -1) ? i : indexOrder[i] ) ;
std::cout << p[idx] << ", " ;
}
std::cout << std::endl ;
}
}

Subtracting mean from calculation puts answer in list?

I have a function that cycles through two sperate lists and combines them into one as follows:
spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) },
{i, 1, Length[gld], 1}]
This works fine, and generates answers in the form:
{{2009, 6, 1}, 52.72}
But when I add a subtraction, as follows:
spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) - meanspread },
{i, 1, Length[gld], 1}]
I get answers in the format:
{{2009, 6, 1}, {-20.2896}}
This causes issues when I want to use DateLinePlot (all the data is in the extreme right of the graph, and the graph is not usable.
Can anyone suggest what might be happening here and how I may avoid it?
Thanks!
Most likely meanspread is not a number, but a single-item list, such as {1.1}. It's impossible to tell without knowing more details and having a sample of all data/variables you're using.
I don't get this, recreating your inputs as best I can. It really depends on how you're computing meanspread.
(*In[2]:= *)
gld = FinancialData["NYSE:GLD", "Close", {"June 1, 2009", DateString[], "Day"}];
gdx = FinancialData["NYSE:GDX", "Close", {"June 1, 2009", DateString[], "Day"}];
(*In[5]:= *)
First[spread = Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]])}, {i, 1, Length[gld], 1}]]
(*Out[5]= *)
{{2009, 6, 1}, 52.72}
(*In[8]:= *)
meanspread = Mean[spread[[All, 2]]]
(*Out[8]= *)
74.0373
(*In[9]:= *)
First[Table[{gld[[i, 1]], (gld[[i, 2]] - gdx[[i, 2]]) - meanspread}, {i, 1, Length[gld], 1}]]
(*Out[9]= *)
{{2009, 6, 1}, -21.3173}
I think you would benefit from a simpler construction.
spread = {gld[[All, 1]], gld[[All, 2]] - gdx[[All, 2]] - meanspread}\[Transpose]
As already said, if meanspread is a single numerical value, and not a list, the output should be correct.

Mathematica - StringMatch Elements Within a List?

I have a functions that returns cases from a table that match specific strings.
Once I get all the cases that match those strings, I need to search each case (which is its own list) for specific strings and do a Which command. But all I know how to do is turn the whole big list of lists into one string, and then I only get one result (when I need a result for each case).
UC#EncodeTable;
EncodeTable[id_?PersonnelQ, f___] :=
Cases[#,
x_List /;
MemberQ[x,
s_String /;
StringMatchQ[
s, ("*ah*" | "*bh*" | "*gh*" | "*kf*" |
"*mn*"), IgnoreCase -> True]], {1}] &#
Cases[MemoizeTable["PersonnelTable.txt"], {_, id, __}]
That function is returning cases from the table
Which[(StringMatchQ[
ToString#
EncodeTable[11282], ("*bh*" | "*ah*" |
"*gh*" ), IgnoreCase -> True]) == True, 1,
(StringMatchQ[
ToString#
EncodeTable[11282], ("*bh*" | "*ah*" |
"*gh*" ), IgnoreCase -> True]) == False, 0]
That function is SUPPOSED to return a 1 or 0 for each case returned by the first function, but I don't know how to search within lists without making them all one string and return a result for each list.
Well, you probaby want Map, but it's hard to say without seeing what the structure of the data to be operated upon is. Perhaps you can provide an example.
EDIT: In the comment, an example result was given as
dat = {{204424, 11111, SQLDateTime[{1989, 4, 4, 0, 0, 0.}], Null,
"Parthom, Mary, MP", Null, 4147,
"T-00010 AH BH UI", {"T-00010 AH BH UI", "M-14007 LL GG",
"F-Y3710 AH LL UI GG"}, "REMOVED."}, {2040, 11111,
SQLDateTime[{1989, 4, 13, 0, 1, 0.}], Null, "KEVIN, Stevens, STK",
Null, 81238,
"T-00010 ah gh mn", {"T-00010 mn", "M-00100 dd", "P-02320 sd",
"M-14003 ed", "T-Y8800 kf", "kj"}}};
(actually the example had a syntax error so I fixed it in what I hope is the right way).
Now, if I define a function
func = Which[(StringMatchQ[#[[8]], ("*bh*" | "*ah*" | "*gh*"),
IgnoreCase -> True]) == True, 1, True, 0] &;
(note the second condition to be matched may be written as True, see the documentation of Which) which does this
func[dat[[1]]]
(*
-> 1
*)
(note that I've slightly changed func from what you have, in order for it to do what I assume you wanted it to actually do). This can then be applied to dat, of which the elements have the form you gave, as follows:
Map[func, dat]
(*
-> {1, 1}
*)
I'm not sure if this is what you want, I did my best guessing.
EDIT2: In response to the comment about the position of the element to be matched being variable, here is one way:
ClearAll[funcel]
funcel[p_String] :=
Which[StringMatchQ[p, ("*bh*" | "*ah*" | "*gh*"),
IgnoreCase -> True], 1, True, 0];
funcel[___] := 0;
ClearAll[func];
func[lst_List] := Which[MemberQ[Map[funcel, lst], 1], 1, True, 0]
so that
Map[func, dat]
gives {1,1}