Mathematica - StringMatch Elements Within a List? - list

I have a functions that returns cases from a table that match specific strings.
Once I get all the cases that match those strings, I need to search each case (which is its own list) for specific strings and do a Which command. But all I know how to do is turn the whole big list of lists into one string, and then I only get one result (when I need a result for each case).
UC#EncodeTable;
EncodeTable[id_?PersonnelQ, f___] :=
Cases[#,
x_List /;
MemberQ[x,
s_String /;
StringMatchQ[
s, ("*ah*" | "*bh*" | "*gh*" | "*kf*" |
"*mn*"), IgnoreCase -> True]], {1}] &#
Cases[MemoizeTable["PersonnelTable.txt"], {_, id, __}]
That function is returning cases from the table
Which[(StringMatchQ[
ToString#
EncodeTable[11282], ("*bh*" | "*ah*" |
"*gh*" ), IgnoreCase -> True]) == True, 1,
(StringMatchQ[
ToString#
EncodeTable[11282], ("*bh*" | "*ah*" |
"*gh*" ), IgnoreCase -> True]) == False, 0]
That function is SUPPOSED to return a 1 or 0 for each case returned by the first function, but I don't know how to search within lists without making them all one string and return a result for each list.

Well, you probaby want Map, but it's hard to say without seeing what the structure of the data to be operated upon is. Perhaps you can provide an example.
EDIT: In the comment, an example result was given as
dat = {{204424, 11111, SQLDateTime[{1989, 4, 4, 0, 0, 0.}], Null,
"Parthom, Mary, MP", Null, 4147,
"T-00010 AH BH UI", {"T-00010 AH BH UI", "M-14007 LL GG",
"F-Y3710 AH LL UI GG"}, "REMOVED."}, {2040, 11111,
SQLDateTime[{1989, 4, 13, 0, 1, 0.}], Null, "KEVIN, Stevens, STK",
Null, 81238,
"T-00010 ah gh mn", {"T-00010 mn", "M-00100 dd", "P-02320 sd",
"M-14003 ed", "T-Y8800 kf", "kj"}}};
(actually the example had a syntax error so I fixed it in what I hope is the right way).
Now, if I define a function
func = Which[(StringMatchQ[#[[8]], ("*bh*" | "*ah*" | "*gh*"),
IgnoreCase -> True]) == True, 1, True, 0] &;
(note the second condition to be matched may be written as True, see the documentation of Which) which does this
func[dat[[1]]]
(*
-> 1
*)
(note that I've slightly changed func from what you have, in order for it to do what I assume you wanted it to actually do). This can then be applied to dat, of which the elements have the form you gave, as follows:
Map[func, dat]
(*
-> {1, 1}
*)
I'm not sure if this is what you want, I did my best guessing.
EDIT2: In response to the comment about the position of the element to be matched being variable, here is one way:
ClearAll[funcel]
funcel[p_String] :=
Which[StringMatchQ[p, ("*bh*" | "*ah*" | "*gh*"),
IgnoreCase -> True], 1, True, 0];
funcel[___] := 0;
ClearAll[func];
func[lst_List] := Which[MemberQ[Map[funcel, lst], 1], 1, True, 0]
so that
Map[func, dat]
gives {1,1}

Related

Let a variable equal multiple values in an if-statement [duplicate]

This question already has an answer here:
Generating a new variable using conditional statements
(1 answer)
Closed 3 years ago.
I am doing data clean-up in Stata and I need to recode a variable to equal 1 if a whole set of other variables are equal to 1, 6, or 7.
I can do this using the code below:
replace anyadl = 1 if diffdress==1 | diffdress==6 | diffdress==7 | ///
diffwalk==1 | diffwalk==6 | diffwalk==7 | ///
diffbath==1 | diffbath==6 | diffbath==7 | ///
diffeat==1 | diffeat==6 | diffeat==7 | ///
diffbed==1 | diffbed==6 | diffbed==7 | ///
difftoi==1 | difftoi==6 | difftoi==7
However, this is very inefficient to type out and it is easy to make errors.
Is there a simpler way to do this?
For example, something along the following lines:
replace anyadl = 1 if diff* == (1 | 6 | 7)
Your fantasy syntax wouldn't do what you want even if it were legal, as for example 1|6|7 would be evaluated as 1. That is, in Stata 1 OR 6 OR 7 is in effect true OR true OR true, so true, and thus 1, given the rules non-zero is true as input and true is 1 as output. The expression is 1|6|7 is legal; it's the wildcard in an equality or inequality that isn't.
Stepping back, your code is producing an indicator (some people say dummy) variable with values 1 or missing. In practice such a variable is much more useful if created with values 0 and 1 (and in some instances missing too).
generate anyad1 = 0
foreach v in dress walk bath eat bed toi {
replace anyad1 = 1 if inlist(diff`v', 1, 6, 7)
}
is one approach. In general, note both inlist(foo, 1, 6, 7) and inlist(1, foo, bar, bazz) as useful constructs.
Reading:
This paper on generating indicators
This one on useful functions
This one on inlist() and inrange()
FAQ on true and false in Stata

Keep only numbers in a string / remove all non-numbers

I have a string column where I only need to the numbers from each string, e.g.
A-123 -> 123
456 -> 456
7-X89 -> 789
How can this be done in PowerQuery?
Add column. In custom column formula type this one-liner:
= Text.Select( [Column], {"0".."9"} )
where [Column] is a string column with a mix of digits and other characters. It extracts numbers only. The new column is still a text column, so you have to change the type.
Edit. If there are dots and minus characters:
= Text.Select( [Column1], {"0".."9", "-", "."} ))
Alternatively, you can transform the existing column:
= Table.TransformColumns( #"PreviousStepName" , {{"Column", each Text.Select( _ , {"0".."9","-","."} ) }} )
An alternative solution is to split the values on each number, and remove blanks from the resulting list.
This result can be used as a new list of delimiters to be used with function Splitter.SplitTextByEachDelimiter to split the original text again and combine the resulting list to the final result.
Explanation: Splitter.SplitTextByEachDelimiter first splits on the first delimiter in the list, then on the second and so on. Note that this function creates a function that must be called with the original string as parameter, so like S.S(delimiters)(string).
Example code:
let
Source = Table1,
NumbersOnly = Table.TransformColumns(Source,{{"String", (string) => Text.Combine(Splitter.SplitTextByEachDelimiter(List.Select(Text.SplitAny(string,"0123456789"), each _ <> ""))(string))}})
in
NumbersOnly
First, create a custom function in PowerQuery using New Query - From Other Sources -> Blank Query. Open the Advanced Editor and paste the following code:
(source) =>
let
NumbersOnly = (char) => if Character.ToNumber(char) >=48 and Character.ToNumber(char) < 58 then char else "",
Len = Text.Length(source),
Acc = List.Accumulate(
List.Generate( () => 0, each _ < Len, each _ + 1),
"",
(acc, index) => acc& NumbersOnly(Text.At(source, index))
),
AsNumber = Number.FromText(Acc)
in
AsNumber
Name this query NumbersOnly.
Now in your main query, add another calculated column where you call this NumbersOnly function with the source column, e.g.:
let
Source = Table.FromRecords({[text="A-123"], [text="456"], [text="7-X89"]}),
Result = Table.AddColumn(Source, "Values", each NumbersOnly([text]), Int64.Type)
in
Result

How to populate missing values for string variable in a column based on fixed criteria

To populate missing data with a fixed range of values
I would like to check how to populate column aktype with a range of values (the range of values for the same pidlink are always fixed at 11 types of values listed below) for those cells with missing values. I have about 17,000+ observations that are missing.
The range of values are as follows:
A
B
C
D
E
G
H
I
J
K
L
I have tried the following command but it does not work:-
foreach x of varlist aktype=1/11 {
replace aktype = "A" in 1 if aktype==""
replace aktype = "B" in 2 if aktype==""
replace aktype = "C" in 3 if aktype==""
replace aktype = "D" in 4 if aktype==""
replace aktype = "E" in 5 if aktype==""
replace aktype = "G" in 6 if aktype==""
replace aktype = "H" in 7 if aktype==""
replace aktype = "I" in 8 if aktype==""
replace aktype = "J" in 9 if aktype==""
replace aktype = "K" in 10 if aktype==""
replace aktype = "L" in 11 if aktype==""
}
Would appreciate it if you could advise on the right command to use. Many thanks!
I would generate a variable AK that has letters A-K in positions 1-11 (and 12-22, and 23-33, and so on). The replace missing values with the value of this variable AK.
* generate data
clear
set obs 20
generate aktype = ""
replace aktype = "foo" in 1/1
replace aktype = "bar" in 10/12
* generate variable with letters A-K
generate AK = char(65 + mod(_n - 1, 11))
* fill missing values
replace aktype = AK if missing(aktype)
list
This yields the following.
. list
+-------------+
| aktype AK |
|-------------|
1. | foo A |
2. | B B |
3. | C C |
4. | D D |
5. | E E |
|-------------|
This first addresses the comment "it does not work".
Generally, in this kind of forum you should always be specific and say exactly what happens, namely where the code breaks down and what the result is (e.g. what error message you get). If necessary, add why that is not what is wanted.
Specifically, in this case Stata would get no further than
foreach x of varlist aktype=1/11
which is illegal (as well as unclear to Stata programmers).
You can loop over a varlist. In this case looping over a single variable aktype is legal. (It is usually pointless, but that's style, not syntax.) So this is legal:
foreach x of varlist aktype
By the way, you define x as the loop argument, but never refer to it inside the loop. That isn't illegal, but it is unusual.
You can also loop over a numlist, e.g.
foreach x of numlist 1/11
although
forval x = 1/11
is a more direct way of doing that. All this follows from the syntax diagrams for the commands concerned, where whatever is not explicitly allowed is forbidden.
On occasions when you need to loop over a varlist and a numlist you will need to use different syntax, but what is best depends on the precise problem.
Now second to the question: I can't see any kind of rule in the question for which values get assigned A through L, so can't advise positively.

How to remove duplicate values from list using scala?

I have following list
List(List
(43673,38448,512,36398,1500,**BpEwv+EcDv3z**,58f39535-03b7-4e05-a2d8-3f5b424c8938),
List(302750,759,512,759,3796,**BpEwv+EcDv3v**,069865df-30c3-48c3-bf02-79f2fcff7213),
List(616278,1600,512,107418,15255,**BpEwv+EcDv3v**,b373b731-6f38-4559-808e-1c05fc06af00),
List(0,0,512,0,0,**BpEwv+EcDv3z**,24894b9f-9e30-4073-a538-186a312c670e)
)
I want to remove duplicate values marked in bold (6th index of list for all elements) from this list. The sequence of elements is fixed.
Expected output:
List(
List(43673,38448,512,36398,1500,BpEwv+EcDv3z,58f39535-03b7-4e05-a2d8-3f5b424c8938),
List(302750,759,512,759,3796,BpEwv+EcDv3v,069865df-30c3-48c3-bf02-79f2fcff7213))
How do I remove duplicate values from list using scala??
If you want to remove all occurrences of a specific value in all Lists you can use the following code:
val lss = List(List(1,2,2), List(1,2,3,4,2))
lss map (_.filter(_ != 2)) // List(List(1), List(1, 3, 4))
which removes all occurrences of 2 in all Lists.
If you want to get a single List in return you can use flatMap:
lss flatMap (_.filter(_ != 2)) // List(1, 1, 3, 4)
Based on your expected output, you can do something like
scala> val a = List(List
| (43673,38448,512,36398,1500,"BpEwv+EcDv3z","58f39535-03b7-4e05-a2d8-3f5b424c8938"),
| List(302750,759,512,759,3796,"BpEwv+EcDv3v","069865df-30c3-48c3-bf02-79f2fcff7213"),
| List(616278,1600,512,107418,15255,"BpEwv+EcDv3v","b373b731-6f38-4559-808e-1c05fc06af00"),
| List(0,0,512,0,0,"BpEwv+EcDv3z","24894b9f-9e30-4073-a538-186a312c670e")
| )
a: List[List[Any]] = List(List(43673, 38448, 512, 36398, 1500, BpEwv+EcDv3z, 58f39535-03b7-4e05-a2d8-3f5b424c8938), List(302750, 759, 512, 759, 3796, BpEwv+EcDv3v, 069865df-30c3-48c3-bf02-79f2fcff7213), List(616278, 1600, 512, 107418, 15255, BpEwv+EcDv3v, b373b731-6f38-4559-808e-1c05fc06af00), List(0, 0, 512, 0, 0, BpEwv+EcDv3z, 24894b9f-9e30-4073-a538-186a312c670e))
scala> a.groupBy(_(5)).mapValues(_(0)).map(_._2)
res0: scala.collection.immutable.Iterable[List[Any]] = List(List(302750, 759, 51
2, 759, 3796, BpEwv+EcDv3v, 069865df-30c3-48c3-bf02-79f2fcff7213), List(43673, 3
8448, 512, 36398, 1500, BpEwv+EcDv3z, 58f39535-03b7-4e05-a2d8-3f5b424c8938))
You can also do which reads a little better
scala> a.groupBy(_(5)).mapValues(_(0)).values.toList
res6: List[List[Any]] = List(List(302750, 759, 512, 759, 3796, BpEwv+EcDv3v, 069865df-30c3-48c3-bf02-79f2fcff7213), List(43673, 38448, 512, 36398, 1500, BpEwv+EcDv3z, 58f39535-03b7-4e05-a2d83f5b424c8938))

Fast list-product sign for PackedArray?

As a continuation of my previous question, Simon's method to find the list product of a PackedArray is fast, but it does not work with negative values.
This can be "fixed" by Abs with minimal time penalty, but the sign is lost, so I will need to find the product sign separately.
The fastest method that I tried is EvenQ # Total # UnitStep[-lst]
lst = RandomReal[{-2, 2}, 5000000];
Do[
EvenQ#Total#UnitStep[-lst],
{30}
] // Timing
Out[]= {3.062, Null}
Is there a faster way?
This is a little over two times faster than your solution and apart from the nonsense of using Rule### to extract the relevant term, I find it more clear - it simply counts the number elements with each sign.
EvenQ[-1 /. Rule###Tally#Sign[lst]]
To compare timings (and outputs)
In[1]:= lst=RandomReal[{-2,2},5000000];
s=t={};
Do[AppendTo[s,EvenQ#Total#UnitStep[-lst]],{10}];//Timing
Do[AppendTo[t,EvenQ[-1/.Rule###Tally#Sign[lst]]],{10}];//Timing
s==t
Out[3]= {2.11,Null}
Out[4]= {0.96,Null}
Out[5]= True
A bit late-to-the-party post: if you are ultimately interested in speed, Compile with the C compilation target seems to be about twice faster than the fastest solution posted so far (Tally - Sign based):
fn = Compile[{{l, _Real, 1}},
Module[{sumneg = 0},
Do[If[i < 0, sumneg++], {i, l}];
EvenQ[sumneg]], CompilationTarget -> "C",
RuntimeOptions -> "Speed"];
Here are the timings on my machine:
In[85]:= lst = RandomReal[{-2, 2}, 5000000];
s = t = q = {};
Do[AppendTo[s, EvenQ#Total#UnitStep[-lst]], {10}]; // Timing
Do[AppendTo[t, EvenQ[-1 /. Rule ### Tally#Sign[lst]]], {10}]; // Timing
Do[AppendTo[q, fn [lst]], {10}]; // Timing
s == t == q
Out[87]= {0.813, Null}
Out[88]= {0.515, Null}
Out[89]= {0.266, Null}
Out[90]= True