So I have a string that is treated as a list when I split them in the \n and it looks like this:
{aaaa2,bbbb2,cccccc,ddddd\n aaaa3,bbbb4,cccccc,ddddd\n aaaa4,bbbb1,cccccc,ddddd\n aaaa5,bbbb3,cccccc,ddddd}
I need to reorder this list based on B column, in a way that the lesser value comes first and equal values stay together. Meaning it should be something like this:
{aaaa4,bbbb1,cccccc,ddddd\n aaaa2,bbbb2,cccccc,ddddd\n aaaa5,bbbb3,cccccc,ddddd\n aaaa3,bbbb4,cccccc,ddddd}
How can I do it?
If you split each element of your list into a list itself, you can use lsort's -index option:
set lst {aaaa2,bbbb2,cccccc,ddddd
aaaa3,bbbb4,cccccc,ddddd
aaaa4,bbbb1,cccccc,ddddd
aaaa5,bbbb3,cccccc,ddddd}
set lst [lmap item $lst { split $item , }]
set lst [lsort -index 1 $lst]
puts [join [lmap sublist $lst { join $sublist , }] \n]
will output
aaaa4,bbbb1,cccccc,ddddd
aaaa2,bbbb2,cccccc,ddddd
aaaa5,bbbb3,cccccc,ddddd
aaaa3,bbbb4,cccccc,ddddd
I'm trying to do something fairly simple (I think) but I can't get my head round it. I'm trying to write a loop that checks if a character variable in a data frame contains any of a certain list of substrings, and to assign a corresponding value to a dummy variable.
so, imagine a data.frame, n=2000, with a variable data.frame$text. Furthermore, I have a character vector containing all the substrings I want to text data.frame$text for. Let's call it hillary_exists :
hillary_exists <- c("Hilary Clinton", "hilary clinton","hilaryclinton", "hillaryclinton", "HilaryClinton",
"HillaryClinton","Hillary Clinton", "Hillary Rodham Clinton", "Hillary", "Hilary", "#Hillary2016", "#ImWithHer",
"Hillary2016", "hillary", "hilary", "Clinton 2016", "Clinton", "Secretary of State Clinton",
"Senator Clinton", "Hilary Rodham", "Hilary Rodham Clinton", "Hilary Rodham-Clinton", "Hillary Rodham-Clinton")
Now, I want my loop to test every row of data.frame$text for the existence of every element of hillary_exists, and if any of them is TRUE, to generate a new value of 1 for the variable data.frame$hillary_mention . This is what I tried:
for(i in hillary_exists){
if(grepl(hillary_exists[i], data.frame$text)){
data.frame$hillary_mention <- 1
} else {
data.frame$hillary_mention <- 0 }
}
But obviously I'm missing the i component for the data.frame$text element, but I don't know how to address it.
Any help would be greatly appreciated! Thanks
One approach we can use to get this to work is to turn hillary_exists into a regex: hillary_regex <- paste(hillary_exists, collapse = "|"). Essentially, this just takes all of your terms and turns it into a big OR statement. This takes care of one of the loops for us automatically. Next, we just loop over our text column, data.frame$text, using sapply.
data.frame$hillary_mention <- sapply(data.frame$text, function(s) grepl(hillary_regex, s, ignore.case = TRUE))
It's good to use ignore.case = TRUE here because there may be mentions in the text that aren't accounted for in hillary_exists, such as "hIllary cLinTon".
Say I have a TCL list, and I have append some elements to my list. Now I want to check if I have appended 6 or 7 elements.
In order to check if list element exists in the place specified by an index I have used:
if { [info exists [lindex $myList 6]] } {
#if I am here then I have appended 7 elems, otherwise it should be at least 6
}
But seams this does not work. How I should do that? properly? It is OK to check if { [lindex $myList 6]] eq "" }
I found this question because I wanted to check if a list contains a specific item, rather than just checking the list's length.
To see if an element exists within a list, use the lsearch function:
if {[lsearch -exact $myList 4] >= 0} {
puts "Found 4 in myList!"
}
The lsearch function returns the index of the first found element or -1 if the given element wasn't found. Through the -exact, -glob (which is the default) or -regexp options, the type of pattern search can be specified.
Why don't you use llength to check the length of your list:
if {[llength $myList] == 6} {
# do something
}
Of course, if you want to check the element at a specific index, then then use lindex to retrieve that element and check that. e.g. if {[lindex $myList 6] == "something"}
Your code using the info exists is not working, because the info exists command checks if a variable exists. So you are basically checking if there is a variable whose name equals the value returned by [lindex $myList 6].
Another way to check for existence in a list in TCL is to simply use 'in', for instance:
if {"4" in $myList} {
puts "Found 4 in my list"
}
It's slightly cleaner/more readable!
Not really getting the point of the map function. Can anyone explain with examples its use?
Are there any performance benefits to using this instead of a loop or is it just sugar?
Any time you want to generate a list based another list:
# Double all elements of a list
my #double = map { $_ * 2 } (1,2,3,4,5);
# #double = (2,4,6,8,10);
Since lists are easily converted pairwise into hashes, if you want a hash table for objects based on a particular attribute:
# #user_objects is a list of objects having a unique_id() method
my %users = map { $_->unique_id() => $_ } #user_objects;
# %users = ( $id => $obj, $id => $obj, ...);
It's a really general purpose tool, you have to just start using it to find good uses in your applications.
Some might prefer verbose looping code for readability purposes, but personally, I find map more readable.
First of all, it's a simple way of transforming an array: rather than saying e.g.
my #raw_values = (...);
my #derived_values;
for my $value (#raw_values) {
push (#derived_values, _derived_value($value));
}
you can say
my #raw_values = (...);
my #derived_values = map { _derived_value($_) } #raw_values;
It's also useful for building up a quick lookup table: rather than e.g.
my $sentence = "...";
my #stopwords = (...);
my #foundstopwords;
for my $word (split(/\s+/, $sentence)) {
for my $stopword (#stopwords) {
if ($word eq $stopword) {
push (#foundstopwords, $word);
}
}
}
you could say
my $sentence = "...";
my #stopwords = (...);
my %is_stopword = map { $_ => 1 } #stopwords;
my #foundstopwords = grep { $is_stopword{$_} } split(/\s+/, $sentence);
It's also useful if you want to derive one list from another, but don't particularly need to have a temporary variable cluttering up the place, e.g. rather than
my %params = ( username => '...', password => '...', action => $action );
my #parampairs;
for my $param (keys %params) {
push (#parampairs, $param . '=' . CGI::escape($params{$param}));
}
my $url = $ENV{SCRIPT_NAME} . '?' . join('&', #parampairs);
you say the much simpler
my %params = ( username => '...', password => '...', action => $action );
my $url = $ENV{SCRIPT_NAME} . '?'
. join('&', map { $_ . '=' . CGI::escape($params{$_}) } keys %params);
(Edit: fixed the missing "keys %params" in that last line)
The map function is used to transform lists. It's basically syntactic sugar for replacing certain types of for[each] loops. Once you wrap your head around it, you'll see uses for it everywhere:
my #uppercase = map { uc } #lowercase;
my #hex = map { sprintf "0x%x", $_ } #decimal;
my %hash = map { $_ => 1 } #array;
sub join_csv { join ',', map {'"' . $_ . '"' } #_ }
See also the Schwartzian transform for advanced usage of map.
It's also handy for making lookup hashes:
my %is_boolean = map { $_ => 1 } qw(true false);
is equivalent to
my %is_boolean = ( true => 1, false => 1 );
There's not much savings there, but suppose you wanted to define %is_US_state?
map is used to create a list by transforming the elements of another list.
grep is used to create a list by filtering elements of another list.
sort is used to create a list by sorting the elements of another list.
Each of these operators receives a code block (or an expression) which is used to transform, filter or compare elements of the list.
For map, the result of the block becomes one (or more) element(s) in the new list. The current element is aliased to $_.
For grep, the boolean result of the block decides if the element of the original list will be copied into the new list. The current element is aliased to $_.
For sort, the block receives two elements (aliased to $a and $b) and is expected to return one of -1, 0 or 1, indicating whether $a is greater, equal or less than $b.
The Schwartzian Transform uses these operators to efficiently cache values (properties) to be used in sorting a list, especially when computing these properties has a non-trivial cost.
It works by creating an intermediate array which has as elements array references with the original element and the computed value by which we want to sort. This array is passed to sort, which compares the already computed values, creating another intermediate array (this one is sorted) which in turn is passed to another map which throws away the cached values, thus restoring the array to its initial list elements (but in the desired order now).
Example (creates a list of files in the current directory sorted by the time of their last modification):
#file_list = glob('*');
#file_modify_times = map { [ $_, (stat($_))[8] ] } #file_list;
#files_sorted_by_mtime = sort { $a->[1] <=> $b->[1] } #file_modify_times;
#sorted_files = map { $_->[0] } #files_sorted_by_mtime;
By chaining the operators together, no declaration of variables is needed for the intermediate arrays;
#sorted_files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (stat($_))[8] ] } glob('*');
You can also filter the list before sorting by inserting a grep (if you want to filter on the same cached value):
Example (a list of the files modified in the last 24 hours sorted the last modification time):
#sorted_files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } grep { $_->[1] > (time - 24 * 3600 } map { [ $_, (stat($_))[8] ] } glob('*');
The map function is an idea from the functional programming paradigm. In functional programming, functions are first-class objects, meaning that they can be passed as arguments to other functions. Map is a simple but a very useful example of this. It takes as its arguments a function (lets call it f) and a list l. f has to be a function taking one argument, and map simply applies f to every element of the list l. f can do whatever you need done to every element: add one to every element, square every element, write every element to a database, or open a web browser window for every element, which happens to be a valid URL.
The advantage of using map is that it nicely encapsulates iterating over the elements of the list. All you have to do is say "do f to every element, and it is up to map to decide how best to do that. For example map may be implemented to split up its work among multiple threads, and it would be totally transparent to the caller.
Note, that map is not at all specific to Perl. It is a standard technique used by functional languages. It can even be implemented in C using function pointers, or in C++ using "function objects".
The map function runs an expression on each element of a list, and returns the list results. Lets say I had the following list
#names = ("andrew", "bob", "carol" );
and I wanted to capitalize the first letter of each of these names. I could loop through them and call ucfirst of each element, or I could just do the following
#names = map (ucfirst, #names);
"Just sugar" is harsh. Remember, a loop is just sugar -- if's and goto can do everything loop constructs do and more.
Map is a high enough level function that it helps you hold much more complex operations in your head, so you can code and debug bigger problems.
To paraphrase "Effective Perl Programming" by Hall & Schwartz,
map can be abused, but I think that it's best used to create a new list from an existing list.
Create a list of the squares of 3,2, & 1:
#numbers = (3,2,1);
#squares = map { $_ ** 2 } #numbers;
Generate password:
$ perl -E'say map {chr(32 + 95 * rand)} 1..16'
# -> j'k=$^o7\l'yi28G
You use map to transform a list and assign the results to another list, grep to filter a list and assign the results to another list. The "other" list can be the same variable as the list you are transforming/filtering.
my #array = ( 1..5 );
#array = map { $_+5 } #array;
print "#array\n";
#array = grep { $_ < 7 } #array;
print "#array\n";
It allows you to transform a list as an expression rather than in statements. Imagine a hash of soldiers defined like so:
{ name => 'John Smith'
, rank => 'Lieutenant'
, serial_number => '382-293937-20'
};
then you can operate on the list of names separately.
For example,
map { $_->{name} } values %soldiers
is an expression. It can go anywhere an expression is allowed--except you can't assign to it.
${[ sort map { $_->{name} } values %soldiers ]}[-1]
indexes the array, taking the max.
my %soldiers_by_sn = map { $->{serial_number} => $_ } values %soldiers;
I find that one of the advantages of operational expressions is that it cuts down on the bugs that come from temporary variables.
If Mr. McCoy wants to filter out all the Hatfields for consideration, you can add that check with minimal coding.
my %soldiers_by_sn
= map { $->{serial_number}, $_ }
grep { $_->{name} !~ m/Hatfield$/ }
values %soldiers
;
I can continue chaining these expression so that if my interaction with this data has to reach deep for a particular purpose, I don't have to write a lot of code that pretends I'm going to do a lot more.
It's used anytime you would like to create a new list from an existing list.
For instance you could map a parsing function on a list of strings to convert them to integers.
As others have said, map creates lists from lists. Think of "mapping" the contents of one list into another. Here's some code from a CGI program to take a list of patent numbers and print hyperlinks to the patent applications:
my #patents = ('7,120,721', '6,809,505', '7,194,673');
print join(", ", map { "$_" } #patents);
As others have said, map is most useful for transforming a list. What hasn't been mentioned is the difference between map and an "equivalent" for loop.
One difference is that for doesn't work well for an expression that modifies the list its iterating over. One of these terminates, and the other doesn't:
perl -e '#x=("x"); map { push #x, $_ } #x'
perl -e '#x=("x"); push #x, $_ for #x'
Another small difference is that the context inside the map block is a list context, but the for loop imparts a void context.