This question already has answers here:
What is the purpose of the single underscore "_" variable in Python?
(5 answers)
Closed 6 years ago.
I am using python to learn about data science. Everything is fine but recently I found below code in a book. I can't understand for what purpose '_' is being used.
def raw_majority_vote(labels):
votes = Counter(labels)
winner, _ = votes.most_common(1)[0]
return winner
In this piece of code you posted, the _ is a variable name.
You can assign values to _.
I.e.:
>>> _ = "test"
>>> print _
Output:
test
If you take a look at Counter.most_common() docs, you'll see this message:
Return a list of the n most common elements and their counts from the
most common to the least. If n is omitted or None, most_common()
returns all elements in the counter. Elements with equal counts are
ordered arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
So, in your code, winner, _ = votes.most_common(1)[0]
The variable winner gets the first value of the first tuple contained in this most_common list.
And the variable, _, gets the second value of the first tuple in this list.
In this case:
winner = 'a'
_ = 5
It's a throwaway variable. Whatever votes.most_common(1)[0] is can be unpacked to two values and the writer of that script is only interested in the first value.
Usually it is used when you don't care about returned variable and you want to discard it but still prevent any ValueErrors.
Related
This question already has answers here:
What does `:_*` (colon underscore star) do in Scala?
(4 answers)
Closed 2 years ago.
When seeing some co-workers' Scala-Spark code, sometimes I encounter that they use lists to filter dataframes as in this example:
val myList: List[String] = List("0661", "0239", "0949", "0380", "0279", "0311")
df.filter(col("col1").isin(myList:_*)
The code above works perfectly, this one, however, does not:
df.filter(col("col1").isin(myList)
What I don't understand is, what is that "colon underscore star" :_* exactly doing?
Thanks in advance!
It does mean "pass list as a separate parameters". It works for methods, that have a vararg argument, like "any number of strings", but not a List[String] version.
Spark's isin function has signature isin(list: Any*): Column, Any* means "any number of arguments of type Any". Not very descriptive, but here you can pass either any number of strings, or any number of cols.
With :_* syntax, you're saying to compiler "replace my list with varargs", it's equialent to writing .isin("0661", "0239" ...)
Also, since Spark 2.4.0 there's function isInCollection, that takes Iterable, so you can pass List there directly.
This is sometimes called splat operator. It is used to to adapt a sequence (Array, List, Seq, Vector, etc.) so it can be passed as an argument for a varargs method parameter:
def printAll(strings: String*):Unit = {
strings.foreach(println)
}
val fruits = List("apple", "banana", "cherry")
printAll(fruits:_*)
If any method contains any repeated parameter. If you want to pass any Iterable in the method's repeated parameter, to convert your Iterable to repeated parameter you will use :_*
def x(y:Int*):Seq[Int]={ // y:Int* is a repeated parameter.
y
}
x(List(1,2,3,4):_*) <--- you are passing List into repeated parameter
This question already has answers here:
How to get the n next values of a generator into a list
(5 answers)
Fetch first 10 results from a list in Python
(4 answers)
Closed 9 days ago.
With linq I would
var top5 = array.Take(5);
How to do this with Python?
Slicing a list
top5 = array[:5]
To slice a list, there's a simple syntax: array[start:stop:step]
You can omit any parameter. These are all valid: array[start:], array[:stop], array[::step]
Slicing a generator
import itertools
top5 = itertools.islice(my_list, 5) # grab the first five elements
You can't slice a generator directly in Python. itertools.islice() will wrap an object in a new slicing generator using the syntax itertools.islice(generator, start, stop, step)
Remember, slicing a generator will exhaust it partially. If you want to keep the entire generator intact, perhaps turn it into a tuple or list first, like: result = tuple(generator)
import itertools
top5 = itertools.islice(array, 5)
#Shaikovsky's answer is excellent, but I wanted to clarify a couple of points.
[next(generator) for _ in range(n)]
This is the most simple approach, but throws StopIteration if the generator is prematurely exhausted.
On the other hand, the following approaches return up to n items which is preferable in many circumstances:
List:
[x for _, x in zip(range(n), records)]
Generator:
(x for _, x in zip(range(n), records))
In my taste, it's also very concise to combine zip() with xrange(n) (or range(n) in Python3), which works nice on generators as well and seems to be more flexible for changes in general.
# Option #1: taking the first n elements as a list
[x for _, x in zip(xrange(n), generator)]
# Option #2, using 'next()' and taking care for 'StopIteration'
[next(generator) for _ in xrange(n)]
# Option #3: taking the first n elements as a new generator
(x for _, x in zip(xrange(n), generator))
# Option #4: yielding them by simply preparing a function
# (but take care for 'StopIteration')
def top_n(n, generator):
for _ in xrange(n):
yield next(generator)
The answer for how to do this can be found here
>>> generator = (i for i in xrange(10))
>>> list(next(generator) for _ in range(4))
[0, 1, 2, 3]
>>> list(next(generator) for _ in range(4))
[4, 5, 6, 7]
>>> list(next(generator) for _ in range(4))
[8, 9]
Notice that the last call asks for the next 4 when only 2 are remaining. The use of the list() instead of [] is what gets the comprehension to terminate on the StopIteration exception that is thrown by next().
Do you mean the first N items, or the N largest items?
If you want the first:
top5 = sequence[:5]
This also works for the largest N items, assuming that your sequence is sorted in descending order. (Your LINQ example seems to assume this as well.)
If you want the largest, and it isn't sorted, the most obvious solution is to sort it first:
l = list(sequence)
l.sort(reverse=True)
top5 = l[:5]
For a more performant solution, use a min-heap (thanks Thijs):
import heapq
top5 = heapq.nlargest(5, sequence)
With itertools you will obtain another generator object so in most of the cases you will need another step the take the first n elements. There are at least two simpler solutions (a little bit less efficient in terms of performance but very handy) to get the elements ready to use from a generator:
Using list comprehension:
first_n_elements = [generator.next() for i in range(n)]
Otherwise:
first_n_elements = list(generator)[:n]
Where n is the number of elements you want to take (e.g. n=5 for the first five elements).
This should work
top5 = array[:5]
I am stuck trying to understand the mechanics behind this combined input(), loop & list-comprehension; from Codegaming's "MarsRover" puzzle. The sequence creates a 2D line, representing a cut-out of the topology in an area 6999 units wide (x-axis).
Understandably, my original question was put on hold, being to broad. I am trying to shorten and to narrow the question: I understand list comprehension basically, and I'm ok experienced with for-loops.
Like list comp:
land_y = [int(j) for j in range(k)]
if k = 5; land_y = [0, 1, 2, 3, 4]
For-loops:
for i in the range(4)
a = 2*i = 6
ab.append(a) = 0,2,4,6
But here, it just doesn't add up (in my head):
6999 points are created along the x-axis, from 6 points(x,y).
surface_n = int(input())
for i in range(surface_n):
land_x, land_y = [int(j) for j in input().split()]
I do not understand where "i" makes a difference.
I do not understand how the data "packaged" inside the input. I have split strings of integers on another task in almost exactly the same code, and I could easily create new lists and work with them - as I understood the structure I was unpacking (pretty simple being one datatype with one purpose).
The fact that this line follows within the "game"-while-loop confuses me more, as it updates dynamically as the state of the game changes.
x, y, h_speed, v_speed, fuel, rotate, power = [int(i) for i in input().split()]
Maybe someone could give an example of how this could be written in javascript, haskell or c#? No need to be syntax-correct, I'm just struggling with the concept here.
input() takes a line from the standard input. So it’s essentially reading some value into your program.
The way that code works, it makes very hard assumptions on the format of the input strings. To the point that it gets confusing (and difficult to verify).
Let’s take a look at this line first:
land_x, land_y = [int(j) for j in input().split()]
You said you already understand list comprehension, so this is essentially equal to this:
inputs = input().split()
result = []
for j in inputs:
results.append(int(j))
land_x, land_y = results
This is a combination of multiple things that happen here. input() reads a line of text into the program, split() separates that string into multiple parts, splitting it whenever a white space character appears. So a string 'foo bar' is split into ['foo', 'bar'].
Then, the list comprehension happens, which essentially just iterates over every item in that splitted input string and converts each item into an integer using int(j). So an input of '2 3' is first converted into ['2', '3'] (list of strings), and then converted into [2, 3] (list of ints).
Finally, the line land_x, land_y = results is evaluated. This is called iterable unpacking and essentially assumes that the iterable on the right has exactly as many items as there are variables on the left. If that’s the case then it’s just a nice way to write the following:
land_x = results[0]
land_y = results[1]
So basically, the whole list comprehension assumes that there is an input of two numbers separated by whitespace, it then splits those into separate strings, converts those into numbers and then assigns each number to a separate variable land_x and land_y.
Exactly the same thing happens again later with the following line:
x, y, h_speed, v_speed, fuel, rotate, power = [int(i) for i in input().split()]
It’s just that this time, it expects the input to have seven numbers instead of just two. But then it’s exactly the same.
Or maybe I should say, ways to skip having to initialize at all.
I really hate that every time I want to do a simple count variable, I have to say, "hey python, this variable starts at 0." I want to be able to say count+=1and have it instantly know to start from 0 at the first iteration of the loop. Maybe there's some sort of function I can design to accomodate this? count(1) that adds 1 to a self-created internal count variable that sticks around between iterations of the loop.
I have the same dislike for editing strings/lists into a new string/list.
(Initializing new_string=""/new_list=[] before the loop).
I think list comprehensions may work for some lists.
Does anyone have some pointers for how to solve this problem? I am fairly new, I've only been programming off and on for half a year.
Disclaimer: I do not think that this will make initialization any cleaner. Also, in case you have a typo in some uses of your counter variable, you will not get a NameError but instead it will just silently create and increment a second counter. Remember the Zen of Python:
Explicit is better than implicit.
Having said that, you could create a special class that will automatically add missing attributes and use this class to create and auto-initialize all sorts of counters:
class Counter:
def __init__(self, default_func=int):
self.default = default_func
def __getattr__(self, name):
if name not in self.__dict__:
self.__dict__[name] = self.default()
return self.__dict__[name]
Now you can create a single instance of that class to create an arbitrary number of counters of the same type. Example usage:
>>> c = Counter()
>>> c.foo
0
>>> c.bar += 1
>>> c.bar += 2
>>> c.bar
3
>>> l = Counter(list)
>>> l.blub += [1,2,3]
>>> l.blub
[1, 2, 3]
In fact, this is similar to what collections.defaultdict does, except that you can use dot-notation for accessing the counters, i.e. c.foo instead of c['foo']. Come to think of it, you could even extend defaultdict, making the whole thing much simpler:
class Counter(collections.defaultdict):
def __getattr__(self, name):
return self[name]
If you are using a counter in a for loop you can use enumerate:
for counter, list_index in enumerate(list):
the counter is the first variable in the statement and 1 is added to it per iteration of the loop, the next variable is the value of that iteration in the list. I hope this answers your first question as for your second, the following code might help
list_a = ["this", "is"]
list_b = ["a", "test"]
list_a += list_b
print(list_a)
["this", "is", "a", "test"]
The += works for strings as well because they are essentially lists aw well. Hope this helps!
I am using the following regexp to match all occurrences of a special kind of number:
^([0-57-9]|E)[12][0-9]{3}[A-Z]?[A-Z]([0-9]{3}|[0-9]{4})
Let's assume that this regex matches the following five numbers:
31971R0974
11957E075
31971R0974-A01P2
31971R0974-A05
51992PC0405
These matches are then printed using the following code. This prints each item in the list and if the item contains a dash, everything after the dash is discarded.
def number_function():
for x in range(0, 10):
print("Number", number_variable[x].split('-', 1)[0])
However, this would print five lines where lines 1, 3 and 4 would be the same.
I need your help to write a script which compares each item with all previous items and only prints the item if it does not already exist.
So, the desired output would be the following three lines:
31971R0974
11957E075
51992PC0405
EDIT 2:
I solved it! I just needed to do some moving around. Here's the finished product:
def instrument_function():
desired = set()
for x in range(0, 50):
try:
instruments_celex[x]
except IndexError:
pass
else:
before_dash = instruments_celex[x].split('-', 1)[0]
desired.add(before_dash)
for x in desired:
print("Cited instrument", x)
I've done practically no python up until now, but this might do what you're after
def number_function():
desired = set()
for x in range(0, 10):
before_hyphen = number_variable[x].split('-', 1)[0]
desired.add(before_hyphen)
for x in desired:
print("Number", x)
Here is a version of your "finished" function that is more reaonable.
# Don't use instruments_celex as a global variable, that's terrible.
# Pass it in to the function instead:
def instrument_function(instruments_celex):
unique = set()
# In Python you don't need an integer loop variable. This is not Java.
# Just loop over the list:
for entry in instruments_celex:
unique.add(entry.split('-', 1)[0])
for entry in unique:
print("Cited instrument", entry)
You can also make use of generator expressions to make this shorter:
def instrument_function(instruments_celex):
unique = set(entry.split('-', 1)[0] for entry in instruments_celex)
for entry in set:
print("Cited instrument", entry)
That's it. It's so simple in fact that I wouldn't make a separate function of it unless I do it at least two times in the program.