Python 2.7 current row index on 2d array iteration - python-2.7

When iterating on a 2d array, how can I get the current row index? For example:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
Something like:
for rows in x:
print x current index (for example, when iterating on [ 5. 6. 7. 8.], return 1)

Enumerate is a built-in function of Python. It’s usefulness can not be summarized in a single line. Yet most of the newcomers and even some advanced programmers are unaware of it. It allows us to loop over something and have an automatic counter. Here is an example:
for counter, value in enumerate(some_list):
print(counter, value)
And there is more! enumerate also accepts an optional argument which makes it even more useful.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
.
# Output:
# 1 apple
# 2 banana
# 3 grapes
# 4 pear
The optional argument allows us to tell enumerate from where to start the index. You can also create tuples containing the index and list item using a list. Here is an example:
my_list = ['apple', 'banana', 'grapes', 'pear']
counter_list = list(enumerate(my_list, 1))
print(counter_list)
.
# Output: [(1, 'apple'), (2, 'banana'), (3, 'grapes'), (4, 'pear')]

enumerate:
In [42]: x = [[ 1, 2, 3, 4],
...: [ 5, 6, 7, 8],
...: [ 9, 0, 3, 6]]
In [43]: for index, rows in enumerate(x):
...: print('current index {}'.format(index))
...: print('current row {}'.format(rows))
...:
current index 0
current row [1, 2, 3, 4]
current index 1
current row [5, 6, 7, 8]
current index 2
current row [9, 0, 3, 6]

Related

Best way to shift a list in Python?

I have a list of numbers, let's say :
my_list = [2, 4, 3, 8, 1, 1]
From this list, I want to obtain a new list. This list would start with the maximum value until the end, and I want the first part (from the beginning until just before the maximum) to be added, like this :
my_new_list = [8, 1, 1, 2, 4, 3]
(basically it corresponds to a horizontal graph shift...)
Is there a simple way to do so ? :)
Apply as many as you want,
To the left:
my_list.append(my_list.pop(0))
To the right:
my_list.insert(0, my_list.pop())
How about something like this:
max_idx = my_list.index(max(my_list))
my_new_list = my_list[max_idx:] + my_list[0:max_idx]
Alternatively you can do something like the following,
def shift(l,n):
return itertools.islice(itertools.cycle(l),n,n+len(l))
my_list = [2, 4, 3, 8, 1, 1]
list(shift(my_list, 3))
Elaborating on Yasc's solution for moving the order of the list values, here's a way to shift the list to start with the maximum value:
# Find the max value:
max_value = max(my_list)
# Move the last value from the end to the beginning,
# until the max value is the first value:
while my_list[0] != max_value:
my_list.insert(0, my_list.pop())

Making a dictionary? from 2 lists / columns

I have a large database with several columns, i need data from 2 of these.
The end result is to have 2 drop down menus where the first one sets "names" and the second one is the "numbers" values that has been merged into the name. I just need the data available so i can input it into another program.
So a list or dictionary that contains the Unique values of the "names" list, with the numbers from the numbers list appended to them.
# Just a list of random names and numbers for testing
names = [
"Cindi Brookins",
"Cumberband Hamberdund",
"Roger Ramsden",
"Cumberband Hamberdund",
"Lorean Dibble",
"Lorean Dibble",
"Coleen Snider",
"Rey Bains",
"Maxine Rader",
"Cindi Brookins",
"Catharine Vena",
"Lanny Mckennon",
"Berta Urban",
"Rey Bains",
"Roger Ramsden",
"Lanny Mckennon",
"Catharine Vena",
"Berta Urban",
"Maxine Rader",
"Coleen Snider"
]
numbers = [
6,
5,
7,
10,
3,
9,
1,
1,
2,
7,
4,
2,
8,
3,
8,
10,
4,
9,
6,
5
]
So in the above example "Berta Urban" would appear once, but still have the numbers 8 and 9 assigned, "Rey Bains" would have 1 and 3.
I have tried with
mergedlist = dict(zip(names, numbers))
But that only assigns the last of the numbers to the name.
I am not sure if i can make a dictionary with Unique "names" that holds multiple "numbers".
You only get the last number associated with each name because dictionary keys are unique (otherwise they wouldn't be much use). So if you do
mergedlist["Berta Urban"] = 8
and after that
mergedlist["Berta Urban"] = 9
the result will be
{'Berta Urban': 9}
Just as if you did:
berta_urban = 8
berta_urban = 9
In that case you would expect the value of berta_urban to be 9 and not [8,9].
So, as you can see, you need an append not an assignment to your dict entry.
from collections import defaultdict
mergedlist = defaultdict(list)
for (name,number) in zip(names, numbers): mergedlist[name].append(number)
This gives:
{'Coleen Snider': [1, 5],
'Cindi Brookins': [6, 7],
'Cumberband Hamberdund': [5, 10],
'Roger Ramsden': [7, 8],
'Lorean Dibble': [3, 9],
'Rey Bains': [1, 3],
'Maxine Rader': [2, 6],
'Catharine Vena': [4, 4],
'Lanny Mckennon': [2, 10],
'Berta Urban': [8, 9]
}
which is what I think you want. Note that you will get duplicates, as in 'Catharine Vena': [4, 4] and you will also get a list of numbers for each name, even if the list has only one number in it.
You cannot have multiple keys of the same name in a dict, but your dict keys can be unique while holding a list of matching numbers. Something like:
mergedlist = {}
for i, v in enumerate(names):
mergedlist[v] = mergedlist.get(v, []) + [numbers[i]]
print(mergedlist["Berta Urban"]) # prints [8, 9]
Not terribly efficient, tho. In dependence of the datatbase you're using, chances are that the database can get you the results in the form you prefer faster than you post-processing and reconstructing the data.

Python Dask - vertical concatenation of 2 DataFrames

I am trying to vertically concatenate two Dask DataFrames
I have the following Dask DataFrame:
d = [
['A','B','C','D','E','F'],
[1, 4, 8, 1, 3, 5],
[6, 6, 2, 2, 0, 0],
[9, 4, 5, 0, 6, 35],
[0, 1, 7, 10, 9, 4],
[0, 7, 2, 6, 1, 2]
]
df = pd.DataFrame(d[1:], columns=d[0])
ddf = dd.from_pandas(df, npartitions=5)
Here is the data as a Pandas DataFrame
A B C D E F
0 1 4 8 1 3 5
1 6 6 2 2 0 0
2 9 4 5 0 6 35
3 0 1 7 10 9 4
4 0 7 2 6 1 2
Here is the Dask DataFrame
Dask DataFrame Structure:
A B C D E F
npartitions=4
0 int64 int64 int64 int64 int64 int64
1 ... ... ... ... ... ...
2 ... ... ... ... ... ...
3 ... ... ... ... ... ...
4 ... ... ... ... ... ...
Dask Name: from_pandas, 4 tasks
I am trying to concatenate 2 Dask DataFrames vertically:
ddf_i = ddf + 11.5
dd.concat([ddf,ddf_i],axis=0)
but I get this error:
Traceback (most recent call last):
...
File "...", line 572, in concat
raise ValueError('All inputs have known divisions which cannot '
ValueError: All inputs have known divisions which cannot be concatenated
in order. Specify interleave_partitions=True to ignore order
However, if I try:
dd.concat([ddf,ddf_i],axis=0,interleave_partitions=True)
then it appears to be working. Is there a problem with setting this to True (in terms of performance - speed)? Or is there another way to vertically 2 concatenate Dask DataFrames?
If you inspect the divisions of the dataframe ddf.divisions, you will find, assuming one partition, that it has the edges of the index there: (0, 4). This is useful to dask, as it knows when you do some operation on the data, not to use a partition not including required index values. This is also why some dask operations are much faster when the index is appropriate for the job.
When you concatenate, the second dataframe has the same index as the first. Concatenation would work without interleaving if the values of the index had different ranges in the two partitions.
mdurant's answer is correct and this answer elaborate with MCVE code snippets using Dask v2021.08.1. Examples make it easier to understand divisions and interleaving.
Vertically concatenating DataFrames
Create two DataFrames, concatenate them, and view the results.
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf3 = dd.concat([ddf1, ddf2])
print(ddf3.compute())
nums letters
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
0 88 xx
1 99 yy
Divisions metadata when vertically concatenating
Create two DataFrames, concatenate them, and illustrate that sometimes this operation will cause divisions metadata to be lost.
def print_partitions(ddf):
for i in range(ddf.npartitions):
print(ddf.partitions[i].compute())
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1.divisions # (0, 3, 5)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2.divisions # (0, 1)
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (None, None, None, None)
Set interleave_partitions=True to avoid losing the divisions metadata.
ddf3_interleave = dd.concat([ddf1, ddf2], interleave_partitions=True)
ddf3_interleave.divisions # (0, 1, 3, 5)
When interleaving isn't necessary
Create two DataFrames without overlapping divisions, concatenate them, and confirm that the divisions metadata is not lost:
df = pd.DataFrame(
{"nums": [1, 2, 3, 4], "letters": ["a", "b", "c", "d"], "some_index": [4, 5, 6, 7]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1 = ddf1.set_index("some_index")
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"], "some_index": [10, 20]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2 = ddf2.set_index("some_index")
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (4, 6, 10, 20)
I wrote a blog post to explain this in more detail. Let me know if you'd like the link.

How to turn column of number into a list of strings?

I don't know why I cant figure this out. But I have a column of numbers that I would like to turn into a list of strings. I should of mention this when i initially posted this but this isn't a DataFrame or did it come from a file this is a result of a some code, sorry wasn't trying to waste anybody's time, I just didn't want to add a bunch of clutter. This is exactly how it prints out.
Here is my column of numbers.
3,1,3
3,1,3
3,1,3
3,3,3
3,1,1
And I would like them to look like this.
['3,1,3', '3,1,3', '3,1,3', '3,3,3', '3,1,1']
I'm trying to find a way that is not dependent on how many numbers are in each row or how many sets of numbers are in the column.
Thanks, really appreciate it.
Assume you start with a DataFrame
df = pd.DataFrame([[3, 1, 3], [3, 1, 3], [3, 1, 3], [3, 3, 3], [3, 1, 1]])
df.astype(str).apply(lambda x: ','.join(x.values), axis=1).values.tolist()
Looks like:
['3,1,3', '3,1,3', '3,1,3', '3,3,3', '3,1,1']
def foo():
l = []
with open("file.asd", "r") as f:
for line in f:
l.append(line)
return l
To turn your dataframe in to strings, use the astype function:
df = pd.DataFrame([[3, 1, 3], [3, 1, 3], [3, 1, 3], [3, 3, 3], [3, 1, 1]])
df = df.astype('str')
Then manipulating your columns becomes easy, you can for instance create a new column:
In [29]:
df['temp'] = df[0] + ',' + df[1] + ',' + df[2]
df
Out[29]:
0 1 2 temp
0 3 1 3 3,1,3
1 3 1 3 3,1,3
2 3 1 3 3,1,3
3 3 3 3 3,3,3
4 3 1 1 3,1,1
And then compact it into a list:
In [30]:
list(df['temp'])
Out[30]:
['3,1,3', '3,1,3', '3,1,3', '3,3,3', '3,1,1']
# Done in Jupyter notebook
# add three quotes on each side of your column.
# The advantage to dataframe is the minimal number of operations for
# reformatting your column of numbers or column of text strings into
# a single string
a = """3,1,3
3,1,3
3,1,3
3,3,3
3,1,1"""
b = f'"{a}"'
print('String created with triple quotes:')
print(b)
c = a.split('\n')
print ("Use split() function on the string. Split on newline character:")
print(c)
print ("Use splitlines() function on the string:")
print(a.splitlines())

Pandas: What is the best way to 'crop' as large dataframe to only the previous 1000 days?

I have a dataframe where the index is made up of datetimes. I also have an anchor date and I know that I only want the second dataframe to contain the 1000 days previous to the anchor date. What is the best way to do this?
Don't know if it's the best way, but it should work
Create example DataFrame:
>>> dates = [pd.datetime(2012, 5, 4), pd.datetime(2012, 5, 5), pd.datetime(2012, 5, 6), pd.datetime(2012, 5, 1), pd.datetime(2012, 5, 2), pd.datetime(2012, 5, 3)]
>>> values = [1, 2, 3, 4, 5, 6]
>>> df = pd.DataFrame(values, dates)
>>> df
>>> df
0
2012-05-04 1
2012-05-05 2
2012-05-06 3
2012-05-01 4
2012-05-02 5
2012-05-03 6
Suppose we want 2 days back from 2012-05-04:
>>> date_end = pd.datetime(2012, 5, 4)
>>> date_start = date_end - pd.DateOffset(days=2)
>>> date_start, date_end
(datetime.datetime(2012, 5, 2, 0, 0), datetime.datetime(2012, 5, 4, 0, 0))
Now let's try to get rows by label indexing:
>>> df.loc[date_start:date_end]
Empty DataFrame
Columns: [0]
Index: []
That's because our index is not sorted, so let's fix it:
>>> df.sort_index(inplace=True)
>>> df.loc[date_start:date_end]
0
2012-05-02 5
2012-05-03 6
2012-05-04 1
It's also possible to get rows by datetime indexing:
>>> df[date_start:date_end]
0
2012-05-02 5
2012-05-03 6
2012-05-04 1
Keep in mind that I'm still not an expert in Pandas, but I like it for Data Analysis very much.
Hope it helps.