Suppose I have the following dataframe:
df1 = {'Column_1': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'x': ['0', '1', '2', '3', '0', '1', '2', '3']}
df1 = pd.DataFrame (df1, columns = ['Column_1','x'])
df1
I want to create a new column called 'x!'. This is calculated by taking the value in the row 'x' and multiplying it be the row-1 of 'x!'. The value in the first row for 'x!' is 1. I need to calculation to reset when the value in 'Column_1' changes. The desired output would be the follwing:
df2 = {'Column_1': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'x': ['0', '1', '2', '3', '0', '1', '2', '3'],
'x!': ['1', '1', '2', '6', '1', '1', '2', '6']}
df2 = pd.DataFrame (df2, columns = ['Column_1','x', 'x!'])
df2
Where 'x' is 3, 'x!' is 6 because 3 x 2 (x! row-1 = 2) is equal to 6.
How would I do this?
Thanks
Try this simple and clear approach:
let df1 = {'Column_1': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],'x': ['0', '1', '2', '3', '0', '1', '2', '3']}
let newX = ['1'];
for(i=1; i<df1['x'].length; i++){
let check = parseInt(df1.x[i],10)*parseInt(df1.x[i-1],10)
if(check ===0){
newX.push('1')
}else{
newX.push(check.toString())
}
}
console.log(df1.x) //INITIAL ['0', '1', '2', '3', '0', '1', '2', '3']
console.log(newX) //RESULT ['1', '1', '2', '6', '1', '1', '2', '6']
After that you can assign your new x! array:
df1['x!'] = newX
And apply to DataFrame
df2 = pd.DataFrame (df2, columns = Object.keys(df1))
I hope this solves your case. :)
Related
with open('askhsh11.txt', 'r') as f:
raw_list = f.read().split('\n')
for i in range(len(raw_list)):
print raw_list[i].split(',')
for i in range(len(raw_list)):
raw_list[i]=int(i)
print raw_list
the result is :
['1', '2', '3', '4']
['5', '6', '7', '8']
[0, '5,6,7,8']
[0, 1]
but i want the result to be:
['1', '2', '3', '4']
['5', '6', '7', '8']
[1, 2, 3, 4]
[5, 6, 7, 8]
How i convert a list of strings into a integers?
you can use:
result = [int(c) for s in raw_list for c in s.split(',')]
output:
[1, 2, 3, 4, 5, 6, 7, 8]
you can just replace the "'" character to remove them
with open('askhsh11.txt', 'r') as f:
raw_list = f.read().replace("'","").split('\n')
numbers = [int(num) for num in raw_list]
To be honnest . i'm biginner in this langage . but i can tell you that you can use the 2 first output to generate the 2 second output .
what i mean that it's easy to convert
this :
['1', '2', '3', '4']
to this :
[1, 2, 3, 4] .
just use the function that convert the string like '1' to the integer 1 .
sorry i gorget its name but u find it on python support
for evryone who know the function's name .plz make a comment and thank you so much
I have a set of indices:
indices = (['1', '1.2', '2', '2.2', '3', '4'])
and a dataset, where the first element identifies a person, the second a round, and the third is the index from the indices set:
dataset = [['A', '1', '1'], ['A', '1', '1.2'], ['B', '1', '2'], ['C', '2', '3']]
I would like to form a binary vector, where for each person and for each individual round, the indices are marked either present (with a 1) or not (with a 0).
The desired output would be something like so, where for A, the vector represents the presence of the indices 1 and 1.2, for B, the index 2, and for C, the index 3. Note that for A, there is only one record, but 2 indices are present.
['A', '1', '1, 1, 0, 0, 0, 0']
['B', '1', '0, 0, 1, 0, 0, 0']
['C', '2', '0, 0, 0, 0, 1, 0']
I'm having a bit of trouble getting my head around the looping of the indices over the dataset. My idea was to loop through the indices set the same amount of time as the number of lists in the dataset. But I dont think this is the most efficient way, and any help would be appreciated!
I'd do it something like this:
from itertools import groupby
for k, g in groupby(dataset, lambda x: x[:2]):
vals = [x[2] for x in g]
print(k + [", ".join("1" if x in vals else "0" for x in indices)])
Output
['A', '1', '1, 1, 0, 0, 0, 0']
['B', '1', '0, 0, 1, 0, 0, 0']
['C', '2', '0, 0, 0, 0, 1, 0']
Is this what you were looking for?
Here's a solution without loops
import pandas as pd
indlist=['1', '1.2', '2', '2.2', '3', '4']
dataset = [['A', '1', '1'], ['A', '1', '1.2'], ['B', '1', '2'], ['C', '2', '3']]
df=pd.DataFrame(dataset,columns=['player','round','ind']).set_index('ind').reindex(indlist)
ans=df.reset_index().pivot('player','ind','round').fillna(0)[1:]
I have the following text file-
http://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt
I need a python code to give me the specific entries of the matrix. I'm using multidimensional lists and would prefer doing it without the numpy library in python. My intent is to form lists within lists where the outer(main) list contains rows of the matrix and the inner list contains the cells of the matrix.
I'm using the following code-
handle=open(fname)
li=[]
matrix=[]
for line in handle:
if not line.startswith('#'):
a=line.split()
for i in a:
li.append(i)
matrix.append(li)
print matrix
However, this just returns a one dimensional list with each element being one cell of the matrix. I'm lost regarding how to fix this. The output should be something of this form-
[['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'B', 'Z', 'X', '*'],
['A', '4', '-1', '-2', '-2', '0', '-1', '-1', '0', '-2', '-1', '-1', '-1', '-1', '-2', '-1', '1', '0', '-3', '-2', '0', '-2', '-1', '0', '-4']]
I think you want to produce a matrix, for example matrix[0][1] refer to a value, right? see following code.
handle=open(fname)
matrix=[]
col={}
idx=0
row={}
idr=0
# get 1st line as column
first_line=0
for line in handle:
if not line.startswith('#'):
if first_line == 0:
first_line=1
# get column header
for i in line.split():
col[i]=idx
idx=idx+1
else:
a = line.split()
x = a.pop(0)
# get row name
row[x]=idr
matrix.append(a)
idr=idr+1
print matrix
print matrix[col['A']][row['A']]
See if this is what you want.
You aren't getting the results you want because you're putting all the values into the same li list. The simplest fix for the issue is simply to move the place you create li into the loop:
handle=open(fname)
matrix=[]
for line in handle:
if not line.startswith('#'):
li=[] # move this line down!
a=line.split()
for i in a:
li.append(i)
matrix.append(li)
print matrix
The inner loop there is a bit silly though. You're adding all the values from one list (a) to another list (li), then throwing away the first list. You should just use the list returned by str.split directly:
handle=open(fname)
matrix=[]
for line in handle:
if not line.startswith('#'):
matrix.append(line.split())
print matrix
Is there a way to collect values in Counter with respect to occurring number?
Example:
Let's say I have a list:
list = ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd']
When I do the Counter:
Counterlist = Counter(list)
I'll get:
Counter({'b': 3, 'a': 3, 'c': 3, 'd': 2})
Then I can select let's say a and get 3:
Counterlist['a'] = 3
But how I can select the occurring number '3'?
Something like:
Counterlist[3] = ['b', 'a', 'c']
Is that possible?
You can write the following
import collections
my_data = ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd']
result = collections.defaultdict(list)
for k, v in collections.Counter(my_data).items():
result[v].append(k)
and then access result[3] to obtain the characters with that count.
I would like to find overlapping lists in two lists of lists.
ListLeft = [['A', 'B', 'C'], ['1', '2', '3', '4'], ['x', 'y'], ['one', 'two', 'three']]
ListRight = [['h', 'i', 'j'], ['A', 'B', 'C'], ['1', '2', '3', '4'], ['5', '6', '7'], ['x', 'y']]
Someone might have a solution to find/print content of overlapping lists and lists which are not in both lists
Maybe this is possible without importing modules.
This can be simply achieved by using loop:
overlap = []
for ll in ListLeft:
for lr in ListRight:
if ll == lr:
overlap.append(ll)
break
print overlap
>>> [['A', 'B', 'C'], ['1', '2', '3', '4'], ['x', 'y']]