I want to sort the following list containing positive and negative elements in descending order. It does sort partially i.e. finds/sorts the positive elements correctly. But, the negative elements seems wrong.
>>> sorted([u'-10.44%', u'-9.35%', u'-4.20%', u'-1.23%', u'-15.37%', u'-13.51%', u'-11.94%', u'10.07%', u'0.32%', u'-4.02%', u'-12.69%', u'-17.28%'],reverse=True)
[u'10.07%', u'0.32%', u'-9.35%', u'-4.20%', u'-4.02%', u'-17.28%', u'-15.37%', u'-13.51%', u'-12.69%', u'-11.94%', u'-10.44%', u'-1.23%']
I was hoping that it will print in this (descending) format :
[u'10.07%', u'0.32%', u'-1.23%', u'-4.02%', u'-4.20%', u'-9.35%', u'-10.44%', u'-11.94%', u'-12.69%', u'-13.51%', u'-15.37%', u'-17.28%' ]
Can someone explain why this is happening?
Thanks in advance!
The list consists of unicode strings. If you want them to sort numerically, they need to bed sorted according to the float value of the strings.
Let's define your list:
>>> mylist = [u'-10.44%', u'-9.35%', u'-4.20%', u'-1.23%', u'-15.37%', u'-13.51%', u'-11.94%', u'10.07%', u'0.32%', u'-4.02%', u'-12.69%', u'-17.28%']
Now, let's sort it keyed to the float value of each string:
>>> sorted(mylist, key=lambda x: float(x.rstrip('%')), reverse=True)
['10.07%', '0.32%', '-1.23%', '-4.02%', '-4.20%', '-9.35%', '-10.44%', '-11.94%', '-12.69%', '-13.51%', '-15.37%', '-17.28%']
Or, equivalently:
>>> sorted(mylist, key=lambda x: -float(x.rstrip('%')))
['10.07%', '0.32%', '-1.23%', '-4.02%', '-4.20%', '-9.35%', '-10.44%', '-11.94%', '-12.69%', '-13.51%', '-15.37%', '-17.28%']
is there have easy way to take first N keys which have max value from they list in dict {key:list}
is there have easy way to take first N keys which have max value from they list in dict {key:list}
def main():
for x in range(len(sale10k)):
timelist.append(sale10k[x][3])
pricesList.append(sale10k[x][4])
if sale10k[x][0] in salesByCategory.keys():
salesByCategory[sale10k[x][0]].append(float(sale10k[x][4]))
else:
salesByCategory[sale10k[x][0]]=[]
salesByCategory[sale10k[x][0]].append(float(sale10k[x][4]))
salesByCategory1={}
for key,value in salesByCategory.items():
salesByCategory1[key]=sum(salesByCategory.get(key))
#fiveLarges=heapq.nlargest(5,salesByCategory1,key=salesByCategory1.get)
salesBycatalog={}
for y in range(len(catalog)):
salesBycatalog[catalog[y][0]]=catalog[y][5]
totalByGroup={}
for key, value in salesBycatalog.items():
if value in totalByGroup.keys():
totalByGroup[value].append(salesByCategory1.get(key))
else:
totalByGroup[value]=[]
totalByGroup[value].append(salesByCategory1.get(key))
print(totalByGroup)
if __name__ == "__main__":
main()
i have 2 files excel.cvs
my output from now is this :
{'POLO SHIRTS': [2609.76, 13339.109999999991, 15622.410000000007], 'APPAREL ACCESSORIES': [22596.24999999999, 20901.099999999995, 31007.8], 'PANTS': [8031.729999999998, 11179.949999999999, 5405.839999999997, 9023.949999999999, 21523.819999999996, 26030.800000000017], 'FOOTWEAR ACCESSORIES': [8686.369999999999], 'GLIDING SP.EQUIPMENT': [22136.399999999987, 27678.920000000006, 14222.21999999999, 30013.37000000001], 'SHOES': [1903.66, 25443.21999999999, 22152.530000000006, 11585.410000000002, 38504.679999999986, 7787.670000000004, 10256.860000000002, 1377.1199999999997, 15459.799999999992, 20919.56000000001, 6299.769999999996, 1555.4499999999998, 17470.460000000006, 29361.220000000034, 4070.9000000000033, 27045.450000000004, 20721.829999999994, 780.55, 24671.590000000015, 13189.570000000002, 6442.700000000001, 6105.390000000005, 12701.659999999998, 29418.89000000001, 7295.620000000001, 26344.420000000002, 3262.12, 11710.460000000006, 3272.2999999999993, 17055.989999999994, 9019.77, 12722.570000000003, 20020.150000000005, 30164.860000000026, 17513.14, 3168.6200000000003, 27008.24, 14585.679999999988, 15273.48, 24172.329999999998, 33968.96000000003, 35480.790000000015, 25150.459999999992, 24207.679999999997, 26909.090000000007, 17692.079999999998, 27844.97999999999, 33847.389999999985, 13266.239999999994, 11757.349999999997, 24469.410000000018, 8214.879999999997, 3966.6899999999964, 5336.910000000003, 27766.659999999978, 24636.97000000002, 21330.829999999994, 10331.680000000004, 19769.529999999995, 20764.439999999984, 2873.509999999999, 23263.23, 15127.240000000003, 13282.320000000003, 32917.03000000001, 17657.12, 9959.55, 21052.779999999995, 16015.79, 2667.2699999999995, 16041.830000000004, 2309.9000000000005, 8095.450000000001, 23628.889999999985, 3846.259999999999, 6795.61, 14608.109999999995, 6422.360000000001, 3241.279999999999, 19220.27999999999, 20836.899999999994, 28446.07000000001, 13984.979999999992, 10006.460000000003, 14417.309999999998, 9069.470000000001, 8081.38, 1766.8899999999999, 19041.750000000004, 3310.279999999999, 3649.49, 11089.069999999994, 10946.420000000002, 16297.91, 3788.1000000000004, 27356.640000000007, 14024.480000000001, 29409.03], 'SUITS': [28587.990000000016, 14337.800000000001], 'BALLS': [25855.07, 15207.729999999992, 25567.809999999987, 8428.509999999998, 15119.609999999995, 26069.969999999983, 29843.490000000023], 'TOPS': [1673.2000000000005, 8673.400000000001, 23610.79999999999, 2090.380000000001], 'HEADWEAR': [2075.3000000000015, 18891.799999999996, 39717.93, 33657.65, 9965.720000000005, 12030.020000000006, 670.9999999999999, 12694.720000000007, 24846.22000000001, 1606.1799999999994, 9993.330000000002, 10154.900000000005], 'HARDWARE ACCESSORIES': [14619.109999999997], 'OTHER SHIRTS': [18013.450000000004], 'PROTECTION GEAR': [26454.929999999997], 'JERSEYS': [23741.06, 38425.269999999975], 'SANDALS/SLIPPERS': [9103.83, 21025.040000000005, 12702.349999999999, 26766.439999999984, 29818.339999999993], 'SHORTS': [14817.77, 29540.92999999998, 9415.059999999996, 14582.480000000001], 'JACKETS': [30096.11000000001, 13372.469999999998, 31145.73000000001, 6011.17, 12225.300000000003, 23485.399999999998, 13889.96], 'SWIMWEAR': [14035.140000000001, 20232.629999999997, 5142.340000000001, 2945.349999999998, 23495.320000000003, 8207.920000000004, 11972.729999999994], 'T-SHIRTS': [11130.700000000004, 8315.83, 8346.719999999998, 27847.550000000007, 22704.759999999995, 7828.200000000002, 17823.379999999997, 2248.46, 9012.14, 7774.72, 12030.049999999996, 4207.649999999999, 21293.16, 3159.4700000000007, 13385.12, 30507.87], 'UNDERWEAR': [10419.31, 31017.909999999993, 2794.590000000002, 18625.990000000005, 21829.879999999994], 'SWEATSHIRTS': [4317.6799999999985, 23453.049999999985, 28176.49000000001], 'TIGHTS': [23823.43999999999, 11180.129999999996], 'BAGS': [13980.240000000007, 18509.50999999999, 20064.309999999998, 22317.360000000004, 17641.04]}
i need this :
SHOES: 1519077.15 €
T-SHIRTS: 207615.78 €
HEADWEAR: 176304.77 €
BALLS: 146092.19 €
JACKETS: 130226.14 €
I have data stored in dict orderBygroup {key-list(of float values)} and need to take first 5 keys with max value.
My second question is - dict salesByCategory1 is make with loping to salesByCategory and sum of all values to receive the total for article number.
Can i get that totals with some smartes way ?
is there have easy way to make that output ?
totalByGroup1={}
for key,value in totalByGroup.items():
totalByGroup1[key]=sum(totalByGroup.get(key))
Create a new dictionary with summed elements. More resources.
sorted5=sorted(totalByGroup1, key=totalByGroup1.get, reverse=True)[:5]
print(sorted5)
sorting and taking the first 5 elements
output is : ['SHOES', 'T-SHIRTS', 'HEADWEAR', 'BALLS', 'JACKETS']
more time, more resurses
for key in sorted5:
print(key,': ','{0:.2f}'.format(totalByGroup1.get(key)))
and now result :
SHOES : 1519077.15
T-SHIRTS : 207615.78
HEADWEAR : 176304.77
BALLS : 146092.19
JACKETS : 130226.14
now lets ask again if we have 4 record in dict wit that data:
a:[1,2,3],b:[6,7,8],c:[4,5,6],d:[9,10,11]
how to get first 2 key,value sorted by max value -->>
d:30,b:21
if we have 1000 record in dict - ?? how to get first N key sorted by max value of list
example go:1563,do:1560,bo:1490,ro:1480 .. etc
I used the split() function to convert string to a list time = time.split() and this is how my output looks like :
[u'1472120400.107']
[u'1472120399.999']
[u'1472120399.334']
[u'1472120397.633']
[u'1472120397.261']
[u'1472120394.328']
[u'1472120393.762']
[u'1472120393.737']
Then I tried accessing the contents of the list using print time[1] which gives the index out of range error (cause only a single value is stored in one list). I checked questions posted by other people and used print len(time). This is the output for that:
1
[u'1472120400.107']
1
[u'1472120399.999']
1
[u'1472120399.334']
1
[u'1472120397.633']
1
[u'1472120397.261']
1
[u'1472120394.328']
1
[u'1472120393.762']
1
[u'1472120393.737']
I do this entire thing inside a for loop because I get logs dynamically and have to extract out just the time.
This is part of my code:
line_collect = lines.collect() #spark function
for line in line_collect :
a = re.search(rx1,line)
time = a.group()
time = time.split()
#print time[1] #index out of range error which is why I wrote another for below
for k in time :
time1 = time[k]#trying to put those individual list values into one variable but get type error
print len(time1)
I get the following error :
time1 = time[k]
TypeError: list indices must be integers, not unicode
Can someone tell me how to read each of those single list values into just one list so I can access each of them using a single index[value]. I'm new to python.
My required output:
time =['1472120400.107','1472120399.999','1472120399.334','1472120397.633','1472120397.261','1472120394.328','1472120393.762','1472120393.737']
so that i can use time[1] to give 1472120399.999 as result.
Update: I misunderstood what you wanted. You have the correct output already and it's a string. The reason you have a u before the string is because it's a unicode string that has 16 bits. u is a python flag to distinguish it from a normal string. Printing it to the screen will give you the correct string. Use it normally as you would any other string.
time = [u'1472120400.107'] # One element just to show
for k in time:
print(k)
Looping over a list using a for loop will give you one value at a time, not the index itself. Consider using enumerate:
for k, value in enumerate(time):
time1 = value # Or time1 = time[k]
print(time1)
Or just getting the value itself:
for k in time:
time1 = k
print(time1)
--
Also, Python is zero based language, so to get the first element out of a list you probably want to use time[0].
Thanks for your help. I finally got the code right:
newlst = []
for line in line_collect :
a = re.search(rx1,line)
time = a.group()
newlst.append(float(time))
print newlst
This will put the whole list values into one list.
Output:
[1472120400.107, 1472120399.999, 1472120399.334, 1472120397.633,
1472120397.261, 1472120394.328, 1472120393.762, 1472120393.737]
I am working with a list of points in python 2.7 and running some interpolations on the data. My list has over 5000 points and I have some repeating "x" values within my list. These repeating "x" values have different corresponding "y" values. I want to get rid of these repeating points so that my interpolation function will work, because if there are repeating "x" values with different "y" values it runs an error because it does not satisfy the criteria of a function. Here is a simple example of what I am trying to do:
Input:
x = [1,1,3,4,5]
y = [10,20,30,40,50]
Output:
xy = [(1,10),(3,30),(4,40),(5,50)]
The interpolation function I am using is InterpolatedUnivariateSpline(x, y)
have a variable where you store the previous X value, if it is the same as the current value then skip the current value.
For example (pseudo code, you do the python),
int previousX = -1
foreach X
{
if(x == previousX)
{/*skip*/}
else
{
InterpolatedUnivariateSpline(x, y)
previousX = x /*store the x value that will be "previous" in next iteration
}
}
i am assuming you are already iterating so you dont need the actualy python code.
A bit late but if anyone is interested, here's a solution with numpy and pandas:
import pandas as pd
import numpy as np
x = [1,1,3,4,5]
y = [10,20,30,40,50]
#convert list into numpy arrays:
array_x, array_y = np.array(x), np.array(y)
# sort x and y by x value
order = np.argsort(array_x)
xsort, ysort = array_x[order], array_y[order]
#create a dataframe and add 2 columns for your x and y data:
df = pd.DataFrame()
df['xsort'] = xsort
df['ysort'] = ysort
#create new dataframe (mean) with no duplicate x values and corresponding mean values in all other cols:
mean = df.groupby('xsort').mean()
df_x = mean.index
df_y = mean['ysort']
# poly1d to create a polynomial line from coefficient inputs:
trend = np.polyfit(df_x, df_y, 14)
trendpoly = np.poly1d(trend)
# plot polyfit line:
plt.plot(df_x, trendpoly(df_x), linestyle=':', dashes=(6, 5), linewidth='0.8',
color=colour, zorder=9, figure=[name of figure])
Also, if you just use argsort() on the values in order of x, the interpolation should work even without the having to delete the duplicate x values. Trying on my own dataset:
polyfit on its own
sorting data in order of x first, then polyfit
sorting data, delete duplicates, then polyfit
... I get the same result twice