How to split list based on values in Python? - list

I have a list that needs to copied multiply times by the number at an index.
here is an example to help explain.
The original list:
[AAA, Bus, Apple, 5, 3, 1, Car, 22, 30]
what is needs to become:
aList = [
[AAA, Bus, Apple, 1, 1, 1, Car, 22, 30]
[AAA, Bus, Apple, 1, 1, 0, Car, 22, 30]
[AAA, Bus, Apple, 1, 1, 0, Car, 22, 30]
[AAA, Bus, Apple, 1, 0, 0, Car, 22, 30]
[AAA, Bus, Apple, 1, 0, 0, Car, 22, 30]
]
All the list are in the same order to I can view the values bu index.
Thanks

I hope I've understood your question right. This example will copy the list by maximal number found in list:
lst = ["AAA", "Bus", "Apple", 5, 3, 1, "Car"]
n = max((v for v in lst if isinstance(v, int)), default=0)
out = [lst]
for _ in range(n - 1):
out.append([(v - 1) if isinstance(v, int) else v for v in out[-1]])
out = [[int(v > 0) if isinstance(v, int) else v for v in l] for l in out]
print(*out, sep="\n")
Prints:
['AAA', 'Bus', 'Apple', 1, 1, 1, 'Car']
['AAA', 'Bus', 'Apple', 1, 1, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 1, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 0, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 0, 0, 'Car']
EDIT: To change only specified indices:
lst = ["AAA", "Bus", "Apple", 5, 3, 1, "Car"]
n = max(v for v in lst[3:6])
out = [lst[3:6]]
for _ in range(n - 1):
out.append([v - 1 for v in out[-1]])
out = [lst[:3] + [int(v > 0) for v in l] + lst[6:] for l in out]
print(*out, sep="\n")
Prints:
['AAA', 'Bus', 'Apple', 1, 1, 1, 'Car']
['AAA', 'Bus', 'Apple', 1, 1, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 1, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 0, 0, 'Car']
['AAA', 'Bus', 'Apple', 1, 0, 0, 'Car']

Related

Django ORM queryset equivalent to group by year-month?

I have an Django app and need some datavisualization and I am blocked with ORM.
I have a models Orders with a field created_at and I want to present data with a diagram bar (number / year-month) in a dashboard template.
So I need to aggregate/annotate data from my model but did find a complete solution.
I find partial answer with TruncMonth and read about serializers but wonder if there is a simpliest solution with Django ORM possibilities...
In Postgresql it would be:
SELECT date_trunc('month',created_at), count(order_id) FROM "Orders" GROUP BY date_trunc('month',created_at) ORDER BY date_trunc('month',created_at);
"2021-01-01 00:00:00+01" "2"
"2021-02-01 00:00:00+01" "3"
"2021-03-01 00:00:00+01" "3"
...
example
1 "2021-01-04 07:42:03+01"
2 "2021-01-24 13:59:44+01"
3 "2021-02-06 03:29:11+01"
4 "2021-02-06 08:21:15+01"
5 "2021-02-13 10:38:36+01"
6 "2021-03-01 12:52:22+01"
7 "2021-03-06 08:04:28+01"
8 "2021-03-11 16:58:56+01"
9 "2022-03-25 21:40:10+01"
10 "2022-04-04 02:12:29+02"
11 "2022-04-13 08:24:23+02"
12 "2022-05-08 06:48:25+02"
13 "2022-05-19 15:40:12+02"
14 "2022-06-01 11:29:36+02"
15 "2022-06-05 02:15:05+02"
16 "2022-06-05 03:08:22+02"
expected result
[
{
"year-month": "2021-01",
"number" : 2
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 1
},
{
"year-month": "2021-04",
"number" : 2
},
{
"year-month": "2021-05",
"number" : 3
},
{
"year-month": "2021-06",
"number" : 3
},
]
I have done this but I am not able to order by date:
Orders.objects.annotate(month=TruncMonth('created_at')).values('month').annotate(number=Count('order_id')).values('month', 'number').order_by()
<SafeDeleteQueryset [
{'month': datetime.datetime(2022, 3, 1, 0, 0, tzinfo=<UTC>), 'number': 4},
{'month': datetime.datetime(2022, 6, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
{'month': datetime.datetime(2022, 5, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 1, 1, 0, 0, tzinfo=<UTC>), 'number': 5},
{'month': datetime.datetime(2021, 12, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 7, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2021, 9, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
'...(remaining elements truncated)...'
]>
Try adding the order_by on the original field if you have multi-year data.
from django.db.models import Sum
from django.db.models.functions import TruncMonth
Orders.objects.values(month=TruncMonth('created_at')).
order_by("created_at").annotate(Sum('number')

Access Pandas MultiIndex column by name

I have a spreadsheet imported with pandas like this:
df = pd.read_excel('my_spreadsheet.xlsx',header = [0,1],index_col=0,sheetname='Sheet1')
The output of df.columns is:
MultiIndex(levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],
labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
names=[None, u'De-Identified MRN'])
I have been trying to access the values of column named 'De-Identified MRN', but can't seem to find the way to do this.
What I have tried (based on similar posts):
[in] df.index.get_level_values('De-Identified MRN')
[out] KeyError: 'Level De-Identified MRN must be same as name (None)'
and
[in] df.index.unique(level='De-Identified MRN')
[out] KeyError: 'Level De-Identified MRN must be same as name (None)'
UPDATE:
The following did the trick for some reason. I really do not understand the format of the MultiIndex Pandas Dataframe:
pd.Series(df.index)
By using your data
s="MultiIndex(levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],names=[None, u'De-Identified MRN'])"
idx=eval(s, {}, {'MultiIndex': pd.MultiIndex})
df=pd.DataFrame(index=idx)
df.index.get_level_values(level=1) # df.index.get_level_values('De-Identified MRN')
Out[336]:
Index(['ExamDesc', 'completedDTTM', 'BIRADS', 'ExamDesc', 'completedDTTM',
'BIRADS', 'ExamDesc', 'completedDTTM', 'BIRADS', 'ExamDesc',...
Also if all above still does not work , try
df.reset_index()['De-Identified MRN']
Try the following:
midx = pd.MultiIndex(
levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],
labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
names=[None, u'De-Identified MRN']
)
midx.levels[1] # returns the following
Index(['BIRADS', 'ExamDesc', 'completedDTTM'], dtype='object', name='De-Identified MRN')
midx.levels[1].values # returns the following
array(['BIRADS', 'ExamDesc', 'completedDTTM'], dtype=object)

How to efficiently multiply each element of vector with matrix getting enlarged matrix

I would like to achieve the following in an efficient way in numpy. Suppose I have a matrix
A = np.asarray([[1, 2], [3, 4]])
and
B = np.asarray([1, 10, 100])
I would like to multiply each element in A with the first element of B, then each element in A with the second element in B etc. At the end a matrix of shape (A.shape[0]*B.shape[0], A.shape[1])
the result should be
np.asarray([[1, 2], [3, 4], [10, 20], [30, 40], [100, 200], [300, 400]])
Out[216]:
array([[ 1, 2],
[ 3, 4],
[ 10, 20],
[ 30, 40],
[100, 200],
[300, 400]])
Reshape with numpy broadcasting:
# option 1
(A * B[:,None,None]).reshape(-1, A.shape[1])
#array([[ 1, 2],
# [ 3, 4],
# [ 10, 20],
# [ 30, 40],
# [100, 200],
# [300, 400]])
# option 2
(A.ravel() * B[:,None]).reshape(-1, A.shape[1])
#array([[ 1, 2],
# [ 3, 4],
# [ 10, 20],
# [ 30, 40],
# [100, 200],
# [300, 400]])
Or use np.einsum:
np.einsum('ij,k->kij', A, B).reshape(-1, A.shape[1])
#array([[ 1, 2],
# [ 3, 4],
# [ 10, 20],
# [ 30, 40],
# [100, 200],
# [300, 400]])

Wrong position of annotations in a stacked bar chart (Google Chart API)

I have a stacked bar chart with annotations which sums the values. The annotations are always at the end of the bar, but when there isn't a value for the last data row (I) the annotation is at the beginning and I don't know how to fix it.
var dataArray = [
["Date", "A", "B", "C", "D", "E", "F", "G", "H", "I", {role: 'annotation'}],
["7.08.2015", 0, 0, 0, 3, 6, 1, 0, 0, 0, 10],
["6.08.2015", 0, 0, 0, 0, 4, 6, 1, 0, 7, 18],
["5.08.2015", 0, 0, 0, 2, 4, 0, 0, 0, 5, 11]
];
Demo and code at JSFiddle
Found a workaround ... added a new data column, with the name Total, is has the same value as the annotation:
var dataArray = [
["Date", "A", "B", "C", "D", "E", "F", "G", "H", "I", "Total", {role: 'annotation'}],
["7.08.2015", 0, 0, 0, 3, 6, 1, 0, 0, 0, 10, 10],
["6.08.2015", 0, 0, 0, 0, 4, 6, 1, 0, 7, 18, 18],
["5.08.2015", 0, 0, 0, 2, 4, 0, 0, 0, 5, 11, 11]
];
And added this to the options:
var options = {
...
series: {
9: {
color: 'transparent',
type: "bar",
targetAxisIndex: 1,
visibleInLegend: false
}
}
};
Demo and code at JSFiddle
This makes the Total bar transparent, hide it in the legends and let it start from the zero point.
Dynamic version which takes the last data row for the annotations:
var series = {};
series[data.getNumberOfColumns() - 3] = {
color: 'transparent',
type: "bar",
targetAxisIndex: 1,
visibleInLegend: false
};
options["series"] = series;
Demo and code at JSFiddle

R style subset/replace in Rcpp

In R I have a loop which runs through a data frame of people coming and leaving a site and assigns them to a parking lot until they leave when it releases the spot.
activity = structure(list(ID = c(1, 2, 3, 3, 1, 2, 3, 2, 3, 1, 2, 1),
Lot = structure(c(1L, 3L, 6L, 6L, 1L, 3L, 2L, 5L, 2L, 4L, 5L, 4L),
.Label = c("a", "A", "C", "d", "f", "z"), class = "factor"),
time = c(1, 3, 6, 7, 8, 9, 10, 12, 13, 14, 15, 21),
dir = c(1, 1, 1, -1, -1, -1, 1, 1, -1, 1, -1, -1),
Here = c("no","no","no","no","no","no","no","no","no","no","no","no")),
class = "data.frame",
row.names = c(NA, -12L), .Names = c("ID", "Lot", "time", "dir","Here"))
lots = structure(c(30, 175, 170, 160, 300, 300, 35, 160, 85, 400, 200,
110, 60, 130, 100, 80, 1000, 320, 330, 350, 320, 250, 250, 370,
20, 40, 0, 140, 185, 200, 185, 120, 55, 105, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), .Dim = c(34L, 2L),
.Dimnames = list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
"u", "v", "x", "y", "z", "A", "B", "C", "D", "E",
"F", "G", "H", "I"), c("Capacity", "Utilization")))
for(i in 1:nrow(activity)){
id = activity$ID[i]
l = activity$Lot[i]
d = activity$dir[i]
t = activity$time[i]
lots[l,"Utilization"] = lots[l,"Utilization"] + d
if(d == -1) activity$Here[activity$ID == id & activity$time <= t] = "no"
}
What I want to determine is, in Rcpp is there a way to do the R style subset and replace? e.g., lots[l,"Utilization"] = ... and activity$Here[activity$ID == id & activity$time <= t] = .... I'm specifically wanting to be able to load in 3 matrices, the master record table, the matrix which manages lots and the matrix which manages current locations (in this example I know the lot already, in practice I need to determine where they are likely to park). I have functioning code but it takes ~160 seconds to run and I'm trying to learn how to leverage Rcpp to make this much faster.