How to apply sum on a list of columns - list

I have a list var aggList : List[String]= List() the list contains the column names on which aggregation has to be applied.
I generate the dataframe as below:
var df = sc.parallelize(Seq[(Int, Int, String, Int, Int, Int)](
(1234, 1234, "PRM", 2, 1, 1),
(1235, 1234, "PRM", 1239, 2, 10),
(1246, 1234, "PRM", 1234, 5, 15),
(1247, 1234, "PRM", 1254, 20, 12),
(1246, 1234, "PRM", 1234, 5, 13),
(1246, 1234, "SEC", 1234, 7, 15),
(1249, 1234, "SEC", 1234, 20, 1),
(1248, 1234, "SEC", 1234, 2, 2))
).toDF("col1", "col2", "col3", "col4", "col5", "col6")
I need to do df.groupby(col1).agg(sum(aggList))
How do I achieve this?

Related

Django ORM queryset equivalent to group by year-month?

I have an Django app and need some datavisualization and I am blocked with ORM.
I have a models Orders with a field created_at and I want to present data with a diagram bar (number / year-month) in a dashboard template.
So I need to aggregate/annotate data from my model but did find a complete solution.
I find partial answer with TruncMonth and read about serializers but wonder if there is a simpliest solution with Django ORM possibilities...
In Postgresql it would be:
SELECT date_trunc('month',created_at), count(order_id) FROM "Orders" GROUP BY date_trunc('month',created_at) ORDER BY date_trunc('month',created_at);
"2021-01-01 00:00:00+01" "2"
"2021-02-01 00:00:00+01" "3"
"2021-03-01 00:00:00+01" "3"
...
example
1 "2021-01-04 07:42:03+01"
2 "2021-01-24 13:59:44+01"
3 "2021-02-06 03:29:11+01"
4 "2021-02-06 08:21:15+01"
5 "2021-02-13 10:38:36+01"
6 "2021-03-01 12:52:22+01"
7 "2021-03-06 08:04:28+01"
8 "2021-03-11 16:58:56+01"
9 "2022-03-25 21:40:10+01"
10 "2022-04-04 02:12:29+02"
11 "2022-04-13 08:24:23+02"
12 "2022-05-08 06:48:25+02"
13 "2022-05-19 15:40:12+02"
14 "2022-06-01 11:29:36+02"
15 "2022-06-05 02:15:05+02"
16 "2022-06-05 03:08:22+02"
expected result
[
{
"year-month": "2021-01",
"number" : 2
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 1
},
{
"year-month": "2021-04",
"number" : 2
},
{
"year-month": "2021-05",
"number" : 3
},
{
"year-month": "2021-06",
"number" : 3
},
]
I have done this but I am not able to order by date:
Orders.objects.annotate(month=TruncMonth('created_at')).values('month').annotate(number=Count('order_id')).values('month', 'number').order_by()
<SafeDeleteQueryset [
{'month': datetime.datetime(2022, 3, 1, 0, 0, tzinfo=<UTC>), 'number': 4},
{'month': datetime.datetime(2022, 6, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
{'month': datetime.datetime(2022, 5, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 1, 1, 0, 0, tzinfo=<UTC>), 'number': 5},
{'month': datetime.datetime(2021, 12, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 7, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2021, 9, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
'...(remaining elements truncated)...'
]>
Try adding the order_by on the original field if you have multi-year data.
from django.db.models import Sum
from django.db.models.functions import TruncMonth
Orders.objects.values(month=TruncMonth('created_at')).
order_by("created_at").annotate(Sum('number')

How to return two lists to chartjs from Django View

I am using chartjs to render a barchart. For this I need to pass two lists in the format like [1, 2, 3] & [3, 2, 1]. I am making an AJAX call to Django which returns the two lists (I have not added the code to get the data from the database yet). The graph works fine for one list but not sure how to pass the second list.
I tried to pass the two lists as json and tried to use each of the lists in the success function of the ajax call but the graph does not render properly. With one list the graph is working fine
below is the code for the ChartJs AJAX call
$.ajax({
async: pasys,
type: "GET",
url: purl,
data: pdata,
contentType: "application/json; charset=utf-8",
dataType: "json",
success: function(ldata) {
var barData = {
labels: ["Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec", "Jan", "Feb", "Mar"],
datasets: [
{
label: "DL1",
backgroundColor: 'rgba(220, 220, 220, 0.5)',
pointBorderColor: "#fff",
data: ldata.data1
},
{
label: "Non-DL1",
backgroundColor: 'rgba(100, 200, 300, 0.5)',
pointBorderColor: "#aaa",
data: ldata.data2
}
]
};
var barOptions = {
responsive: true
};
var ctx2 =
document.getElementById("opendemandtrend").getContext("2d");
new Chart(ctx2, {type: 'bar', data: barData, options:barOptions});
* below is the code for the django view *
def gldh_productivitymetric_opendemandtrend_get(request):
lcompanyid = request.GET.get("pcompanyid")
lpmid = request.GET.get("ppmid")
data = json.dumps({"data1": "[12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]",
"data2": "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]"})
return HttpResponse(data, content_type="application/json")
You're passing in strings as the values, instead of lists. Don't do that.
data = json.dumps({"data1": [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
"data2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})

Django count number of records per day

I'm using Django 2.0
I am preparing data to show on a graph in template. I want to fetch number of records per day.
This is what I'm doing
qs = self.get_queryset().\
extra({'date_created': "date(created)"}).\
values('date_created').\
annotate(item_count=Count('id'))
but, the output given is
[
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1}
]
Here data is not grouped and same date is returning repeatedly with count as 1
Try using TruncDate function.
See that answer

Group list by given occurrence in Scala

I have a list of strings that I'm trying to split into separate lists sequentially, grouping the 4th occurrence i.e. this list:
val data = List("1", "2", "3", "4", "5", "6", "7", "8")
should be grouped as
val list1 = List("1", "5")
val list2 = List("2", "6")
val list3 = List("3", "7")
val list4 = List("4", "8")
I'm not sure if I am trying to overcomplicate this but the only way I can think is to first group the elements using sliding e.g.:
data.sliding(4,4).toList
results in
List(List(1, 2, 3, 4), List(5, 6, 7, 8))
and then to implement my own unzip method that would group the above as my desired output.
Please can someone let me know if there is an easier way of doing this?
You can use .transpose on the list .sliding generates:
scala> val data = List("1", "2", "3", "4", "5", "6", "7", "8")
data: List[String] = List(1, 2, 3, 4, 5, 6, 7, 8)
scala> data.sliding(4, 4).toList
res1: List[List[String]] = List(List(1, 2, 3, 4), List(5, 6, 7, 8))
scala> data.sliding(4, 4).toList.transpose
res2: List[List[String]] = List(List(1, 5), List(2, 6), List(3, 7), List(4, 8))
A version which will work for every list length:
def groupNth[A](n: Int, list: List[A]): List[List[A]] = {
val (firstN, rest) = list.splitAt(n)
val groupedRest = if (rest.nonEmpty) groupNth(n, rest) else Nil
// null.asInstanceOf[A] is of course cheating, but the value is never used
firstN.zipAll(groupedRest, null.asInstanceOf[A], Nil).map {
case (h, t) => h :: t
}
}
println(groupNth(4, Nil))
// List()
println(groupNth(4, List(1, 2, 3)))
// List(List(1), List(2), List(3))
println(groupNth(4, List(1, 2, 3, 4, 5, 6, 7, 8)))
// List(List(1, 5), List(2, 6), List(3, 7), List(4, 8))
println(groupNth(4, List(1, 2, 3, 4, 5, 6, 7, 8, 9)))
// List(List(1, 5, 9), List(2, 6), List(3, 7), List(4, 8))
println(groupNth(4, List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)))
// List(List(1, 5, 9), List(2, 6, 10), List(3, 7, 11), List(4, 8, 12))
zip after sliding
scala> val data = List("1", "2", "3", "4", "5", "6", "7", "8")
data: List[String] = List("1", "2", "3", "4", "5", "6", "7", "8")
scala> val result = data.sliding(4, 4).toList
result: List[List[String]] = List(List("1", "2", "3", "4"), List("5", "6", "7", "8"))
scala> result.transpose
res7: List[(String, String)] = List(("1", "5"), ("2", "6"), ("3", "7"), ("4", "8"))
If tuples would do as output, it's fairly neat:
val tuples = data zip data.drop(4)
//> tuples : List[(String, String)] = List((1,5), (2,6), (3,7), (4,8))
turn them into List:
tuples.map{case(a,b) => List(a, b)}
//> List[List[String]] = List(List(1, 5), List(2, 6), List(3, 7), List(4, 8))
EDIT: Showing that the comment about only working with 8 is incorrect
def pairs[A](xs:List[A], n:Int) =
(xs zip xs.drop(n)).map{case(a,b) => List(a, b)}
pairs(List("1","2", "3", "4", "5", "6", "7", "8"), 4)
// List(List(1, 5), List(2, 6), List(3, 7), List(4, 8))
pairs(List("1","2", "3", "4", "5", "6", "7", "8", "9"), 4)
// List(List(1, 5), List(2, 6), List(3, 7), List(4, 8), List(5, 9))
pairs(List("1","2", "3", "4", "5", "6", "7", "8", "9", "10"), 4)
// List(List(1, 5), List(2, 6), List(3, 7), List(4, 8), List(5, 9), List(6, 10))
pairs(List("1","2", "3", "4"), 4)
// List()
pairs(List("1","2", "3"), 4)
// List()

Query Array value of a record

I have a record with one of the values stored as an array like this;
#<Health id: 12, district: 43, county: 89, sub_county: 480, name_of_institution: "May Medical Center", money_received: #<BigDecimal:b6ca9318,'0.6E6',9(18)>, date_received: "2013-10-23", use_of_money: "Money used o construct operation theater.", grade_of_health_center: 4, opening_time: "2000-01-01 07:00:00", closing_time: "2000-01-01 18:30:00", **service_offered: ["2", "3", "4", ""]**, other_service_offered: "HIV treatement", male_patients: 54, female_patients: 78, brick_and_wattle: 9, mad_and_wattle: 2, other_structures: 9, source_of_power: [""], other_source_of_power: "Generator", toilet_facilities: true, alternative_disposal: "", separate_toilets: true, running_water: true, alternative_water: "", state_of_water: "Functional", duration_non_functional: "", placenta_pit: true, placental_disposal: "", waste_pit: true, waste_disposal: "", storage_expired_drugs: false, expired_drugs_storage: "They are burnt to ash", pregnant_mother: 15, number_of_beds: 5, delivery_beds: 1, ambulance: true, status_of_ambulance: "Functional", keep_records: true, number_of_staff: 14, medical_staff: 6, resident_medical_staff: 3, created_at: "2014-10-23 17:00:48", updated_at: "2014-10-23 21:23:36">
How can i check the value service_offered: ["2", "3", "4", ""] for existence of '2'
I have tried this;
Health.where("grade_of_health_center = '2'").includes(service_offered: "4").count
I want to retrieve the records having service_offered array containing value 2