Django - order queryset alphabetically for cyrillic symbols

Django - order queryset alphabetically for cyrillic symbols - django

I have an issue with default ordering of cyrillic CharField's in Django. Is there a way to order cyrillic words alphabetically?
Taxonomy.objects.filter(type=tax_type).order_by('name')
This ordering returns me such data:
For english words ordering works as expected:
Shell output is exactly the same:
In [3]: Taxonomy.objects.filter(type="COUNTRY").order_by('name').values_list('id', 'name')
Out[3]: [(30, 'Abkhazija'), (31, 'Armenia'), (33, 'Gruzia'), (53, 'Kipr'), (59, 'Nepal'), (56, 'Thailand'), (46, 'Turkey'), (52, 'USA')]
In [4]: Taxonomy.objects.filter(type="PLACE").order_by('name').values_list('id', 'name')
Out[4]: [(42, 'Дон'), (49, 'Крым'), (73, 'Алтай'), (4, 'Архыз'), (71, 'Плато Путорана'), (44, 'Адыгея'), (75, 'Байкал'), (64, 'Домбай'), (11, 'Кавказ'), (69, 'Хибины'), (35, 'Карелия'), (54, 'Эверест'), (32, 'Эльбрус'), (34, 'Камчатка'), (51, 'Псковская область'), (19, 'Заграничный'), (50, 'Подмосковье'), (65, 'Приэльбрусье'), (60, 'Ленинградская область')]

Related

Django is giving different date values for objects accessed via queryset

Context I get different values for datetime field when I access them differently. I am sure there is some utc edge magic going on here.
(Pdb++)
Foo.objects.all().values_list('gated_out__occurred__date')[0][0]
datetime.date(2021, 9, 9)
(Pdb++) Foo.objects.all()[0].gated_out.occurred.date()
datetime.date(2021, 9, 10)
Edit: They have the same PK
Foo.objects.all().order_by("pk")[0].gated_out.occurred.date()
datetime.date(2021, 9, 10)
(Pdb++) Foo.objects.all().order_by("pk").values_list('gated_out__occurred__date')[0][0]
datetime.date(2021, 9, 9)
How do I fix/figure out what is happening?

Why does nulls_last=False not put the nulls first in Django?

I'm finding that while nulls_last=True works, nulls_last=False doesn't. Example below is in a Django shell.
In [10]: [x.date for x in Model.objects.all().order_by(F('date').asc(nulls_last=True))]
Out[10]:
[datetime.datetime(2020, 3, 10, 16, 58, 7, 768401, tzinfo=<UTC>),
datetime.datetime(2020, 3, 10, 17, 4, 51, 601980, tzinfo=<UTC>),
None,
]
[ins] In [11]: [x.last_run_created_at for x in Model.objects.all().order_by(F('date').asc(nulls_last=False))]
Out[11]:
[datetime.datetime(2020, 3, 10, 16, 58, 7, 768401, tzinfo=<UTC>),
datetime.datetime(2020, 3, 10, 17, 4, 51, 601980, tzinfo=<UTC>),
None,
]
In [12]:
I've tried this with both desc() and asc().

The mistake is assuming that the opposite of nulls_last=True is nulls_last=False. It isn't.
nulls_last=True does the following to the query:
SELECT ... ORDER BY ... ASC NULLS LAST
Whereas nulls_last=False just means use the DB default:
SELECT ... ORDER BY ... ASC
What you want instead is to use nulls_first=True OR nulls_last=True to explicitly get the order you want.
This is mentioned in the docs, but perhaps not as explicitly as it could be:
Using F() to sort null values
Use F() and the nulls_first or
nulls_last keyword argument to Expression.asc() or desc() to control
the ordering of a field’s null values. By default, the ordering
depends on your database.

OperationalError: sub-select returns 2 columns - expected 1 in Django QuerySet

There are two models a Price model and Service model. I'm trying to find the Services that have the most prices. All of this is initially filtered by a user input query (called the entry_query). The first line (1) get the price objects that the user queries (this works). Then, line 2 (2) returns a QS with the service_code and counts of prices a service has. Then, it brings on line (3) where it gives an error (see below).
(1) Price_objs = Price_filter.filter(entry_query)
(2) objs_filter=Price_objs.values_list('service__code').annotate(service_count=Count('service__code')).order_by('-service_count')
(3) serv_obj = Service.objects.filter(price__in = objs_filter).distinct()
Here is what line (2) outputs:
<QuerySet [('36430', 62), ('86003', 28), ('87149', 28), ('83516', 23), ('86317', 20), ('94640', 19), ('73502', 18), ('86658', 14), ('73721', 13), ('87070', 13), ('76942', 12), ('87081', 12), ('73560', 11), ('87798', 11), ('36415', 10), ('74177', 10), ('99211', 10), ('73100', 9), ('73221', 9), ('74176', 9), '...(remaining elements truncated)...']>
Here is the error that I get with line (3):
django.db.utils.OperationalError: sub-select returns 2 columns - expected 1

Understanding *[] in python passed to .agg() in pyspark

I am trying to understand how the *[] allows me to pass parameters to this
aggregate in pyspark. This runs, but I am trying to reuse the code in another example and was hoping someone could point me to the appropriate documentation so that I knew what was going on here. I like that it can pass the columns in the list as a parameter.
I was hoping that either someone knew what *[] is doing here.
How does it know to append a column to the DataFrame and not just iterate through the list, and execute once for each element in testdata.
import pyspark.sql.functions as fn
spark = SparkSession.builder.getOrCreate()
testdata= spark.createDataFrame([
(1, 144.5, 5.9, 33, 'M'),
(2, 167.2, None, 45, 'M'),
(3, 124.1, 5.2, 23, 'F'),
(4, None, 5.9, None, 'M'),
(5, 133.2, 5.7, 54, 'F'),
(3, 124.1, None, None, 'F'),
(5, 129.2, 5.3, None, 'M'),
],
['id', 'weight', 'height', 'age', 'gender']
)
testdata.where(
fn.col("gender") == 'M'
).select(
'*'
).agg(*[
(1 - (fn.count(c) / fn.count('*'))).alias(c + '_missing')
for c in testdata.columns
]).toPandas()
output:
+----------+--------------+--------------+-----------+--------------+
|id_missing|weight_missing|height_missing|age_missing|gender_missing|
+----------+--------------+--------------+-----------+--------------+
| 0.0| 0.25| 0.25| 0.5| 0.0|
+----------+--------------+--------------+-----------+--------------+

Using * in front of a list expands out the members as individual arguments. So, the following two function calls will be equivalent:
my_function(*[1, 2, 3])
my_function(1, 2, 3)
Obviously, the first one is not very useful if you already know the precise number of arguments. It becomes more useful with a comprehension like you are using, where is is not clear how many items will be in the list.

how to prepare transactional dataset for association rule mining in RapidMiner?

I have a dataset like this:
abelia,fl,nc
esculentus,ct,dc,fl,il,ky,la,md,mi,ms,nc,sc,va,pr,vi
abelmoschus moschatus,hi,pr*
dataset link:
My dataset haven't any attribute declaration. I want apply association rules on my dataset. I want to be like this dataset.
plant fl nc ct dc .....
abelia 1 1 0 0
.....

ELKI contains a parser that can read the input as is. Maybe Rapidminer does so, too - or you should write a parser for this format! With the ELKI parameters
-dbc.in /tmp/plants.data
-dbc.parser SimpleTransactionParser -parser.colsep ,
-algorithm itemsetmining.associationrules.AssociationRuleGeneration
-itemsetmining.minsupp 0.10
-associationrules.interestingness Lift
-associationrules.minmeasure 7.0
-resulthandler ResultWriter -out /tmp/rules
we can find all association rules with support >= 10%, Lift >= 7.0, and write them to the folder /tmp/rules (there is currently no visualization of association rules in ELKI):
For example, this finds the rules
sc, va, ga: 3882 --> nc, al: 3529 : 7.065536626573297
va, nj: 4036 --> md, pa: 3528 : 7.206260507764794
So plants that occur in South Carolina, Virigina, and Georgia will also occur in North Carolina and Alabama. NC is not much of a surprise, given that it is inbetween of SC and VA, but Alabama is interesting.
The second rule is Virigina and New Jersey imply Maryland (inbetween the two) and Pennsylvania. Also a very plausible rule, supported by 3528 cases.

I did my work with this python script:
import csv
abbrs = ['states', 'ab', 'ak', 'ar', 'az', 'ca', 'co', 'ct',
'de', 'dc', 'of', 'fl', 'ga', 'hi', 'id', 'il', 'in',
'ia', 'ks', 'ky', 'la', 'me', 'md', 'ma', 'mi', 'mn',
'ms', 'mo', 'mt', 'ne', 'nv', 'nh', 'nj', 'nm', 'ny',
'nc', 'nd', 'oh', 'ok', 'or', 'pa', 'pr', 'ri', 'sc',
'sd', 'tn', 'tx', 'ut', 'vt', 'va', 'vi', 'wa', 'wv',
'wi', 'wy', 'al', 'bc', 'mb', 'nb', 'lb', 'nf', 'nt',
'ns', 'nu', 'on', 'qc', 'sk', 'yt']
with open("plants.data.txt", encoding = "ISO-8859-1") as f1, open("plants.data.csv", "a") as f2:
csv_f2 = csv.writer(f2, delimiter=',')
csv_f2.writerow(abbrs)
csv_f1 = csv.reader(f1)
for row in csv_f1:
new_row = [row[0]]
for abbr in abbrs:
if abbr in row:
new_row.append(1)
else:
new_row.append(0)
csv_f2.writerow(new_row)

If all of the values are single words, you can use text mining extension in Rapidminer to transform them into variables and then run association rule mining methods on them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django - order queryset alphabetically for cyrillic symbols - django

Related

Django is giving different date values for objects accessed via queryset

Why does nulls_last=False not put the nulls first in Django?

OperationalError: sub-select returns 2 columns - expected 1 in Django QuerySet

Understanding *[] in python passed to .agg() in pyspark

how to prepare transactional dataset for association rule mining in RapidMiner?

Categories

Resources