Django ORM. Select only duplicated fields from DB - django

I have table in DB like this:
MyTableWithValues
id | user(fk to Users) | value(fk to Values) | text | something1 | something2 ...
1 | userobject1 | valueobject1 |asdasdasdasd| 123 | 12321
2 | userobject2 | valueobject50 |QWQWQWQWQWQW| 515 | 5555455
3 | userobject1 | valueobject1 |asdasdasdasd| 12345 | 123213
I need to delete all objects where are repeated fields user, value and text, but save one from them. In this example will be deleted 3rd record.
How can I do this, using Django ORM?
PS:
try this:
recs = (
MyTableWithValues.objects
.order_by()
.annotate(max_id=Max('id'), count_id=Count('user__id'))
#.filter(count_id__gt=1)
.annotate(count_values=Count('values'))
#.filter(count_icd__gt=1)
)
...
...
for r in recs:
print(r.id, r.count_id, , r.count_values)
it prints something like this:
1 1 1
2 1 1
3 1 1
...
Dispite the fact, that in database there are duplicated values. I cant understand, why Count function does not work.
Can anybody help me?

You should first be aware of how count works.
The Count method will count for identical rows.
It uses all the fields available in an object to check if it is identical with fields of other rows or not.
So in current situation the count_values is resulting 1 because Count is using all fields excluding id to look for similar rows.
Count is including user,value,text,something1,something2 fields to check for similarity.
To count rows with similar fields you have to use only user,values & text field
Query:
recs = MyTableWithValues.objects
.values('user','values','text')
.annotate(max_id=Max('id'),count_id=Count('user__id'))
.annotate(count_values=Count('values'))
It will return a list of dictionary
print(recs)
Output:
<QuerySet[{'user':1,'values':1,'text':'asdasdasdasd','max_id':3,'count_id':2,'count_values':2},{'user':2,'values':2,'text':'QWQWQWQWQWQW','max_id':2,'count_id':1,'count_values':1}]
using this queryset you can check how many times a row contains user,values & text field with same values

Would a Python loop work for you?
import collections
d = collections.defaultdict(list)
# group all objects by the key
for e in MyTableWithValues.objects.all():
k = (e.user_id, e.value_id, e.text)
d[k].append(e)
for k, obj_list in d.items():
if len(obj_list) > 1:
for e in obj_list[1:]:
# except the first one, delete all objects
e.delete()

Related

Django race condition aggregate(Max) in F() expression

Imagine the following model:
class Item(models.Model):
# natural PK
increment = models.PositiveIntegerField(_('Increment'), null=True, blank=True, default=None)
# other fields
When an item is created, I want the increment fields to automatically acquire the maximum value is has across the whole table, +1. For example:
|_item____________________________|
|_id_|_increment__________________|
| 1 | 1 |
| 2 | 2 |
| 4 | 3 | -> id 3 was deleted at some stage..
| 5 | 4 |
| 6 | 5 |
.. etc
When a new Item() comes in and is saved(), how in one pass, and in way that will avoid race conditions, make sure it will have increment 6 and not 7 in case another process does exactly the same thing, at the same time?
I have tried:
with transaction.atomic():
i = Item()
highest_increment = Item.objects.all().aggregate(Max('increment'))
i.increment = highest_increment['increment__max']
i.save()
I would like to be able to create it in a way similar to the following, but that obviously does not work (have checked places like https://docs.djangoproject.com/en/3.2/ref/models/expressions/#avoiding-race-conditions-using-f):
from django.db.models import Max, F
i = Item(
increment=F(Max(increment))
)
Many thanks

How to find a match between any partial value in column A and any value column B? And extract?

I have two columns in a pandas DataFrame, both containing also a lot of null values. Some values in column B, exist partially in a field (or multiple fields) in columns A. I want to check if this value of B exists in A, and if so, seperate this value and add as a new row in column A
Example:
Column A | Column B
black bear | null
black box | null
red fox | null
red fire | null
green tree | null
null | red
null | yellow
null | black
null | red
null | green
And I want the following:
Column A
black
bear
box
red
fire
fox
yellow
green
Does anyone have any tips on how to get this result? I have tried using regex (re.match), but I am struggling with the fact that I do not have a fixed pattern but a variable (namely, any value in column B) This is my effort:
import re
list_A= df['Column A'].values.tolist()
list_B= df['Column B'].values.tolist()
for i in list_A:
for j in list_B:
if i != None:
if re.match('{j}.+', i) :
...
Note: the columns are over 2500 rows long.
If I understand your question correctly that you want to split the value of b from the value in a whenever b is found in a, and then stored the separated values separately, then how about trying the following?
import re
list_A = df['Column A'].values.tolist()
list_B = df['Column B'].values.tolist()
list_of_separated_values = []
for a in list_a:
for b in list_b:
if b in a:
list_of_separated_values.extend([val for val in re.split('({})'.format(b),a) if not val])
This is not a regex question. You have your data in a dataframe, use the dataframe functionality to fix it.
Assuming data_frame is your pandas DataFrame.
# filter the DataFrame to just those with `null` in Column A
filtered = data_frame[data_frame["Column A"].isnull()]
# in the filtered table, assign Column B to Column A
filtered["Column A"] = filtered["Column B"]
# set Column B to null/None (I'm assuming you want this or this step can be skipped)
filtered["Column B"] = None
print(data_frame)

Compare fields within relationship on Django ORM

I have two models, route and stop.
A route can have several stop, each stop have a name and a number. On same route, stop.number are unique.
The problem:
I need to search which route has two different stops and one stop.number is less than the other stop.number
Consider the following models:
class Route(models.Model):
name = models.CharField(max_length=20)
class Stop(models.Model):
route = models.ForeignKey(Route)
number = models.PositiveSmallIntegerField()
location = models.CharField(max_length=45)
And the following data:
Stop table
| id | route_id | number | location |
|----|----------|--------|----------|
| 1 | 1 | 1 | 'A' |
| 2 | 1 | 2 | 'B' |
| 3 | 1 | 3 | 'C' |
| 4 | 2 | 1 | 'C' |
| 5 | 2 | 2 | 'B' |
| 6 | 2 | 3 | 'A' |
In example:
Given two locations 'A' and 'B', search which routes have both location and A.number is less than B.number
With the previous data, it should match route id 1 and not route id 2
On raw SQL, this works with a single query:
SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = 'A'
AND stop_to.`stop_location_id` = 'B'
AND stop_from.stop_number < stop_to.stop_number
Is this possible to do with one single query on Django ORM as well?
Generally ORM frameworks like Django ORM, SQLAlchemy and even Hibernate is not design to autogenerate most efficient query. There is a way to write this query only using Model objects, however, since I had similar issue, I would suggest to use raw query for more complex queries. Following is link for Django raw query:
[https://docs.djangoproject.com/en/1.11/topics/db/sql/]
Although, you can write your query in many ways but something like following could help.
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = %s
AND stop_to.`stop_location_id` = %s
AND stop_from.stop_number < stop_to.stop_number", ['A', 'B'])
row = cursor.fetchone()
return row
hope this helps.

Is there a "proper" way to convert django.db.models.query.ValuesListQuerySet to a pure List?

I'm attempted to delete records from a number of models in a single method, were I have the following schema:
picture 1:1 picture_foreign_picture *:1 picture_foreign
I'm deleting these given list picture_foreign objects:
picture_foreign_pictures = PictureForeignPicture.objects.filter(picture_foreign__in=picture_foreigns)
picture_ids = picture_foreign_pictures.values_list('picture_id', flat=True)
logger.warn('PICTURES REMOVE: %s' % picture_ids)
picture_foreign_pictures.delete()
logger.warn('PICTURES REMOVE: %s' % picture_ids)
The 2 loggers lines output the following:
WARNING 2013-01-02 03:40:10,974 PICTURES REMOVE: [86L]
WARNING 2013-01-02 03:40:11,045 PICTURES REMOVE: []
Despite this however, the picture 86 still exists:
mysql> select id from picture where id = 86;
+----+
| id |
+----+
| 86 |
+----+
1 row in set (0.00 sec)
I guess I could get around this by simply converting picture_ids into a pure integer list however I'm wondering whether there's a more Django method to this? I would have thought flat=True would already handle this but it seems to be more than just a pure list.
Well, I'm not sure it's proper, but it's ridiculously simple to use list() to accomplish this:
picture_ids = list(picture_foreign_pictures.values_list('picture_id', flat=True))
Above solution is not working:
You can convert to pur list like this:
p_ids =PictureForeignPicture.objects.filter(picture_foreign__in=picture_foreigns).values_list('picture_id', flat=True)
new_list = [];new_list.extend(p_ids)

Using Django ListView with custom query

I have a set of django models that are set out as follows:
class Foo(models.Model):
...
class FooVersion(models.Model):
name = models.CharField(max_length=100)
parent = models.ForeignKey(Foo)
version = models.FloatField()
...
I'm trying to create a Django ListView that displays all Foos, in alphabetical order by the name of their highest version. For example, if I have a data set that looks like:
version_id | id | version_name | version
-----------+----+-----------------------------------+---------
1 | 1 | Test 1 | 1.0
2 | 1 | Test 2 | 2.0
3 | 1 | Test 2 | 3.0
4 | 2 | Test 1 | 1.0
5 | 1 | Test 3 | 2.5
6 | 3 | Test 3 | 1.0
I want the query to return:
version_id | id | version_name | version
-----------+----+-----------------------------------+---------
4 | 2 | Test 1 | 1.0
3 | 1 | Test 2 | 3.0
6 | 3 | Test 3 | 1.0
The raw sql I would use to generate this is:
SELECT version_class.id as version_id, someapp_foo.id, version_class.name as version_name, version_class.version
FROM someapp_foo
INNER JOIN(
SELECT someapp_fooversion.name, someapp_fooversion.version, someapp_fooversion.parent_id, someapp_fooversion.id
FROM someapp_fooversion
INNER JOIN(
SELECT parent_id, max(version) AS version
FROM courses_courseversion GROUP BY parent_id)
AS current_version ON current_version.parent_id = someapp_fooversion.parent_id
AND current_version.version = someapp_fooversion.version)
AS version_class ON version_class.parent_id = someapp_foo.id
ORDER BY version_name;
But I'm having trouble using a raw query because the RawQuerySet object doesn't have a 'count' method, which is called by ListView for pagination. I've looked into the 'extra' feature of Django querysets, but I'm having trouble formulating a query that will work with that.
How would I formulate a query for 'extra' that would get me what I'm looking for? Or is there a way to convert a RawQuerySet into a regular QuerySet? Any other possible solutions to get the results I'm looking for?
There may be a better way to do this, but for now I'm trying a custom solution that seems to work:
from django.db import models
from django.db.models.query import RawQuerySet
class CountableRawQuerySet(RawQuerySet):
def count(self):
return sum([1 for obj in self])
class FooManager(models.Manager):
def raw(self, raw_query, params=None, *args, **kwargs):
return CountableRawQuerySet(raw_query=raw_query, model=self.model, params=params, using=self._db, *args, **kwargs)
class Foo(models.Model):
objects = FooManager()
Then my queryset is:
Foo.objects.raw(sql)
Suggestions on how to improve this?
First of all - your solution is wrong and very uneffective with a big amount of data.
I believe you just need something like:
from django.db.models import Max
Foo.objects.annotate(max_version=Max(fooversion__version))
You can now reffer to max_version attribute in each result as to normal attribute.
Please see https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for details.
One other point to add is that RawQuerySet works fine with a ListView as long as you don't use pagination, i.e. you can just leave out the paginate_by = NN attribute from your ListView subclass.