Is there a cleaner way to check does all providers for item exists in pivot table?
eg. I have few items, if one of them has all given providers then method should return True otherwise False
for item in items:
exists_count = 0
for provider in providers:
if ItemProviderConn.objects.filter(
item_id=item.pk,
provider_id=provider.pk,
):
exists_count += 1
else:
break
if exists_count == len(providers):
return True
return False
of Course you can do it better:
I hope your items and providers are querySets.
if i right understand - you want to check, if all providers exists in any item.
in this case:
provider_ids = providers.values_list('pk', flat=True)
length_of_providers = len(provider_ids)
#1. get a filter for providers
queryset = items.filter(ItemProviderConn__provider_pk__in=provider_ids)
#2. count a providers_id's
queryset =queryset.annotate(providercount=Count('ItemProviderConn__provider_pk'))
#3. check if a count of providers_id => length_of_providers
queryset=queryset.filter(providercount__gte=length_of_providers)
#4. return True if any item has enough providers
return queryset.exists()
by the way - all work with item queryset i can do in line.
Related
I want to do a filter in Django that uses form method.
If the user type de var it should query in the dataset that var, if it is left in blank to should bring all the elements.
How can I do that?
I am new in Django
if request.GET.get('Var'):
Var = request.GET.get('Var')
else:
Var = WHAT SHOULD I PUT HERE TO FILTER ALL THE ELEMNTS IN THE CODE BELLOW
models.objects.filter(Var=Var)
It's not a great idea from a security standpoint to allow users to input data directly into search terms (and should DEFINITELY not be done for raw SQL queries if you're using any of those.)
With that note in mind, you can take advantage of more dynamic filter creation using a dictionary syntax, or revise the queryset as it goes along:
Option 1: Dictionary Syntax
def my_view(request):
query = {}
if request.GET.get('Var'):
query['Var'] = request.GET.get('Var')
if request.GET.get('OtherVar'):
query['OtherVar'] = request.GET.get('OtherVar')
if request.GET.get('thirdVar'):
# Say you wanted to add in some further processing
thirdVar = request.GET.get('thirdVar')
if int(thirdVar) > 10:
query['thirdVar'] = 10
else:
query['thirdVar'] = int(thirdVar)
if request.GET.get('lessthan'):
lessthan = request.GET.get('lessthan')
query['fieldname__lte'] = int(lessthan)
results = MyModel.objects.filter(**query)
If nothing has been added to the query dictionary and it's empty, that'll be the equivalent of doing MyModel.objects.all()
My security note from above applies if you wanted to try to do something like this (which would be a bad idea):
MyModel.objects.filter(**request.GET)
Django has a good security track record, but this is less safe than anticipating the types of queries that your users will have. This could also be a huge issue if your schema is known to a malicious site user who could adapt their query syntax to make a heavy query along non-indexed fields.
Option 2: Revising the Queryset
Alternatively, you can start off with a queryset for everything and then filter accordingly
def my_view(request):
results = MyModel.objects.all()
if request.GET.get('Var'):
results = results.filter(Var=request.GET.get('Var'))
if request.GET.get('OtherVar'):
results = results.filter(OtherVar=request.GET.get('OtherVar'))
return results
A simpler and more explicit way of doing this would be:
if request.GET.get('Var'):
data = models.objects.filter(Var=request.GET.get('Var'))
else:
data = models.objects.all()
I need to select entries which match specific pattern, count them and the pick latest. How to do this via Django ORM?
Tried:
Entry.objects.filter(A=B).annotate(Count("something")).latest("date")
This counts only 1 item for each B. If I remove latest("date"), it counts correctly but gives only count number and nothing else. How to perform this task correctly?
UPD: Actual code
def render_entries(request):
ids = Entry.objects.values("entry_token").distinct()
entries = [Entry.objects.filter(entry_token=x["entry_token"]).annotate(count=Count("id")).latest("date_time") for x in ids]
return render(request, "entries_list.html", {'entries':entries})
Updated Probably not most optimal, but this should work.
def render_entries(request):
# get list of all entry_tokens with count
tokens = Entry.objects.values('entry_token').annotate(count=Count('entry_token'))
# create list of entries
entries = []
for token in tokens:
entry = Entry.objects.filter(entry_token=token['entry_token']).latest('date_time')
entry.count = token['count']
entries.append(entry)
return render(request, "entries_list.html", {'entries':entries})
I am using bulk_create to loads thousands or rows into a postgresql DB. Unfortunately some of the rows are causing IntegrityError and stoping the bulk_create process. I was wondering if there was a way to tell django to ignore such rows and save as much of the batch as possible?
This is now possible on Django 2.2
Django 2.2 adds a new ignore_conflicts option to the bulk_create method, from the documentation:
On databases that support it (all except PostgreSQL < 9.5 and Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it).
Example:
Entry.objects.bulk_create([
Entry(headline='This is a test'),
Entry(headline='This is only a test'),
], ignore_conflicts=True)
One quick-and-dirty workaround for this that doesn't involve manual SQL and temporary tables is to just attempt to bulk insert the data. If it fails, revert to serial insertion.
objs = [(Event), (Event), (Event)...]
try:
Event.objects.bulk_create(objs)
except IntegrityError:
for obj in objs:
try:
obj.save()
except IntegrityError:
continue
If you have lots and lots of errors this may not be so efficient (you'll spend more time serially inserting than doing so in bulk), but I'm working through a high-cardinality dataset with few duplicates so this solves most of my problems.
(Note: I don't use Django, so there may be more suitable framework-specific answers)
It is not possible for Django to do this by simply ignoring INSERT failures because PostgreSQL aborts the whole transaction on the first error.
Django would need one of these approaches:
INSERT each row in a separate transaction and ignore errors (very slow);
Create a SAVEPOINT before each insert (can have scaling problems);
Use a procedure or query to insert only if the row doesn't already exist (complicated and slow); or
Bulk-insert or (better) COPY the data into a TEMPORARY table, then merge that into the main table server-side.
The upsert-like approach (3) seems like a good idea, but upsert and insert-if-not-exists are surprisingly complicated.
Personally, I'd take (4): I'd bulk-insert into a new separate table, probably UNLOGGED or TEMPORARY, then I'd run some manual SQL to:
LOCK TABLE realtable IN EXCLUSIVE MODE;
INSERT INTO realtable
SELECT * FROM temptable WHERE NOT EXISTS (
SELECT 1 FROM realtable WHERE temptable.id = realtable.id
);
The LOCK TABLE ... IN EXCLUSIVE MODE prevents a concurrent insert that creates a row from causing a conflict with an insert done by the above statement and failing. It does not prevent concurrent SELECTs, only SELECT ... FOR UPDATE, INSERT,UPDATE and DELETE, so reads from the table carry on as normal.
If you can't afford to block concurrent writes for too long you could instead use a writable CTE to copy ranges of rows from temptable into realtable, retrying each block if it failed.
Or 5. Divide and conquer
I didn't test or benchmark this thoroughly, but it performs pretty well for me. YMMV, depending in particular on how many errors you expect to get in a bulk operation.
def psql_copy(records):
count = len(records)
if count < 1:
return True
try:
pg.copy_bin_values(records)
return True
except IntegrityError:
if count == 1:
# found culprit!
msg = "Integrity error copying record:\n%r"
logger.error(msg % records[0], exc_info=True)
return False
finally:
connection.commit()
# There was an integrity error but we had more than one record.
# Divide and conquer.
mid = count / 2
return psql_copy(records[:mid]) and psql_copy(records[mid:])
# or just return False
Even in Django 1.11 there is no way to do this. I found a better option than using Raw SQL. It using djnago-query-builder. It has an upsert method
from querybuilder.query import Query
q = Query().from_table(YourModel)
# replace with your real objects
rows = [YourModel() for i in range(10)]
q.upsert(rows, ['unique_fld1', 'unique_fld2'], ['fld1_to_update', 'fld2_to_update'])
Note: The library only support postgreSQL
Here is a gist that I use for bulk insert that supports ignoring IntegrityErrors and returns the records inserted.
Late answer for pre Django 2.2 projects :
I ran into this situation recently and I found my way out with a seconder list array for check the uniqueness.
In my case, the model has that unique together check, and bulk create is throwing Integrity Error exception because of the array of bulk create has duplicate data in it.
So I decided to create checklist besides bulk create objects list. Here is the sample code; The unique keys are owner and brand, and in this example owner is an user object instance and brand is a string instance:
create_list = []
create_list_check = []
for brand in brands:
if (owner.id, brand) not in create_list_check:
create_list_check.append((owner.id, brand))
create_list.append(ProductBrand(owner=owner, name=brand))
if create_list:
ProductBrand.objects.bulk_create(create_list)
it's work for me
i am use this this funtion in thread.
my csv file contains 120907 no of rows.
def products_create():
full_path = os.path.join(settings.MEDIA_ROOT,'productcsv')
filename = os.listdir(full_path)[0]
logger.debug(filename)
logger.debug(len(Product.objects.all()))
if len(Product.objects.all()) > 0:
logger.debug("Products Data Erasing")
Product.objects.all().delete()
logger.debug("Products Erasing Done")
csvfile = os.path.join(full_path,filename)
csv_df = pd.read_csv(csvfile,sep=',')
csv_df['HSN Code'] = csv_df['HSN Code'].fillna(0)
row_iter = csv_df.iterrows()
logger.debug(row_iter)
logger.debug("New Products Creating")
for index, row in row_iter:
Product.objects.create(part_number = row[0],
part_description = row[1],
mrp = row[2],
hsn_code = row[3],
gst = row[4],
)
# products_list = [
# Product(
# part_number = row[0] ,
# part_description = row[1],
# mrp = row[2],
# hsn_code = row[3],
# gst = row[4],
# )
# for index, row in row_iter
# ]
# logger.debug(products_list)
# Product.objects.bulk_create(products_list)
logger.debug("Products uploading done")```
Is there a better way to query so that i dont have to loop and append the users to a list because when the length of items is very large say 10000 then it has to query 10000 times. Is there a way to avoid it?
user_list = []
items = Items.objects.all()
for item in items:
userobjects = Users.objects.filter(item=item.id)
user = userobjects[0].user
user_list.append(user)
Update
As pointed out by Sandip Agarwal, if you only want the first one, there is not easy SQL AFAIK, for performance I would do it like
user_list = []
cur = None
# If there is customized ordering,
# you could put the customization after 'item' in the following code
for user in Users.objects.filter(item__isnull=False).order_by('item').iterator():
if cur != user.item_id:
user_list.append(user)
cur = user.item_id
You haven't provided any code of models Item and User, so simply guess here:
User.objects.filter(item__isnull=False)
# or
User.objects.filter(item__in=Items.objects.all())
You can use MYSQL join to join tables and get the expected result without iterrating.
Try:
user_list = User.objects.filter(users__items__in=Items.objects.all())
However, since you're not filtering the items, you're really just looking for Users that have any Item, so you can actually do:
user_list = User.objects.filter(users__items__isnull=False)
You can achieve this by getting all Item id's then using that to filter the User's with an IN clause (in the list), with a distinct filter on the returned users. I can't test directly at the moment, but something like the following should do this for you:
item_ids = Item.objects.all().values_list('id', flat=True) # Get a list of Item id's values, since you don't need the whole Item object
user_list = User.objects.filter(items__id__in=item_ids).distinct('id') #Distinct on user id/username, filtered on those users matching an item id
user_list = list( user_list ) # Convert to a list
Note: You can use Item.objects.all() directly instead of the values_list. This may result in a bigger/slower query, try profiling it.
If this doesn't resolve the question you might need to add a clearer description of exactly what you want to see in an output. You're doing a select and taking the first value that comes out (usually this will be the first one in the database). You may be able to emulate that with the above by sorting before distinct:
user_list = User.objects.filter(items__id__in=item_ids).order_by('id').distinct('id')
If this is something you are going to do a lot, a faster solution may be to add another field to the Item record called first_user, that is set when the first user is added via the save hook.
I have a form that has a group of 13 checkboxes that together make up my search criteria... except that I also added a pair of radio buttons for ALL or ANY.
I was hoping to get away with something elegant like:
priority_ids = request.GET.getlist("priority") # checkboxes
collection = request.GET.get("collection") # radio buttons
priorities = []
for priority_id in priority_ids:
priorities.append(Q(focus__priority=priority_id))
if (collection == "any"): qset = any(priorities)
elif (collection == "all"): qset = all(priorities)
However, any() and all() return a boolean, not a queryset that I can use in a filter. I want an "any" or "all" that does the equivalent of "Q(...) | Q(...) | Q(...)" or "Q(...) & Q(...) & Q(...)" for anywhere from 1 to 13 criteria.
There's nothing that Django needs to do about that. You just need to combine your Q-s with & and respectively |, in a simple loop or in a more compact way with reduce.
And regarding terminology it seems to me that you are calling Q a queryset, but it's not. It's a filter on a queryset.
Something like the below should work:
priority_ids = request.GET.getlist("priority")
collection = request.GET.get("collection")
priority_filters = []
for priority_id in priority_ids:
priority_filters.append(Q(focus__priority=priority_id))
base_qs = SomeModel.objects.all()
if collection == "any":
filtered_qset = base_qs.filter(reduce(operator.or_, priority_filters))
elif collection == "all":
filtered_qset = base_qs.filter(reduce(operator.and_, priority_filters))
You can query dynamically with Q() as below...
from django.db.models import Q
from priority_app.models import Priority # get your model identified
#(whatever API place you had this):
priority_id = request.GET.get("priority") #getlist is isn't a proper list
field_id = request.query_params.get('collection')
if priority_id:
priority_id_list = list(map(str.strip, priority_id.split(",")))
dynamic_query_filters = Priority.objects.none()
for a_item in priority_id_list:
if collection == "any":
dynamic_query_filters |= (Q(id=a_item)) # OR query
else:
dynamic_query_filters &= (Q(id=a_item)) # AND query
queryset = Priority.objects.filter(dynamic_query_filters)
# you will notice, it executes in order of "query", maybe take them in different variables and chain them accordingly.