Django - creating and saving multiple object in a loop, with ForeignKeys

Django - creating and saving multiple object in a loop, with ForeignKeys - django

I am having trouble creating and saving objects in Django. I am very new to Django so I'm sure I'm missing something very obvious!
I am building a price comparison app, and I have a Search model:
Search - all searches carried out, recording best price, worst price, product searched for, time of search etc. I have successfully managed to save these searches to a DB and am happy with this model.
The two new models I am working with are:
Result - this is intended to record all search results returned, for each search carried out. I.e. Seller 1 £100, Seller 2 £200, Seller 3, £300. (One search has many search results).
'Agent' - a simple table of Agents that I compare prices at. (One Agent can have many search Results).
class Agent(models.Model):
agent_id = models.AutoField(primary_key=True)
agent_name = models.CharField(max_length=30)
class Result(models.Model):
search_id = models.ForeignKey(Search, on_delete=models.CASCADE) # Foreign Key of Search table
agent_id = models.ForeignKey(Agent, on_delete=models.CASCADE) # Foreign Key of Agent table
price = models.FloatField()
search_position = models.IntegerField().
My code that is creating and saving the objects is here:
def update_search_table(listed, product):
if len(listed) > 0:
search = Search(product=product,
no_of_agents=len(listed),
valid_search=1,
best_price=listed[0]['cost'],
worst_price=listed[-1]['cost'])
search.save()
for i in range(len(listed)):
agent = Agent.objects.get(agent_name = listed[i]['company'])
# print(agent.agent_id) # Prints expected value
# print(search.search_id) # Prints expected value
# print(listed[i]['cost']) # Prints expected value
# print(i + 1) # Prints expected value
result = Result(search_id = search,
agent_id = agent,
price = listed[i]['cost'],
position = i + 1)
search.result_set.add(result)
agent.result_set.add(result)
result.save()
Up to search.save() is working as expected.
The first line of the for loop is also correctly retrieving the relevant Agent.
The rest of it is going wrong (i.e. not saving any Result objects to the Result table). What I want to achieve is, if there are 10 different agent results returned, create 10 Result objects and save each one. Link each of those 10 objects to the Search that triggered the results, and link each of those 10 objects to the relevant Agent.
Have tried quite a few iterations but not sure where I'm going wrong.
Thanks

Related

Django Query - Get list that isnt in FK of another model

I am working on a django web app that manages payroll based on reports completed, and then payroll generated. 3 models as follows. (ive tried to limit to data needed for question).
class PayRecord(models.Model):
rate = models.FloatField()
user = models.ForeignKey(User)
class Payroll(models.Model):
company = models.ForeignKey(Company)
name = models.CharField()
class PayrollItem(models.Model):
payroll = models.ForeignKey(Payroll)
record = models.OneToOneField(PayRecord, unique=True)
What is the most efficient way to get all the PayRecords that aren't also in PayrollItem. So i can select them to create a payroll item.
There are 100k records, and my initial attempt takes minutes. Attempt tried below (this is far from feasible).
records_completed_in_payrolls = [
p.report.id for p in PayrollItem.objects.select_related(
'record',
'payroll'
)
]

Because you have the related field record in PayrollItem you can reach into that model while you filter PayRecord. Using the __isnull should give you what you want.
PayRecord.objects.filter(payrollitem__isnull=True)
Translates to a sql statement like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON payroll_payrecord.id = payroll_payrollitem.record_id
WHERE payroll_payrollitem.id IS NULL
Depending on your intentions, you may want to chain on a .select_related (https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-related)
PayRecord.objects.filter(payrollitem__isnull=True).select_related('user')
which translates to something like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id,
payroll_user.id,
payroll_user.name
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON (payroll_payrecord.id = payroll_payrollitem.record_id)
INNER JOIN payroll_user
ON (payroll_payrecord.user_id = payroll_user.id)
WHERE payroll_payrollitem.id IS NULL

Django ORM - LEFT JOIN with WHERE clause

I have made a previous post related to this problem here but because this is a related but new problem I thought it would be best to make another post for it.
I'm using Django 1.8
I have a User model and a UserAction model. A user has a type. UserAction has a time, which indicates how long the action took as well as a start_time which indicates when the action began. They look like this:
class User(models.Model):
user_type = models.IntegerField()
class UserAction:
user = models.ForeignKey(User)
time = models.IntegerField()
start_time = models.DateTimeField()
Now what I want to do is get all users of a given type and the sum of time of their actions, optionally filtered by the start_time.
What I am doing is something like this:
# stubbing in a start time to filter by
start_time = datetime.now() - datetime.timedelta(days=2)
# stubbing in a type
type = 2
# this gives me the users and the sum of the time of their actions, or 0 if no
# actions exist
q = User.objects.filter(user_type=type).values('id').annotate(total_time=Coalesce(Sum(useraction__time), 0)
# now I try to add the filter for start_time of the actions to be greater than or # equal to start_time
q = q.filter(useraction__start_time__gte=start_time)
Now what this does is of course is an INNER JOIN on UserAction, thus removing all the users without actions. What I really want to do is the equivalent of my LEFT JOIN with a WHERE clause, but for the life of me I can't find how to do that. I've looked at the docs, looked at the source but am not finding an answer. I'm (pretty) sure this is something that can be done, I'm just not seeing how. Could anyone point me in the right direction? Any help would be very much appreciated. Thanks much!

I'm having the same kind of problem as you. I haven't found any proper way of solving the problem yet, but I've found a few fixes.
One way would be looping through all the users:
q = User.objects.filter(user_type=type)
for (u in q):
u.time_sum = UserAction.filter(user=u, start_time__gte=start_time).aggregate(time_sum=Sum('time'))['time_sum']
This method does however a query at the database for each user. It might do the trick if you don't have many users, but might get very time-consuming if you have a large database.
Another way of solving the problem would be using the extra method of the QuerySet API. This is a method that is detailed in this blog post by Timmy O'Mahony.
valid_actions = UserAction.objects.filter(start_time__gte=start_time)
q = User.objects.filter(user_type=type).extra(select={
"time_sum": """
SELECT SUM(time)
FROM userAction
WHERE userAction.user_id = user.id
AND userAction.id IN %s
""" % (%s) % ",".join([str(uAction.id) for uAction in valid_actions.all()])
})
This method however relies on calling the database with the SQL table names, which is very un-Django - if you change the db_table of one of your databases or the db_column of one of their columns, this code will no longer work. It though only requires 2 queries, the first one to get the list of valid userAction and the other one to sum them to the matching user.

Django query get common items based on attribute

I have a model as follows:
class Item(models.Model):
VENDOR_CHOICES = (
('a', 'A'),
('b', 'B')
)
vendor = models.CharField(max_length=16, choices=VENDOR_CHOICES)
name = models.CharField(max_length=255)
price = models.DecimalField(max_digits=6, decimal_places=2)
Now I have 2 data sources, so I get items from vendor A and items from vendor B.
In some cases vendor A may not have the same items as Vendor B, say vendor A has 30 items and Vendor B has 442 items, out of which only 6 items are common. Items that are common are defined as items that have the exact same name.
I need to also find the difference in prices of items that are common to vendor a and vendor b items, meaning the items that have the same name in vendor a and vendor b. I have a large no. of items which may go upto 10k items per vendor, so a efficient way of doing this would be required?

I think that something like this should work:
vendor_a_items = Item.objects.filter(vendor='a')
vendor_b_items = Item.objects.filter(vendor='b')
common_items = vendor_a_items.filter(
name__in=vendor_b_items.values_list('name', flat=True))
UPDATE: To find the price difference you can just loop over the found common items:
for a_item in common_items:
b_item = vendor_b_items.get(name=a_item.name)
print u'%s: %s' % (a_item.name, a_item.price - b_item.price)
This adds a db hit for each found item but if you have a small number of common items then this solution will work just fine. For larger intersection you can load all prices from vendor_b_items in one query. Use this code instead of the previous snippet.
common_items_d = {item.name: item for item in common_items}
for b_item in vendor_b_items.filter(name__in=common_items_d.keys()):
print u'%s: %s' % (b_item.name,
common_items_d[b_item.name].price - b_item.price)

Starting from django 1.11, this can be resolved using the built in intersection and difference methods.
vendor_a_items = Item.objects.filter(vendor='a')
vendor_b_items = Item.objects.filter(vendor='b')
common_items = vendor_a_items.intersection(vendor_b_items)
vendor_a_exclussive_items = vendor_a_items.difference(vendor_b_items)
vendor_b_exclussive_items = vendor_b_items.difference(vendor_a_items)
See my blog post on this for more detailed use cases.

django paginate with non-model object

I'm working on a side project using python and Django. It's a website that tracks the price of some product from some website, then show all the historical price of products.
So, I have this class in Django:
class Product(models.Model):
price = models.FloatField()
date = models.DateTimeField(auto_now = True)
name = models.CharField()
Then, in my views.py, because I want to display products in a table, like so:
+----------+--------+--------+--------+--------+....
| Name | Date 1 | Date 2 | Date 3 |... |....
+----------+--------+--------+--------+--------+....
| Product1 | 100.0 | 120.0 | 70.0 | ... |....
+----------+--------+--------+--------+--------+....
...
I'm using the following class for rendering:
class ProductView(objects):
name = ""
price_history = {}
So that in my template, I can easily convert each product_view object into one table row. I'm also passing through context a sorted list of all available dates, for the purpose of constructing the head of the table, and getting the price of each product on that date.
Then I have logic in views that converts one or more products into this ProductView object. The logic looks something like this:
def conversion():
result_dict = {}
all_products = Product.objects.all()
for product in all_products:
if product.name in result_dict:
result_dict[product.name].append(product)
else:
result_dict[product.name] = [product]
# So result_dict will be like
# {"Product1":[product, product], "Product2":[product],...}
product_views = []
for products in result_dict.values():
# Logic that converts list of Product into ProductView, which is simple.
# Then I'm returning the product_views, sorted based on the price on the
# latest date, None if not available.
return sorted(product_views,
key = lambda x: get_latest_price(latest_date, x),
reverse = True)
As per Daniel Roseman and zymud, adding get_latest_price:
def get_latest_price(date, product_view):
if date in product_view.price_history:
return product_view.price_history[date]
else:
return None
I omitted the logic to get the latest date in conversion. I have a separate table that only records each date I run my price-collecting script that adds new data to the table. So the logic of getting latest date is essentially get the date in OpenDate table with highest ID.
So, the question is, when product grows to a huge amount, how do I paginate that product_views list? e.g. if I want to see 10 products in my web application, how to tell Django to only get those rows out of DB?
I can't (or don't know how to) use django.core.paginator.Paginator, because to create that 10 rows I want, Django needs to select all rows related to that 10 product names. But to figure out which 10 names to select, it first need to get all objects, then figure out which ones have the highest price on the latest date.
It seems to me the only solution would be to add something between Django and DB, like a cache, to store that ProductView objects. but other than that, is there a way to directly paginate produvt_views list?

I'm wondering if this makes sense:
The basic idea is, since I'll need to sort all product_views by the price on the "latest" date, I'll do that bit in DB first, and only get the list of product names to make it "paginatable". Then, I'll do a second DB query, to get all the products that have those product names, then construct that many product_views. Does it make sense?
To clear it a little bit, here comes the code:
So instead of
#def conversion():
all_products = Product.objects.all()
I'm doing this:
#def conversion():
# This would get me the latest available date
latest_date = OpenDate.objects.order_by('-id')[:1]
top_ten_priced_product_names = Product.objects
.filter(date__in = latest_date)
.order_by('-price')
.values_list('name', flat = True)[:10]
all_products_that_i_need = Product.objects
.filter(name__in = top_ten_priced_product_names)
# then I can construct that list of product_views using
# all_products_that_i_need
Then for pages after the first, I can modify that [:10] to say [10:10] or [20:10].
This makes the code pagination easier, and by pulling appropriate code into a separate function, it's also possible to do Ajax and all those fancy stuff.
But, here comes a problem: this solution needs three DB calls for every single query. Right now I'm running everything on the same box, but still I want to reduce this overhead to two(One or Opendate, the other for Product).
Is there a better solution that solves both the pagination problem and with two DB calls?

Django, accessing PostgreSQL sequence

In a Django application I need to create an order number which looks like: yyyymmddnnnn in which yyyy=year, mm=month, dd=day and nnnn is a number between 1 and 9999.
I thought I could use a PostgreSQL sequence since the generated numbers are atomic, so I can be sure when the process gets a number that number is unique.
So I created a PostgreSQL sequence:
CREATE SEQUENCE order_number_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9999
START 1
CACHE 1
CYCLE;
This sequence can be accessed as a tables having one row. So in the file checkout.py I created a Django model to access this sequence.
class OrderNumberSeq(models.Model):
"""
This class maps to OrderNumberSeq which is a PostgreSQL sequence.
This sequence runs from 1 to 9999 after which it restarts (cycles) at 1.
A sequence is basically a special single row table.
"""
sequence_name = models.CharField(max_length=128, primary_key=True)
last_value = models.IntegerField()
increment_by = models.IntegerField()
max_value = models.IntegerField()
min_value = models.IntegerField()
cache_value = models.IntegerField()
log_cnt = models.IntegerField()
is_cycled = models.BooleanField()
is_called = models.BooleanField()
class Meta:
db_table = u'order_number_seq'
I set the sequence_name as primary key as Django insists on having a primary key in a table.
The I created a file get_order_number.py with the contents:
def get_new_order_number():
order_number = OrderNumberSeq.objects.raw("select sequence_name, nextval('order_number_seq') from order_number_seq")[0]
today = datetime.date.today()
year = u'%4s' % today.year
month = u'%02i' % today.month
day = u'%02i' % today.day
new_number = u'%04i' % order_number.nextval
return year+month+day+new_number
now when I call 'get_new_order_number()' from the django interactive shell it behaves as expected.
>>> checkout.order_number.get_new_order_number()
u'201007310047'
>>> checkout.order_number.get_new_order_number()
u'201007310048'
>>> checkout.order_number.get_new_order_number()
u'201007310049'
You see the numbers nicely incrementing by one every time the function is called. You can start multiple interactive django sessions and the numbers increment nicely with no identical numbers appearing in the different sessions.
Now I try to use call this function from a view as follows:
import get_order_number
order_number = get_order_number.get_new_order_number()
and it gives me a number. However next time I access the view, it increments the number by 2. I have no idea where the problem is.

The best solution I can come up with is: don't worry if your order numbers are sparse. It should not matter if an order number is missing: there is no way to ensure that order numbers are contiguous that will not be subject to a race condition at some point.
Your biggest problem is likely to be convincing the pointy-haired ones that having 'missing' order numbers is not a problem.
For more details, see the Psuedo-Key Neat Freak entry in SQL Antipatterns. (note, this is a link to a book, which the full text of is not available for free).

Take a look at this question/answer Custom auto-increment field in postgresql (Invoice/Order No.)
You can create stored procedures using RawSql migration.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django - creating and saving multiple object in a loop, with ForeignKeys - django

Related

Django Query - Get list that isnt in FK of another model

Django ORM - LEFT JOIN with WHERE clause

Django query get common items based on attribute

django paginate with non-model object

Django, accessing PostgreSQL sequence

Categories

Resources