Modeling user for distance queries - django

We want to be able to query local sellers based on seller's preferred distance to a geolocation set by a customer using PostgreSQL and Django models.
Our seller model is given below:
class Seller(AbstractModel):
user = models.OneToOneField('users.User')
distance = models.IntegerField()
geoposition = GeopositionField()
For example:
Seller John can travel 50 miles. If buyer's location(lat, long) is 49 miles a John's geoposition, we get John.

Alternatively you can write a solutions with just postgreSQL.
Try filtering the query using a distance calculation between the buyer and seller location and compare that to the sellers preferred distance.
It appears you want a arithmetic query correct? I think the above example will help you ask me and the community with more questions.

Related

Hierarchical (three levels deep) Bayesian pymc3 model

I am working on a Bayesian Hierarchical linear model in Pymc3.
The model consists of three input variables on a daily level: number of users, product category and product sku and the output variable is revenue. In total the data consists of roughly 73.000 records with 180 categories and 12.000 sku's. Moreover, some categories/sku's are highly present while other categories aren't. An example of the data is shown in the link:
Preview of the data
As the data on sku level is very sparse an hierarchical model has been chosen with the intent that sku's with less data should shrink towards the category level mean and if a category is scarce the group level mean should shrink towards the overall mean.
In the final model the categories are label encoded and the continuous variables users and revenue are min-max scaled.
At this point the model is formalized as follows:
with pm.Model() as model:
sigma_overall = pm.HalfNormal("sigma_overall", mu=50)
sigma_category = pm.HalfNormal("sigma_category", mu=sigma_overall)
sigma_sku = pm.HalfNormal("sigma_sku", sigma=sigma_category, shape=n_sku)
beta = pm.HalfNormal("beta", sigma=sigma_sku, shape=n_sku)
epsilon = pm.HalfCauchy("epsilon", 1)
y = pm.Deterministic('y', beta[category_idx][sku_idx] * df['users'].values)
y_likelihood = pm.Normal("y_likelihood", mu=y, sigma=epsilon, observed=df['revenue'].values)
trace = pm.sample(2000)
The main hurdle is that the model is very slow. It takes hours, sometimes a day before the model completes. Metropolis- or NUTS sampling with find_MAP() did not make a difference. Furthermore, I doubt whether the model is formalized correctly as I am pretty new to Pymc3.
A review of the model and advice to speed it up is very welcome.

Inventory design management - FIFO / Weighted Average design which also needs historical inventory value

I have a database with the following details:
Product
Name
SKU
UOM (There is a UOM master, so all purchase and sales are converted to base uom and stored in the db)
Some other details
has_attribute
has_batch
Attributes
Name
Details/Remarks
Product-Attribute
Product (FK)
Attribute(FK)
Value of attribute
Inventory Details
#This is added for every product lot bought & quantity available is updated after every sale
Product (FK)
Warehouse (FK to warehoue model)
Purchase Date
Purchase Price
MRP
Tentative sales price
Quantity_bought
Quantity_available
Other batch details if applicable(batch id, manufactured_date, expiry_date)
Inventory Ledger
#This table records all in & out movement of inventory
Product
Warehouse (FK to warehoue model)
Transaction Type (Purchase/Sales)
Quantity_transacted(i.e. quantity purchased/sold)
Inventory_Purchase_cost(So as to calculate inventory valuation)
Now, my problem is:
I need to find out the historical inventory cost. For example, let's say I need to find out the value of inventory on 10th Feb 2017, what I'll be doing with the current table is not very efficient: I'll find out current inventory and go back through the ledger for all 1000-1500 SKU and about 100 transactions daily (for each sku) for more than 120 days and come to a value. taht's about 1500*100*120. It's Huge. Is there a better DB design to handle this case?
Firstly, have you tested it? 1500*100*120 is not that huge. It may be acceptable performance and there is no problem to be solved!
I'm not 100% clear how you compute the value. Do you sum up the InventoryLedger rows for each Product in each Warehouse? Is so, it's easy to put a value on the purchases, but how do you value the sales? I'm going to assume that you value the sales using the Inventory_Purchase_Cost (so it should maybe be called TransactionValue instead).
If you must optimise it, I suggest you could populate a record each day for the valuation of the product in each warehouse. I suggest the following StockValution table could be populated daily and this would allow quick computation of the valuations for any historical day.
Diagram made using QuickDBD, where I work.

How to generate Sum (and other aggregates) in Django where aggregate depends on values from related tables

My model consists of a Portfolio, a Holding, and a Company. Each Portfolio has many Holdings, and each Holding is of a single Company (a Company may be connected to many Holdings).
Portfolio -< Holding >- Company
I'd like the Portfolio query to return the sum of the product of the number of Holdings in the Portfolio, and the value of the Company.
Simplified model:
class Portfolio(model):
some fields
class Company(model):
closing = models.DecimalField(max_digits=10, decimal_places=2)
class Holding(model):
portfolio = models.ForeignKey(Portfolio)
company = models.ForeignKey(Company)
num_shares = models.IntegerField(default=0)
I'd like to be able to query:
Portfolio.objects.some_function()
and have each row annotated with the value of the Portfolio, where the value is equal to the sum of the product of the related Company.closing, and Holding.num_shares. ie something like:
annotate(value=Sum('holding__num_shares * company__closing'))
I'd also like to obtain a summary row, which contains the sum of the values of all of a user's Portfolios, and a count of the number of holdings. ie something like:
aggregate(Sum('holding__num_shares * company__closing'), Count('holding__num_shares'))
I would like to do have a similar summary row for a single Portfolio, which would be the sum of the values of each holding, and a count of the total number of holdings in the portfolio.
I managed to get part of the way there using extra:
return self.extra(
select={
'value': 'select sum(h.num_shares * c.closing) from portfolio_holding h '
'inner join portfolio_company as c on h.company_id = c.id '
'where h.portfolio_id = portfolio_portfolio.id'
}).annotate(Count('holding'))
but this is pretty ugly, and extra seems to be frowned upon, for obvious reasons.
My question is: is there a more Djangoistic way to summarise and annotate queries based on multiple fields, and across related tables?
These two options seem to move in the right direction:
Portfolio.objects.annotate(Sum('holding__company__closing'))
(ie this demonstrates annotation/aggregation over a field in a related table)
Holding.objects.annotate(Sum('id', field='num_shares * id'))
(this demonstrates annotation/aggregation over the product of two fields)
but if I attempt to combine them: eg
Portfolio.objects.annotate(Sum('id', field='holding__company__closing * holding__num_shares'))
I get an error: "No such column 'holding__company__closing'.
So far I've looked at the following related questions, but none of them seem to capture this precise problem:
Annotating django QuerySet with values from related table
Product of two fields annotation
Do I just need to bite the bullet and use raw / extra? I'm hoping that Django ORM will prove the exception to the rule that ORMs really only work as designed for simple queries / models, and anything beyond the most basic ones require either seriously gnarly tap-dancing, or stepping out of the abstraction, which somewhat defeats the purpose...
Thanks in advance!

How do I filter for prices using Satchmo/django?

I'd like to find the minimum and maximum prices for a defined category of products.
I'd also like to be able to do the reverse, i.e, find all products given a defined price range.
The problem is that Satchmo does not have price in it's product model. How can I solve this problem?
Min/max prices for a category:
Product.objects.filter(category=some_category).aggregate(Min('price'), Max('price'))
Products filtered by price range:
Product.objects.filter(price__price__range=(5,10))

Query for a ManytoMany Field with Through in Django

I have a models in Django that are something like this:
class Classification(models.Model):
name = models.CharField(choices=class_choices)
...
class Activity(models.Model):
name = models.CharField(max_length=300)
fee = models.ManyToManyField(Classification, through='Fee')
...
class Fee(models.Model):
activity = models.ForeignKey(Activity)
class = models.ForeignKey(Classification)
early_fee = models.IntegerField(decimal_places=2, max_digits=10)
regular_fee = models.IntegerField(decimal_places=2, max_digits=10)
The idea being that there will be a set of fees associated with each Activity and Classification pair. Classification is like Student, Staff, etc.
I know that part works right.
Then in my application, I query for a set of Activities with:
activities = Activity.objects.filter(...)
Which returns a list of activities. I need to display in my template that list of Activities with their Fees. Something like this:
Activity Name
Student Early Price - $4
Student Regular Price - $5
Staff Early Price - $6
Staff Regular Price - $8
But I don't know of an easy way to get this info without a specific get query of the Fees object for each activity/class pair.
I hoped this would work:
activity.fee.all()
But that just returns the Classification Object. Is there a way to get the Fee Object Data for the Pair via the Activities I already queried?
Or am I doing this completely wrong?
Considering michuk's tip to rename "fee" to "classification":
Default name for Fee objects on Activity model will be fee_set. So in order to get your prices, do this:
for a in Activity.objects.all():
a.fee_set.all() #gets you all fees for activity
There's one thing though, as you can see you'll end up doing 1 SELECT on each activity object for fees, there are some apps that can help with that, for example, django-batch-select does only 2 queries in this case.
First of all I think you named your field wrong. This:
fee = models.ManyToManyField(Classification, through='Fee')
should be rather that:
classifications = models.ManyToManyField(Classification, through='Fee')
as ManyToManyField refers to a list of related objects.
In general ManyToManyField, AFAIK, is only a django shortcut to enable easy fetching of all related objects (Classification in your case), with the association table being transparent to the model. What you want is the association table (Fee in your case) not being transparent.
So what I would do is to remove the ManyToManyField field from Activity and simply get all the fees related with the activity. And thenm if you need a Classification for each fee, get the Classification separately.