Django, accessing PostgreSQL sequence - django

In a Django application I need to create an order number which looks like: yyyymmddnnnn in which yyyy=year, mm=month, dd=day and nnnn is a number between 1 and 9999.
I thought I could use a PostgreSQL sequence since the generated numbers are atomic, so I can be sure when the process gets a number that number is unique.
So I created a PostgreSQL sequence:
CREATE SEQUENCE order_number_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9999
START 1
CACHE 1
CYCLE;
This sequence can be accessed as a tables having one row. So in the file checkout.py I created a Django model to access this sequence.
class OrderNumberSeq(models.Model):
"""
This class maps to OrderNumberSeq which is a PostgreSQL sequence.
This sequence runs from 1 to 9999 after which it restarts (cycles) at 1.
A sequence is basically a special single row table.
"""
sequence_name = models.CharField(max_length=128, primary_key=True)
last_value = models.IntegerField()
increment_by = models.IntegerField()
max_value = models.IntegerField()
min_value = models.IntegerField()
cache_value = models.IntegerField()
log_cnt = models.IntegerField()
is_cycled = models.BooleanField()
is_called = models.BooleanField()
class Meta:
db_table = u'order_number_seq'
I set the sequence_name as primary key as Django insists on having a primary key in a table.
The I created a file get_order_number.py with the contents:
def get_new_order_number():
order_number = OrderNumberSeq.objects.raw("select sequence_name, nextval('order_number_seq') from order_number_seq")[0]
today = datetime.date.today()
year = u'%4s' % today.year
month = u'%02i' % today.month
day = u'%02i' % today.day
new_number = u'%04i' % order_number.nextval
return year+month+day+new_number
now when I call 'get_new_order_number()' from the django interactive shell it behaves as expected.
>>> checkout.order_number.get_new_order_number()
u'201007310047'
>>> checkout.order_number.get_new_order_number()
u'201007310048'
>>> checkout.order_number.get_new_order_number()
u'201007310049'
You see the numbers nicely incrementing by one every time the function is called. You can start multiple interactive django sessions and the numbers increment nicely with no identical numbers appearing in the different sessions.
Now I try to use call this function from a view as follows:
import get_order_number
order_number = get_order_number.get_new_order_number()
and it gives me a number. However next time I access the view, it increments the number by 2. I have no idea where the problem is.

The best solution I can come up with is: don't worry if your order numbers are sparse. It should not matter if an order number is missing: there is no way to ensure that order numbers are contiguous that will not be subject to a race condition at some point.
Your biggest problem is likely to be convincing the pointy-haired ones that having 'missing' order numbers is not a problem.
For more details, see the Psuedo-Key Neat Freak entry in SQL Antipatterns. (note, this is a link to a book, which the full text of is not available for free).

Take a look at this question/answer Custom auto-increment field in postgresql (Invoice/Order No.)
You can create stored procedures using RawSql migration.

Related

How to add weeks to a datetime column, depending on a django model/dictionary?

Context
There is a dataframe of customer invoices and their due dates.(Identified by customer code)
Week(s) need to be added depending on customer code
Model is created to persist the list of customers and week(s) to be added
What is done so far:
Models.py
class BpShift(models.Model):
bp_name = models.CharField(max_length=50, default='')
bp_code = models.CharField(max_length=15, primary_key=True, default='')
weeks = models.IntegerField(default=0)
helper.py
from .models import BpShift
# used in views later
def week_shift(self, df):
df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
weeks=BpShift.objects.get(pk=df['BpCode']).weeks)
I realised my understanding of Dataframes is seriously flawed.
df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).
Dataframe
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)
Customer List csv
BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1
Error
BpShift matching query does not exist.
Commentary
I used these methods in hope that I would be able to change the dataframe at once, instead of
using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this
is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!
This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although
pandas.Series(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('weeks', flat=True)
)
will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.
So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this
pandas.DataFrame(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('pk', 'weeks'),
columns=['BpCode', 'weeks'],
)
should give you a DataFrame that you can join with.
So combined this should be the gist of your code:
django_response = [('customer1', 1), ('customer2', '2')]
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
DueDate weeks new_duedate
BpCode
customer1 2020-05-30 1 2020-06-06
customer2 2020-04-30 2 2020-05-14
You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.
NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.

Django - creating and saving multiple object in a loop, with ForeignKeys

I am having trouble creating and saving objects in Django. I am very new to Django so I'm sure I'm missing something very obvious!
I am building a price comparison app, and I have a Search model:
Search - all searches carried out, recording best price, worst price, product searched for, time of search etc. I have successfully managed to save these searches to a DB and am happy with this model.
The two new models I am working with are:
Result - this is intended to record all search results returned, for each search carried out. I.e. Seller 1 £100, Seller 2 £200, Seller 3, £300. (One search has many search results).
'Agent' - a simple table of Agents that I compare prices at. (One Agent can have many search Results).
class Agent(models.Model):
agent_id = models.AutoField(primary_key=True)
agent_name = models.CharField(max_length=30)
class Result(models.Model):
search_id = models.ForeignKey(Search, on_delete=models.CASCADE) # Foreign Key of Search table
agent_id = models.ForeignKey(Agent, on_delete=models.CASCADE) # Foreign Key of Agent table
price = models.FloatField()
search_position = models.IntegerField().
My code that is creating and saving the objects is here:
def update_search_table(listed, product):
if len(listed) > 0:
search = Search(product=product,
no_of_agents=len(listed),
valid_search=1,
best_price=listed[0]['cost'],
worst_price=listed[-1]['cost'])
search.save()
for i in range(len(listed)):
agent = Agent.objects.get(agent_name = listed[i]['company'])
# print(agent.agent_id) # Prints expected value
# print(search.search_id) # Prints expected value
# print(listed[i]['cost']) # Prints expected value
# print(i + 1) # Prints expected value
result = Result(search_id = search,
agent_id = agent,
price = listed[i]['cost'],
position = i + 1)
search.result_set.add(result)
agent.result_set.add(result)
result.save()
Up to search.save() is working as expected.
The first line of the for loop is also correctly retrieving the relevant Agent.
The rest of it is going wrong (i.e. not saving any Result objects to the Result table). What I want to achieve is, if there are 10 different agent results returned, create 10 Result objects and save each one. Link each of those 10 objects to the Search that triggered the results, and link each of those 10 objects to the relevant Agent.
Have tried quite a few iterations but not sure where I'm going wrong.
Thanks

Django SUM from INNER JOIN

I have a DB like this:
class MyCPU(models.Model):
cpu_name = models.CharField(max_length=100)
cpu_count = models.IntegerField()
class MyMachine(models.Model):
hostname = models.CharField(max_length=50)
ip = models.CharField(max_length=50)
cpu = models.ForeignKey(CPU, on_delete=models.CASCADE)
How can I achieve the result of following raw SQL command in Django ?
select sum(cpu_count) as sum_cpu from my_machine inner join my_cpu on my_machine.cpu_id=my_cpu.id
I basically want to sum how many CPU in all of machines.
I have tried this solution but it did not work
Machine.objects.annotate(total_cpu=Sum('cpu__cpu_count'))
Since you are using foreign key, You can do
MyMachine.objects.values('hostname', 'ip', 'cpu__cpu_count')
This will get each Machine how many cpu`s.
If you need total number of cpu`s
MyCPU.objects.aggregate(total_cpu=Sum('cpu_count'))['total_cpu']
If there are unconnected CPU objects, you can do following to get sum from all machines,
MyMachine.objects.aggregate(total_cpu=Sum('cpu__cpu_count'))['total_cpu']
I think the last one is you are searching for since there is chance of same CPU object in different machines.

Creating an object with a field based on the last one in django

Consider the following Model:
class Tickets(models.Model):
number = models.PositiveIntegerField(default=0)
created_at = models.DateTimeField(auto_now_add=True)
and the following method that creates a ticket:
#transaction.atomic()
def create_ticket(self):
last = Tickets.objects.order_by('created_at').last()
next_number = 1 if last is None else last.number + 1
new_ticket = Tickets.objects.create(number=next_number)
return new_ticket
This fails if create_ticket is called concurrently, as the transaction won't have a reason to abort (no same record has been updated by multiple calls) but last will return repeated results on some runs.
I'm aware of auto incrementing fields and they're not an option in this case, as there is extra logic in place that may cause next_number to reset to 0. I'm leaning towards finding a way to acquire a table lock but was hoping to find an easier way.

Django ORM - LEFT JOIN with WHERE clause

I have made a previous post related to this problem here but because this is a related but new problem I thought it would be best to make another post for it.
I'm using Django 1.8
I have a User model and a UserAction model. A user has a type. UserAction has a time, which indicates how long the action took as well as a start_time which indicates when the action began. They look like this:
class User(models.Model):
user_type = models.IntegerField()
class UserAction:
user = models.ForeignKey(User)
time = models.IntegerField()
start_time = models.DateTimeField()
Now what I want to do is get all users of a given type and the sum of time of their actions, optionally filtered by the start_time.
What I am doing is something like this:
# stubbing in a start time to filter by
start_time = datetime.now() - datetime.timedelta(days=2)
# stubbing in a type
type = 2
# this gives me the users and the sum of the time of their actions, or 0 if no
# actions exist
q = User.objects.filter(user_type=type).values('id').annotate(total_time=Coalesce(Sum(useraction__time), 0)
# now I try to add the filter for start_time of the actions to be greater than or # equal to start_time
q = q.filter(useraction__start_time__gte=start_time)
Now what this does is of course is an INNER JOIN on UserAction, thus removing all the users without actions. What I really want to do is the equivalent of my LEFT JOIN with a WHERE clause, but for the life of me I can't find how to do that. I've looked at the docs, looked at the source but am not finding an answer. I'm (pretty) sure this is something that can be done, I'm just not seeing how. Could anyone point me in the right direction? Any help would be very much appreciated. Thanks much!
I'm having the same kind of problem as you. I haven't found any proper way of solving the problem yet, but I've found a few fixes.
One way would be looping through all the users:
q = User.objects.filter(user_type=type)
for (u in q):
u.time_sum = UserAction.filter(user=u, start_time__gte=start_time).aggregate(time_sum=Sum('time'))['time_sum']
This method does however a query at the database for each user. It might do the trick if you don't have many users, but might get very time-consuming if you have a large database.
Another way of solving the problem would be using the extra method of the QuerySet API. This is a method that is detailed in this blog post by Timmy O'Mahony.
valid_actions = UserAction.objects.filter(start_time__gte=start_time)
q = User.objects.filter(user_type=type).extra(select={
"time_sum": """
SELECT SUM(time)
FROM userAction
WHERE userAction.user_id = user.id
AND userAction.id IN %s
""" % (%s) % ",".join([str(uAction.id) for uAction in valid_actions.all()])
})
This method however relies on calling the database with the SQL table names, which is very un-Django - if you change the db_table of one of your databases or the db_column of one of their columns, this code will no longer work. It though only requires 2 queries, the first one to get the list of valid userAction and the other one to sum them to the matching user.