Get id of next object in Django model - django

I have a function that is passed in the id of an object in my class Image. I need the id of the next object in the model. Currently, I am doing it in the least efficient way possible as I need to get all the objects to do this. My current implementation is:
def get_next_id(curr_id):
result = []
Image_list = Image.objects.all()
total = Image.objects.all().count()
for i in range(len(Image_list)):
result.append(Image_list[i].id)
index_curr = result.index(curr_id)
if index_curr == total:
new_index = 0
else:
new_index = index_curr + 1
return Image_list[new_index]
I would be grateful if someone could provide a better way, or make this one more efficient. Thank you.

I would suggest something like this:
def get_next_id(curr_id):
try:
ret = Image.objects.filter(id__gt=curr_id).order_by("id")[0:1].get().id
except Image.DoesNotExist:
ret = Image.objects.aggregate(Min("id"))['id__min']
return ret
This does not take care of the special case where the table is empty, but then you should not have a valid curr_id in the first place if the table is empty. It also does not protect against passing nonsensical values as curr_id.
What this does is get first id which is greater than the current one. The [0:1] slice limits the data returned from the database to the first record: in effect the database is performing the slice rather than Python. If there is no id greater than the current one, then get the lowest one.
In response to your comment about how to do it in reverse:
def get_prev_id(curr_id):
try:
ret = Image.objects.filter(id__lt=curr_id).order_by("-id")[0:1].get().id
except Image.DoesNotExist:
ret = Image.objects.aggregate(Max("id"))['id__max']
return ret
The changes are:
Use id__lt, and order by -id.
Use Max rather than Min for the aggregate, and use the id__max key rather than id__min to get the value.

Related

Efficiently updating a large number of records based on a field in that record using Django

I have about a million Comment records that I want to update based on that comment's body field. I'm trying to figure out how to do this efficiently. Right now, my approach looks like this:
update_list = []
qs = Comments.objects.filter(word_count=0)
for comment in qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
However, this hangs and seems to time out in my migration process. Does anybody have suggestions on how I can accomplish this?
It's not easy to determine the memory footprint of a Django object, but an absolute minimum is the amount of space needed to store all of its data. My guess is that you may be running out of memory and page-thrashing.
You probably want to work in batches of, say, 1000 objects at a time. Use Queryset slicing, which returns another queryset. Try something like
BATCH_SIZE = 1000
start = 0
base_qs = Comments.objects.filter(word_count=0)
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
start += BATCH_SIZE
if not batch_qs.exists():
break
update_list = []
for comment in batch_qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
print( f'Processed batch starting at {start}' )
Each trip around the loop will free the space occupied by the previous trip when it replaces batch_qs and update_list. The print statement will allow you to watch it progress at a hopefully acceptable, regular rate!
Warning - I have never tried this. I'm also wondering whether slicing and filtering will play nice with each other or whether one should use
base_qs = Comments.objects.all()
...
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
....
for comment in batch_qs.filter(word_count=0) :
so you are slicing your way though rows in the entire DB table and retrieving a subset of each slice that needs updating. This feels "safer". Anybody know for sure?

How to "create or update" using F object?

I need to create LeaderboardEntry if it is not exists. It should be updated with the new value (current + new) if exists. I want to achieve this with a single query. How can I do that?
Current code looking like this: (2 queries)
reward_amount = 50
LeaderboardEntry.objects.get_or_create(player=player)
LeaderboardEntry.objects.filter(player=player).update(golds=F('golds') + reward_amount)
PS: Default value of "golds" is 0.
you can have one query hit less with the defaults :
reward_amount = 50
leader_board, created = LeaderboardEntry.objects.get_or_create(
player=player,
defaults={
"golds": reward_amount,
}
)
if not created:
leader_board.golds += reward_amount
leader_board.save(update_fields=["golds"])
I think your problem is the get_or_create() method so it return back a tuple with two values, (object, created) so you have to recieve them in your code as following:
reward_amount = 50
entry, __ = LeaderboardEntry.objects.get_or_create(player=player)
entry.golds += reward_amount
entry.save()
It will work better than your actual code, just will avoid make two queries.
Of course the save() method will hit again your database.
You can solve this with update_or_create:
LeaderboardEntry.objects.update_or_create(
player=player,
defaults={
'golds': F('golds') + reward_amount
}
)
EDIT:
Sorry, F expressions in update_or_create are not yet supported.

Python: How to create a function that uses its own output and uses an array of random generated numbers

Disclaimer: I am quite new to Python and programming as a whole.
I have been trying to create a function to generated random stock prices using the following:
New stock price = previous price + (previous price*(return + (volatility * random number)))
The return and volatility numbers are fixed. Also, I have generated the random numbers for N times.
The problem is how to create a function that has the output re-used again on itself as an input previous price.
Basically to have an array of NEW stock prices generated from this formula and the previous price variable is the output of the function on itself.
I have been trying to do this for a couple of days and I am sure I am not fully equipped to do it (given that I am a newbie) but ANY HELP would really really be more than appreciated...!!!
Please any help would be useful.
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
random_numbers = []
for i in range (5):
random_numbers.append(random.gauss(0,1))
def stock_prices(random_numbers):
prices = []
for i in range(0,len(random_numbers)):
calc = initial_price + (initial_price * (return_daily+(vol_daily*random_numbers[i])))
prices.append(calc)
return prices
You can't really use recursion here, because you don't have a break condition that ends the recursion. You could construct one by passing an additional counter parameter that specifies how many more levels to recurse, but that would be not optimal in my opinion.
Instead, I recommend you to use a for loop that gets repeated a fixed number of times you can specify. This way you can add one new price value to a list per loop iteration step and access the previous one to calculate it:
first_price = 100
list_length = 20
def price_formula(previous_price):
return previous_price * 1.2 # you would replace this with your actual calculation
prices = [first_price] # create list with initial item
for i in range(list_length): # repeats exactly 'list_length' times, turn number is 'i'
prices.append(price_formula(prices[-1])) # append new price to list
# prices[-1] always returns the last element of the list, i.e. the previously added one.
print("\n".join(map(str, prices)))
My optimization of your code snippet:
import random
initial_price = 10
return_daily = 0.12 / 252
vol_daily = 0.30 / (math.sqrt(252))
def stock_prices(number_of_prices):
prices = [initial_price]
for i in range(0, number_of_prices):
prices.append(prices[-1] + (prices[-1] * (return_daily+(vol_daily*random.gauss(0,1))))
return prices
This is the classic Markov process. The present value depends upon its previous value, and only its previous value. The best thing to use in this case is what is called an iterator. Iterators can be created to generate arbitrary iterators that model the markov model.
Learn about how iterators can be generated here http://anandology.com/python-practice-book/iterators.html
Now that you have some understanding of how iterators work, you can create your own iterators for your problem. You need a class that implements the __iter__() method and the next() method.
Something like this:
import random
from math import sqrt
class Abc:
def __init__(self, initPrice):
self.v = initPrice # This is the initial price
self.dailyRet = 0.12/252
self.dailyVol = 0.3/sqrt(252)
return
def __iter__(self): return self
def next(self):
self.v += self.v * (self.dailyRet + self.dailyVol*random.gauss(0,1) )
return self.v
if __name__ == '__main__':
initPrice = 10
temp = Abc(initPrice)
for i in range(10):
print temp.next()
This will give the output:
> python test.py
10.3035353791
10.3321905359
10.3963790497
10.5354048937
10.6345509793
10.2598381299
10.3336476153
10.6495914319
10.7915999185
10.6669136891
Note that this does not have the stop iteration command, so if you use this incorrectly, you may get into trouble. However, that is not difficult to implement and I hope you try to implement it ...

Python, is there a easier way to add values to a default key?

The program I am working does the following:
Grabs stdout from a .perl program
Builds a nested dict from the output
I'm using the AutoVivification approach found here to build a default nested dictionary. I'm using this method of defaultdict because it's easier for me to follow as a new programmer.
I'd like to add one key value to a declared key per pass of the for line in the below code. Is there a easier way to add values to a key beyond making a [list] of values then adding said values as a group?
import pprint
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
reg = 'NtUser'
od = Vividict()
od[reg]
def run_rip():
os.chdir('/Users/ME/PycharmProjects/RegRipper2.8') # Path to regripper dir
for k in ntDict:
run_command = "".join(["./rip.pl", " -r
/Users/ME/Desktop/Reg/NTUSER.DAT -p ", str(k)])
process = subprocess.Popen(run_command,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = process.communicate() # wait for the process to terminate
parse(out)
# errcode = process.returncode // used in future for errorcode checking
ntDict.popitem(last=False)
def parse(data):
pattern = re.compile('lastwrite|(\d{2}:\d{2}:\d{2})|alert|trust|Value')
grouping = re.compile('(?P<first>.+?)(\n)(?P<second>.+?)
([\n]{2})(?P<rest>.+[\n])', re.MULTILINE | re.DOTALL)
if pattern.findall(data):
match = re.search(grouping, data)
global first
first = re.sub("\s\s+", " ", match.group('first'))
od[reg][first]
second = re.sub("\s\s+", " ", match.group('second'))
parse_sec(second)
def parse_sec(data):
pattern = re.compile(r'^(\(.*?\)) (.*)$')
date = re.compile(r'(.*?\s)(.*\d{2}:\d{2}:\d{2}.*)$')
try:
if pattern.match(data):
result = pattern.match(data)
hive = result.group(1)
od[reg][first]['Hive'] = hive
desc = result.group(2)
od[reg][first]['Description'] = desc
elif date.match(data):
result = date.match(data)
hive = result.group(1)
od[reg][first]['Hive'] = hive
time = result.group(2)
od[reg][first]['Timestamp'] = time
else:
od[reg][first]['Finding'] = data
except IndexError:
print('error w/pattern match')
run_rip()
pprint.pprint(od)
Sample Input:
bitbucket_user v.20091020
(NTUSER.DAT) TEST - Get user BitBucket values
Software\Microsoft\Windows\CurrentVersion\Explorer\BitBucket
LastWrite Time Sat Nov 28 03:06:35 2015 (UTC)
Software\Microsoft\Windows\CurrentVersion\Explorer\BitBucket\Volume
LastWrite Time = Sat Nov 28 16:00:16 2015 (UTC)
If I understand your question correctly, you want to change the lines where you're actually adding values to your dictionary (e.g. the od[reg][first]['Hive'] = hive line and the similar one for desc and time) to create a list for each reg and first value and then extend that list with each item being added. Your dictionary subclass takes care of creating the nested dictionaries for you, but it won't build a list at the end.
I think the best way to do this is to use the setdefault method on the inner dictionary:
od[reg][first].setdefault("Hive", []).append(hive)
The setdefault will add the second value (the "default", here an empty list) to the dictionary if the first argument doesn't exist as a key. It preempts the dictionary's __missing__ method creating the item, which is good, since we want a the value to be list rather than another layer of dictionary. The method returns the value for the key in all cases (whether it added a new value or if there was one already), so we can chain it with append to add our new hive value to the list.

Iterating over a large unicode list taking a long time?

I'm working with the program Autodesk Maya.
I've made a naming convention script that will name each item in a certain convention accordingly. However I have it list every time in the scene, then check if the chosen name matches any current name in the scene, and then I have it rename it and recheck once more through the scene if there is a duplicate.
However, when i run the code, it can take as long as 30 seconds to a minute or more to run through it all. At first I had no idea what was making my code run slow, as it worked fine in a relatively low scene amount. But then when i put print statements in the check scene code, i saw that it was taking a long time to check through all the items in the scene, and check for duplicates.
The ls() command provides a unicode list of all the items in the scene. These items can be relatively large, up to a thousand or more if the scene has even a moderate amount of items, a normal scene would be several times larger than the testing scene i have at the moment (which has about 794 items in this list).
Is this supposed to take this long? Is the method i'm using to compare things inefficient? I'm not sure what to do here, the code is taking an excessive amount of time, i'm also wondering if it could be anything else in the code, but this seems like it might be it.
Here is some code below.
class Name(object):
"""A naming convention class that runs passed arguments through user
dictionary, and returns formatted string of users input naming convention.
"""
def __init__(self, user_conv):
self.user_conv = user_conv
# an example of a user convention is '${prefix}_${name}_${side}_${objtype}'
#staticmethod
def abbrev_lib(word):
# a dictionary of abbreviated words is here, takes in a string
# and returns an abbreviated string, if not found return given string
#staticmethod
def check_scene(name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
scene = ls()
match = [x for x in scene if isinstance(x, collections.Iterable)
and (name in x)]
if not match:
return name
else:
return ''
def convert(self, prefix, name, side, objtype):
"""Converts given information about object into user specified convention.
Keyword Arguments:
prefix -- what is prefixed before the name
name -- name of the object or node
side -- what side the object is on, example 'left' or 'right'
obj_type -- the type of the object, example 'joint' or 'multiplyDivide'
"""
prefix = self.abbrev_lib(prefix)
name = self.abbrev_lib(name)
side = ''.join([self.abbrev_lib(x) for x in side])
objtype = self.abbrev_lib(objtype)
i = 02
checked = ''
subs = {'prefix': prefix, 'name': name, 'side':
side, 'objtype': objtype}
while self.checked == '':
newname = Template (self.user_conv.lower())
newname = newname.safe_substitute(**subs)
newname = newname.strip('_')
newname = newname.replace('__', '_')
checked = self.check_scene(newname)
if checked == '' and i < 100:
subs['objtype'] = '%s%s' %(objtype, i)
i+=1
else:
break
return checked
are you running this many times? You are potentially trolling a list of several hundred or a few thousand items for each iteration inside while self.checked =='', which would be a likely culprit. FWIW prints are also very slow in Maya, especially if you're printing a long list - so doing that many times will definitely be slow no matter what.
I'd try a couple of things to speed this up:
limit your searches to one type at a time - why troll through hundreds of random nodes if you only care about MultiplyDivide right now?
Use a set or a dictionary to search, rather than a list - sets and dictionaries use hashsets and are faster for lookups
If you're worried about maintining a naming convetion, definitely design it to be resistant to Maya's default behavior which is to append numeric suffixes to keep names unique. Any naming convention which doesn't support this will be a pain in the butt for all time, because you can't prevent Maya from doing this in the ordinary course of business. On the other hand if you use that for differntiating instances you don't need to do any uniquification at all - just use rename() on the object and capture the result. The weakness there is that Maya won't rename for global uniqueness, only local - so if you want to make unique node name for things that are not siblings you have to do it yourself.
Here's some cheapie code for finding unique node names:
def get_unique_scene_names (*nodeTypes):
if not nodeTypes:
nodeTypes = ('transform',)
results = {}
for longname in cmds.ls(type = nodeTypes, l=True):
shortname = longname.rpartition("|")[-1]
if not shortname in results:
results[shortname] = set()
results[shortname].add(longname)
return results
def add_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
if shortname in node_dict:
node_dict[shortname].add(item)
else:
node_dict[shortname] = set([item])
def remove_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
existing = node_dict.get(shortname, [])
if item in existing:
existing.remove(item)
def apply_convention(node, new_name, node_dict):
if not new_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, new_name), l=True)[0]
remove_unique_name(node, node_dict)
add_unique_name ( renamed_item, node_dict)
return renamed_item
else:
for n in range(99999):
possible_name = new_name + str(n + 1)
if not possible_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, possible_name), l=True)[0]
add_unique_name(renamed_item, node_dict)
return renamed_item
raise RuntimeError, "Too many duplicate names"
To use it on a particular node type, you just supply the right would-be name when calling apply_convention(). This would rename all the joints in the scene (naively!) to 'jnt_X' while keeping the suffixes unique. You'd do something smarter than that, like your original code did - this just makes sure that leaves are unique:
joint_names= get_unique_scene_names('joint')
existing = cmds.ls( type='joint', l = True)
existing .sort()
existing .reverse()
# do this to make sure it works from leaves backwards!
for item in existing :
apply_convention(item, 'jnt_', joint_names)
# check the uniqueness constraint by looking for how many items share a short name in the dict:
for d in joint_names:
print d, len (joint_names[d])
But, like i said, plan for those damn numeric suffixes, maya makes them all the time without asking for permission so you can't fight em :(
Instead of running ls for each and every name, you should run it once and store that result into a set (an unordered list - slightly faster). Then check against that when you run check_scene
def check_scene(self, name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
if not hasattr(self, 'scene'):
self.scene = set(ls())
if name not in self.scene:
return name
else:
return ''