Why does django-sphinx only output 20 results? How can I get the rest? - django

Doing a search using django-sphinx gives me results._sphinx that says there were 68 results, but when I iterate over them, I can only get at the first 20 of them.
I'm SURE there's a way around this, and that this is by design, but it's officially stumping the heck out of me. Does anybody know how to get the complete queryset?

I figured this out finally.
Apparently, the querysets only return 20 hits until you access the queryset. Or something like that.
So, if you explicitly want the iterate over the whole thing, you have to do:
for result in results[0:results.count()]:
print result
Or something to that effect, which will query the entire thing explicitly. Ugh. This should be clearly documented...but it not.

After hacking through source, I set the _limit variable explicitly.. Does the job, and issues an actual limit:
qs = MyEntity.search.query(query_string)
qs._limit = limit
for result in qs:
print result

work for me:
in sphinx config file:
max_matches = 5000
in django code:
desc_obj = Dictionary.search.query( search_desc )
desc_obj._maxmatches = 5000
or in settings:
SPHINX_MAX_MATCHES = 5000

Related

Efficiently updating a large number of records based on a field in that record using Django

I have about a million Comment records that I want to update based on that comment's body field. I'm trying to figure out how to do this efficiently. Right now, my approach looks like this:
update_list = []
qs = Comments.objects.filter(word_count=0)
for comment in qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
However, this hangs and seems to time out in my migration process. Does anybody have suggestions on how I can accomplish this?
It's not easy to determine the memory footprint of a Django object, but an absolute minimum is the amount of space needed to store all of its data. My guess is that you may be running out of memory and page-thrashing.
You probably want to work in batches of, say, 1000 objects at a time. Use Queryset slicing, which returns another queryset. Try something like
BATCH_SIZE = 1000
start = 0
base_qs = Comments.objects.filter(word_count=0)
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
start += BATCH_SIZE
if not batch_qs.exists():
break
update_list = []
for comment in batch_qs:
model_obj = Comments.objects.get(id=comment.id)
model_obj.word_count = len(model_obj.body.split())
update_list.append(model_obj)
Comment.objects.bulk_update(update_list, ['word_count'])
print( f'Processed batch starting at {start}' )
Each trip around the loop will free the space occupied by the previous trip when it replaces batch_qs and update_list. The print statement will allow you to watch it progress at a hopefully acceptable, regular rate!
Warning - I have never tried this. I'm also wondering whether slicing and filtering will play nice with each other or whether one should use
base_qs = Comments.objects.all()
...
while True:
batch_qs = base_qs[ start: start+BATCH_SIZE ]
....
for comment in batch_qs.filter(word_count=0) :
so you are slicing your way though rows in the entire DB table and retrieving a subset of each slice that needs updating. This feels "safer". Anybody know for sure?

Trying to import dictionary to work with from a url; 'unicode' object not callable

I'm new to coding and have searched as best I can to find out how to solve this before asking.
I'm trying to pull information from poloniex.com REST api, which is in JSon format I believe. I can import the data, and work with it a little bit, but when I try to call and use the elements in the contained dictionaries, I get "'unicode' object not callable". How can I use this information? The end goal with this data is to pull the "BTC: "(volume)" for each coin pair and test if it is <100, and if not, append it to a new list.
The data is presented like this or you can see yourself at https://poloniex.com/public?command=return24hVolume:
{"BTC_LTC":{"BTC":"2.23248854","LTC":"87.10381314"},"BTC_NXT":{"BTC":"0.981616","NXT":"14145"}, ... "totalBTC":"81.89657704","totalLTC":"78.52083806"}
And my code I've been trying to get to work with currently looks like this(I've tried to iterate the information I want a million different ways, so I dunno what example to give for that part, but this is how I am importing the data):
returnvolume = urllib2.urlopen(urllib2.Request('https://poloniex.com/public?command=return24hVolume'))
coinvolume = json.loads(returnvolume.read())
coinvolume = dict(coinvolume)
No matter how I try to use the data I've pulled, I get an error stating:
"unicode' object not callable."
I'd really appreciate a little help, I'm concerned I may be approaching this the wrong way as I haven't been able to get anything to work, or maybe I'm just missing something rudimentary, I'm not sure.
Thank you very much for your time!
Thanks to another user, downshift, I have discovered the answer!
d = {}
for k, v in coinvolume.items():
try:
if float(v['BTC']) > 100:
d[k] = v
except KeyError:
d[k] = v
except TypeError:
if v > 100:
d[k] = k
This creates a new list, d, and adds every coin with a 'BTC' volume > 100 to this new list.
Thanks again downshift, and I hope this helps others as well!

Joining many callisto images

c1 = CallistoSpectrogram.read('BIR_20110922_101500_01.fit')
c2 = CallistoSpectrogram.read('BIR_20110922_103000_01.fit')
d = CallistoSpectrogram.join_many([c1, c2])
If I want to join approximately 40 files like this, it is throwing the following error
ValueError: Too large gap.
Is there any limit in number?
This error is an internal error of the sunpy package that you are using. Really your question is not to do with python but to do with that package. You need to tag it with that.
But we can see what's going on by looking at the source, eg here. It shows that the ValueError is thrown when two adjacent spectra are separated by more than the maxgap parameter which defaults to zero.
So one fix might be simply to pass in maxgap = None
d = CallistoSpectrogram.join_many([c1, c2],maxgap = None)
That assumes you don't mind the gaps, of course.

Django schedule: Difference between event.get_occurrence() and event.get_occurrences()?

Based on django-schedule. I can't find the guy who made it.
Pardon me if I'm missing something, but I've been trying to get an events occurrences, preferably for a given day.
When I use event.get_occurrence(date), it always returns nothing. But when I use event.get_occurrences(before_date, after_date), suddenly the occurrences on the previously attempted date show up.
Why won't this work with just one datetime object?
This difference is probably in the actual design of these two methods. Frankly, get_occurrence is rather flawed, in general. A method like this should always return something, even if it's just None, but there's scenarios where it doesn't return at all. Namely, if your event doesn't have an rrule, and the date you passed get_occurrence isn't the same as your event's start, then no value is returned.
There's not really anything that can be done about that. It's just flawed code.
Based on the above comment especially in the case when the event doesn't return and occurrence, the following snippet below can force the retrieval of an occurrence especially when you are sure that it exists
from dateutil.relativedelta import relativedelta
def custom_get_occurrence(event,start_date):
occurrence = event.get_occurrence(start_date)
if occurrence is None:
occurrences = event.get_occurrences(start_date, start_date+relative_delta(months=3)
result = filter(lambda x: x.start==start_date,occurrences)
occurence = result[0]
The above code resolves issue that might occur when the default get_occurrence doesn't return a result.

Django Query in a loop fails for no good reason

I have this code:
msgs = int(post['time_in_weeks'])
for i in range(msgs):
tip_msg = Tip.objects.get(week_number=i)
it always results in an error saying that no values could be found.
week_number is an integer field. When I input the value of i directly,
the query works.
When i print out the value of i I get the expected values.
Any input at all would be seriously appreciated.
Thanks.
The range function will give you a zero based list of numbers up to and excluding msgs. I guess there is no Tip with week_number=0.
Also, to limit the number of queries you could do this:
for tip in Tip.objects.filter(week_number__lt=msgs):
#do something
or, if you want specific weeks:
weeks=(1,3,5)
for tip in Tip.objects.filter(week_number__in=weeks):
#do something