Django 3.2.3 Pagination isn't working properly - django

I have a Class Based View who isn't working properly (duplicating objects and deleting some)
Tested it in shell
from django.core.paginator import Paginator
from report.models import Grade, Exam
f = Exam.objects.all().filter(id=7)[0].full_mark
all = Grade.objects.all().filter(exam_id=7, value__gte=(f*0.95)).order_by('-value')
p = Paginator(all, 12)
for i in p.page(1).object_list:
... print(i.id)
2826
2617
2591
2912
2796
2865
2408
2501
2466
2681
2616
2563
for i in p.page(2).object_list:
... print(i.id)
2558
2466
2563
2920
2681
2824
2498
2854
2546
2606
2598
2614

Making an order_by call before passing the query_set all to the pagination is the root of the problem and well explained here. All you need is to call distinct() or specify another field in the order_by to use in case of same value.
Below is the code that should work, you also don't need to use all() in your queries. The filter by default applies on all the model objects.
from django.core.paginator import Paginator
from report.models import Grade, Exam
f = Exam.objects.filter(id=7).first().full_mark
all = Grade.objects.filter(exam_id=7, value__gte=(f*0.95)).order_by('-value').distinct()
p = Paginator(all, 12)
for i in p.page(1).object_list:
... print(i.id)
By the way, your code will crash if an exam object with id=7 is not found. You should assign the full_mark value to your f variable conditionally.

Related

PVLIB: How can I add module and inverter specifications which are not present in CEC and SAM library?

I am working on a PV system installed in Amsterdam. The PVsystem code is as follows. I am getting good results with the inverter and the modules specified in the code which is obtained with retrieve_sam.
import pvlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
from pandas.plotting import register_matplotlib_converters
from pvlib.modelchain import ModelChain
# Define location for the Netherlands
location = pvlib.location.Location(latitude=52.53, longitude=5.15, tz='UTC', altitude=50, name='amsterdam')
#import the database
module_database = pvlib.pvsystem.retrieve_sam(name='SandiaMod')
inverter_database = pvlib.pvsystem.retrieve_sam(name='cecinverter')
module = module_database.Canadian_Solar_CS5P_220M___2009_
# module = module_database.DMEGC_Solar_320_M6_120BB_ (I want to add this module)
inverter = inverter_database.ABB__PVI_3_0_OUTD_S_US__208V_
temperature_model_parameters = pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']
modules_per_string = 10
inverter_per_string = 1
# Define a PV system characteristics
surface_tilt = 12.5
surface_azimuth = 180
system = pvlib.pvsystem.PVSystem(surface_tilt=surface_tilt, surface_azimuth=surface_azimuth, albedo=0.25,
module=module, module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string, inverter_per_string=inverter_per_string,
inverter=inverter, inverter_parameters=inverter, racking_model='open_rack')
# Define a weather file
def importPSMData():
df = pd.read_csv('/Users/laxmikantradkar/Desktop/PVLIB/solcast_data1.csv', delimiter=';')
# Rename the columns for input to PVLIB
df.rename(columns={'Dhi': 'dhi', 'Dni': 'dni', 'Ghi': 'ghi', 'AirTemp': 'temp_air', 'WindSpeed10m': 'wind_speed',
}, inplace=True)
df.rename(columns={'Year': 'year', 'Month': 'month', 'Day': 'day', 'Hour': 'hour',
'Minute': 'minute'}, inplace=True)
df['dt'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
df.set_index(df['dt'], inplace=True)
# Rename data parameters to run to datetime
# df.rename(columns={'PeriodEnd': 'period_end'}, inplace=True)
# Drop unnecessary columns
df = df.drop('PeriodStart', 1)
df = df.drop('Period', 1)
df = df.drop('Azimuth', 1)
df = df.drop('CloudOpacity', 1)
df = df.drop('DewpointTemp', 1)
df = df.drop('Ebh', 1)
df = df.drop('PrecipitableWater', 1)
df = df.drop('SnowDepth', 1)
df = df.drop('SurfacePressure', 1)
df = df.drop('WindDirection10m', 1)
df = df.drop('Zenith', 1)
return df
mc = ModelChain(system=system, location=location)
weatherData = importPSMData()
mc.run_model(weather=weatherData)
ac_energy = mc.ac
# ac_energy.to_csv('/Users/laxmikantradkar/Desktop/ac_energy_netherlands.csv')
plt.plot(ac_energy)
plt.show()
Now I want to change the module and inverter which is not present in the library. Could anyone please tell me how to do this?
Is it possible to access the library and manually add the row/column of inverter and module? If yes, where is the library located?
Is it ../Desktop/PVLIB/venv/lib/python3.8/site-packages/pvlib/data/sam-library-sandia-modules-2015-6-30.csv
When I change try to change the module/inverter parameters from above path, I receive an error as DataFrame' object has no attribute 'Module name'
I started working on PVLIB_python 2 days ago, so I am new to the language. I really appreciate your help. Feel free to correct me at any point.
I started working on PVLIB_python 2 days ago, so I am new to the
language. I really appreciate your help. Feel free to correct me at
any point.
Welcome to the community! If you haven't already I encourage you to dig through the pvlib-python documentation and continue to learn Python basics through playing with the examples in the documentation. I encourage you to checkout the pandas tutorials and any other highly rated pandas learning material you can find to get yourself running with data science in Python.
When I change try to change the module/inverter parameters from above
path, I receive an error as DataFrame' object has no attribute 'Module
name'
This is because you're asking for a column in the DataFrame table that's not there. No worries, you can make your own module.
Now I want to change the module and inverter which is not present in
the library. Could anyone please tell me how to do this? Is it possible to access the library and manually add the row/column
of inverter and module? If yes, where is the library located?
It isn't necessary to change the library. You can construct a module yourself since it is a Series from the pandas library. Here's an example showing how you can output the module as a dictionary, change a couple parameters and create your own module.
my_new_module = module.copy() # create your own copy of the module
print("Before:", my_new_module, sep="\n") # show module before
my_new_module["Notes"] = "This is how to change a field in the module. Do this for every field in the module."
my_new_module.name = "DMEGC_Solar_320_M6_120BB_" # rename the Series appropriately
print("\nAfter:", my_new_module, sep="\n") # show module after
Then you can just insert "my_new_module" into PVSystem:
system = pvlib.pvsystem.PVSystem(
surface_tilt=surface_tilt,
surface_azimuth=surface_azimuth,
albedo=0.25,
module=my_new_module, # HERE'S THE NEW MODULE!
module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string,
inverter_per_string=inverter_per_string,
inverter=inverter,
inverter_parameters=inverter,
racking_model='open_rack')
The hard part here is having the right coefficients that you can trust. You may have an easier time using module_database = pvlib.pvsystem.retrieve_sam(name='CECMod') and replacing those parameters since they can be substituted more easily with data from the module spec sheet.
This should work identically for inverters as well.

Duplicate elements in Django Paginate after `order_by` call

I'm using Django 1.7.7.
I'm wondering if anyone has experienced this. This is my query:
events = Event.objects.filter(
Q(date__gt=my_date) | Q(date__isnull=True)
).filter(type__in=[...]).order_by('date')
When I try to then paginate it
p = Paginator(events, 10)
p.count # Gives 91
event_ids = []
for i in xrange(1, p.count / 10 + 2):
event_ids += [i.id for i in p.page(i)]
print len(event_ids) # Still 91
print len(set(event_ids)) # 75
I noticed that if I removed the .order_by, I don't get any duplicates. I then tried just .order_by with Event.objects.all().order_by('date') which gave no duplicates.
Finally, I tried this:
events = Event.objects.filter(
Q(date__gt=my_date) | Q(date__isnull=True)
).order_by('date')
p = Paginator(events, 10)
events.count() # Gives 131
p.count # Gives 131
event_ids = []
for i in xrange(1, p.count / 10 + 2):
event_ids += [i.id for i in p.page(i)]
len(event_ids) # Gives 131
len(set(event_ids)) # Gives 118
... and there are duplicates. Can anyone explain what's going on?
I dug into the Django source (https://github.com/django/django/blob/master/django/core/paginator.py#L46-L55) and it seems to be something to do with how Django slices the object_list.
Any help is appreciated. Thanks.
Edit: distinct() has no affect on the duplicates. There aren't any duplicates in the database and I don't think the query introduces any duplicates ([e for e in events.iterator()] doesn't produce any duplicates). It's just when the Paginator is slicing.
Edit2: Here's a more complete example
In [1]: from django.core.paginator import Paginator
In [2]: from datetime import datetime, timedelta
In [3]: my_date = timezone.now()
In [4]: 1 events = Event.objects.filter(
2 Q(date__gt=my_date) | Q(date__isnull=True)
3 ).order_by('date')
In [5]: events.count()
Out[5]: 134
In [6]: p = Paginator(events, 10)
In [7]: p.count
Out[7]: 134
In [8]: event_ids = []
In [9]: 1 for i in xrange(1, p.num_pages + 1):
2 event_ids += [j.id for j in p.page(i)]
In [10]: len(event_ids)
Out[10]: 134
In [11]: len(set(event_ids))
Out[11]: 115
oh, shot in the dark, but i think i might know what it is. i wasn't able to reproduce it in sqlite but using mysql. i think mysql trying to sort on a column that has the same value has it returning the same results during slicing
the pagination splicing basically does an sql statement of
SELECT ... FROM ... WHERE (date > D OR date IS NULL) ORDER BY date ASC LIMIT X OFFSET X
But when date is null I'm not sure how mysql sorts it. So when I tried two sql queries of LIMIT 10 and LIMIT 10 OFFSET 10 it returned sets that had the same rows, while LIMIT 20 produce a unique set.
you can try to update your order_by to order_by('id', 'date') to have it sort by a unique field first and it may fix it.
Try to use .distinct() on your query before passing it to Paginator.

Django date string translation to english

I am parsing an excel file with dates in it and the date format changes throughout the document. One of the formats is '19 Mart 1912', 'Mart' is the month name in Turkish.
I want to translate this string into English (using Django's builtin translations) as '19 March 1912'.
I tried:
#views.py
from dataparsers import *
def getEnglishDate(request)
translateDateTimeStr('19 Mart 1912')
#dataparsers.py
from django.utils import translation
def translateDateTimeStr(datestr)
translation.activate('en')
translatedDateStr = _(datestr)
translation.deactivate()
return(translatedDateStr)
But nothing changes and I get the same string...
It appears Django's I18N tools has a hard time translating a string with the presence of number characters. I was able to workaround that with the following method:
def translateDateTimeStr(s):
t = []
w = s.split()
for s in w:
translation.activate('de')
t.append(_(s))
translation.deactivate()
return ' '.join(t)
print translateDateTimeStr('12 March 1912')
>>> 12 März 1912
# Does not work
def translateDateTimeStr(s):
translation.activate('de')
t = _(s)
translation.deactivate()
return t
print translateDateTimeStr('12 March 1912')
>>> 12 March 1912
You'll also want to make sure you have USE_I18N=True in your settings.py and your LOCALE_PATH.

Unhashable type error with sklearn and importing a CSV

I'm trying to execute the below code and I don't understand what I'm doing wrong. The purpose of the code is to use Python's & sklearn's train_test_split function to partition the data into training and testing chunks.
The data (downloadable here) is cost of rent data for various houses/condos, along with each house/condo's properties. Ultimately I'm trying to use predictive modeling to predict rent prices (so rent prices are the target). Here's the code:
import pandas as pd
rentdata = pd.read_csv('6000_clean.csv')
import sklearn as sk
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
#trying to make a all rows of the first column and b all rows of columns 2-46, i.e., a will be only target data (rent prices) and b will be the data.
a, b = rentdata[ : ,0], rentdata[ : ,1:46]
What results is the following error:
TypeError Traceback (most recent call last)
<ipython-input-24-789fb8e8c2f6> in <module>()
8 from sklearn.cross_validation import train_test_split
9
---> 10 a, b = rentdata[ : ,0], rentdata[ : ,1:46]
11
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
2001 # get column
2002 if self.columns.is_unique:
-> 2003 return self._get_item_cache(key)
2004
2005 # duplicate columns
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
665 return cache[item]
666 except Exception:
--> 667 values = self._data.get(item)
668 res = self._box_item_values(item, values)
669 cache[item] = res
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
1653 def get(self, item):
1654 if self.items.is_unique:
-> 1655 _, block = self._find_block(item)
1656 return block.get(item)
1657 else:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _find_block(self, item)
1933
1934 def _find_block(self, item):
-> 1935 self._check_have(item)
1936 for i, block in enumerate(self.blocks):
1937 if item in block:
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\internals.pyc in _check_have(self, item)
1939
1940 def _check_have(self, item):
-> 1941 if item not in self.items:
1942 raise KeyError('no item named %s' % com.pprint_thing(item))
1943
C:\Users\Nick\Anaconda\lib\site-packages\pandas\core\index.pyc in __contains__(self, key)
317
318 def __contains__(self, key):
--> 319 hash(key)
320 # work around some kind of odd cython bug
321 try:
TypeError: unhashable type
You can download the CSV to get a look at the data here: http://wikisend.com/download/776790/6000_clean.csv
I downloaded your data and modified your problem line to this:
a, b = rentdata.iloc[0], rentdata.iloc[1:46]
iloc selects row by position, see the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
This now selects the first row and rows 2-46 (remember that slicing is open-closed, includes begin of range but not the end of the range)
Note you can always select the first row using head:
a, b = rentdata.head(0), rentdata.iloc[1:46]
would also work
In [5]:
a
Out[5]:
Monthly $ rent 1150
Location alameda
# of bedrooms 1
# of bathrooms 1
# of square feet NaN
Latitude 37.77054
Longitude -122.2509
Street address 1500-1598 Lincoln Lane
# more rows so trimmed for brevity here
.......
In [9]: b
Out[9]:
# too large to paste here
.....
45 rows × 46 columns

The queryset's `count` is wrong after `extra`

When I use extra in a certain way on a Django queryset (call it qs), the result of qs.count() is different than len(qs.all()). To reproduce:
Make an empty Django project and app, then add a trivial model:
class Baz(models.Model):
pass
Now make a few objects:
>>> Baz(id=1).save()
>>> Baz(id=2).save()
>>> Baz(id=3).save()
>>> Baz(id=4).save()
Using the extra method to select only some of them produces the expected count:
>>> Baz.objects.extra(where=['id > 2']).count()
2
>>> Baz.objects.extra(where=['-id < -2']).count()
2
But add a select clause to the extra and refer to it in the where clause, and the count is suddenly wrong, even though the result of all() is correct:
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).all()
[<Baz: Baz object>, <Baz: Baz object>] # As expected
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).count()
0 # Should be 2
I think the problem has to do with django.db.models.sql.query.BaseQuery.get_count(). It checks whether the BaseQuery's select or aggregate_select attributes have been set; if so, it uses a subquery. But django.db.models.sql.query.BaseQuery.add_extra adds only to the BaseQuery's extra attribute, not select or aggregate_select.
How can I fix the problem? I know I could just use len(qs.all()), but it would be nice to be able to pass the extra'ed queryset to other parts of the code, and those parts may call count() without knowing that it's broken.
Redefining get_count() and monkeypatching appears to fix the problem:
def get_count(self):
"""
Performs a COUNT() query using the current filter constraints.
"""
obj = self.clone()
if len(self.select) > 1 or self.aggregate_select or self.extra:
# If a select clause exists, then the query has already started to
# specify the columns that are to be returned.
# In this case, we need to use a subquery to evaluate the count.
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
subquery.clear_ordering(True)
subquery.clear_limits()
obj = AggregateQuery(obj.model, obj.connection)
obj.add_subquery(subquery)
obj.add_count_column()
number = obj.get_aggregation()[None]
# Apply offset and limit constraints manually, since using LIMIT/OFFSET
# in SQL (in variants that provide them) doesn't change the COUNT
# output.
number = max(0, number - self.low_mark)
if self.high_mark is not None:
number = min(number, self.high_mark - self.low_mark)
return number
django.db.models.sql.query.BaseQuery.get_count = quuux.get_count
Testing:
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).count()
2
Updated to work with Django 1.2.1:
def basequery_get_count(self, using):
"""
Performs a COUNT() query using the current filter constraints.
"""
obj = self.clone()
if len(self.select) > 1 or self.aggregate_select or self.extra:
# If a select clause exists, then the query has already started to
# specify the columns that are to be returned.
# In this case, we need to use a subquery to evaluate the count.
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
subquery.clear_ordering(True)
subquery.clear_limits()
obj = AggregateQuery(obj.model)
obj.add_subquery(subquery, using=using)
obj.add_count_column()
number = obj.get_aggregation(using=using)[None]
# Apply offset and limit constraints manually, since using LIMIT/OFFSET
# in SQL (in variants that provide them) doesn't change the COUNT
# output.
number = max(0, number - self.low_mark)
if self.high_mark is not None:
number = min(number, self.high_mark - self.low_mark)
return number
models.sql.query.Query.get_count = basequery_get_count
I'm not sure if this fix will have other unintended consequences, however.