Severe memory leak with Django - django

I am facing a problem of huge memory leak on a server, serving a Django (1.8) app with Apache or Ngnix (The issue happens on both).
When I go on certain pages (let's say on the specific request below) the RAM of the server goes up to 16 G in few seconds (with only one request) and the server freeze.
def records(request):
"""Return list 14 last records page. """
values = []
time = timezone.now() - timedelta(days=14)
record =Records.objetcs.filter(time__gte=time)
return render(request,
'record_app/records_newests.html',
{
'active_nav_tab': ["active", "", "", ""]
' record': record,
})
When I git checkout to older version, back when there was no such problem, the problem survives and i have the same issue.
I Did a memory check with Gumpy for the faulty request here is the result:
>>> hp.heap()
Partition of a set of 7042 objects. Total size = 8588675016 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1107 16 8587374512 100 8587374512 100 unicode
1 1014 14 258256 0 8587632768 100 django.utils.safestring.SafeText
2 45 1 150840 0 8587783608 100 dict of 0x390f0c0
3 281 4 78680 0 8587862288 100 dict of django.db.models.base.ModelState
4 326 5 75824 0 8587938112 100 list
5 47 1 49256 0 8587987368 100 dict of 0x38caad0
6 47 1 49256 0 8588036624 100 dict of 0x39ae590
7 46 1 48208 0 8588084832 100 dict of 0x3858ab0
8 46 1 48208 0 8588133040 100 dict of 0x38b8450
9 46 1 48208 0 8588181248 100 dict of 0x3973fe0
<164 more rows. Type e.g. '_.more' to view.>

After a day of search I found my answer.
While investigating I checked statistics on my DB and saw that some table was 800Mo big but had only 900 rows. This table contains a Textfield without max len. Somehow one text field got a huge amount of data inserted into and this line was slowing everything down on every pages using this model.

Related

Power BI Report with dynamic columns according to slicer selection

I have a table with the columns as below -
There are rows showing the allocation of various people under various project. The months (columns) can extend to Dec,20 and continue on from Jan,21 in the same pattern as above.
One Staff can be tagged to any number of projects in a given month.
Now I want to prepare a Power BI report on this in the format as below -
Staff ID, Project ID and End Date are the slicers to be present.
For the End Date slicer we can select options in the format of (MMM,YY) (eg - Jan,23). On the basis of this slicer I want to show the preceding 6 months of data, as portrayed by the above sample image.
I have tried using parameters but those have to specified for each combination so not usable for this data as this is going to increase over time.
Is there any way to do this or am I missing some simple thing in particular?
Any help on this will be highly appreciated.
Adding in the sample data as below -
Staff ID
Project ID
Jan,20
Feb,20
Mar,20
Apr,20
May,20
Jun,20
Jul,20
1
20
0
0
0
100
80
10
0
1
30
0
0
0
0
20
90
100
2
20
100
100
100
0
0
0
0
3
50
80
100
0
0
0
0
0
3
60
15
0
0
0
20
0
0
3
70
5
0
100
100
80
0
0

Power BI - get the graph out of the data set

It seems very simple but I can not get the graph to show the data I want.
So, I have got a lot of IDs with the end and start dates (LENGHT) and open items (OPEN). Each day has got availability (AVAIL) and there is nil used (USED) at day 1.
ID LENGTH OPEN USED AVAIL
1A 6 100 0 2400
I need to create the NEW_DAY column with count of the LENGHT. In this case the result would be
ID LENGTH NEW_DAY OPEN USED AVAIL
1A 6 1 100 0 2400
1A 6 2 100 0 2400
1A 6 3 100 0 2400
1A 6 4 100 0 2400
1A 6 5 100 0 2400
1A 6 6 100 0 2400
Note, I have hundreds of IDs so can not hard code it as 1A and needs to be dynamic.
I am not sure, but maybe this might help you.
If you add a blank query and add this expression:
= List.Repeat({1, 2}, 3)
you will get the first argument {1, 2} repeated three times.
When you separate your ID in a new column and pass this column to the code above (the same goes for the second argument) it might work.

Scikit-learn labelencoder: how to preserve mappings between batches?

I have 185 million samples that will be about 3.8 MB per sample. To prepare my dataset, I will need to one-hot encode many of the features after which I end up with over 15,000 features.
But I need to prepare the dataset in batches since the memory footprint exceeds 100 GB for just the features alone when one hot encoding using only 3 million samples.
The question is how to preserve the encodings/mappings/labels between batches?
The batches are not going to have all the levels of a category necessarily. That is, batch #1 may have: Paris, Tokyo, Rome.
Batch #2 may have Paris, London.
But in the end I need to have Paris, Tokyo, Rome, London all mapped to one encoding all at once.
Assuming that I can not determine the levels of my Cities column of 185 million all at once since it won't fit in RAM, what should I do?
If I apply the same Labelencoder instance to different batches will the mappings remain the same?
I also will need to use one hot encoding either with scikitlearn or Keras' np_utilities_to_categorical in batches as well after this. So same question: how to basically use those three methods in batches or apply them at once to a file format stored on disk?
I suggest using Pandas' get_dummies() for this, since sklearn's OneHotEncoder() needs to see all possible categorical values when .fit(), otherwise it will throw an error when it encounters a new one during .transform().
# Create toy dataset and split to batches
data_column = pd.Series(['Paris', 'Tokyo', 'Rome', 'London', 'Chicago', 'Paris'])
batch_1 = data_column[:3]
batch_2 = data_column[3:]
# Convert categorical feature column to matrix of dummy variables
batch_1_encoded = pd.get_dummies(batch_1, prefix='City')
batch_2_encoded = pd.get_dummies(batch_2, prefix='City')
# Row-bind (append) Encoded Data Back Together
final_encoded = pd.concat([batch_1_encoded, batch_2_encoded], axis=0)
# Final wrap-up. Replace nans with 0, and convert flags from float to int
final_encoded = final_encoded.fillna(0)
final_encoded[final_encoded.columns] = final_encoded[final_encoded.columns].astype(int)
final_encoded
output
City_Chicago City_London City_Paris City_Rome City_Tokyo
0 0 0 1 0 0
1 0 0 0 0 1
2 0 0 0 1 0
3 0 1 0 0 0
4 1 0 0 0 0
5 0 0 1 0 0

Data frames pandas python

I have a data frame that looks like this:
id age sallary
1 16 500
2 21 1000
3 25 3000
4 30 6000
5 40 25000
and a list of ids that I would like to ignore [1,3,5]
how can I get a data frame that will contain all the remaining rows: 2,4.
Big thanks for every one.
Call isin and negate the result using ~:
In [42]:
ignore_ids=[1,3,5]
df[~df.id.isin(ignore_ids)]
Out[42]:
id age sallary
1 2 21 1000
3 4 30 6000

Django query aggregation

Imagine a number guessing game where one person thinks of a number and another person has to guess it. The game is over if the correct number was guessed.
The models might look like this
class SecretNumber(models.Model):
number = models.IntegerField()
class Guess(models.Model)
secretnumber = models.Foreignkey(SecretNumber)
guess = models.IntegerField()
After having played four times, the database might look like this:
id number
==========
1 10
2 54
3 68
4 25
id secretnumber_id guess
=============================
1 1 50
2 1 30
3 1 10
4 2 99
5 2 60
6 2 54
7 3 1
8 3 68
9 4 73
10 4 34
11 4 86
12 4 51
13 4 25
As you can see, the guesser was very lucky: it took him 3, 3, 2 and 4 guesses. But that's just to keep this example short.
Now I need to come up with a query which will allow to display the following data:
Nb. guesses Count
=====================
2 1
3 2
4 1
A manual SQL statement would look something like this:
SELECT inner_count AS 'Nb. guesses', count(inner_count) AS 'Count' FROM (
SELECT secretnumber_id, count(id) AS inner_count FROM guess GROUP BY secretnumber_id
) GROUP BY inner_count
I thought about annotating an annotation, but this seems not to be possible.
Any ideas?
If you're using django (ie models instead of classes), you want to use the QuerySet aggregate functions
e.g.
from django.db.models import Count
guesses = Guess.objects.values('secretnumber').annotate(Count('secretnumber'))
This will give you a queryset with a list of objects, which have a secretnumber and a count value.