Import CSV to Postgresql - django

I am working on a Django based web application.
I am going to import a csv to postgresql database, which has over 100,000 lines, and use it as a database for the Django application.
Here, I've faced two problems.
The field name includes special characters like this:
%oil, %gas, up/down, CAPEX/Cash-flow, D&C Cape,...
1st, How should I define the field name of Postgresql database to import csv?
2nd, After import, I am going to get data through django model. Then how can I define the Django model variable name that includes special characters?
Of course, It's possible if I change the column name of the csv which includes special characters, but I don't want to change it. I want to import original csv without any changes.
Is there any solution to solve this problem?

There are no special characters in your example. At least not any that would be problematic from the python or database point of view.
First of, avoid dubious field names, especially in finance. %oil can mean either oil share, oil margin or something else. Define a model with meaningful names like
class FinancialPeformanceData(models.Model):
oil_share = models.DecimalField(max_digits=5, decimal_places=2)
gas_share = models.DecimalField(max_digits=5, decimal_places=2)
growth = models.DecimalField(max_digits=10, decimal_places=2)
capex_to_cf = models.DecimalField(max_digits=7, decimal_places=2)
... etc.
Then you use copy to import data from CSV as #Hambone suggested. You don't need headers in CSV files.
def import_csv(request):
file = './path/to/file'
with open(file, 'rb') as csvfile:
with closing(connections['database_name_from_settings'].cursor()) as cursor:
cursor.copy_from(
file=csvfile,
table='yourapp_financialperformancedata', #<-- table name from db
sep='|', #<-- delimiter
columns=(
'oil_share',
'gas_share',
'growth',
'capex_to_cf',
... etc.
),
)
return HttpResponse('Done!')

Related

Django Model - special characters in field name

I'm creating model for my app. Unfortunately I'm working with measurements units like km/h, kg CO2/ton, heat content (HHV) - 30 different units at all. I don't know how to save it properly in django model or maybe in serializer to make it display proper unit name, including "/", " ", "(" in REST Responses. Also I will be importing data through django-import-export module so it should recognize excel columns which will be named like actual unit name.
For example:
class Units(models.Model):
km_h = models.FloatField(default=-1, null=True)
kg_co2ton = models.FloatField(default=-1, null=True)
and I would like to have this data available in the following form:
class Units(models.Model):
km/h = models.FloatField(default=-1, null=True)
kg co2/ton = models.FloatField(default=-1, null=True)
How to write model and/or serializer to make it work and look good?
For django-import-export, you can use the column_name of the Field class to declare the column name to match your Excel import spreadsheet:
class UnitsResource(resources.ModelResource):
km_h = Field(attribute='km_h', column_name='km/h')

Using Django ArrayField to Store Dates and Value

I'm currently having a hard time wrapping my head around Django's Array Field. What i'm hoping to do is have an array that looks something like this:
Price(close=[
[1/1/2018, 3.00],
[1/2/2018, 1.00],
])
It's basically an array that stores a date followed by a corresponding value tied to that date. However, thus far my model looks like this:
class Price(models.Model):
close = ArrayField(
models.DecimalField(max_digits=10, decimal_places=4),
size=365,
)
I am not certain how to create an array with two different types of fields, one DateTime, the other decimal. Any help would be much appreciated.
You can't mix types stored in the ArrayField. [1] I recommend you to change model schema (aka Database normalization [2]).
This is my suggestion:
from django.db import models
class Price(models.Model):
pass
class PriceItem(models.Model):
datetime = models.DateTimeField()
ammount = models.DecimalField(max_digits=10, decimal_places=4)
price = models.ForeignKey(Price, on_delete=models.CASCADE)
[1] https://stackoverflow.com/a/8168017/752142
[2] https://en.wikipedia.org/wiki/Database_normalization
It depends on how important it is to the model.
postgresql provides composite types
The generous contributor of psycopg2 (django's posgresql driver) is supporting it,
define this type in postgresql:
CREATE TYPE date_price AS (
start date,
float8 price
);
and using the methods described here to implement CompositeField
from django.db import models
from django.contrib.postgres.fields import ArrayField
from django.db import connection
from psycopg2.extras import register_composite
# register the composite
register_composite('date_price', connection.cursor().cursor)
# CompositeField implementation here . . . . . .
class DatePriceField(CompositeField):
'''
DatePriceField specifics
'''
pass
class Price(models.Model):
close = ArrayField(base_field=DatePriceField(), size=365,)
I am going to follow this route and update soon.

Bin a queryset using Django?

Let's say we have the following simplistic models:
class Category(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "categories"
class Status(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "status"
class Product(models.Model):
title = models.CharField(max_length=264)
description = models.CharField(max_length=264)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
price = models.DecimalField(max_digits=10)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
My aim is to get some statistics, like total products, total sales, average sales etc, based on which price bin each product belongs to.
So, the price bins could be something like 0-100, 100-500, 500-1000, etc.
I know how to use pandas to do something like that:
Binning column with python pandas
I am searching for a way to do this with the Django ORM.
One of my thoughts is to convert the queryset into a list and apply a function to get the apropriate price bin and then do the statistics.
Another thought which I am not sure how to impliment, is the same as the one above but just apply the bin function to the field in the queryset I am interested in.
There are three pathways I can see.
First is composing the SQL you want to use directly and putting it to your database with a modification of your models manager class. .objects.raw("[sql goes here]"). This answer shows how to define group with a simple function on the content - something like that could work?
SELECT FLOOR(grade/5.00)*5 As Grade,
COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1
Second is that there is no reason you can't move the queryset (with .values() or .values_list()) into a pandas dataframe or similar and then bin it, as you mentioned. There is probably a bit of an efficiency loss in terms of getting the queryset into a dataframe and then processing it, but I am not sure that it would certainly or always be bad. If its easier to compose and maintain, that might be fine.
The third way I would try (which I think is what you really want) is chaining .annotate() to label points with the bin they belong in, and the aggregate count function to count how many are in each bin. This is more advanced ORM work than I've done, but I think you'd start looking at something like the docs section on conditional aggregation. I've adapted this slightly to create the 'price_class' column first, with annotate.
Product.objects.annotate(price_class=floor(F('price')/100).aggregate(
class_zero=Count('pk', filter=Q(price_class=0)),
class_one=Count('pk', filter=Q(price_class=1)),
class_two=Count('pk', filter=Q(price_class=2)), # etc etc
)
I'm not sure if that 'floor' is going to work, and you may need 'expression wrapper' to ensure the push price_class into the write type of output_field. All the best.

Error Importing .csv file into pgsql

I am attempting to upload data into my postgres database using an excel file that I have converted into a .csv file. My .csv file is a simple test file, it contains only one row of data, all of which have cells that are formatted to be text and the titles of which match the columns in my data model.
The data model I am attempting to upload data to looks like:
class Publication(models.Model):
title = models.CharField(max_length=200)
journalists = models.ManyToManyField(Journalist, blank=True)
email = models.EmailField(blank=True)
tags = models.ManyToManyField(Tag, blank=True, related_name="publications")
url = models.URLField(blank=True)
notes = models.CharField(max_length=500, blank=True)
image_url = models.URLField(blank=True)
media_kit_url = models.URLField(blank=True)
When I go into psql and enter the command:
\copy apricot_app_publication from '~/Desktop/sampleDBPubs.csv';
I get back the following error:
ERROR: invalid input syntax for integer: "title,url,email,media_kit_url,notes,tags,image_url,journalists"
CONTEXT: COPY apricot_app_publication, line 1, column id: "title,url,email,media_kit_url,notes,tags,image_url,journalists"
I looked at this question Importing csv file into pgsql which addresses the same issue, but the answer given was that the error means that "you're trying to input something into an integer field which is not an integer...", but my data model does not have any integer fields, so I do not know how to solve the issue.
Can anyone suggest what might be causing the issue?
I just answered my own question. There is an automatically generated id column that is created behind the scenes on anything that has a many to many relationship in my Django app. Thus, the database is expecting an integer to be added from the .csv file, which is the id, but my .csv file does not have an id column and I do not want to add one as want the id's to continue to be auto-generated.
To get around this, I just have to specify which columns my file is going to provide data for in parenthesis after the table name:
EX:
\copy apricot_app_tag(title) FROM '~/Desktop/Sample_Database_Files/tags.csv' with csv header
Where 'title' is the only column in the tag table I want to update.

Flexible field list names in django models class

Instead of dynamically altering a models file by adding fields, very bad i've been told, i'm suppose to maintain a type of flexibility by having variable field list names(i think).
Thus, when an attribute is added to the database, this attribute can be accessed without the models file being altered.
I cant figure out how to create variable field list names in my models class though.
I'm having trouble sifting through reading materials to find a solution to my problem, and trial and era is 15hrs and counting.
Could some one point me in the right direction.
New Edit
Heres what im trying to achieve.
When an attribute is added, i add it to the table like this.
c = 'newattributename'
conn = mdb.connect('localhost', 'jamie', '########', 'website')
cursor = conn.cursor()
cursor.execute("alter table mysite_weightsprofile add column %s integer not null; SET #rank=0; UPDATE mysite_weightsprofile SET %s = #rank:=#rank+1 order by %s DESC;" % (c, c, a))
cursor.close()
conn.close()
Now, in my models class i have
class WeightsProfile(models.Model):
1attributes = models.IntegerField()
2attributes = models.IntegerField()
3attributes = models.IntegerField()
class UserProfile(WeightsProfile):
user = models.ForeignKey(User, unique=True)
aattributes = models.CharField()
battributes = models.CharField()
cattributes = models.CharField()
Now all i want to do is get access to the new attribute that was added in the table but not added to in the models file.
Does sberry2A have the right answer. I hope it is, it seems the simplest.
I might not be following what you are asking, but assuming you have some model, like Person, which will start out having some defined fields, but may have several more added in the future...
class Person(models.Model):
fname = models.CharField(max_length=255)
lname = models.CharField(max_length=255)
age = models.IntegerField()
# more fields to come
Then you could use a PersonAttribute model...
class PersonAttribute(models.Model):
name = models.CharField(max_length=32)
value = models.CharField(max_length=255)
Then you could just add a ManyToMany relationship field to your Person...
attributes = models.ManyToManyField(PersonAttribute)
Or something similar.
I don't really understand what it is you're trying to do, but South is a good system for handling changes to models. It makes migrations, so that it understands the changes you've made and knows how to change them in the database in a way that you can use for both development sites and production.
I don't understand what you're after either, JT, but I really doubt South (see #Dougal) is going to help you if what you want boils down to "Look at the relevant DB table to know what fields the model should have at read time. But not write time.". South is brilliant for evolving schemas/models, but not at runtime, and not inconsistently across rows/instances of models. And hacking models at runtime is definitely a world of hurt.
Indeed, Django's ORM isn't built for dynamic fields (at least for now) - it was built to abstract writing SQL and speed up dev against an RDBMS, not schemaless/NoSQL stuff.
Speaking of which, if someone landed me with a spec that effectively said "We don't know what fields the model will have to store" I'd suggest we try MongoDB for that data (alongside Postgres for trad relational data), probably via MongoEngine