Error Importing .csv file into pgsql - django

I am attempting to upload data into my postgres database using an excel file that I have converted into a .csv file. My .csv file is a simple test file, it contains only one row of data, all of which have cells that are formatted to be text and the titles of which match the columns in my data model.
The data model I am attempting to upload data to looks like:
class Publication(models.Model):
title = models.CharField(max_length=200)
journalists = models.ManyToManyField(Journalist, blank=True)
email = models.EmailField(blank=True)
tags = models.ManyToManyField(Tag, blank=True, related_name="publications")
url = models.URLField(blank=True)
notes = models.CharField(max_length=500, blank=True)
image_url = models.URLField(blank=True)
media_kit_url = models.URLField(blank=True)
When I go into psql and enter the command:
\copy apricot_app_publication from '~/Desktop/sampleDBPubs.csv';
I get back the following error:
ERROR: invalid input syntax for integer: "title,url,email,media_kit_url,notes,tags,image_url,journalists"
CONTEXT: COPY apricot_app_publication, line 1, column id: "title,url,email,media_kit_url,notes,tags,image_url,journalists"
I looked at this question Importing csv file into pgsql which addresses the same issue, but the answer given was that the error means that "you're trying to input something into an integer field which is not an integer...", but my data model does not have any integer fields, so I do not know how to solve the issue.
Can anyone suggest what might be causing the issue?

I just answered my own question. There is an automatically generated id column that is created behind the scenes on anything that has a many to many relationship in my Django app. Thus, the database is expecting an integer to be added from the .csv file, which is the id, but my .csv file does not have an id column and I do not want to add one as want the id's to continue to be auto-generated.
To get around this, I just have to specify which columns my file is going to provide data for in parenthesis after the table name:
EX:
\copy apricot_app_tag(title) FROM '~/Desktop/Sample_Database_Files/tags.csv' with csv header
Where 'title' is the only column in the tag table I want to update.

Related

ReportLab get_FOO_display() Not Working On A List

Ive created a form which users fill out, then it uses reportlab to create a pdf of their answers.
It works well except for a charfield (preferred_topics) which contains a list. Data is saved like this:
['ANI', 'EDU', 'ENV']
I think that might be a problem as id exect it to save the data like this:
[['ANI'], ['EDU'], ['ENV']]
However it works fine on the website.
So to print human readable data to the pdf im using get_FOO_display(), but this doesnt work for preferred_topics. If i call (user.personalinformation.get_preferred_topics_display() i get:
AttributeError at /enrolment/final_question/
'PersonalInformation' object has no attribute 'get_preferred_topics_display'
Here is my other relevant code:
model.py
preferred_topics = models.CharField(max_length=200, default='')
utils.py
# generate pdf
def generate_pdf(request):
# get user
user = request.user
# data that will be printed to the pdf
page_contents = [
['Personal Information'],
['Name:', '%s %s' %(user.personalinformation.first_name, user.personalinformation.surname)],
['E-mail:', '%s' %(user.email)],
['Gender:', '%s' %(user.personalinformation.get_gender_display())],
# this field is causing grief
['Preferred Topics:', '%s' %(user.personalinformation.preferred_topics)]
]
forms.py
TOPICS = (
('ANI', 'Animals'),
('ART', 'Art'),
('COM', 'Communication'),
('CRI', 'Crime'),
)
preferred_topics = forms.MultipleChoiceField(choices=TOPICS, required=False, widget=forms.CheckboxSelectMultiple())
Im expecting to be told that the data is being saved wrongly in my db, but dont know how to change it, and wanted confirmation before i started changing previously working stuff as im sure i will break currently working things in the process.
SUMMARY - i want to use user.personalinformation.get_preferred_topics_display() but its not working and i suspect its because the data is being saved wrongly in the db but would like confirmation before i go wrecking stuff.
Thank you.
You are saving multiple choices as single string which is not good idea as you would have hard time filtering and working with this kind of data ( rather use Arrayfield of choices)
There is no get_FOO_display() without choices on models field so you would need to write your own converter
# create dict of options
options = dict((y,x) for y,x in PersonalInformationForm.TOPICS)
# evaluate string to list
selected_choices = ast.literal_eval(testobj2.preferred_topics)
# find choices in dict
selected values = [option.get(key) for key in selected_choices]

Effective way to use CSV of DNA information as database in Django project

I am thinking about the design of the following Django project.
In this project, I have a CSV file (4 columns, 500 rows) which I am not sure how to handle as the database.
the CSV looks like this
The data contains 500 codes where each code has 3 scores: f1, f2, f3.
The website goal: 1. to get the input of the user of what feature columns data he is interested in and in which order.
e.g: 2Xf2 1Xf1 (there are only 3 feature columns: f1, f2, f3 and 'code' column)
2. to generate an output of codes that contains the highest-ranking codes for the required features in the required order.
so for our input: 2Xf2 1Xf1
the output will be the following string: [#1 rankning code f2 column] [#2 ranking code f2 column] [#1 rankning code f1 column]
I was thinking about creating a database with 3 columns: f1, f2, f3 where in each column there are codes in descending order, so if the user wants 5 codes from f1 I will take the first 5.
My question is:
How to handle the database in a simple way for developing and maintaining it (not looking for efficiency) that will use Django tools properly?
My first direction was using MySql and django models to map the data.
I would appreciate any thoughts or tips for learning Django as I am using the official documentation that provides example of "pool" website which is not what I need.
Thanks!
Here's how I'd design the models as I understand your problem.
class Field(models.Model):
# This model represents each of the individual fX fields, so f1, f2, f3
name = models.CharField(max_length=15, unique=True)
class Code(models.Model):
# This model represents the values in the first column, code.
name = models.CharField(max_length=255, unique=True)
class FieldData(models.Model):
code = models.ForeignKey(Code, on_delete=models.CASCADE)
field = models.ForeignKey(Field, on_delete=models.CASCADE)
value = models.IntegerField()
class Meta:
unique_together = [('code', 'field')]
Then when you process a CSV, you'd:
Read the header row and use Field.objects.get_or_create for each of the columns following Code.
Read the body to create Code instance and FieldData instances for each Field column in the row.
The model design would have to change if you need history for the FieldData instances.
Thank you for your answer!
Using this model might be a great way to map my csv data!
I have 3 questions:
1) What is an easy way to actually parse the CSV to go row by row and add those objects?
2) Why wouldn't the following model be simpler (while knowning that there will not be any change in the fields):
class Code(models.model):
code = models.CharField(max_length=30, unique=True)
f1_score = models.IntegerField()
f2_score = models.IntegerField()
f3_score = models.IntegerField()
3) How would you extract each time the highest-ranking codes from each field? e.g 5 best ranking on f1

Is there a way to avoid django to upload same name file again by altering its name

I'm trying to develop a simple model form by Django to upload pdf files. The form is based on a model. Every time, user upload a file a database table entry would be created with the file path (including filename), uploaded user name and time Etc.
when I upload the same file again, Django is uploading the same file by altering its name (poster-proposal.pdf ->poster-proposal_IomFZQM.pdf). It is also creating another entry in the database table.
I want Django to give the user a warning when he is trying to upload an already existing file saying (a file with the same name is already existing) or something like that and not to upload the duplicate file.
I followed this post,post 1 but it says it does not prevent Django from uploading the file.
I followed this method post 2, but I'm new to Django and it seems complicated. I believe for newer Django versions there should be an easier way to address this issue.
I added unique = True to FileField.It did not work
models.py
class files(models.Model):
repo_id = models.ForeignKey(Repository, on_delete = models.CASCADE)
username = models.CharField(db_column='username',max_length = 45)
date = models.DateTimeField(auto_now_add=True, db_column = 'date')
file = models.FileField(upload_to='documents/', db_column = 'file', unique = True)
indicator_name =models.CharField(db_column = 'indicator_name',max_length = 100)
username = models.CharField(db_column='username',max_length = 45)
Any idea would be highly appreciated. Thanks
The simplest way is to search for the name and then upload the file:
# Note that file name depends on your upload_to path.
# Either you should include it in the search or you have to use something like:
# filter(file_contains="filename") which might return results that you don't want
filename = "documents/" + filename_you_want_to_upload
files = files.objects.filter(file=filename)
if files.count() > 0:
# A file with that name exists.
# Return some error or ...
else:
# There is no file with that name.
# Upload the file and save it to database.

Import CSV to Postgresql

I am working on a Django based web application.
I am going to import a csv to postgresql database, which has over 100,000 lines, and use it as a database for the Django application.
Here, I've faced two problems.
The field name includes special characters like this:
%oil, %gas, up/down, CAPEX/Cash-flow, D&C Cape,...
1st, How should I define the field name of Postgresql database to import csv?
2nd, After import, I am going to get data through django model. Then how can I define the Django model variable name that includes special characters?
Of course, It's possible if I change the column name of the csv which includes special characters, but I don't want to change it. I want to import original csv without any changes.
Is there any solution to solve this problem?
There are no special characters in your example. At least not any that would be problematic from the python or database point of view.
First of, avoid dubious field names, especially in finance. %oil can mean either oil share, oil margin or something else. Define a model with meaningful names like
class FinancialPeformanceData(models.Model):
oil_share = models.DecimalField(max_digits=5, decimal_places=2)
gas_share = models.DecimalField(max_digits=5, decimal_places=2)
growth = models.DecimalField(max_digits=10, decimal_places=2)
capex_to_cf = models.DecimalField(max_digits=7, decimal_places=2)
... etc.
Then you use copy to import data from CSV as #Hambone suggested. You don't need headers in CSV files.
def import_csv(request):
file = './path/to/file'
with open(file, 'rb') as csvfile:
with closing(connections['database_name_from_settings'].cursor()) as cursor:
cursor.copy_from(
file=csvfile,
table='yourapp_financialperformancedata', #<-- table name from db
sep='|', #<-- delimiter
columns=(
'oil_share',
'gas_share',
'growth',
'capex_to_cf',
... etc.
),
)
return HttpResponse('Done!')

Django model instance from foreign key

I am reading Excel using xlrd. One of the columns has the Bank name, which is linked to vehicle model via Foreign Key. When xlrd finishes reading a row, it should save that record to vehicle table. However getting the actual pk value and error that Vehicles.bank must a Banks instance.
After checking dozens of questions related to this issue, I found this one the most similar one, but still I am not getting the expected result.
The relevant Vehicle model section is as follows:
class Vehicles(models.Model):
stock = models.CharField(max_length=10, blank=False, db_index=True)
vin = models.CharField(max_length=17, blank=False, db_index=True)
sold = models.DateField(blank=True, null=True, db_index=True)
origin = models.CharField(max_length=10, blank=False, db_index=True)
bank = models.ForeignKey('banks.Banks', db_column='bank', null=True)
I am using python 2.7, django 1.5.4 and Postgresql 9.2.5. Dbshell utility does show that banks table has a Foreign contraint referring to vehicles table, via banks(id).
Since I am not using a form for this particular part, I think it does not matter whether I use a ModelForm or not.
Current scenario: Excel file has FBANK as the cell value. There is an existing record in banks table that contains FBANK in its name column, id=2. The python line is:
def bank(value):
return Banks.objects.get(name=value).id
With the above line, error is:
Cannot assign "2": "Vehicles.bank" must be a "Banks" instance.
If I remove the ".id" at the end, error is then:
Banks matching query does not exist.
Appreciate your help.
Ricardo
When saving Vehicle you need to pass Banks instance with corresponding bank name. See example, I suppose that you have all data in corresponding cells from 0 to 4, replace with your own cells numbers:
def get_bank_instance(bank_name):
try:
bank = Banks.objects.get(name=bank_name)
except Banks.DoesNotExist:
return None
return bank
# reading excel file here, we have list of cells in a row
for cell in cells:
bank = get_bank_instance(cell[4])
if bank:
# get other cells values to be saved in Vehicles
stock, vin, sold, origin = cell[0], cell[1], cell[2], cell[3]
Vehicles.create(bank=bank, stock=stock, vin=vin, sold=sold, origin=origin)
You also can create save instance of Vehicles passing bank id directly:
b_id = Banks.objects.get(name=bank_name).id
Vehicles.create(bank_id=b_id, stock=stock, vin=vin, sold=sold, origin=origin)
Update:
create() is a built-in model method to create and save into database model instance. If you are asking about "Add a classmethod on the model class" in Django docs, this is not the case, because you are just using built-in method for the model. For some cases you can use custom method for creating new models, but I would do so if I had to pass a lot of default attributes for the new instance.
Also, it's possible to create and save new model instance by using save():
bank_instance = Banks.objects.get(name=bank_name)
vehicle = Vehicles()
vehicle.bank = bank_instance
vehicle.stock = stock
vehicle.vin = vin
vehicle.sold = sold
vehicle.origin = origin
# without save() data will not be saved to db!
vehicle.save()
It's quite long and you always need to remember to call .save(), so it's a good idea to use .create()
You should be returning a Banks instance when you want to assign it to a Vehicle model instance; so you should not have the .id part at the end of the return value for your bank() method.
Secondly, if it says that it isn't finding the Banks instance, then you should check the value of your value parameter to see what it is and try to manually do a Banks.objects.get from your database. If it can't be found then there is probably another reason for this other than using the Django ORM incorrectly.
When you are assigning instances to other instances in Django, for example setting the Bank for a Vehicle it must be an instance of the model and not the id or pk value of model; this is stated in the other StackOverflow question that you reference in your question.