Djanjo Unique Constraint - Upload BUT Skip Duplicates

Djanjo Unique Constraint - Upload BUT Skip Duplicates - django

I have a django unique constraint where im using the django admin site to import .csv files. The constraint is working as expected but I would like to just skip over the duplicates and still add the valid records. Is there a method to get this behavior?
def data_upload(request):
template = "data_upload.html"
data = ScanData.objects.all()
prompt = {
'order': 'Order of the CSV should be CVE, CVSS, Risk, Host, Hostname, Project_Assigned, Component, Owner, Environment, Location, Notes, Protocol, Port, Name, Synopsis, Description, Solution, Plugin_Output',
'scandata': data
}
if request.method == "GET":
return render(request, template, prompt)
csv_file = request.FILES['file']
if not csv_file.name.endswith('.csv'):
messages.error(request, 'THIS IS NOT A CSV FILE')
data_set = csv_file.read().decode('UTF-8')
io_string = io.StringIO(data_set)
next(io_string)
for column in csv.reader(io_string, delimiter=','):
_, created = ScanData.objects.update_or_create(
CVE=column[0],
CVSS=column[1],
Risk=column[2],
Host=column[3],
Hostname=column[4],
Project_Assigned=column[5],
Component=column[6],
Owner=column[7],
Environment =column[8],
Location=column[9],
Notes=column[10],
Protocol=column[11],
Port=column[12],
Name=column[13],
Synopsis=column[14],
Description=column[15],
Solution=column[16],
Plugin_Output=column[17],
)
context = {}
return render(request, template, context)

You can work with .bulk_create(…) [Django-doc] and set the ignore_conflicts=True parameter. As the documentation says:
On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it).
You thus make a list of your model objects (but you do not save these yet), and then you use .bulk_create(list_of_objects, ignore_conflicts=True). This thus looks like:
m1 = MyModel(field1=value11, field2=value12)
m2 = MyModel(field1=value21, field2=value22)
m3 = MyModel(field1=value31, field2=value32)
MyModel.objects.bulk_create(
[m1, m2, m3],
ignore_conflicts=True
)

Related

Is PostgreSQL (via ElephantSQL) a much slower database than Django's SQLite, and what to do about it?

I am building a Django webapp including a view, which can upload data into the database through a csv-import. Each import contains around 2,000 rows and 9 columns with DecimalFields and CharFields. So far I've been using Django's SQLite Database and each upload took me 1 min max. I switched to PostgreSQL (hosted via ElephantSQL) and now the upload takes at least 10 minutes. I've read in some posts that SQLite is faster than PostgreSQL but I was not expecting anything of this magnitude. Is there a way to speed up the upload process in PostgreSQL? I thought one reason for the low speed might be that I am using ElephantSQL's free Tiny Turtle Plan, but if i understand correctly the non-free plans differ only in terms of the max size of the database but not its speed? See also here https://www.elephantsql.com/plans.html
Might it be a solution to have PostgreSQL installed locally instead of using a cloud provider? Is there anything else I can optimize to speed up the process?
my model:
class Testdata3(models.Model):
key = models.CharField(max_length=100, primary_key=True)
mnemonic = models.CharField(max_length=50)
assetclass = models.CharField(max_length=50)
value = models.DecimalField(max_digits=255, decimal_places=25)
performance = models.DecimalField(max_digits=255, decimal_places=25)
performance_exccy = models.DecimalField(max_digits=255, decimal_places=25)
performance_abs = models.DecimalField(max_digits=255, decimal_places=25)
performance_abs_exccy = models.DecimalField(max_digits=255, decimal_places=25)
date = models.DateField()
def __str__(self):
return self.key
my view:
def file_upload(request):
template = "upload.html"
prompt = {
'order': 'Order of the CSV should be "placeholder_1", "placeholder_2", "placeholder_3" '
}
if request.method == "GET":
return render(request, template, prompt)
csv_file = request.FILES['file']
if not csv_file.name.endswith('.csv'):
messages.error(request, 'This is not a csv file')
data_set = csv_file.read().decode('UTF-8')
io_string = io.StringIO(data_set)
#Ignores header row by jumping to next row
next(io_string)
for column in csv.reader(io_string, delimiter=';', quotechar="|"):
# Check if csv-row is empty, if true jump to next iteration/row
if all(elem == "" for elem in column):
next
else:
_, created = Testdata3.objects.update_or_create(
key = column[0],
defaults = {
'key' : column[0],
# Get everything after the date part in the primary key
'mnemonic': re.findall(r'AMCS#[0-9]*(.*)', column[0])[0],
# Create datetime object from a string
'date' : datetime.datetime.strptime(column[6], '%d/%m/%Y'),
'assetclass' : column[10],
'value' : column[16],
'performance' : column[19],
'performance_abs' : column[20],
'performance_abs_exccy' : column[30],
'performance_exccy' : column[31],
}
)
context = {}
return render(request, template, context)

I don't think so. I guess there is some problem with your service provider, or the CSV file you are importing is very large. I use AWS RDS with Postgres and that is fast enough. It is nothing related to SQLite vs Postgres. Also, it can be because of your disk's IO speed which can be very high for SSD and high-end machines.

"form.populate_by returns" ERROR:'list' object has no attribute

I am creating a view function to edit the database using a wtform, I want to populate the form with information held on the database supplied by a differente form, My problem is the query that provides the details
I have read the manual https://wtforms.readthedocs.io/en/stable/crash_course.html
and the following question Python Flask-WTF - use same form template for add and edit operations
but my query does not seem to supply the correct format of data
datatbase model:
class Sensors(db.Model):
id = db.Column(db.Integer, primary_key=True)
sensorID = db.Column(db.String, unique=True)
name = db.Column(db.String(30), unique=True)
form model:
class AddSensorForm(FlaskForm):
sensorID = StringField('sensorID', validators=[DataRequired()])
sensorName = StringField('sensorName', validators=[DataRequired()])
submit = SubmitField('Register')
view function:
#bp.route('/sensors/editsensor/<int:id>', methods=('GET', 'POST'))
#login_required
def editsensor(id):
edit = [(s.sensorID, s.sensorName) for s in db.session.\
query(Sensors).filter_by(id=id).all()]
form = AddSensorForm(obj=edit)
form.populate_obj(edit)
if form.validate_on_submit():
sensors = Sensors(sensorID=form.sensorID.data, sensorName=form.sensorNa$
db.session.add(sensors)
db.session.commit()
shell code for query:
from homeHeating import db
from homeHeating import create_app
app = create_app()
app.app_context().push()
def editsensor(id):
edit = [(s.sensorID, s.sensorName) for s in db.session.query(Sensors).filter_by(id=id).all()]
print(edit)
editsensor(1)
[('28-0000045680fde', 'Boiler input')]
I expect that the two form fields will be populated with the in formation concerning the sensor called by its 'id'
but I get this error
File "/home/pi/heating/homeHeating/sensors/sensors.py", line 60, in
editsensor
form.populate_obj(edit)
File "/home/pi/heating/venv/lib/python3.7/site-
packages/wtforms/form.py", line 96, in populate_obj
Open an interactive python shell in this
framefield.populate_obj(obj, name)
File "/home/pi/heating/venv/lib/python3.7/site-
packages/wtforms/fields/core.py", line 330, in populate_obj
setattr(obj, name, self.data)
AttributeError: 'list' object has no attribute 'sensorID'
The error indicates that it wants 2 parts for each field "framefield.populate_obj(obj, name) mine provides only one the column data but not the column name, "sensorID"
If i hash # out the line "edit = ..." then there are no error messages and the form is returned but the fields are empty. So I want the form to be returned with the information in the database, filled in so that i can modify the name or the sensorID and then update the database.
I hope that this is clear
Warm regards
paul.
ps I have followed the instruction so the ERROR statement is only the part after "field.populate_by".

You are trying to pass a 1-item list to your form.
Typically, when you are selecting a single record based on the primary key of your model, use Query.get() instead of Query.filter(...).all()[0].
Furthermore, you need to pass the request data to your form to validate it on submit, and also to pre-fill the fields when the form reports errors.
Form.validate_on_submit will be return True only if your request method is POST and your form passes validation; it is the step where your form tells you "the user provided syntactically correct information, now you may do more checks and I may populate an existing object with the data provided to me".
You also need to handle cases where the form is being displayed to the user for the first time.
#bp.route('/sensors/editsensor/<int:id>', methods=('GET', 'POST'))
#login_required
def editsensor(id):
obj = Sensors.query.get(id) or Sensors()
form = AddSensorForm(request.form, obj=obj)
if form.validate_on_submit():
form.populate_obj(obj)
db.session.add(obj)
db.session.commit()
# return response or redirect here
return redirect(...)
else:
# either the form has errors, or the user is displaying it for
# the first time (GET)
return render_template('sensors.html', form=form, obj=obj)

django model and csv file zip feature lack

I fixed a lot with your help and I guess we come to last problem;
If the line from the csv file not in the my django model database than django mixes up all so that; csv line and django database are no longer going in correct order so all mixed.
To prevent this problem I added if queryset count function within the loop to raise an error message or tried also pushing another default value but nothing worked. What you would suggest to prevent this sync problem ?
for instance in RFP.objects.filter(FP_Item=query):
if RFP.objects.filter(FP_Item=query).count() >= 1:
instances.append(instance)
else:
messages.success(request, "ERROR")
For reference the whole code:
with open(path, encoding='utf-8') as f:
data = csv.reader(f, delimiter='|')
for row in data:
line = row[0]
lines.append(line)
query = line
for instance in FP.objects.filter(FP_Item=query):
if FP.objects.filter(FP_Item=query).count() <= 1:
instances.append(instance)
else:
messages.success(request, "ERROR")
pair = zip(lines, instances)
context = {'pair': pair,
}
return render(request, 'check_fp.html', context)

You can't use filter queryset and for if else loops.
You need to use get function with try and except to catch empty instance.
try:
instance = FP.objects.get(FP_Item=query)
instances.append(instance)
except FP.DoesNotExist:
instance = ["Check"]
instances.append(instance)
pair = zip(lines, instances)

Django bulk create for non-repetitive entries

I want to insert data from an Excel file to the database, but I want to insert only non-repetitive ones.
I wrote this code, but the if statement is always False!
def ca_import(request):
uploadform=UploadFileForm(None)
if request.method == 'POST' :
uploadform = UploadFileForm(request.POST, request.FILES)
if uploadform.is_valid():
file = uploadform.cleaned_data['docfile']
workbook = openpyxl.load_workbook(filename=file, read_only=True)
# Get name of the first sheet and then open sheet by name
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
data = []
try:
for row in worksheet.iter_rows(row_offset=1): # Offset for header
stockname =StocksName()
if (StocksName.objects.filter(name=row[0].value).count()<1): #???
stockname.name=row[0].value
data.append(stockname)
StocksName.objects.bulk_create(data)
messages.success(request,"Successful" ,extra_tags="saveexcel")
except :
messages.error(request,_('Error'),extra_tags="excelerror")
return render(request, 'BallbearingSite/excelfile.html',{'uploadform':uploadform})
Any suggestion to solve it?

If you data has a unique id then you can use get_or_create() or update_or_create() instead of bulk_create()
Otherwise you will have to write the logic to check if each line already exists in your model.

mongoengine know when to delete document

New to django. I'm doing my best to implement CRUD using Django, mongodb, and mongoengine. I'm able to query the database and render my page with the correct information from the database. I'm also able to change some document fields using javascript and do an Ajax POST back to the original Django View class with the correct csrf token.
The data payload I'm sending back and forth is a list of each Document Model (VirtualPageModel) serialized to json (each element contains ObjectId string along with the other specific fields from the Model.)
This is where it starts getting murky. In order to update the original document in my View Class post function I do an additional query using the object id and loop through the dictionary items, setting the respective fields each time. I then call save and any new data is pushed to the Mongo collection correctly.
I'm not sure if what I'm doing to update existing documents is correct or in the spirit of django's abstracted database operations. The deeper I get the more I feel like I'm not using some fundamental facility earlier on (provided by either django or mongoengine) and because of this I'm having to make things up further downstream.
The way my code is now I would not be able to create a new document (although that's easy enough to fix). However what I'm really curious about is how I would know when to delete a document which existed in the initial query, but was removed by the user/javascript code? Am I overthinking things and the contents of my POST should contain a list of ObjectIds to delete (sounds like a security risk although this would be an internal tool.)
I was assuming that my View Class might maintain either the original document objects (or simply ObjectIds) it queried and I could do my comparisions off of that set, but I can't seem to get that information to persist (as a class variable in VolumeSplitterView) from its inception to when I received the POST at the end.
I would appreciate if anyone could take a look at my code. It really seems like the "ease of use" facilities of Django start to break when paired with Mongo and/or a sufficiently complex Model schema which needs to be directly available to javascript as opposed to simple Forms.
I was going to use this dev work to become django battle-hardened in order to tackle a future app which will be much more complicated and important. I can hack on this thing all day and make it functional, but what I'm really interested in is anyone's experience in using Django + MongoDB + MongoEngine to implement CRUD on a Database Schema which is not vary Form-centric (think more nested metadata).
Thanks.
model.py: uses mongoengine Field types.
class MongoEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, VirtualPageModel):
data_dict = (o.to_mongo()).to_dict()
if isinstance(data_dict.get('_id'), ObjectId):
data_dict.update({'_id': str(data_dict.get('_id'))})
return data_dict
else:
return JSONEncoder.default(self, o)
class SubTypeModel(EmbeddedDocument):
filename = StringField(max_length=200, required=True)
page_num = IntField(required=True)
class VirtualPageModel(Document):
volume = StringField(max_length=200, required=True)
start_physical_page_num = IntField()
physical_pages = ListField(EmbeddedDocumentField(SubTypeModel),
default=list)
error_msg = ListField(StringField(),
default=list)
def save(self, *args, **kwargs):
print('In save: {}'.format(kwargs))
for k, v in kwargs.items():
if k == 'physical_pages':
self.physical_pages = []
for a_page in v:
tmp_pp = SubTypeModel()
for p_k, p_v in a_page.items():
setattr(tmp_pp, p_k, p_v)
self.physical_pages.append(tmp_pp)
else:
setattr(self, k, v)
return super(VirtualPageModel, self).save(*args, **kwargs)
views.py: My attempt at a view
class VolumeSplitterView(View):
#initial = {'key': 'value'}
template_name = 'click_model/index.html'
vol = None
start = 0
end = 20
def get(self, request, *args, **kwargs):
self.vol = self.kwargs.get('vol', None)
records = self.get_records()
records = records[self.start:self.end]
vp_json_list = []
img_filepaths = []
for vp in records:
vp_json = json.dumps(vp, cls=MongoEncoder)
vp_json_list.append(vp_json)
for pp in vp.physical_pages:
filepath = get_file_path(vp, pp.filename)
img_filepaths.append(filepath)
data_dict = {
'img_filepaths': img_filepaths,
'vp_json_list': vp_json_list
}
return render_to_response(self.template_name,
{'data_dict': data_dict},
RequestContext(request))
def get_records(self):
return VirtualPageModel.objects(volume=self.vol)
def post(self, request, *args, **kwargs):
if request.is_ajax:
vp_dict_list = json.loads(request.POST.get('data', []))
for vp_dict in vp_dict_list:
o_id = vp_dict.pop('_id')
original_doc = VirtualPageModel.objects.get(id=o_id)
try:
original_doc.save(**vp_dict)
except Exception:
print(traceback.format_exc())

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Djanjo Unique Constraint - Upload BUT Skip Duplicates - django

Related

Is PostgreSQL (via ElephantSQL) a much slower database than Django's SQLite, and what to do about it?

"form.populate_by returns" ERROR:'list' object has no attribute

django model and csv file zip feature lack

Django bulk create for non-repetitive entries

mongoengine know when to delete document

Categories

Resources