django model and csv file zip feature lack - django

I fixed a lot with your help and I guess we come to last problem;
If the line from the csv file not in the my django model database than django mixes up all so that; csv line and django database are no longer going in correct order so all mixed.
To prevent this problem I added if queryset count function within the loop to raise an error message or tried also pushing another default value but nothing worked. What you would suggest to prevent this sync problem ?
for instance in RFP.objects.filter(FP_Item=query):
if RFP.objects.filter(FP_Item=query).count() >= 1:
instances.append(instance)
else:
messages.success(request, "ERROR")
For reference the whole code:
with open(path, encoding='utf-8') as f:
data = csv.reader(f, delimiter='|')
for row in data:
line = row[0]
lines.append(line)
query = line
for instance in FP.objects.filter(FP_Item=query):
if FP.objects.filter(FP_Item=query).count() <= 1:
instances.append(instance)
else:
messages.success(request, "ERROR")
pair = zip(lines, instances)
context = {'pair': pair,
}
return render(request, 'check_fp.html', context)

You can't use filter queryset and for if else loops.
You need to use get function with try and except to catch empty instance.
try:
instance = FP.objects.get(FP_Item=query)
instances.append(instance)
except FP.DoesNotExist:
instance = ["Check"]
instances.append(instance)
pair = zip(lines, instances)

Related

How to stop scrapy from paginating the pages with repetitive records?

I tried to crawl a website whit pagination by scrapy, and it was ok! But, as this website gets update and new posts are added to this website, I need to run my code every day, so each time I run my code, it crawls all the pages. Fortunately, I'm using django and in my django model, I used
unique=True
So there are no duplicate records in my database, but I want to stop the pagination crawling as soon as it finds a duplicate record. how should I do this?
here is my spider snippet code:
class NewsSpider(scrapy.Spider):
name = 'news'
allowed_domains = ['....']
start_urls = ['....']
duplicate_record_flag = False
def parse(self, response, **kwargs):
next_page = response.xpath('//a[#class="next page-numbers"]/#href').get()
news_links = response.xpath('//div[#class="content-column"]/div/article/div/div[1]/a/#href').getall()
for link in news_links:
if self.duplicate_record_flag:
print("Closing Spider ...")
raise CloseSpider('Duplicate records found')
yield scrapy.Request(url=link, callback=self.parse_item)
if next_page and not self.duplicate_record_flag:
yield scrapy.Request(url=next_page, callback=self.parse)
def parse_item(self, response):
item = CryptocurrencyNewsItem()
...
try:
CryptocurrencyNews.objects.get(title=item['title'])
self.duplicate_record_flag = True
return
except CryptocurrencyNews.DoesNotExist:
item.save()
return item
I used a class variable (duplicate_record_flag) to have access to it in all functions and also to know that when I am facing a duplicate record?
The problem is that the spider doesn't stop in real time when the first duplicate record is founded! For more clarification: In the for iteration in the parse function, if we have 10 news_links and in the first iteration we find a duplication record, our flag wouldn't change at that moment and if we print the flag in the for loop, it will print 10 "False" values for each iteration!!! While it should be changed to "True" in the first iteration!
in other words, the crawler cralws all the links in each page in each parse!
How can I prevent this?
If you want to stop the spider after meeting certain criteria, you can raise the CloseSpider
if some_logic_to_check_duplicates:
raise CloseSpider('Duplicate records found')
# This message shows up in the logs
If you just want to skip the duplicate item, you can raise a DropItem exception from the pipeline. Example code from Scrapy docs:
class DuplicatesPipeline:
def __init__(self):
self.ids_seen = set()
def process_item(self, item, spider):
adapter = ItemAdapter(item)
if adapter['id'] in self.ids_seen:
raise DropItem(f"Duplicate item found: {item!r}")
else:
self.ids_seen.add(adapter['id'])
return item

Djanjo Unique Constraint - Upload BUT Skip Duplicates

I have a django unique constraint where im using the django admin site to import .csv files. The constraint is working as expected but I would like to just skip over the duplicates and still add the valid records. Is there a method to get this behavior?
def data_upload(request):
template = "data_upload.html"
data = ScanData.objects.all()
prompt = {
'order': 'Order of the CSV should be CVE, CVSS, Risk, Host, Hostname, Project_Assigned, Component, Owner, Environment, Location, Notes, Protocol, Port, Name, Synopsis, Description, Solution, Plugin_Output',
'scandata': data
}
if request.method == "GET":
return render(request, template, prompt)
csv_file = request.FILES['file']
if not csv_file.name.endswith('.csv'):
messages.error(request, 'THIS IS NOT A CSV FILE')
data_set = csv_file.read().decode('UTF-8')
io_string = io.StringIO(data_set)
next(io_string)
for column in csv.reader(io_string, delimiter=','):
_, created = ScanData.objects.update_or_create(
CVE=column[0],
CVSS=column[1],
Risk=column[2],
Host=column[3],
Hostname=column[4],
Project_Assigned=column[5],
Component=column[6],
Owner=column[7],
Environment =column[8],
Location=column[9],
Notes=column[10],
Protocol=column[11],
Port=column[12],
Name=column[13],
Synopsis=column[14],
Description=column[15],
Solution=column[16],
Plugin_Output=column[17],
)
context = {}
return render(request, template, context)
You can work with .bulk_create(…) [Django-doc] and set the ignore_conflicts=True parameter. As the documentation says:
On databases that support it (all but Oracle), setting the ignore_conflicts parameter to True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values. Enabling this parameter disables setting the primary key on each model instance (if the database normally supports it).
You thus make a list of your model objects (but you do not save these yet), and then you use .bulk_create(list_of_objects, ignore_conflicts=True). This thus looks like:
m1 = MyModel(field1=value11, field2=value12)
m2 = MyModel(field1=value21, field2=value22)
m3 = MyModel(field1=value31, field2=value32)
MyModel.objects.bulk_create(
[m1, m2, m3],
ignore_conflicts=True
)

"form.populate_by returns" ERROR:'list' object has no attribute

I am creating a view function to edit the database using a wtform, I want to populate the form with information held on the database supplied by a differente form, My problem is the query that provides the details
I have read the manual https://wtforms.readthedocs.io/en/stable/crash_course.html
and the following question Python Flask-WTF - use same form template for add and edit operations
but my query does not seem to supply the correct format of data
datatbase model:
class Sensors(db.Model):
id = db.Column(db.Integer, primary_key=True)
sensorID = db.Column(db.String, unique=True)
name = db.Column(db.String(30), unique=True)
form model:
class AddSensorForm(FlaskForm):
sensorID = StringField('sensorID', validators=[DataRequired()])
sensorName = StringField('sensorName', validators=[DataRequired()])
submit = SubmitField('Register')
view function:
#bp.route('/sensors/editsensor/<int:id>', methods=('GET', 'POST'))
#login_required
def editsensor(id):
edit = [(s.sensorID, s.sensorName) for s in db.session.\
query(Sensors).filter_by(id=id).all()]
form = AddSensorForm(obj=edit)
form.populate_obj(edit)
if form.validate_on_submit():
sensors = Sensors(sensorID=form.sensorID.data, sensorName=form.sensorNa$
db.session.add(sensors)
db.session.commit()
shell code for query:
from homeHeating import db
from homeHeating import create_app
app = create_app()
app.app_context().push()
def editsensor(id):
edit = [(s.sensorID, s.sensorName) for s in db.session.query(Sensors).filter_by(id=id).all()]
print(edit)
editsensor(1)
[('28-0000045680fde', 'Boiler input')]
I expect that the two form fields will be populated with the in formation concerning the sensor called by its 'id'
but I get this error
File "/home/pi/heating/homeHeating/sensors/sensors.py", line 60, in
editsensor
form.populate_obj(edit)
File "/home/pi/heating/venv/lib/python3.7/site-
packages/wtforms/form.py", line 96, in populate_obj
Open an interactive python shell in this
framefield.populate_obj(obj, name)
File "/home/pi/heating/venv/lib/python3.7/site-
packages/wtforms/fields/core.py", line 330, in populate_obj
setattr(obj, name, self.data)
AttributeError: 'list' object has no attribute 'sensorID'
The error indicates that it wants 2 parts for each field "framefield.populate_obj(obj, name) mine provides only one the column data but not the column name, "sensorID"
If i hash # out the line "edit = ..." then there are no error messages and the form is returned but the fields are empty. So I want the form to be returned with the information in the database, filled in so that i can modify the name or the sensorID and then update the database.
I hope that this is clear
Warm regards
paul.
ps I have followed the instruction so the ERROR statement is only the part after "field.populate_by".
You are trying to pass a 1-item list to your form.
Typically, when you are selecting a single record based on the primary key of your model, use Query.get() instead of Query.filter(...).all()[0].
Furthermore, you need to pass the request data to your form to validate it on submit, and also to pre-fill the fields when the form reports errors.
Form.validate_on_submit will be return True only if your request method is POST and your form passes validation; it is the step where your form tells you "the user provided syntactically correct information, now you may do more checks and I may populate an existing object with the data provided to me".
You also need to handle cases where the form is being displayed to the user for the first time.
#bp.route('/sensors/editsensor/<int:id>', methods=('GET', 'POST'))
#login_required
def editsensor(id):
obj = Sensors.query.get(id) or Sensors()
form = AddSensorForm(request.form, obj=obj)
if form.validate_on_submit():
form.populate_obj(obj)
db.session.add(obj)
db.session.commit()
# return response or redirect here
return redirect(...)
else:
# either the form has errors, or the user is displaying it for
# the first time (GET)
return render_template('sensors.html', form=form, obj=obj)

Django bulk create for non-repetitive entries

I want to insert data from an Excel file to the database, but I want to insert only non-repetitive ones.
I wrote this code, but the if statement is always False!
def ca_import(request):
uploadform=UploadFileForm(None)
if request.method == 'POST' :
uploadform = UploadFileForm(request.POST, request.FILES)
if uploadform.is_valid():
file = uploadform.cleaned_data['docfile']
workbook = openpyxl.load_workbook(filename=file, read_only=True)
# Get name of the first sheet and then open sheet by name
first_sheet = workbook.get_sheet_names()[0]
worksheet = workbook.get_sheet_by_name(first_sheet)
data = []
try:
for row in worksheet.iter_rows(row_offset=1): # Offset for header
stockname =StocksName()
if (StocksName.objects.filter(name=row[0].value).count()<1): #???
stockname.name=row[0].value
data.append(stockname)
StocksName.objects.bulk_create(data)
messages.success(request,"Successful" ,extra_tags="saveexcel")
except :
messages.error(request,_('Error'),extra_tags="excelerror")
return render(request, 'BallbearingSite/excelfile.html',{'uploadform':uploadform})
Any suggestion to solve it?
If you data has a unique id then you can use get_or_create() or update_or_create() instead of bulk_create()
Otherwise you will have to write the logic to check if each line already exists in your model.

Help understanding a Django view

I am trying to follow the code listed on https://github.com/alex/django-ajax-validation/blob/master/ajax_validation/views.py
I have been able to understand a small chunk of it. I have added comments stating my understanding of what is happening.
I would really appreciate some assistance on questions I listed in comments next to the lines I couldn't quite follow.
def validate(request, *args, **kwargs):
# I thing it is some sort of initializations but I cannot really understand what's happening
form_class = kwargs.pop('form_class')
defaults = {
'data': request.POST
}
extra_args_func = kwargs.pop('callback', lambda request, *args, **kwargs: {})
kwargs = extra_args_func(request, *args, **kwargs)
defaults.update(kwargs)
form = form_class(**defaults)
if form.is_valid(): #straightforward, if there is no error then the form is valid
data = {
'valid': True,
}
else:
# if we're dealing with a FormSet then walk over .forms to populate errors and formfields
if isinstance(form, BaseFormSet): #I cannot really understand what is BaseFromSet
errors = {}
formfields = {}
for f in form.forms: # I am guessing that this is for when there are multiple form submitted for validation
for field in f.fields.keys(): # I think he is looping over all fields and checking for error. what does add_prefix () return? and what is formfields[]?
formfields[f.add_prefix(field)] = f[field]
for field, error in f.errors.iteritems():
errors[f.add_prefix(field)] = error
if form.non_form_errors():
errors['__all__'] = form.non_form_errors() # what is the '__all__'?
else:
errors = form.errors
formfields = dict([(fieldname, form[fieldname]) for fieldname in form.fields.keys()])
# if fields have been specified then restrict the error list
if request.POST.getlist('fields'): # I am having a hard time understanding what this if statement does.
fields = request.POST.getlist('fields') + ['__all__']
errors = dict([(key, val) for key, val in errors.iteritems() if key in fields])
final_errors = {} # here the author of this code totally lost me.
for key, val in errors.iteritems():
if '__all__' in key:
final_errors[key] = val
elif not isinstance(formfields[key].field, forms.FileField):
html_id = formfields[key].field.widget.attrs.get('id') or formfields[key].auto_id
html_id = formfields[key].field.widget.id_for_label(html_id)
final_errors[html_id] = val
data = {
'valid': False or not final_errors,
'errors': final_errors,
}
json_serializer = LazyEncoder() # Why does the result have to be returned in json?
return HttpResponse(json_serializer.encode(data), mimetype='application/json')
validate = require_POST(validate) # a decorator that requires a post to submit
LazyEncoder
class LazyEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, Promise):
return force_unicode(obj)
return obj
form_class = kwargs.pop('form_class')
This is simply pulling the keyword argument, form_class, that was passed in via the URL conf.
(r'^SOME/URL/$', 'ajax_validation.views.validate',
{'form_class': ContactForm}, # this keyword argument.
'contact_form_validate')
BaseFormSet is simply the formset class doing the work behind the scenes. When you don't know, search the source! grep -ri "baseformset" . It's an invaluable tool.
Take a look at at django.forms.formsets to see how formset_factory produces new "formset" classes based on the BaseFormSet, hence the factory part!
I am guessing that this is for when there are multiple form submitted for validation
Yes, that's exactly what a formset is for (dealing with multiple forms)
I think he is looping over all fields and checking for error. what does add_prefix () return? and what is formfields[]?
Yes, that would be looping through the field names.
add_prefix() is for prefixing form field names with a specific form. Because a formset repeats form elements multiple times, each field needs a unique prefix, such as 0-field1, 1-field1, etc.
formfields is just an empty dictionary defined a few lines above.
what is the 'all'?
__all__ is defined at the top of django.forms.forms
NON_FIELD_ERRORS = '__all__'
It's just what non field specific errors (such as constraints across 2 fields) are stored under in the errors dictionary as opposed to errors[fieldname].
I am having a hard time understanding what this if statement does.
The author has left a note:
# if fields have been specified then restrict the error list
if request.POST.getlist('fields'):
It's checking if you specified any specific fields to validate in your URLConf, this is not django but ajax_validation.
You can see that he's overwriting his errors dictionary based on only the fields specified, thus passing on the validation only for those fields.
errors = dict([(key, val) for key, val in errors.iteritems() if key in fields])
here the author of this code totally lost me.
The author has mapped a custom errors and fields dictionary to specific field names with prefixes, (as opposed to the usual FormSet with each form having its own errors dictionary, unaware of the formset itself) which he presumably uses in the AJAX response to validate all fields.
Normally, you can iterate over a formset and go through the errors on a form by form basis, but not so if you need to validate all of them through ajax.
The line pulling html_id should be straight forward most of the time, but it's there because form widgets CAN add interesting things to the end of the ID's based on whether or not the widget is a radio select for example.
From source comments :
# RadioSelect is represented by multiple <input type="radio"> fields,
# each of which has a distinct ID. The IDs are made distinct by a "_X"
# suffix, where X is the zero-based index of the radio field. Thus,
# the label for a RadioSelect should reference the first one ('_0').
Why does the result have to be returned in json?
Because it's an ajax request and javascript easily eats json.
2- could you go through these lines of code...
extra_args_func = kwargs.pop('callback', lambda request, *args, **kwargs: {})
Either return a keyword argument named 'callback' (which if passed in, is supposed to be a function that accepts request and return a dictionary), and if it wasn't, return a lambda function that only returns an empty dictionary.
I'm not sure what the specific use is for the extra context. You could use it to run arbitrary snippets of code without modifying or subclassing ajax_validation...
It might help you to run this code, and put a debugger breakpoint in somewhere so you can step through and examine the variables and methods. You can do this by simply putting this line where you want to break:
import pdb; pdb.set_trace()
and you will be dumped into the debugger in the console.