Why am I able to insert invalid data into database? [duplicate] - django

This question already has answers here:
Sqlite column is getting assigned wrong data type value
(1 answer)
Invalid sqlite datatypes
(1 answer)
Closed 2 days ago.
This is my models.py file:
from django.db import models
# Create your models here.
class FxRates(models.Model):
business_date = models.DateField()
country = models.CharField(max_length=1000)
currency_code = models.CharField(max_length=3)
exchange_rate = models.FloatField()
class Meta:
db_table = "FxRates"
Using the "migrate" command I am able to create the FxRates database table in the db.sqlite3.
Now if I use Pandas to insert data into this database, no errors are thrown if for this line in my data.csv file:
24/01/2023,Bangladesh,BDT,ERR!
Shouldn't this have been impossible to insert, considering ERR! is not a number?
The Pandas commands I use are:
df = pandas.read_csv("/home/data.csv")
database_path = "<path_to_db.sqlite3>"
connection = sqlite3.connect(database_path)
df.to_sql("FxRates", connection, if_exists= "append", index=False)
This inserts all the data fine, but why does it let invalid data be inserted?

Related

How to delete 200,000 records with DJango? [duplicate]

This question already has answers here:
How to make Django QuerySet bulk delete() more efficient
(3 answers)
Closed 3 months ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Situation:
I have Model have a relation 1-1, sample:
class User(models.Model):
user_namme = models.CharField(max_length=40)
type = models.CharField(max_length=255)
created_at = models.DatetimeField()
...
class Book(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
And I have a around 200,000 records.
Languague: Python
Framework: Django
Database: Postgres
Question:
How can I delete 200,000 records above with minimal cost?
Solution I have tried:
user_ids = Users.objects.filter(type='sample', created_date__gte='2022-11-15 08:00', created_date__lt="2022-11-15 08:30").values_list('id',flat=True)[:200000] # Fetch 200,000 user ids.
for i, _ in enumerate(user_ids[:: 1000]):
with transaction.atomic():
batch_start = i * self.batch_size
batch_end = batch_start + self.batch_size
_, deleted = Users.objects.filter(id__in=user_ids[batch_start,batch_end]
With this solution, my server use arround:
600MB CPU
300MB RAM
Take more 15 minutes to finish workload.
I wonder do anyone have a better solution?
By first principles, nothing beats raw(Django query) SQL in terms of speed because it operates closest to the database!
cursor.execute(“DELETE FROM DB WHERE Column = %s”)
Or else you can do it by:
Variable = Model.objects.filter(variable=variable)
if Variable.exists():
Variable.delete()
Thanks, everyone. I have tried the solution with RawQuery
user_ids = Users.objects.filter(type='sample', created_date__gte='2022-11-15 08:00', created_date__lt="2022-11-15 08:30").values_list('id',flat=True)[:200000] # Fetch 200,000 user ids.
for i in range(0, 3):
user_ids_str = ""
for user_id in user_ids.iterator(chunk_size=5000):
user_ids_str += f"{user_id},"
query = f"""
DELETE FROM "user" WHERE "user"."id" IN ({user_ids_str});
DELETE FROM "book" WHERE "user"."id" IN ({user_ids_str});
"""
with transaction.atomic():
with connection.cursor() as c:
c.execute("SET statement_timeout = '10min';")
c.execute(query)
This one can remove 600000 records and take around 10 minutes.
And the server used around:
CPU: 50MB
RAM: 200MB
If you are using straight sql why not do a join on the user table with the date criteria to delete the books and then delete all the users using the created_date criteria? Let the database do all the work!
Even without writing the join
DELETE FROM "book" WHERE "user"."id" IN (select id from user where created_date >= '2022-11-15 08:00' and...)
DELETE FROM "user" WHERE created_date >= '2022-11-15 08:00' and...
would be better than what you have.

How can I covert sqlachemy query into list object? [duplicate]

This question already has answers here:
How to serialize SqlAlchemy result to JSON?
(37 answers)
Closed 1 year ago.
hi guys I am trying to convert my query into object but I am getting this error " 'User' object is not iterable"
Below are my codes.
#app.route('/users')
def users():
rows = db.session.query(User).first();
for row in rows:
data.append(list(row))# data.append([x for x in row])
return jsonify(data)
The code you have for querying
rows = db.session.query(User).first();
selects the first object found and returns it, else returns None as per docs
if there are multiple rows you are trying to query, use the .all() function as per docs
data = []
rows = db.session.query(User).all();
for row in rows:
data.append(row)
return jsonify(data)
this will fetch all the users and add it to the list
I was able to do this buy using flask mashmallow
ma = Marshmallow(app)
enter code here
class User(db.Model):
id = db.Column(db.Integer,primary_key=True)
name = db.Column(db.String(200),nullable=False)
email = db.Column(db.String(200),nullable=False)
password = db.Column(db.String(200),nullable=False)
class UserSchema(ma.Schema):
class Meta:
# Fields to expose
fields = ("email", "password","name")
# Smart hyperlinking
user_schema = UserSchema()
users_schema = UserSchema(many=True)
#app.route("/users/")
def users():
#row= db.session.query(User)
all_users = User.query.all()
results = users_schema.dump(all_users)
return jsonify(results)
#app.route("/users/<id>")
def user_detail(id):
user = User.query.get(id)
results = user_schema.dump(user)
return jsonify(results)

Django hitcount order_by("hit_count_generic__hits") gives error on PostgreSQL database

I was using django-hitcont to count the views on my Post model. I am trying to get the most viewed post in my ListView using this query objects.order_by('hit_count_generic__hits') and it is working fine on SQLite but on PostgreSQL, it is giving me this error :
django.db.utils.ProgrammingError: operator does not exist: integer = text LINE 1: ...R JOIN "hitcount_hit_count" ON ("posts_post"."id" = "hitcoun....
models.py
class Post(models.Model, HitCountMixin):
author = models.ForeignKey(User, related_name='authors', on_delete=models.CASCADE)
title = models.CharField('Post Title', max_length = 150)
description = models.TextField('Description', max_length=1000, blank = True)
date_posted = models.DateTimeField('Date posted', default = timezone.now)
date_modifed = models.DateTimeField('Date last modified', default = timezone.now)
document = models.FileField('Document of Post', upload_to='documents', \
validators=[FileExtensionValidator(allowed_extensions = ['pdf', 'docx']), validate_document_size] \
)
hit_count_generic = GenericRelation(
HitCount,
object_id_field='object_pk',
related_query_name='hit_count_generic_relation'
)
views.py
queryset = Post.objects.order_by('hit_count_generic__hits')
I found this issue on Github related to the problem, but I am still not able to figure out the mentioned workaround.
When comparing different types (in this example integer and text), equals operator throws this exception. To fix that, convert HitCount model pk field to integer and you are good to go. To do that, you need to create and apply migration operation. Django is a really good framework to handle this kind of operations. You just need to check values are not null and are "convertable" to integer. Just change the field type and run two commands below.
python manage.py makemigrations
python manage.py migrate
Before updating your model, I highly recommend you to take a backup in case of failure. This is not an easy operation but you can follow the these links to understand what is going on during this the process.
migrations dump and restore initial data
If you don't care the data on table, just drop table and create a brand new migration file and recreate table.

django update_or_create gets "duplicate key value violates unique constraint "

Maybe I misunderstand the purpose of Django's update_or_create Model method.
Here is my Model:
from django.db import models
import datetime
from vc.models import Cluster
class Vmt(models.Model):
added = models.DateField(default=datetime.date.today, blank=True, null=True)
creation_time = models.TextField(blank=True, null=True)
current_pm_active = models.TextField(blank=True, null=True)
current_pm_total = models.TextField(blank=True, null=True)
... more simple fields ...
cluster = models.ForeignKey(Cluster, null=True)
class Meta:
unique_together = (("cluster", "added"),)
Here is my test:
from django.test import TestCase
from .models import *
from vc.models import Cluster
from django.db import transaction
# Create your tests here.
class VmtModelTests(TestCase):
def test_insert_into_VmtModel(self):
count = Vmt.objects.count()
self.assertEqual(count, 0)
# create a Cluster
c = Cluster.objects.create(name='test-cluster')
Vmt.objects.create(
cluster=c,
creation_time='test creaetion time',
current_pm_active=5,
current_pm_total=5,
... more simple fields ...
)
count = Vmt.objects.count()
self.assertEqual(count, 1)
self.assertEqual('5', c.vmt_set.all()[0].current_pm_active)
# let's test that we cannot add that same record again
try:
with transaction.atomic():
Vmt.objects.create(
cluster=c,
creation_time='test creaetion time',
current_pm_active=5,
current_pm_total=5,
... more simple fields ...
)
self.fail(msg="Should violated integrity constraint!")
except Exception as ex:
template = "An exception of type {0} occurred. Arguments:\n{1!r}"
message = template.format(type(ex).__name__, ex.args)
self.assertEqual("An exception of type IntegrityError occurred.", message[:45])
Vmt.objects.update_or_create(
cluster=c,
creation_time='test creaetion time',
# notice we are updating current_pm_active to 6
current_pm_active=6,
current_pm_total=5,
... more simple fields ...
)
count = Vmt.objects.count()
self.assertEqual(count, 1)
On the last update_or_create call I get this error:
IntegrityError: duplicate key value violates unique constraint "vmt_vmt_cluster_id_added_c2052322_uniq"
DETAIL: Key (cluster_id, added)=(1, 2018-06-18) already exists.
Why didn't wasn't the model updated? Why did Django try to create a new record that violated the unique constraint?
The update_or_create(defaults=None, **kwargs) has basically two parts:
the **kwargs which specify the "filter" criteria to determine if such object is already present; and
the defaults which is a dictionary that contains the fields mapped to values that should be used when we create a new row (in case the filtering fails to find a row), or which values should be updated (in case we find such row).
The problem here is that you make your filters too restrictive: you add several filters, and as a result the database does not find such row. So what happens? The database then aims to create the row with these filter values (and since defaults is missing, no extra values are added). But then it turns out that we create a row, and that the combination of the cluster and added already exists. Hence the database refuses to add this row.
So this line:
Model.objects.update_or_create(field1=val1,
field2=val2,
defaults={
'field3': val3,
'field4': val4
})
Is to semantically approximately equal to:
try:
item = Model.objects.get(field1=val1, field2=val2)
except Model.DoesNotExist:
Model.objects.create(field1=val1, field2=val2, field3=val3, field4=val4)
else:
item = Model.objects.filter(
field1=val1,
field2=val2,
).update(
field3 = val3
field4 = val4
)
(but the original call is typically done in a single query).
You probably thus should write:
Vmt.objects.update_or_create(
cluster=c,
creation_time='test creaetion time',
defaults = {
'current_pm_active': 6,
'current_pm_total': 5,
}
)
(or something similar)
You should separate your field:
Fields that should be searched for
Fields that should be updated
for example:
If I have the model:
class User(models.Model):
username = models.CharField(max_length=200)
nickname = models.CharField(max_length=200)
And I want to search for username = 'Nikolas' and update this instance nickname to 'Nik'(if no User with username 'Nikolas' I need to create it) I should write this code:
User.objects.update_or_create(
username='Nikolas',
defaults={'nickname': 'Nik'},
)
see in https://docs.djangoproject.com/en/3.1/ref/models/querysets/
This is already answered well in the above.
To be more clear the update_or_create() method should have **kwargs as those parameters on which you want to check if that data already exists in DB by filtering.
select some_column from table_name where column1='' and column2='';
Filtering by **kwargs will give you objects. Now if you wish to update any data/column of those filtered objects, you should pass them in defaults param in update_or_create() method.
so lets say you found an object based on a filter now the default param values are expected to be picked and updated.
and if there's no matching object found based on the filter then it goes ahead and creates an entry with filters and the default param passed.

Flask SQLAlchemy customer orm? [duplicate]

This question already has an answer here:
SQLAlchemy: Convert column value back and forth between internal and database format
(1 answer)
Closed 4 years ago.
My model is like this:
class A(db.Model):
__tablename__ = 'tablename'
id = Column(Integer, primary_key=True)
users = Column(String(128))
But most of time, I use the users field as list.
In java ORM I can declared this field as list just by tell the framework how to map string to list and list to string.
So I wonder there is any way to do this in Flask.
You can create custom type with TypeDecorator
import sqlalchemy.types as types
class MyList(types.TypeDecorator):
impl = types.String
def process_bind_param(self, value, dialect):
return ','.join(value)
def process_result_value(self, value, dialect):
return value.split(',')
class A(db.Model):
__tablename__ = 'tablename'
id = Column(Integer, primary_key=True)
users = Column(MyList)
a = A(users=['user1', 'user2'])
db.session.add(a)
db.session.commit()
A.query.first().users
>> [u'user1', u'user2']