I am trying to retrieve a single row from a table. This row contains filed that hold foreign keys into another table, which in turns is related to yet another table. I am trying to get just one row returned, yet, the problem is, it returns not only the row but ALL the objects that are jointly related to that table as well. As I have to deal with a fairly large amount of data, the returned object is very cumbersome as it contains all the related data as well. In some cases my script simply times out because there is just far too much data to grab.
My question is; is there a way to retrieve just a single record without the associated fluff with it? I am basically accessing the table via the entityManager from the repository, then trying to get my record by using the ->find($id) method.
I am sure this is something stupidly simple but I can't seem to figure this out. Thanks in advance for any help, it is much appreciated.
Doctrine 2 use "lazy loading", it means that the associated objects are not really retrieved from the database while you don't try to access them.
So the find($id) is just fine.
Related
I am a little confused about which is better for soft delete.
There are two ways for Soft Delete.
create table for deleted records.(In this way we will make copy
for the records in the table of deleted records, then delete it from its table)
create extra column called deleted,(In this way we will only change the status of this field to true , then at display records we will filter according to this extra field)
Also, I want to store the changes of the records after every update, So I think creating extra table is more suitable. What is your opinion?
I agree with #web-engineer, adding a nullable column with the datetime of when the row has been soft-deleted is the best. I used this ressource to do this.
And to answer the second part of your question, yes an extra table will be needed. There is a third party app named django-simple-history which handles it for you.
Best option is the second one, in your first example it's not a soft delete if your deleting it from the table - soft should be to modify the data in a minimal way. Leaving the row in place is the purpose of a soft-delete, this has the minimal effect on the data and will retain all attributes such as primary key index value and any internals you cant see that the database might use.
Your first option is far less succinct as it means duplicating data structures. A common approach is to add a "deleted_at" column (default to NULL), this positively identifies the record state.
In a Django project, I'm refreshing tens of thousands of lines of data from an external API on a daily basis. The problem is that since I don't know if the data is new or just an update, I can't do a bulk_create operation.
Note: Some, or perhaps many, of the rows, do not actually change on a daily basis, but I don't which, or how many, ahead of time.
So for now I do:
for row in csv_data:
try:
MyModel.objects.update_or_create(id=row['id'], defaults={'field1': row['value1']....})
except:
print 'error!'
And it takes.... forever! One or two lines a second, max speed, sometimes several seconds per line. Each model I'm refreshing has one or more other models connected to it through a foreign key, so I can't just delete them all and reinsert every day. I can't wrap my head around this one -- how can I cut down significantly the number of database operations so the refresh doesn't take hours and hours.
Thanks for any help.
The problem is you are doing a database action on each data row you grabbed from the api. You can avoid doing that by understanding which of the rows are new (and do a bulk insert to all new rows), Which of the rows actually need update, and which didn't change.
To elaborate:
grab all the relevant rows from the database (meaning all the rows that can possibly be updated)
old_data = MyModel.objects.all() # if possible than do MyModel.objects.filter(...)
Grab all the api data you need to insert or update
api_data = [...]
for each row of data understand if its new and put it in array, or determine if the row needs to update the DB
for row in api_data:
if is_new_row(row, old_data):
new_rows_array.append(row)
else:
if is_data_modified(row, old_data):
...
# do the update
else:
continue
MyModel.objects.bulk_create(new_rows_array)
is_new_row - will understand if the row is new and add it to an array that will be bulk created
is_data_modified - will look for the row in the old data and understand if the data of that row is changed and will update only if its changed
If you look at the source code for update_or_create(), you'll see that it's hitting the database multiple times for each call (either a get() followed by a save(), or a get() followed by a create()). It does things this way to maximize internal consistency - for example, this ensures that your model's save() method is called in either case.
But you might well be able to do better, depending on your specific models and the nature of your data. For example, if you don't have a custom save() method, aren't relying on signals, and know that most of your incoming data maps to existing rows, you could instead try an update() followed by a bulk_create() if the row doesn't exist. Leaving aside related models, that would result in one query in most cases, and two queries at the most. Something like:
updated = MyModel.objects.filter(field1="stuff").update(field2="other")
if not updated:
MyModel.objects.bulk_create([MyModel(field1="stuff", field2="other")])
(Note that this simplified example has a race condition, see the Django source for how to deal with it.)
In the future there will probably be support for PostgreSQL's UPSERT functionality, but of course that won't help you now.
Finally, as mentioned in the comment above, the slowness might just be a function of your database structure and not anything Django-specific.
Just to add to the accepted answer. One way of recognizing whether the operation is an update or create is to ask the api owner to include a last updated timestamp with each row (if possible) and store it in your db for each row. That way you only have to check for those rows where this timestamp is different from the one in api.
I faced an exact issue where I was updating every existing row and creating new ones. It took a whole minute to update 8000 odd rows. With selective updates, I cut down my time to just 10-15 seconds depending on how many rows have actually changed.
I think below code can do the same thing together instead of update_or_create:
MyModel.objects.filter(...).update()
MyModel.objects.get_or_create()
I am trying to write a program in C++ to read, manipulate, and update my database. I am having a problem inserting my data into mongo. So for my work flow, I get some type of request to update a document. I query the document, and update the data. I then try to do an update on the document.
I have a function that converts my class object to a BSONObj through a BSONObjBuilder. I seem to be having a problem with large arrays of sub-objects. For example, I have a field in my document called geo that looks like this:
geo: [{"postal": 10012},{"postal":10013},...,{"postal":90210}]
and is stored in C++ as:
std::vector<mongo::BSONObj> geo;
this field might have thousands of postal codes in it. When doing:
db.get()->update("db.collection",BSON("id"<<id_), BSON($set<<updateObj));
where updateObj is the obj I got from my BSONObjBuilder, nothing is updated in mongo. If I remove the geo field, everything is inserted.
I tried to just do
db.get()->update("db.collection",BSON("id"<<id_), BSON($set<<BSON("geo" << geo)));
thinking maybe it necessary to do separate queries due to the size of the obj and this also result in no update.
I was wondering if somehow I was hitting some sort of BSON size limit in C++.
The only reason I believe it is a size limit is because while trying to debug this problem, I tried to call updateObj.toString() in order to print out the object I was trying to insert and it threw an exception: Element extends past end of object. I assume that this means I hit some type of max size of an object/element.
Any insight into this problem will be greatly appreciated.
Thank you
I seemed to have figured it out. What happened was I got the geo field in one function, stored it in a vector and was using it in another. I did not used .Obj().copy() while storing the object in the vector, I just stored the .Obj() from the query results and when I went to insert I guess the invalid pointers blew up the BSONObj and caused an error.
I have a Qt QAbstractItemModel, and the underlying information is inside an sqlite database. I want to incrementally add rows from a database query to the model as they are needed. The database fetch may be slow, however. My problem is that beginInsertRows() needs to be called before the model is modified, but it needs to know how many rows will be added. I won't know that until after I do the query. This means I seem to have the following alternatives, all of which are unattractive.
Do two database queries: "SELECT COUNT(*) …" to get the number of result rows, then call beginInsertRows(), and do the real query. The downside, of course, is that a potentially expensive query has to be done twice.
Do my entire query, buffer the results, count the rows, then call beginInsertRows() and insert them into the model. The downside is all the extra buffering.
Call beginInsertRows()/endInsertRows() once for each row in the result set. This is going to cause a whole bunch of unnecessary view updates.
This seems like a general problem to me. Is there a general solution? For instance, is there a way to tell beginInsertRows() one thing and then change your mind?
Thanks.
I am inserting data into two tables, however I can not figure out (after hours of Googling) how to insert data into the second table after retrieving the new ID created after the first update?
I'm using <CFINSERT>.
use <CFQUERY result="result_name"> and the new ID will be available at result_name.generatedkey .. <cfinsert> and <cfupdate>, while easy and fast for simple jobs, they are pretty limited.
I have never used cfinsert myself, but this blog post from Ben Forta says you may not be able to use cfinsert if you need a generated key http://www.forta.com/blog/index.cfm/2006/10/3/Use-CFINSERT-And-CFUPDATE
Yes, I realize that blog post is old, but it doesn't appear much has changed.
Why not use a traditional INSERT statement wrapped in a <cfquery> tag?