How to delete all database data with NHibernate?

How to delete all database data with NHibernate? - unit-testing

Is it possible to delete all data in the database using NHibernate. I want to do that before starting each unit test. Currently I drop my database and create it again but this is not acceptable solution for me.
==========================================================
Ok, here are the results. I am testing this on a database (Postgre). I will test CreateSchema(1), DanP solution(2) and apollodude217 solution(3). I run the tests 5 times with each method and take the average time.
Round 1 - 10 tests
(1) - ~26 sec
(2) - 9,0 sec
(3) - 9,3 sec
Round 2 - 100 tests
(1) - Come on, I will not do that on my machine
(2) - 12,6 sec
(3) - 18,6 sec
I think that it is not necessary to test with more tests.

I'm using the SchemaExport class and recreate the schema before each test. This is almost like dropping the database, but it's only dropping and recreating the tables. I assume that deleting all data from each table is not faster then this, it could even be slower.
Our unit tests are usually running on Sqlite in memory, this is very fast. This database exists only as long as the connection is open, so the whole database is recreated for each test. We're switching to Sqlserver by changing the build configuration.

Personally, I use a stored procedure to do this, but it may be possible with Executable HQL (see this post for more details: http://fabiomaulo.blogspot.com/2009/05/nh21-executable-hql.html )
Something along the lines of session.Delete("from object");

I do not claim this is faster, but you can do something like this for each mapped class:
// untested
var entities = MySession.CreateCriteria(typeof(MappedClass)).List<MappedClass>();
foreach(var entity in entities)
MySession.Delete(entity); // please optimize
This (alone) will not work in at least 2 cases:
When there is data that must be in your database when the app starts up.
When you have a type where the identity property's unsaved-value is "any".

A good alternative is having a backup of the initial DB state and restoring it when starting tests (this can be complex or not, depending on the DB)

Re-creating the database is a good choice, especially for unit testing. If the creation script is too slow you could take a backup of the database and use it to restore the DB to an initial state before each test.
The alternative would be to write a script that would drop all foreign keys in the database then delete/truncate all tables. This would not reset any autogenerated IDs or sequences however. This doesn't seem like an elegant solution and it is definitely more time consuming.
In any case, this is not something that should be done through an ORM, not just NHibernate.
Why do you reject the re-creation option? What are your requirements? Is the schema too complex? Does someone else design the database? Do you want to avoid file fragmentation?

Another solution might be to create a stored procedure that wipes the data. In your test set up or instantiate method run the stored procedure first.
However I am not sure if this is quicker than any of the other methods as we don't know the size of database and number of rows likely to be deleted. Also I would not recommend deploying this stored procedure to the live server for safety purposes!
HTH

Related

Database unit test

I am hoping to get some advice on a unit test I am writing for to test some db entries.
The function I am testing seeds the database if no records are found.
func Seed(db *gorm.DB) {
var data []Data
db.Find(&data)
if len(data) == 0 {
// do seed default data
}
}
What I can't quite seem to get going is the test for that if len test. I am using a test db so I can nuke it whenever so it is not an issue if I just need to force an empty DB on the function.
The function itself works and I just want to make sure I get that covered.
Any advice would be great.
Thanks!

It really depends, there are so many ways of addressing this based on your risk level and the amount of time you want to invest to mitigate those risks.
You could write a unit test that asserts your able to detect and act on users logic (ie seeding when empty and ignoring when full) without any database.
If you would like to test the logic as well as your programs ability to speak to mysql correctly through the gorm library you could:
Have a test where you call Seed with no users in the DB, after calling it your test could select from Users and make sure there are the expected entries created from len(users) == 0 conditional
Have a test where the test creates a single entry and calls Seed, after which asserting that underlying tables are empty.
It can get more complicated. If Seed is selecting a subset of data than your test could insert 2 users, one that qualifies and one that doesnt', and make sure that no new users are Seeded.

What's more efficient? Read and Write If... or always write to db?

I have a database table, that has a column, which is being updated frequently (relatively).
The question is:
Is it more efficient to avoid always writing to the database, by reading the object first (SELECT ... WHERE), and comparing the values, to determine if an update is even necessary
or always just issue an update (UPDATE ... WHERE) without checking what's the current state.
I think that the first approach would be more hassle, as it consists of two DB operations, instead of just one, but we could also avoid an unnecessary write.
I also question if we should even think about this, as our db will most likely not reach the 100k records in this table anytime soon, so even if the update would be more costly, it wouldn't be an issue, but please correct me if I'm wrong.
The database is PostgreSQL 9.6

It will avoid I/O load on the database if you only perform the updates that are necessary.
You can include the test with the UPDATE, like in
UPDATE mytable
SET mycol = 'avalue'
WHERE id = 42
AND mycol <> 'avalue';
The only downside is that triggers will not be called unless the value really changes.

C++ Builder - Multithreaded database update

I use ADO components in C++ Builder and I need to add about 200 000 records to my MS Access database. If I add those records one by one it takes a lot of time so I wanted to use threads. Each thread would create a TADOTable, connect to a database and insert it's own rows. But, when running the application it is even slower then using only one thread!
So, how to do it? I need to add many records to my Access database but want to avoid one-by-one insertion. A code would be useful.
Thank you.

First of all multithreading will not increase the speed of inserts. It will slow it down because of context switching and stuff. What you need is the way to have bulk inserts , that is sending multiple rows in a single transaction
Try searching for bulk inserts in acesss tables. There is a lot of information there.

Unit Testing & Primary Keys

I am new to Unit Testing and think I might have dug myself into a corner.
In your Unit Tests, what is the better way to handle primary keys?
Hopefully an example will paint some context. If create several instances of an object (Lets' say Person).
My unit test is to test the correct relationships are being created.
My code is to create Homer, he children Bart and Lisa. He also has a friend Barney, Karl & Lenny.
I've seperated my data layer with an Interface. My preference is to keep the primary key simple. Eg On Save, Person.ProductID = new Random().Next(10000); instead of say Barney.PersonID = 9110 Homer.PersonID = 3243 etc.
It doesn't matter what the primary key is, it just needs to be unique.
Any thoughts???
EDIT:
Sorry I haven't made it clear. My project is setup to use Dependency Injection. The data layer is totally separate. The focus of my question is, what is practical?

I have a class called "Unique" which produces unique objects (strings, integers, etc). It makes sure they're unique per-test by keeping a internal static counter. That counter value is incremented per key generated, and included in the key somehow.
So when I'm setting up my test
var Foo = {
ID = Unique.Integer()
}
I like this as it communicates that the value is not important for this test, just the uniqueness.
I have a similar class 'Some' that does not guarantee uniqueness. I use it when I need an arbitrary value for a test. Its useful for enums and entity objects.
None of these are threadsafe or anything like that, its strictly test code.

There are several possible corners you may have dug yourself into that could ultimately lead to the question that you're asking.
Maybe you're worried about re-using primary keys and overwriting or incorrectly loading data that's already in the database (say, if you're testing against a dev database as opposed to a clean test database). In that case, I'd recommend you set up your unit tests to create their records' PKs using whatever sequence a normal application would or to test in a clean, dedicated testing database.
Maybe you're concerned about the efficacy of your code with PKs beyond a simple 1,2,3. Rest assured, this isn't something one would typically test for in a straightforward application, because most of it is outside the concern of your application: generating a number from a sequence is the DB vendor's problem, keeping track of a number in memory is the runtime/VM's problem.
Maybe you're just trying to learn what the best practice is for this sort of thing. I would suggest you set up the database by inserting records before executing your test cases using the same facilities that your application itself will use to insert records; presumably your application code will rely on a database-vended sequence number for PKs, and if so, use that. Finally, after your test cases have executed, your tests should roll back any changes they made to the database to ensure the test is idempotent over multiple executions. This is my sorry attempt of describing a design pattern called test fixtures.

Consider using GUIDs. They're unique across space and time, meaning that even if two different computers generated them at the same exact instance in time, they will be different. In other words, they're guaranteed to be unique. Random numbers are never good, there is a considerable risk of collision.
You can generate a Guid using the static class and method:
Guid.NewGuid();
Assuming this is C#.
Edit:
Another thing, if you just want to generate a lot of test data without having to code it by hand or write a bunch of for loops, look into NBuilder. It might be a bit tough to get started with (Fluent methods with method chaining aren't always better for readability), but it's a great way to create a huge amount of test data.

Why use random numbers? Does the numeric value of the key matter? I would just use a sequence in the database and call nextval.

The essential problem with database unit testing is that primary keys do not get reused. Rather, the database creates a new key each time you create a new record, even if you delete the record with the original key.
There are two basic ways to deal with this:
Read the generated Primary Key, from the database and use it in your tests, or
Use a fresh copy of the database each time you test.
You could put each test in a transaction and roll the transaction back when the test completes, but rolling back transactions doesn't always work with Primary Keys; the database engine will still not reuse keys that have been generated once (in SQL Server anyway).

When a test executes against a database through another piece of code it ceases to be an unit test. It is called an "integration test" because you are testing the interactions of different pieces of code and how they "integrate" together. Not that it really matters, but its fun to know.
When you perform a test, the following things should occur:
Begin a db transaction
Insert known (possibly bogus) test items/entities
Call the (one and only one) function to be tested
Test the results
Rollback the transaction
These things should happen for each and every test. With NUnit, you can get away with writing step 1 and 5 just once in a base class and then inheriting from that in each test class. NUnit will execute Setup and Teardown decorated methods in a base class.
In step 2, if you're using SQL, you'll have to write your queries such that they return the PK numbers back to your test code.
INSERT INTO Person(FirstName, LastName)
VALUES ('Fred', 'Flintstone');
SELECT SCOPE_IDENTITY(); --SQL Server example, other db vendors vary on this.
Then you can do this
INSERT INTO Person(FirstName, LastName, SpouseId)
VALUES('Wilma', 'Flintstone', #husbandId);
SET #wifeId = SCOPE_IDENTITY();
UPDATE Person SET SpouseId = #wifeId
WHERE Person.Id = #husbandId;
SELECT #wifeId;
or whatever else you need.
In step 4, if you use SQL, you have to re-SELECT your data and test the values returned.
Steps 2 and 4 are less complicated if you are lucky enough to be able to use a decent ORM like (N)Hibernate (or whatever).

Memcache db models to make search more efficient

I need to set up some kind of e-store with search functionality.
For every search request I got to query structure like this:
product:
-name
-tags
--tag
-ingredients
--ingredient
---tags
----tag
---options
----option
-----option details
-variants
--variant
---tags
----tag
---options
----option measure
----value
---price
Now imagine number of queries... Database is normalized (2nd level I guess).
It seems to me that one obvious solution here is to store each fetched model result set (product set, ingredient set, attribute set, tag set etc.) in memory for a very long time (products and its attributes updated not so often and only by admin) and make query from there.
So what do you think? Is there a better way to reduce db queries count?
Another option I thought about is to use sphinx, but I don't need full-text search at all, just exact matches with tag-like fields.
Thank you in advance!

On my Google App Engine app I normally move things from the datastore to memcache and work with them there since querying for the data can take a lot of time. MemCache, in my case, returns the data and has less load on CPU than accessing the data which can go through a number of queries until it gets what it is looking for.
I would recommend setting a long timeout on your memcache so that memcache doesnt flush it more often than you are expecting. I think the maximum timout is up to 1 month but normally setting it for a couple days will suffice.
You can always add code to flush the memcache if the data for a product has been updated so that you do the DB hit again but only once this time

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js