Change schema attribute cardinality based on another attribute - clojure

I have the following pseudo schemas:
A)
-- Cost-schedule: FRE494
-- Periodic: false
-- Type: Fixed
-- Value: 70.00
-- CCY GBP
B)
-- Cost-schedule: GHK999
-- Periodic: true
-- Period start: 01/04/2015
-- Period end: 30/04/2015
-- Type: Promise
-- Filled: false
-- Value: 0.00
-- CCY: GBP
I am trying to avoid any kind of nasty hierarchy with a super class "Cost-Schedule" with sub-classes "Periodic" and "One-off". Firstly, I am using clojure which is not OO. Also don't want to fall into the Liskov Substitution trap.
So, as a newbie to Datomic, is there a way to dynamically change the schema so that an attributes cardinality is modified based on another attribute value. In this case, if Periodic is "false" we don't need to have Period-Start, Period-End. If Periodic is "true" then we need to enforce having values for these attributes.
My gut says, this is not possible. If not, how can I enforce this in the DB? It appears to me that if I have to explicitly validate the transaction before submitting it to the transactor then I am really just defining a schema outside of the constraints of Datomic which doesn't appear to be wise, given that many micro-systems will be writing/reading from the DB and coordinating humans to write 'correct' code is difficult!
Any help on how to overcome this challenge very gratefully received.

I see two sub-answers to your question.
The first is that Datomic does not define "objects". It is really closer to a plain map. Your entity B has 3 fields that entity A does not. That is fine and is not controlled in any way by Datomic. Each attribute-value pair can be added to any entity independently from any other entity. Just because one map has 4 entries, it has no relationship to another map having 7 entries, even if all of the keys in map A are also in map B.
The 2nd sub-answer is that your app must do all validation & integrity checking - Datomic won't. There is no analogue to "UNIQUE NOT NULL" in SQL, etc. However, Datomic does support Database Functions which have a chance to abort any transaction that fails a user-supplied test. So, this is one way of enforcing data integrity checks.
Please also check out Tupelo Datomic, a library I wrote to make using Datomic more effortless.

Related

When to use Datomic Upsert?

I'm referring to :db/unique :db.unique/identity
(as opposed to :db.unique/value)
Upsert to me in my naivety sounds a bit scary, because if i try to insert a new record which has the same value (for a field declared unique) for an existing record, my feeling is that I want that to fail.
What I'd do if a field is declared unique is, when creating or updating another record, check whether that value is taken, and if it is give the user feedback that it's taken.
What am i missing about upsert and when/why is it useful?
A search revealed the following (not in relation to datomic)
"upsert helps avoid the creation of duplicate records and can save you time (if inserting a group) because you don't have to determine which records exist first".
What I dont' understand is that the datomic docs sometimes suggest using this, but I don't see why it's better in any way. Saving time at the cost of allowing collisions?
e.g., if I have a user registration system, I definitely do not want to "upsert" on the email the user submits. I can't think of a case when it would be useful - other than the one quoted above, if you had a large collection of things and you "didn't have time" to check whether they existed first.
Datomic property :db/unique can be used with :db.unique/identity or :db.unique/value
If it's used with :db.unique/identity, will upsert
If used with :db.unique/value, will conflict.

Oracle FK constraints are not enforced

I'm using oracle 12c database and want to test out one problem.
When carrying out web service request it returns underlying ORA-02292 error on constraint name (YYY.FK_L_TILSYNSOBJEKT_BEGRENSNING).
Here is SQL of the table with the constraint:
CONSTRAINT "FK_L_TILSYNSOBJEKT_BEGRENSNING" FOREIGN KEY ("BEGRENSNING")
REFERENCES "XXX"."BEGRENSNING" ("IDSTRING") DEFERRABLE INITIALLY DEFERRED ENABLE NOVALIDATE
The problem is, that when I try to delete the row manually with valid IDSTRING (in both tables) from parent table - it successfully does it.
What cause it to behave this way? Is there any other info I should give?
Not sure if it helps someone, since it was fairly stupid mistake, but i'll try to make it useful since people demand answers.
Keyword DEFERRABLE INITIALLY DEFERRED means that constraint is enforced at commit not at query run time as opposed to INITIALLY IMMEDIATE, which does the check right after you issue the query, however this keyword makes database bulk updates a bit slower (since every query in a transaction has to be checked by constraint, meanwhile in initial deference if it turns out there is an issue - whole bulk is rolled back, no additional unnecessary queries are issued and something can be done about it), hence used less often than initial deference.
Error ORA-02292 however is shown only for DELETE statements, knowing that its a bit easier to debug your statements.

The differences between GeneratedValue strategies

In the Doctrine docs they mention that there exists a few different strategies for the #GeneratedValue annotation:
AUTO
SEQUENCE
TABLE
IDENTITY
UUID
CUSTOM
NONE
Would someone please explain the differences between all thees strategies?
Check the latest doctrine documentation
Here is a summary :
the list of possible generation strategies:
AUTO (default): Tells Doctrine to pick the strategy that is preferred by the used database platform. The preferred strategies are IDENTITY for MySQL, SQLite and MsSQL and SEQUENCE for Oracle and PostgreSQL. This strategy provides full portability.
SEQUENCE: Tells Doctrine to use a database sequence for ID generation. This strategy does currently not provide full portability. Sequences are supported by Oracle and PostgreSql and SQL Anywhere.
IDENTITY: Tells Doctrine to use special identity columns in the database that generate a value on insertion of a row. This strategy does currently not provide full portability and is supported by the following platforms:
MySQL/SQLite/SQL Anywhere => AUTO_INCREMENT
MSSQL => IDENTITY
PostgreSQL => SERIAL
TABLE: Tells Doctrine to use a separate table for ID generation. This strategy provides full portability. This strategy is not yet implemented!
NONE: Tells Doctrine that the identifiers are assigned, and thus generated, by your code. The assignment must take place before a new entity is passed to EntityManager#persist. NONE is the same as leaving off the #GeneratedValue entirely.
SINCE VERSION 2.3 :
UUID: Tells Doctrine to use the built-in Universally Unique Identifier generator. This strategy provides full portability.
Ofcourse the accepted answer is correct, but it needs a minor update as follows:
According to Annotation section of the documentation:
This annotation is optional and only has meaning when used in conjunction with #Id.
If this annotation is not specified with #Id the NONE strategy is used as default.
The strategy attribute is optional.
According to Basic Mapping section of the documentation:
SEQUENCE: Tells Doctrine to use a database sequence for ID generation. This strategy does currently not provide full portability. Sequences are supported by Oracle, PostgreSql and SQL Anywhere.
IDENTITY: Tells Doctrine to use special identity columns in the database that generate a value on insertion of a row. This strategy does currently not provide full portability and is supported by the following platforms:
MySQL/SQLite/SQL Anywhere (AUTO_INCREMENT)
MSSQL (IDENTITY)
PostgreSQL (SERIAL).
Downvote
Regarding the downvote given by someone, it should be noted that SQL Anywhere has been added and the accepted answer needs a minor update.
From the perspective of a programmer, they all achieve the same result: that is to provide a UNIQUE value for the primary key field. Strictly speaking, there are two further conditions which are also met, namely: the key must also be mandatory and not null.
The only differences lie in the internal implementations which provide the primary key value. In addition, there are performance and database-compatibility factors which also need to be considered. Different databases support different strategies.
The easiest one to understand is SEQUENCE and this is generally also the one which yields the best performance advantage. Here, the database maintains an internal sequence whose nextval is accessed by an additional SQL call as illustrated below:
SELECT nextval ('hibernate_sequence')
The next value is allocated during insertion of each new row. Despite the additional SQL call, there is negligible performance impact. With SEQUENCE, it is possible to specify the initial value (default is 1) and also the allocation size (default=50) using the #SequenceGenerator annotation:
#SequenceGenerator(name="seq", initialValue=1, allocationSize=100)
The IDENTITY strategy relies on the database to generate the primary key by maintaining an additional column in the table whose next value is automatically generated whenever a new row is inserted. A separate identity generator is required for each type hierarchy.
The TABLE strategy relies on a separate table to store and update the sequence with each new row insertion. It uses pessimistic locks to maintain the sequence and as a result is the slowest strategy of all these options. It may be worth noting that an #TableGenerator annotation can be used to specify generator name, table name and schema for this strategy:
#TableGenerator(name="book_generator", table="id_generator", schema="bookstore")
With the UUID option, the persistence provider (eg Hibernate) generates a universally unique ID of the form: '8dd5f315-9788-4d00-87bb-10eed9eff566'. To select this option, simply apply the #GeneratedValue annotation above a field declaration whose data type is UUID; eg:
#Entity
public class UUIDDemo {
#Id
#GeneratedValue
private UUID uuid;
// ...
}
Finally, the AUTO strategy is the default and with this option, the persistence provider selects the optimal strategy for the database being used.

Designing a (recursive) log object for an event logging system

I'm designing a system that is supposed to store events. Each event has three basic properties:
1. timestamp (64bit)
2. key (what it is).
3. value (the actual value for the event).
Event keys are usually strings, event values are almost always numbers.
Simple so far, but, here it gets a bit muddy. The event system is supposed to allow drilldown to a very high level. What this means is best illustrated with an example:
NB: Leaving out timestamp for brevity.
key: hits // might be per hour, might be in the last second, the key is application specific, its up to the user to figure out how often his application reports this event to us.
value: 12000
// and here the drilldown starts.
key: US
value: 5000
key: State1
value: 2000
key: City1
value: 500
key: UK
value: 5000
key: StateN
value: 20
// to an arbitrary level.
So, as you can see above, the value actually turns into a tree.
One might say, well, why not store each k/v independently and maintain a "parent key", this would be inefficient due to increasing write (and eventually, when events are looked up, read load). it would be much more efficient to write them out in one operation, and read back the entire object at one go.
I'm wondering how best to design this. The objects are essentially a C++ class (although, for inter-operability, it's actually a serialization framework ala protocol buffers/thrift).
The event system is application agnostic, but I want a nice API to provide the clients that's intuitive.
Have you designed something like this before? Thoughts? What do you think is the best way to go about it?
Thank you in advance.
P.S: A few million events are expected per day, and we'll be building graphs based off the data.
Can you extend the log file definition to provide a "group" or "package" type log tag?
For example :
group:US key:
State1
value: 7000
key: State2
value: 65191
group:UK ...
That way you could solve parsing per each group...if that's what you are looking for...
One idea that comes to mind is also give your entries a fourth property: parent log entry id. With a ORM like ActiveRecord you can then form a natural tree. For example:
class LogEntry < ActiveRecord::Base
has_one :parent_log_entry
has_many :log_entries
end
(That's surely not correct as it stands, but you'll get the idea).
There are various implementations of the ActiveRecord scheme in various languages, so that'd be pretty language (and DB) agnostic.

Unit Testing & Primary Keys

I am new to Unit Testing and think I might have dug myself into a corner.
In your Unit Tests, what is the better way to handle primary keys?
Hopefully an example will paint some context. If create several instances of an object (Lets' say Person).
My unit test is to test the correct relationships are being created.
My code is to create Homer, he children Bart and Lisa. He also has a friend Barney, Karl & Lenny.
I've seperated my data layer with an Interface. My preference is to keep the primary key simple. Eg On Save, Person.ProductID = new Random().Next(10000); instead of say Barney.PersonID = 9110 Homer.PersonID = 3243 etc.
It doesn't matter what the primary key is, it just needs to be unique.
Any thoughts???
EDIT:
Sorry I haven't made it clear. My project is setup to use Dependency Injection. The data layer is totally separate. The focus of my question is, what is practical?
I have a class called "Unique" which produces unique objects (strings, integers, etc). It makes sure they're unique per-test by keeping a internal static counter. That counter value is incremented per key generated, and included in the key somehow.
So when I'm setting up my test
var Foo = {
ID = Unique.Integer()
}
I like this as it communicates that the value is not important for this test, just the uniqueness.
I have a similar class 'Some' that does not guarantee uniqueness. I use it when I need an arbitrary value for a test. Its useful for enums and entity objects.
None of these are threadsafe or anything like that, its strictly test code.
There are several possible corners you may have dug yourself into that could ultimately lead to the question that you're asking.
Maybe you're worried about re-using primary keys and overwriting or incorrectly loading data that's already in the database (say, if you're testing against a dev database as opposed to a clean test database). In that case, I'd recommend you set up your unit tests to create their records' PKs using whatever sequence a normal application would or to test in a clean, dedicated testing database.
Maybe you're concerned about the efficacy of your code with PKs beyond a simple 1,2,3. Rest assured, this isn't something one would typically test for in a straightforward application, because most of it is outside the concern of your application: generating a number from a sequence is the DB vendor's problem, keeping track of a number in memory is the runtime/VM's problem.
Maybe you're just trying to learn what the best practice is for this sort of thing. I would suggest you set up the database by inserting records before executing your test cases using the same facilities that your application itself will use to insert records; presumably your application code will rely on a database-vended sequence number for PKs, and if so, use that. Finally, after your test cases have executed, your tests should roll back any changes they made to the database to ensure the test is idempotent over multiple executions. This is my sorry attempt of describing a design pattern called test fixtures.
Consider using GUIDs. They're unique across space and time, meaning that even if two different computers generated them at the same exact instance in time, they will be different. In other words, they're guaranteed to be unique. Random numbers are never good, there is a considerable risk of collision.
You can generate a Guid using the static class and method:
Guid.NewGuid();
Assuming this is C#.
Edit:
Another thing, if you just want to generate a lot of test data without having to code it by hand or write a bunch of for loops, look into NBuilder. It might be a bit tough to get started with (Fluent methods with method chaining aren't always better for readability), but it's a great way to create a huge amount of test data.
Why use random numbers? Does the numeric value of the key matter? I would just use a sequence in the database and call nextval.
The essential problem with database unit testing is that primary keys do not get reused. Rather, the database creates a new key each time you create a new record, even if you delete the record with the original key.
There are two basic ways to deal with this:
Read the generated Primary Key, from the database and use it in your tests, or
Use a fresh copy of the database each time you test.
You could put each test in a transaction and roll the transaction back when the test completes, but rolling back transactions doesn't always work with Primary Keys; the database engine will still not reuse keys that have been generated once (in SQL Server anyway).
When a test executes against a database through another piece of code it ceases to be an unit test. It is called an "integration test" because you are testing the interactions of different pieces of code and how they "integrate" together. Not that it really matters, but its fun to know.
When you perform a test, the following things should occur:
Begin a db transaction
Insert known (possibly bogus) test items/entities
Call the (one and only one) function to be tested
Test the results
Rollback the transaction
These things should happen for each and every test. With NUnit, you can get away with writing step 1 and 5 just once in a base class and then inheriting from that in each test class. NUnit will execute Setup and Teardown decorated methods in a base class.
In step 2, if you're using SQL, you'll have to write your queries such that they return the PK numbers back to your test code.
INSERT INTO Person(FirstName, LastName)
VALUES ('Fred', 'Flintstone');
SELECT SCOPE_IDENTITY(); --SQL Server example, other db vendors vary on this.
Then you can do this
INSERT INTO Person(FirstName, LastName, SpouseId)
VALUES('Wilma', 'Flintstone', #husbandId);
SET #wifeId = SCOPE_IDENTITY();
UPDATE Person SET SpouseId = #wifeId
WHERE Person.Id = #husbandId;
SELECT #wifeId;
or whatever else you need.
In step 4, if you use SQL, you have to re-SELECT your data and test the values returned.
Steps 2 and 4 are less complicated if you are lucky enough to be able to use a decent ORM like (N)Hibernate (or whatever).