I am new to clojure. I want to fetch x records with fields from database and want to insert records into database. Which once should I use between defrecord and defschema in this scenario?
Are those the same?
defschema and defrecord do not refer to database schema ("shape of database") nor to records (i.e., rows in relational DBs).
Schema is a library for describing the shape of your data, and validating whether some data conforms to this shape. It is similar to the more recent clojure.spec. Clojure Records are custom datatypes, which look a bit like Java-classes.
It is easy to be tempted to write "Object Oriented" DB communication with Records for each entity. However, all database contains is data, which is just lists, maps, sets, and some basic data types. I suggest you keep your data in built-in Clojure data structures, ready at hand, and don't hide it in unnecessary abstractions. (Side note: your DB component, instead of DB entity, may very well be a Clojure Record. For example, lifecycle management with Component uses Records.)
A good place to start would be Honey SQL, which allows you to build SQL queries as Clojure data structures. You get back data and can operate on that data with the full might of Clojure.
Then, when you are comfortable with "laying all your data open (without encapsulation)", go and describe the shape of your data, what is valid and what is not. clojure.spec is a powerful tool for that.
Related
I need to generate numbers that are unique and in sequence (for easy traceability). Also, they should be as gapless as possible (for decent non-repudiation).
SQL SEQUENCEs are the obvious answer. What's the cleanest way to use them with Django 1.8 and MySQL? They doesn't seem to be a good way to use SEQUENCEs with Django.
Since MySQL doesn't have separate sequences the only clear way is to create a new table that has autoincrementing IDs.
Thinking about the design in a super relational way, I am also able to store other related data in the new table.
The other data is otherwise being redundantly stored in other tables (redundantly since some of the other tables are for logging). However, storing the data in a more organised way will result in better traceability and smaller database size.
I'm learning SQLAlchemy and using a database where multiple table lookups are needed to find a single piece of data.
I'm trying to find the best (most efficient and Pythonic) way to map the multiple lookups to a single SQLAlchemy object or reusable python method.
Ultimately, there will be dozens if not hundreds of mapped object such as these, so something like a .map file might be handy.
I.e. (Using pseudocode)
If I want to find the data 'Status' from 'Patient Name' have to use three tables.
Instead of writing a function for every potential 'this' to 'that' data request, is there an SQLAlchemy or Pythonic way to make the mappings?
I CAN make new, temporary SQLAlchemy Tables to store data. I am NOT at liberty to change the database I'm reading from. I'm hoping to reduce the number of individual calls to the database, because it is remote and slow.
I'm not sure a data join will work, because the Primary Keys, Foreign Keys and Column names are inconsistent in the database. But, I don't really know how to make select-joins in SQLAlchemy.
Perhaps I need to create a new table, with relationships to those three previous tables? But I'm not understanding the relationships well.
Can these tables be auto-generated from a map.ini file?
EDIT:
I might add, that some of these relationships could be one to many. I.e. a patient may be associated with more than one statusID...such as...
Let's say I have a (MySQL) DB. I want to automate the update of this database via an application, that will:
1. Import from DB
2. Calculate updated data
3. Export back updated data
The timing is important, I don't want to import while calculating, in fact I don't want any queries then; I want to import (a) table(s) as a whole, then calculate. So, my question is, if a row is represented with an instance of a class, then what container do I put these objects into?
A vector? A set? What about ordered vs. unordered? Just use what seems best for my case according to big O times? Any special traps to fall into here? Is this case no different than with data "born in memory", so the only things to consider besides size overhead are "do I want the lookup or the insertion to be faster" ?
Probably the best route is to use some ORM, but let's say I don't want to.
I've seen some apps use boost::unordered_set, and I wondered, if there is a particular reason for its use...
I use a jdbc-like interface as the connector (libmysqlcpp).
I do not think that the container you have to use can be guessed with so few information. It mainly depends of the data size, type and the algorithm you will run.
But my main concern over such a design is that it will quickly choke your network or your base and database. If you have a big table you'll:
select all the data from the table
retrieve all the data over the network
process on you machine part (some columns ?) or the entirety of the data
push the data over the network
update your rows (or erase/replace maybe)
Why don't you consider working directly on the mysql server ? You create your user defined function that work on the directly data, saving the network and even taking advantage of the fact that mysql is built to handle gigantic amount of data, quantity that an in-memory container is not built to handle.
Can Django support Oracle nested tables or varrays or collections in some manner? Asking just for completeness as our project is reworking the data model, attempting to move away from EAV organization, but I don't like creating a bucket load of dependent supporting tables for each main entity.
e.g.
(not the proper Oracle syntax, but gets the idea across)
Events
eventid
report_id
result_tuple (result_type_id, result_value)
anomaly_tuple(anomaly_type_id, anomaly_value)
contributing_factors_tuple(cf_type_id, cf_value)
etc,
where the can be multiple rows of the tuples for one eventid
each of these tuples can, of course exist as separate tables, but this seems to be more concise. If it 's something Django can't do, or I can't modify the model classes to do easily, then perhaps just having django create the extra tables is the way to go.
--edit--
I note that django-hstore is doing something very similar to what I want to do, but using postgresql's hstore capability. Maybe I can branch off of that for an Oracle nested table implementation. I dunno...I'm pretty new to python and django, so my reach may exceed my grasp in this case.
Querying a nested table gives you a cursor to traverse the tuples, one member of which is yet another cursor, so you can get the rows from the nested table.
I have a set of statistical data (about 100M size), which is organized in key-value pairs, some of the values are just numbers (e.g. like person's age or weight) and some are hierarchical (e.g. like person's employments - it can have a set of employment records, each again containing key/value pairs, etc.). The real data is not exactly these but the structure is similar.
I need to query these data with arbitrary set of criteria - i.e. I may want to ask something like "where 20 oldest persons worked 3 years ago" or "what is the sum of all salaries for all people that ever worked at company X for more than a year", or "give me all you know on people that found a new job recently", etc.
I can program each individual query pretty easily but since there can be many of them and they vary all the time it becomes tedious to program each one anew, so the question is if there's an existing tool that would make it easier for me to do such queries (if it has a nice GUI that's a bonus :). Something like SQL wouldn't work well because data fields aren't really fixed and making hierarchy work in SQL would be too much trouble IMHO. So is there a tool that I could use with relative ease for this task (i.e. not learning a whole new language for that - I'd better stay with hand-coding the queries then)?
You may want to look at MongoDB. It is a JSON data store, so it essentially works with key/value pairs, and you can nest JSON within JSON. It uses JavaScript as the query language. Of course, you'd need to convert your data to JSON, but this is not difficult.
Another option may be a graph database like Neo4j. Each record is a node and you can define relationships between nodes (visualized as edges).
I do not think either of these have any type of GUI, but they are pretty easy to query. MongoDB uses JS with bindings you can use to call the DB. Neo4j uses Java, but there are some bindings for other languages.
SQL queries would be challenging, but it would work. I will also throw PostgreSQL as an option since it is somewhat object oriented, but I am more familiar with the others.