How to call SQL functions / stored procedure when using the Repository pattern - repository-pattern

What is the best way to call a SQL function / stored procedure when converting code to use the repository pattern? Specifically, I am interested in read/query capabilities.
Options
Add an ExecuteSqlQuery to IRepository
Add a new repository interface specific to the context (i.e. ILocationRepository) and add resource specific methods
Add a special "repository" for all the random stored procedures until they are all converted
Don't. Just convert the stored procedures to code and place the logic in the service layer
Option #4 does seem to be the best long term solution, but it's also going to take a lot more time and I was hoping to push this until a future phase.
Which option (above or otherwise) would be "best"?
NOTE: my architecture is based on ardalis/CleanArchitecture using ardalis/Specification, though I'm open to all suggestions.

https://github.com/ardalis/CleanArchitecture/issues/291
If necessary, or create logically grouped Query services/classes for
that purpose. It depends a bit on the functionality of the SPROC how I
would do it. Repositories should be just simple CRUD, at most with a
specification to help shape the result. More complex operations that
span many entities and/or aggregates should not be added to
repositories but modeled as separate Query objects or services. Makes
it easier to follow SOLID that way, especially SRP and OCP (and ISP)
since you're not constantly adding to your repo
interfaces/implementations.

Don't treat STORED PROCEDURES as 2nd order citizens. In general, avoid using them because they very often take away your domain code and hide it inside database, but sometimes due to performance reasons, they are your only choice. In this case, you should use option 2 and treat them same as some simple database fetch.
Option 1 is really bad because you will soon have tons of SQL in places you don't want (Application Service) and it will prevent portability to another storage media.
Option 3 is unnecessary, stored procedures are no worse than simple Entity Framework Core database access requests.
Option 4 is the reason why you cannot always avoid stored procedures. Sometimes trying to query stuff in application service/repositories will create very big performance issues. That's when, and only when, you should step in with stored procedures.

Related

Is it possible to use policy based design together with automated testing?

I am developing a numerical simulations library which is centred around a single collection of data operated on by different computational algorithms. The algorithms are complex, they have different states involving multiple parameters, and are interchangeable (under some semantic restrictions).
To avoid bloated interface of the collection and to enable different implementations etc, I'm thinking about using policy based design. This gives the collection a wide combination of choices between storage structures, algorithms, parameters, internal stuff.
If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures? Conceptually I need to define the set of policies and a set of verification test cases and execute a parametric study.
This is easy when object oriented programming is used since I can determine all necessary types and their parameters during run-time using e.g. a string-based Abstract Factory with type names stored in the input file, that is then changed by an external script that executes the client application on a family of test cases.
How do I do that with policies, where a combination of N policies ends up in being N different client applications?
How is automated testing done together with policy based design in a professional way?
If you're representing algorithms as policies, you /should/ have a pretty uniform interface already thought up. You could imagine an "AlgorithmPolicy" processing some data from your data store and returning some representation of the results.
"If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures?"
If your object oriented design currently makes use of the strategy pattern (see also: the Gang of Four book), your policies will simply replace every place that you've used a strategy. Choosing "optimal algorithms" for the different policies you design will simply be a matter of nailing the right conceptual structure / interface for those policies. (If you're going to use many different data stores, make sure that the interface for adding / removing / getting data from them is uniform, for example. Here, it can be helpful to think of three examples and find commonalities... then think of another exmaple and make sure it fits the schema. Iterate until things feel correct.)
You'll still have adequate type checking, it'll just feel a bit different (and you may run into some nasty compile errors occaisionally. ;)
Testing will simply be a matter of writing some unit tests for each of the configurations / policy combinations you'd like to cover. You probably should already be writing these tests anyways; the primary difference is that you'll want to try to hit the interfaces you designate rather than targetting specifics.
You can validate different storage methods based on validations of your algorithm policies. (So, if I have some algorithm that can be stored in different ways, I can run the algorithm on some test data for ecah storage mechanism and expect the same results.) Assuming that you've spec'd out the inteface correclty, you should only need to write a single test for each additional storage mechanism you add.
Again: It'd be nice to have more details about the structure of the program, what different parameters and such you'd need to pass in. (Is any of this code open source / going to be open sourced?)
From what you've said, in my mind, your complicated-policy process may have an interface like so:
FancyDataStore.Process()
For testing it, I'd write:
MockAlgorithmPolicy - A very simple algorithm that's trivial to validate.
MockInternalStuffPolicy - A very simple internal stuff policy that causes no integrations / reports nothing new.
MockStoragePolicy - A very simple storage policy that meets your interface for storage / doesn't cause many issues.
Write a test that validates the mocks put together...
For each StoragePolicy you create, write an automated test to validate it:
testSomeStoragePolicy{
// has a call to:
FancyDataStore.Process<MockAlgorithmPolicy, SomeStoragePolicy, MockInternalStuff>()
// validate...
}
That should prove that the SomeStoragePolicy works as expected.
Then, for your algorithms, you could write:
testSomeAlgorithmPolicy{
FancyDataStore.process<SomeAlgorithmPolicy, MockStoragePolicy, MockInternalStuff>();
///Validate.
}
etc.
This way, you write basically 1 test per each policy you end up writing (which seems feasible and not too ridiculous) Additionally, you can always add additional unit tests to cover other subtle integrations that may spin up over time.
If you're looking for good books on this subject, I'd suggest reading "Modern C++ Programming"; it provides a great primer on policy-driven design in C++.

An alternative to hierarchical data model

Problem domain
I'm working on a rather big application, which uses a hierarchical data model. It takes images, extracts images' features and creates analysis objects on top of these. So the basic model is like Object-(1:N)-Image_features-(1:1)-Image. But the same set of images may be used to create multiple analysis objects (with different options).
Then an object and image can have a lot of other connected objects, like the analysis object can be refined with additional data or complex conclusions (solutions) can be based on the analysis object and other data.
Current solution
This is a sketch of the solution. Stacks represent sets of objects, arrows represent pointers (i.e. image features link to their images, but not vice versa). Some parts: images, image features, additional data, may be included in multiple analysis objects (because user wants to make analysis on different sets of object, combined differently).
Images, features, additional data and analysis objects are stored in global storage (god-object). Solutions are stored inside analysis objects by means of composition (and contain solution features in turn).
All the entities (images, image features, analysis objects, solutions, additional data) are instances of corresponding classes (like IImage, ...). Almost all the parts are optional (i.e., we may want to discard images after we have a solution).
Current solution drawbacks
Navigating this structure is painful, when you need connections like the dotted one in the sketch. If you have to display an image with a couple of solutions features on top, you first have to iterate through analysis objects to find which of them are based on this image, and then iterate through the solutions to display them.
If to solve 1. you choose to explicitly store dotted links (i.e. image class will have pointers to solution features, which are related to it), you'll put very much effort maintaining consistency of these pointers and constantly updating the links when something changes.
My idea
I'd like to build a more extensible (2) and flexible (1) data model. The first idea was to use a relational model, separating objects and their relations. And why not use RDBMS here - sqlite seems an appropriate engine to me. So complex relations will be accessible by simple (left)JOIN's on the database: pseudocode "images JOIN images_to_image_features JOIN image_features JOIN image_features_to_objects JOIN objects JOIN solutions JOIN solution_features") and then fetching actual C++ objects for solution features from global storage by ID.
The question
So my primary question is
Is using RDBMS an appropriate solution for problems I described, or it's not worth it and there are better ways to organize information in my app?
If RDBMS is ok, I'd appreciate any advice on using RDBMS and relational approach to store C++ objects' relationships.
You may want to look at Semantic Web technologies, such as RDF, RDFS and OWL that provide an alternative, extensible way of modeling the world. There are some open-source triple stores available, and some of the mainstream RDBMS also have triple store capabilities.
In particular take a look at Manchester Universities Protege/OWL tutorial: http://owl.cs.manchester.ac.uk/tutorials/protegeowltutorial/
And if you decide this direction is worth looking at further, I can recommend "SEMANTIC WEB for the WORKING ONTOLOGIST"
Just based on the diagram, I would suggest that an RDBMS solution would indeed work. It has been years since I was a developer on an RDMS (called RDM, of course!), but I was able to renew my knowledge and gain very many valuable insights into data structure and layout very similar to what you describe by reading the fabulous book "The Art of SQL" by Stephane Faroult. His book will go a long way to answer your questions.
I've included a link to it on Amazon, to ensure accuracy: http://www.amazon.com/The-Art-SQL-Stephane-Faroult/dp/0596008945
You will not go wrong by reading it, even if in the end it does not solve your problem fully, because the author does such a great job of breaking down a relation in clear terms and presenting elegant solutions. The book is not a manual for SQL, but an in-depth analysis of how to think about data and how it interrelates. Check it out!
Using an RDBMS to track the links between data can be an efficient way to store and think about the analysis you are seeking, and the links are "soft" -- that is, they go away when the hard objects they link are deleted. This ensures data integrity; and Mssr Fauroult can answer what to do to ensure that remains true.
I don't recommend RDBMS based on your requirement for an extensible and flexible model.
Whenever you change your data model, you will have to change DB schema and that can involve more work than change in code.
Any problems with DB queries are discovered only at runtime. This can make a lot of difference to the cost of maintenance.
I strongly recommend using standard C++ OO programming with STL.
You can make use of encapsulation to ensure any data change is done properly, with updates to related objects and indexes.
You can use STL to build highly efficient indexes on the data
You can create facades to get you the information easily, rather than having to go to multiple objects/collections. This will be one-time work
You can make unit test cases to ensure correctness (much less complicated compared to unit testing with databases)
You can make use of polymorphism to build different kinds of objects, different types of analysis etc
All very basic points, but I reckon your effort would be best utilized if you improve the current solution rather than by look for a DB based solution.
http://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/index.html
"you'll put very much effort maintaining consistency of these pointers
and constantly updating the links when something changes."
With the help of Boost.MultiIndex you can create almost every kind of index on a "table". I think the quoted problem is not so serious, so the original solution is manageable.

Where to store SQL code for a C++ application?

We have a C++ application that utilizes some basic APIs to send raw queries to a MS SQL Server. Scattered through the various translation units in our program, we have simple 1-2 line queries as C++ strings, and every now and then you'll see more complex queries that can be over 20 lines.
I can't help but think that the larger queries, specifically the 20+ line ones, should not be embedded in C++ code as constant strings. I want to propose pulling these out into separate text files that are loaded on-demand by the C++ application, however I'm not sure if this is the best approach.
What design choices are typical for situations like this? I definitely feel there needs to be improvement, I just don't know if moving the SQL queries out into data files (text files) is the best idea.
You could make a DAL (Data Access Layer).
It would be the API that the rest of the program talks to. Then you can mess around and try anything and everything (Stored procedures, caching, etc.) without disturbing the main program.
Move them into their own files, or even into their own stored procedures. Queries embedded in the application cannot be changed without a recompile, and depending on your release procedures, that could severely impair your ability to respond to emergencies or deploy hot fixes. You could alter your app to cache the file contents, if you go down that road, and even periodically check the files for updates.
the best "design choice" - for many different reasons - is to use MSSQL stored procedures whenever/wherever possible.
I've seen code that segregates SQL queries into a common module, but I don't think there's much benefit to a common "queries module" (or a standalone text file) over having the SQL queries spelled out as string literals in the module that's calling them.
Stored procedures, on the other hand, increase modularity, enhance security, and can vastly improve performance.
IMHO...
I would leave the SQL embedded in the C++ functions that use it: it will be easier to read and understand what the code does.
If you have SQL queries scattered around your code I'd say that there is some problem with the overall structure of the classes you are using: you should have some (or even just one) 'low level' classes that handle the interaction with the database, and the rest of the code uses these classes.
I personally don't like using stored procedure: if you have to support a different database server the porting will be a pain, I never saw that much of a performance improvement and to understand what the code does you have to jump back and forth between the stored procedures and the C++.
It really depends, here are some notes:
1) If all your sql code resides in the application, then your application is pretty much self contained in terms of logic. This is good as you have done in the current application. In terms of speed, this can be a little slower as SQL will need to be parsed when when you run these queries(also depends if you used Prepared statements,etc which can speed it up).
2) The second approach is to put all SQL logic as stored procedures on the server. This is a very much preferred approach for even small SQL queries whether one line or not. You just build a DAL layer. In terms of performance this is very good, however the logic lives in two different systems, your C++ app and the SQL server. You will quite likely need to build a small utility application that can translate the stored procedures input and output to template code (be it C++ or any other) to make your life easier.
3) A mixed approach with the above two. I would not recommend this route.
You need to think about how these queries are likely to change over time, and compare it to how the related C++ code is likely to change. If the queries are relatively independent of the code, and have a higher likelihood of change, then I would either load them at runtime from separate files, or use stored procedures instead. That approach allows for changing the queries without recompiling the C++ code. On the other hand, if the queries are highly coupled to the C++ code, making a change in one likely to accompany a change in the other, I would keep the queries in the code. This approach makes a change more localized and less error prone.

Web Service ‘mandatory/optional’ fields: XSD Design time vs Runtime

We are currently building a pile of SOAP Web Service to front the access of various backend systems.
While defining our Request/Response message XML, we see multiple services needing the ‘Account’ object with different ‘mandatory/optional’ fields.
How should we define and enforce the validation of these ‘mandatory/optional’ fields on the same Message? I see these options
1) Enforce validation with XSD by creating different 'Account' Complexe Type
Pros : Design time clarity.
Cons : proliferation of Object Type, Less reuse of Object,
2) Enforce validation with XSD by Extending+Restriction a single base 'Account' type
Pros : Design time clarity.
Cons : Not sure of the support of the Extend+Restriction feature (java, .Net)
3) Using a single 'Account' type and enforcing validation in runtime (ie in the Code).
Pros: Simple
Cons: No design time validation. Need to communicate field requirements via a specification doc.
What are you’re thoughts on that?
I would have to assume that: i) some of what you would call optional fields are actually fields that are not applicable (don't make sense) to all accounts and ii) we're not talking trivial scenarios (like two type of accounts with 2 fields each-kind of thing).
Firstly, I would say that unless you're really lucky, from a requirements perspective, then you're going to end up with some sort of "validation in runtime" no matter what option you're going with. XML Schema can't express some common data validation requirements, such as cross field validation; or simply because the data in your XML is not sufficient to feed the rules to validate the integrity of the message (the data in the message being a subset on what's available at the time the XML is being un/marshalled).
Secondly, I would avoid deriving new complex types through restricton; from an authoring perspective you don't achieve much in terms of reuse, and you might end up with problems in how that is interpreted by your XSD to code tooling. I like to think that the original intention of deriving through restriction was to provide a tool for people to use in xsd:redefine scenarios; for people that wouldn't want to fiddle with XML Schemas that were authored by someone else. If one owns (authors) the schema, one can work around the need to restrict by defining the "lesser" object first and extend from that.
As to the "proliferation of objects", you are kind of getting that with option #2 as well (when compared with #1); what I mean by that, all the tools I know will create a class for each named (global) complex type you have in your XSD; so if you have to have three type of accounts, you'll have three for scenario #1, and four, or so, if you choose to extend from one, or so, base classes; a worst case scenario for the later would be when you need three specializations (concrete if you wish); anyway, from my experience, the difference in real life scenarios is not something that would really tip the decision one way or the other.
Extending base types in XML Schema is good for reuse; however, reuse brings coupling; if you're analysing this from a forward/backward compatibility point of view, extending something in the base type could mess up some of the unmarshalling (deserialization) of the XML for clients of your service(s) that don't want to change their code base, yet you want to maintain only one Web Service endpoint for all; in this case, a forward-compatibility strategy that relies on an xsd:any at the end of a compositor (xsd:sequence) would be rendered useless in your first release that goes and extends your base type.
There is even more; because of this, I don't think there's a correct answer, just for the criteria you seem to imply by setting your pro/cons.
All of my preferred options below assume that you put high value on the requirement to ensure forward/backward compatibility of your services, and you want to minimize the cost of your clients having to deal with your services (because of XML Schema changes).
I would say that if all your domain (accounts in particular) can be fully modeled (assume no future change basically) and that there is enough commonality to justify reuse, then go with option #2. Otherwise, go with option #1 since I have yet to see things that don't change...
If the modeling of your domain can be done 80% or more (or some number that you think is high) and that there is enough commonality to justify reuse, then I would still go with option #2, with the caveat that any future extensions for common attributes across accounts, must be applied for each individual account (basically turning your option into a hybrid, by doing #1).
For anything else, I would go #1. Whew, I can't believe I wrote all of this...

What are the beneifts of using a database abstraction layer?

I've been using some code that implements the phpBB DBAL for some time. Recently I had to implement a more full package around it and decided to use the DBAL throughout. In the main, it's been OK. But occassionally there are circumstances where I can't see the logic in using it. It seems to make the simple much more complicated.
What benefits does a DBAL offer rather then writing sql statements directly?
From wikipedia (http://en.wikipedia.org/wiki/Database_abstraction_layer) :
API level abstraction
Libraries like OpenDBX unify access to databases by providing a single low-level programming interface to the application developer. Their advantages are most often speed and flexibility because they are not tied to a specific query language (subset) and only have to implement a thin layer to reach their goal. The application developer can choose from all language features but has to provide configurable statements for querying or changing tables. Otherwise his application would also be tied to one database.
When cooking a dish, you do not want several chefs having access to the pot. They could all be adding spices unaware that another chef had already added a spice. Ideally, you want a single chef that would serve as a single point of access to avoid spoiling the soup.
The same with databases. A single point of access can avoid problems of multiple services accessing the data in different ways.