Web Service ‘mandatory/optional’ fields: XSD Design time vs Runtime

Web Service ‘mandatory/optional’ fields: XSD Design time vs Runtime - web-services

We are currently building a pile of SOAP Web Service to front the access of various backend systems.
While defining our Request/Response message XML, we see multiple services needing the ‘Account’ object with different ‘mandatory/optional’ fields.
How should we define and enforce the validation of these ‘mandatory/optional’ fields on the same Message? I see these options
1) Enforce validation with XSD by creating different 'Account' Complexe Type
Pros : Design time clarity.
Cons : proliferation of Object Type, Less reuse of Object,
2) Enforce validation with XSD by Extending+Restriction a single base 'Account' type
Pros : Design time clarity.
Cons : Not sure of the support of the Extend+Restriction feature (java, .Net)
3) Using a single 'Account' type and enforcing validation in runtime (ie in the Code).
Pros: Simple
Cons: No design time validation. Need to communicate field requirements via a specification doc.
What are you’re thoughts on that?

I would have to assume that: i) some of what you would call optional fields are actually fields that are not applicable (don't make sense) to all accounts and ii) we're not talking trivial scenarios (like two type of accounts with 2 fields each-kind of thing).
Firstly, I would say that unless you're really lucky, from a requirements perspective, then you're going to end up with some sort of "validation in runtime" no matter what option you're going with. XML Schema can't express some common data validation requirements, such as cross field validation; or simply because the data in your XML is not sufficient to feed the rules to validate the integrity of the message (the data in the message being a subset on what's available at the time the XML is being un/marshalled).
Secondly, I would avoid deriving new complex types through restricton; from an authoring perspective you don't achieve much in terms of reuse, and you might end up with problems in how that is interpreted by your XSD to code tooling. I like to think that the original intention of deriving through restriction was to provide a tool for people to use in xsd:redefine scenarios; for people that wouldn't want to fiddle with XML Schemas that were authored by someone else. If one owns (authors) the schema, one can work around the need to restrict by defining the "lesser" object first and extend from that.
As to the "proliferation of objects", you are kind of getting that with option #2 as well (when compared with #1); what I mean by that, all the tools I know will create a class for each named (global) complex type you have in your XSD; so if you have to have three type of accounts, you'll have three for scenario #1, and four, or so, if you choose to extend from one, or so, base classes; a worst case scenario for the later would be when you need three specializations (concrete if you wish); anyway, from my experience, the difference in real life scenarios is not something that would really tip the decision one way or the other.
Extending base types in XML Schema is good for reuse; however, reuse brings coupling; if you're analysing this from a forward/backward compatibility point of view, extending something in the base type could mess up some of the unmarshalling (deserialization) of the XML for clients of your service(s) that don't want to change their code base, yet you want to maintain only one Web Service endpoint for all; in this case, a forward-compatibility strategy that relies on an xsd:any at the end of a compositor (xsd:sequence) would be rendered useless in your first release that goes and extends your base type.
There is even more; because of this, I don't think there's a correct answer, just for the criteria you seem to imply by setting your pro/cons.
All of my preferred options below assume that you put high value on the requirement to ensure forward/backward compatibility of your services, and you want to minimize the cost of your clients having to deal with your services (because of XML Schema changes).
I would say that if all your domain (accounts in particular) can be fully modeled (assume no future change basically) and that there is enough commonality to justify reuse, then go with option #2. Otherwise, go with option #1 since I have yet to see things that don't change...
If the modeling of your domain can be done 80% or more (or some number that you think is high) and that there is enough commonality to justify reuse, then I would still go with option #2, with the caveat that any future extensions for common attributes across accounts, must be applied for each individual account (basically turning your option into a hybrid, by doing #1).
For anything else, I would go #1. Whew, I can't believe I wrote all of this...

Related

Modeling frontend and backend in a use case diagram

I am trying to make a use case diagram for my project, the backend is going to be made using Django rest framework and the front end using react, my question is how can i model this situation in the right way, should i model the frontend and represent the backend as an actor or the opposite, since i am thinking of making a mobile application as a second front end?

The right answer here is the Standard Answer of the Business Analyst no 1: It depends.
The question is - what do you want to model and why. Then - what is the correct tool (diagram) to do it.
The goal of the Use Case diagram is to show what functionalities a system is going to offer. Now the system can be treated as a whole, in which case you show the functionalities without depicting how the system is internally organised (this is the most common scenario and most probable the best way to use Use Case diagram in your case - but it does not show the fact of having FE and BE, note that this type of diagram isn't really best suited to do so, so keep reading).
You may also tread e.g. BE as the system itself (it can make sense especially when you're preparing headless API and really separate BE from FE; even more so when your BE and FE teams are totally separate). In such case FE will become an actor (just like e.g. other system that can interact with your BE). Obviously FE can be treated in the same way (i.e. be considered the system with BE being an actor), however usually there's less reason to do so.
Now having said that, if you want to depict the distinction between BE and FE, you should consider other types of diagrams. Keep in mind that Use Case diagram is a dynamic diagram, and the internal structure of the system is static, so obviously it should be one of the static diagrams instead. One that is dedicated to show the internal structure of the system is the Component diagram and it would most likely serve best the purpose of indicating existence of FE and BE (potentially with further level of details, e.g. existing microservices).
If on the other hand you would like to show specific technology in use, Deployment diagram might be your best shot. It allows to show the actual runtime environments, artifacts and their technologies.
Keep in mind - tying to use one type of diagram, or even worse one diagram, to show everything is usually a bad idea and a mistake often made by newbies. Be smarter than that.

Use-case are about a set of behaviors with an observable result that is of value for the actors. They should not care about the internals of a system:
UseCases define the offered Behaviors of the subject without reference to its internal structure.
Therefore, you should in principle not care about the distinction between front-end and back-end, but focus on actor goals with the system.
The only situation where you'd care for the back-end in a use-case diagram, is the case where the front-end would be an independent application that is of value on its own, but can interact with actors that represent external independent systems. (More here)

How to call SQL functions / stored procedure when using the Repository pattern

What is the best way to call a SQL function / stored procedure when converting code to use the repository pattern? Specifically, I am interested in read/query capabilities.
Options
Add an ExecuteSqlQuery to IRepository
Add a new repository interface specific to the context (i.e. ILocationRepository) and add resource specific methods
Add a special "repository" for all the random stored procedures until they are all converted
Don't. Just convert the stored procedures to code and place the logic in the service layer
Option #4 does seem to be the best long term solution, but it's also going to take a lot more time and I was hoping to push this until a future phase.
Which option (above or otherwise) would be "best"?
NOTE: my architecture is based on ardalis/CleanArchitecture using ardalis/Specification, though I'm open to all suggestions.

https://github.com/ardalis/CleanArchitecture/issues/291
If necessary, or create logically grouped Query services/classes for
that purpose. It depends a bit on the functionality of the SPROC how I
would do it. Repositories should be just simple CRUD, at most with a
specification to help shape the result. More complex operations that
span many entities and/or aggregates should not be added to
repositories but modeled as separate Query objects or services. Makes
it easier to follow SOLID that way, especially SRP and OCP (and ISP)
since you're not constantly adding to your repo
interfaces/implementations.

Don't treat STORED PROCEDURES as 2nd order citizens. In general, avoid using them because they very often take away your domain code and hide it inside database, but sometimes due to performance reasons, they are your only choice. In this case, you should use option 2 and treat them same as some simple database fetch.
Option 1 is really bad because you will soon have tons of SQL in places you don't want (Application Service) and it will prevent portability to another storage media.
Option 3 is unnecessary, stored procedures are no worse than simple Entity Framework Core database access requests.
Option 4 is the reason why you cannot always avoid stored procedures. Sometimes trying to query stuff in application service/repositories will create very big performance issues. That's when, and only when, you should step in with stored procedures.

Need explanation for 'convention over configuration' [duplicate]

What are the benefits of the "Convention over Configuration" paradigm in web development? And are there cases where sticking with it don't make sense?
Thanks

Convention states that 90% of the time it will be a certain way. When you deviate from that convention then you can make changes...versus forcing each and every user to understand each and every configuration parameter. The idea is that if you need it to differ you will search it out at that point in time versus trying to wrap your head around all the configuration parameters when it often times has no real value.
IMHO it always makes sense. Making convention the priority over explicit configuration is ideal. Again if someone has a concern, they will force themselves to investigate the need.

I think the benefit is simple: No configuration necessary. You don't need to define locations for this-or-that type of resource, for example, for the app/framework to find them itself.
As for cases where it does not make sense: any situation where it will be fairly frequent that alternative configurations would be required, or where it makes sense that a developer/admin would need to 'opt-in' to some behavior explicitly (for example, to prevent unintended and unexpected side-effects that could have security implications).

The benefit of convention over configuration paradigm in web development the productivity since you won't be required to configured to set all the rules and there are less decision that a programmer has to make. This is evident when using the .NET Framework.

The most obvious benefit is that you will have to write lesser code. Let's take case of Java Persistence API. When you define a POJO having attributes and corresponding setters/getters, it's a simple class. But the moment you annotate it with #javax.persistence.Entity it becomes an entity object (table) which can get persisted in DB. Now this was achieved by just a simple annotation, no other config file.
Another plus point is, all your logic is at one place and in one language (i.e. you get rid of separate xml).

I think this wikipedia article has explained it very well:
Convention over configuration (also known as coding by convention) is
a software design paradigm used by software frameworks that attempts
to decrease the number of decisions that a developer using the
framework is required to make without necessarily losing flexibility.
The concept was introduced by David Heinemeier Hansson to describe the
philosophy of the Ruby on Rails web framework, but is related to
earlier ideas like the concept of "sensible defaults" and the
principle of least astonishment in user interface design.
The phrase essentially means a developer only needs to specify
unconventional aspects of the application. For example, if there is a
class Sales in the model, the corresponding table in the database is
called "sales" by default. It is only if one deviates from this
convention, such as the table "product sales", that one needs to write
code regarding these names.
When the convention implemented by the tool matches the desired
behavior, it behaves as expected without having to write configuration
files. Only when the desired behavior deviates from the implemented
convention is explicit configuration required.

Is it possible to use policy based design together with automated testing?

I am developing a numerical simulations library which is centred around a single collection of data operated on by different computational algorithms. The algorithms are complex, they have different states involving multiple parameters, and are interchangeable (under some semantic restrictions).
To avoid bloated interface of the collection and to enable different implementations etc, I'm thinking about using policy based design. This gives the collection a wide combination of choices between storage structures, algorithms, parameters, internal stuff.
If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures? Conceptually I need to define the set of policies and a set of verification test cases and execute a parametric study.
This is easy when object oriented programming is used since I can determine all necessary types and their parameters during run-time using e.g. a string-based Abstract Factory with type names stored in the input file, that is then changed by an external script that executes the client application on a family of test cases.
How do I do that with policies, where a combination of N policies ends up in being N different client applications?
How is automated testing done together with policy based design in a professional way?

If you're representing algorithms as policies, you /should/ have a pretty uniform interface already thought up. You could imagine an "AlgorithmPolicy" processing some data from your data store and returning some representation of the results.
"If I imagine that I redesigned my generic / object oriented existing design usign policies, how can I choose the optimal algorithms and data structures?"
If your object oriented design currently makes use of the strategy pattern (see also: the Gang of Four book), your policies will simply replace every place that you've used a strategy. Choosing "optimal algorithms" for the different policies you design will simply be a matter of nailing the right conceptual structure / interface for those policies. (If you're going to use many different data stores, make sure that the interface for adding / removing / getting data from them is uniform, for example. Here, it can be helpful to think of three examples and find commonalities... then think of another exmaple and make sure it fits the schema. Iterate until things feel correct.)
You'll still have adequate type checking, it'll just feel a bit different (and you may run into some nasty compile errors occaisionally. ;)
Testing will simply be a matter of writing some unit tests for each of the configurations / policy combinations you'd like to cover. You probably should already be writing these tests anyways; the primary difference is that you'll want to try to hit the interfaces you designate rather than targetting specifics.
You can validate different storage methods based on validations of your algorithm policies. (So, if I have some algorithm that can be stored in different ways, I can run the algorithm on some test data for ecah storage mechanism and expect the same results.) Assuming that you've spec'd out the inteface correclty, you should only need to write a single test for each additional storage mechanism you add.
Again: It'd be nice to have more details about the structure of the program, what different parameters and such you'd need to pass in. (Is any of this code open source / going to be open sourced?)
From what you've said, in my mind, your complicated-policy process may have an interface like so:
FancyDataStore.Process()
For testing it, I'd write:
MockAlgorithmPolicy - A very simple algorithm that's trivial to validate.
MockInternalStuffPolicy - A very simple internal stuff policy that causes no integrations / reports nothing new.
MockStoragePolicy - A very simple storage policy that meets your interface for storage / doesn't cause many issues.
Write a test that validates the mocks put together...
For each StoragePolicy you create, write an automated test to validate it:
testSomeStoragePolicy{
// has a call to:
FancyDataStore.Process<MockAlgorithmPolicy, SomeStoragePolicy, MockInternalStuff>()
// validate...
}
That should prove that the SomeStoragePolicy works as expected.
Then, for your algorithms, you could write:
testSomeAlgorithmPolicy{
FancyDataStore.process<SomeAlgorithmPolicy, MockStoragePolicy, MockInternalStuff>();
///Validate.
}
etc.
This way, you write basically 1 test per each policy you end up writing (which seems feasible and not too ridiculous) Additionally, you can always add additional unit tests to cover other subtle integrations that may spin up over time.
If you're looking for good books on this subject, I'd suggest reading "Modern C++ Programming"; it provides a great primer on policy-driven design in C++.

Different REST resource content based on user viewing privileges

I want to provide different answers to the same question for different users, based on the access rights. I read this question:
Excluding private data in RESTful response
But I don't agree with the accepted answer, which states that you should provide both /people.xml and /unauthenticated/people.xml, since my understanding of REST is that a particular resource should live in a particular location, not several depending on how much of its information you're interested in.
The system I'm designing is even more complicated than that one. Let's say that a user has created a number of circles of friends, and assigned different access rights to them. For example, my "acquaintances" circle might have access to my birthday, and my "professional" circle might have access to my employment history, but not the other way around. In order to apply the answer from the question I mentioned, I need to have a way of getting all of the user's circles (which I might want to keep secret for security reasons), and then go through /circles/a/users/42, /circles/b/users/42, /circles/c/users/42 and so on, and then merge the results to display the maximum amount of information available. Obviously there's not necessarily a single circle that gets all the information that any of the other circles get. I believe this is tricky enough (note that I probably need to do this with several kinds of objects and that future versions might require a different procedure), but what if I want to impose security restrictions on a particular user despite the fact that he's also in some of my circles? Can that problem even be solved? Even if I refuse to respond to any of the above-mentioned queries and come up with a new one that could give me an answer, it'd still reveal the fact that this specific user is treated differently due to individual access restrictions.
What am I missing here? Is it even possible for me to develop a RESTful web service?
If the conclusion is that the behavior is not RESTful, would this still constitute as a situation where it'd be morally okay to break the REST contract? If so, what are the negative implications? Do I risk proxy caching issues for example?

According to Fielding's dissertation (it really is a great writing):
A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
In other words, if you have a resource that is defined as "the requesting user's assigned projects" and representations thereof accessible by a URI of /projects, you do not violate any constraints of REST by returning one list of projects (i.e., representation) for user A and another (representation) for user B when they GET that same URI. In this way, the interface is uniform/consistent.
In addition to this, REST only prescribes that an explicit caching instruction be included with the response, whether that is 'cache for this long' or 'do not cache at all':
Cache constraints require that the data within a response to a request be implicitly or explicitly labeled as cacheable or non-cacheable.
How you choose to manage that is up to you.
Keeping that in mind,
You should feel comfortable returning a representation of a resource that varies depending on the user requesting a representation of a particular resource, as long as you are not violating the constraints of a uniform interface -- don't use a single resource identifier to return representations of different resources.
If it helps, consider that the server responds with varying representations of a resource as well -- XML or JSON, French or English, etc. The credentials sent with the request are just another factor the server is able to use in determining which representation to to send in response. That's what the header section is there for.

I agree that the other solution doesn't seem right. It makes the URL structure complicated and more difficult to find the resource. However, if you did REST properly, it shouldn't matter what the URL for the resource is as the server controls it (and is free to relocate it as it sees fit). If your client is really "REST", it would discover the resources it needed through prior negotiation with the server. So the path truly would not matter on the client. I don't like it because its confusing to use - not because of some violation of REST principles.
But that probably doesn't answer your question -
What you didn't mention is your security setup - presumably you are a passing a session token with the request as part of the request header. So your back-end processing should have the ability to tie it to particular set of security constraints. From there, you form the list with whatever business logic you need and return a limited resource based on the user's security tied to the session.
For the algorithm itself, one usually implements a least or most restrictive type algorithm that merges the allowable data into a response (very similar to java realms or Microsoft's user security model).
If the data is structured differently for the restricted/non-restricted case, you could create two different representations of the data and return which ever one the user was authorized to see. The client should be asking for the accepted mime response types anyway, and it would just provide different answers based on the session security in the request header. Alternatively, you could provide optional elements with the representations and fill out the appropriate one base on authorization (although this is a little hack-ee in my opinion).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js