Splitting an object or keeping it intact - web-services

We are receiving an object from a client's webservice that contains two properties : Postcode and Storenumber.
On our side, we need this data frequently so the object is being stored in session and in a cookie. The problem that has arisen is that as a response to some webservice calls to the third party, we will receive only an updated postcode on its own , and for others an updated storenumber.
This would mean that updating the object on our side would involve
Checking if the object exists in session
If it does - updating only the relevant property
Saving it back into the session state
I was thinking to separate the two properties, so the incoming values could used to overwrite the current values, but this feels like an odd approach as I would be separating two values that logically belong together.
I was wondering what your opinion is?

You are considering adjusting your to-storage mechanism to make updates somewhat easier. The expense you incur is to make the consumer's life harder, they would need to fetch two properties.
Generally, I would favour making the consumer's life easier at the expense of making "plumbing" code more complex.
I would always favour data models that most closely represent the business. If Postcode and StoreNumber are strongly related keep them together.

Related

Django Best Practices -- Migrating Data

I have a table with data that must be filled by users. Once this data is filled, the status changes to 'completed' (status is a field inside data).
My question is, is it good practice to create a table for data to be completed and another one with completed data? Or should I only make one table with both types of data, distinguished by the status?
Not just Django
This is actually a very good general question, not necessarily specific to Django. But Django, through easy use of linked tables (ForeignKey, ManyToMany) is a good use case for One table.
One table, or group of tables
One table has some advantages:
No need to copy the data, just change the Status field.
If there are linked tables then they don't need to be copied
If you want to remove the original data (i.e., avoid keeping redundant data) then this avoids having to worry about deleting the linked data (and deleting it in the right sequence).
If the original add and the status change are potentially done by different processes then one table is much safer - i.e., marking the field "complete" twice is harmless but trying to delete/add a 2nd time can cause a lot of problems.
"or group of tables" is a key here. Django handles linked tables really well, so but doing all of this with two separate groups of linked tables gets messy, and easy to forget things when you change fields or data structures.
One table is the optimal way to approach this particular case. Two tables requires you to enforce data integrity and consistency within your application, rather than relying on the power of your database, which is generally a very bad idea.
You should aim to normalize your database (within reason) and utilize the database's built-in constraints as much as possible to avoid erroneous data, including duplicates, redundancies, and other inconsistencies.
Here's a good write-up on several common database implementation problems. Number 4 covers your 2-table option pretty well.
If you do insist on using two tables (please don't), then at least be sure to use an artificial primary key (IE: a unique value that is NOT just the id) to help maintain integrity. There may be matching id integer values in each table that match, but there should only ever be one version of each artificial primary key value between the two tables. Again, though, this is not the recommended approach, and adds complexity in your application that you don't otherwise need.

Alternative approach to sending a lot of parameter data on GET

I am creating a REST-API and am encountering a problem where the client needs to get a calculation based on a lot of different parameters.
This GET operation might not be a part of any Save or Update operations (before the GET or after), and can happen in a stateless manner.
Due to this the GET URL can be very long (and even exceed the maximum allowed by the browser).
I have looked in other posts here in SO and elsewhere and it is discouraged to use a body in GET requests. But whats most important about all these posts is that none of them give an alternative to this problem they just state that something is flawed in the design ETC...ETC...
Well nothing is wrong with the design here. its a stateless calculation built on a lot of parameters.
I would like some alternatives. Thank you.
nothing is wrong with the design here
There is. From Wiki:
An important concept in REST is the existence of resources (sources of specific information), each of which is referenced with a global identifier (e.g., a URI in HTTP).
Your calculation parameters have nothing to do with the underlying resource identified by the URL you make the request to. You're not requesting an existing resource (as that's what GET is for, depending on how you're willing to interpret REST), but some calculations will be done based on some input. This is a Remote Procedure Call, not a REST call.
You can change your approach by modeling a Calculation, so you send a POST /Calculations/ request with all your parameters.
There's no requirement for a POST call to change server state (i.e. store the results):
httpbis-draft, POST (which is somewhat better worded and updated than RFC 2616):
The POST method requests that the target resource process the
representation enclosed in the request according to the resource's
own specific semantics.
POST is used for (among others): providing a block of data, such as the fields entered into an HTML
form, to a data-handling process;
So you can just return the calculation results along with a 200, or you can store them and return a 200, 201 or 204, containing or pointing to the calculation results, so you can retrieve them later, using GET /Calculations/$id.
As far as I can tell, the only solution you have left is to break the rules of REST and use a POST request. POST can have an arbitrary number of arguments, but it's meant for a "modification" operation in REST.
Like everything in software, the rules are there to help you avoid mistakes. But if the rules prevent you from solving your problems, you need to modify them a little bit (or modify them for a well defined part of your code).
Just make sure that everyone understands how you changed the rules, where the new rules apply (and where they don't). Otherwise, the next guy will "fix" your "broken" code with his simple test cases.
So you want a safe HTTP method that accepts a payload. Have a look at http://greenbytes.de/tech/webdav/draft-ietf-httpbis-method-registrations-14.html - both SEARCH and REPORT are theoretical candidates, if you can live with the WebDAV baggage they come with.
An alternative would be to start work on either generalizing these, or defining something new (but don't forget that definitions of new HTTP methods need IETF review, see http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p2-semantics-25.html#considerations.for.new.methods.

Why perform transformations in middleware?

A remote system sends a message via middleware (MQ) to my application.
In middleware a transformation (using xslt) is applied to this message. It is just reformatted and there is no enrichment nor validation. My system is the only consumer of this transformed message and the xslt is maintained by my team.
The original author of all of this has long gone and I am wondering why he thought it was a good idea to do the transformation in middleware rather than in my app. I can't see the value in moving this to middleware, it makes it less visible and less simple to maintain.
Also I would have thought that the xslt would be maintained by the message producer not the consumer.
Are there any guidelines for this sort of architecture? Has he done the right thing here?
It is a bad idea to modify a message body in the middleware. This negatively affects the maintainability and performance.
The only reason of doing this is trying to connect two incompatible endpoints without modifying them. This would require the transformation of the source content to be understood by the destination endpoint.
The motivation to delegate middleware to perform transformation could be a political one (endpoints are maintained by different teams, management is reluctant to touch the endpoint code, etc.).
If you are trying to create an application architecture where there is a need to serve data to different users in different formats, and perhaps receive data in different formats (think weather reports, or sports news), then creating a hub capable of doing the transformations between many different formats makes excellent sense. (Whether you call that "middleware" is up to you.) Perhaps your predecessor had this kind of architecture in mind, but it never grew big or complex enough to justify the design.
From a architectural point of view, it's a good idea to provide consumers of messages or content that is in a humanly readable format, e.g. xslt, unless there is a significant performance gain in using a binary format.
In the humanly readable format case, one simply has to look at the message to verify that it is correct. In the binary case, one would have to develop a utility to tranform binary message into a humanly readable form. Different implementers of such a utility may not always interpret the binary form as intended and it may turn into a finger pointing exercise as to who or what is correct.
Also, if one is looking at what's in the queue, it is easier to make sense of it if the messages are in a humanly readable format.
It doesn't hurt to start with humanly readable format and get the app working first. Then profile the app and see if in the big picture the transformation routines are significant sources of delay. If yes, then go to a binary format.
It would have been preferable to have the original message producer provide messages in xslt format, but they must have had good reasons for doing what they did when they did it. E.g potentially other consumers, xslt didn't exist then, resource constraints, etc.
Read about the adaptor design pattern and you will understand the intent of the current system architecture.

Different REST resource content based on user viewing privileges

I want to provide different answers to the same question for different users, based on the access rights. I read this question:
Excluding private data in RESTful response
But I don't agree with the accepted answer, which states that you should provide both /people.xml and /unauthenticated/people.xml, since my understanding of REST is that a particular resource should live in a particular location, not several depending on how much of its information you're interested in.
The system I'm designing is even more complicated than that one. Let's say that a user has created a number of circles of friends, and assigned different access rights to them. For example, my "acquaintances" circle might have access to my birthday, and my "professional" circle might have access to my employment history, but not the other way around. In order to apply the answer from the question I mentioned, I need to have a way of getting all of the user's circles (which I might want to keep secret for security reasons), and then go through /circles/a/users/42, /circles/b/users/42, /circles/c/users/42 and so on, and then merge the results to display the maximum amount of information available. Obviously there's not necessarily a single circle that gets all the information that any of the other circles get. I believe this is tricky enough (note that I probably need to do this with several kinds of objects and that future versions might require a different procedure), but what if I want to impose security restrictions on a particular user despite the fact that he's also in some of my circles? Can that problem even be solved? Even if I refuse to respond to any of the above-mentioned queries and come up with a new one that could give me an answer, it'd still reveal the fact that this specific user is treated differently due to individual access restrictions.
What am I missing here? Is it even possible for me to develop a RESTful web service?
If the conclusion is that the behavior is not RESTful, would this still constitute as a situation where it'd be morally okay to break the REST contract? If so, what are the negative implications? Do I risk proxy caching issues for example?
According to Fielding's dissertation (it really is a great writing):
A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.
In other words, if you have a resource that is defined as "the requesting user's assigned projects" and representations thereof accessible by a URI of /projects, you do not violate any constraints of REST by returning one list of projects (i.e., representation) for user A and another (representation) for user B when they GET that same URI. In this way, the interface is uniform/consistent.
In addition to this, REST only prescribes that an explicit caching instruction be included with the response, whether that is 'cache for this long' or 'do not cache at all':
Cache constraints require that the data within a response to a request be implicitly or explicitly labeled as cacheable or non-cacheable.
How you choose to manage that is up to you.
Keeping that in mind,
You should feel comfortable returning a representation of a resource that varies depending on the user requesting a representation of a particular resource, as long as you are not violating the constraints of a uniform interface -- don't use a single resource identifier to return representations of different resources.
If it helps, consider that the server responds with varying representations of a resource as well -- XML or JSON, French or English, etc. The credentials sent with the request are just another factor the server is able to use in determining which representation to to send in response. That's what the header section is there for.
I agree that the other solution doesn't seem right. It makes the URL structure complicated and more difficult to find the resource. However, if you did REST properly, it shouldn't matter what the URL for the resource is as the server controls it (and is free to relocate it as it sees fit). If your client is really "REST", it would discover the resources it needed through prior negotiation with the server. So the path truly would not matter on the client. I don't like it because its confusing to use - not because of some violation of REST principles.
But that probably doesn't answer your question -
What you didn't mention is your security setup - presumably you are a passing a session token with the request as part of the request header. So your back-end processing should have the ability to tie it to particular set of security constraints. From there, you form the list with whatever business logic you need and return a limited resource based on the user's security tied to the session.
For the algorithm itself, one usually implements a least or most restrictive type algorithm that merges the allowable data into a response (very similar to java realms or Microsoft's user security model).
If the data is structured differently for the restricted/non-restricted case, you could create two different representations of the data and return which ever one the user was authorized to see. The client should be asking for the accepted mime response types anyway, and it would just provide different answers based on the session security in the request header. Alternatively, you could provide optional elements with the representations and fill out the appropriate one base on authorization (although this is a little hack-ee in my opinion).

Web Service ‘mandatory/optional’ fields: XSD Design time vs Runtime

We are currently building a pile of SOAP Web Service to front the access of various backend systems.
While defining our Request/Response message XML, we see multiple services needing the ‘Account’ object with different ‘mandatory/optional’ fields.
How should we define and enforce the validation of these ‘mandatory/optional’ fields on the same Message? I see these options
1) Enforce validation with XSD by creating different 'Account' Complexe Type
Pros : Design time clarity.
Cons : proliferation of Object Type, Less reuse of Object,
2) Enforce validation with XSD by Extending+Restriction a single base 'Account' type
Pros : Design time clarity.
Cons : Not sure of the support of the Extend+Restriction feature (java, .Net)
3) Using a single 'Account' type and enforcing validation in runtime (ie in the Code).
Pros: Simple
Cons: No design time validation. Need to communicate field requirements via a specification doc.
What are you’re thoughts on that?
I would have to assume that: i) some of what you would call optional fields are actually fields that are not applicable (don't make sense) to all accounts and ii) we're not talking trivial scenarios (like two type of accounts with 2 fields each-kind of thing).
Firstly, I would say that unless you're really lucky, from a requirements perspective, then you're going to end up with some sort of "validation in runtime" no matter what option you're going with. XML Schema can't express some common data validation requirements, such as cross field validation; or simply because the data in your XML is not sufficient to feed the rules to validate the integrity of the message (the data in the message being a subset on what's available at the time the XML is being un/marshalled).
Secondly, I would avoid deriving new complex types through restricton; from an authoring perspective you don't achieve much in terms of reuse, and you might end up with problems in how that is interpreted by your XSD to code tooling. I like to think that the original intention of deriving through restriction was to provide a tool for people to use in xsd:redefine scenarios; for people that wouldn't want to fiddle with XML Schemas that were authored by someone else. If one owns (authors) the schema, one can work around the need to restrict by defining the "lesser" object first and extend from that.
As to the "proliferation of objects", you are kind of getting that with option #2 as well (when compared with #1); what I mean by that, all the tools I know will create a class for each named (global) complex type you have in your XSD; so if you have to have three type of accounts, you'll have three for scenario #1, and four, or so, if you choose to extend from one, or so, base classes; a worst case scenario for the later would be when you need three specializations (concrete if you wish); anyway, from my experience, the difference in real life scenarios is not something that would really tip the decision one way or the other.
Extending base types in XML Schema is good for reuse; however, reuse brings coupling; if you're analysing this from a forward/backward compatibility point of view, extending something in the base type could mess up some of the unmarshalling (deserialization) of the XML for clients of your service(s) that don't want to change their code base, yet you want to maintain only one Web Service endpoint for all; in this case, a forward-compatibility strategy that relies on an xsd:any at the end of a compositor (xsd:sequence) would be rendered useless in your first release that goes and extends your base type.
There is even more; because of this, I don't think there's a correct answer, just for the criteria you seem to imply by setting your pro/cons.
All of my preferred options below assume that you put high value on the requirement to ensure forward/backward compatibility of your services, and you want to minimize the cost of your clients having to deal with your services (because of XML Schema changes).
I would say that if all your domain (accounts in particular) can be fully modeled (assume no future change basically) and that there is enough commonality to justify reuse, then go with option #2. Otherwise, go with option #1 since I have yet to see things that don't change...
If the modeling of your domain can be done 80% or more (or some number that you think is high) and that there is enough commonality to justify reuse, then I would still go with option #2, with the caveat that any future extensions for common attributes across accounts, must be applied for each individual account (basically turning your option into a hybrid, by doing #1).
For anything else, I would go #1. Whew, I can't believe I wrote all of this...