REST API - Update of single resource changes multiple others - web-services

I'm looking for a way how to deal with a following problem:
Imagine you modify a resource and that subsequently causes update of other resources.
E.g. you issue a PUT to, say /api/orders/1234, which by definition changes state of all other Orders of given user. There may be UI clients that display the table of Orders and they should know that not only single item in the table was updated, but eventually other as well.
Now, is there any standard way how inform a clients about such a situation?
So far I can only think of sending back the 205 Reset Content HTTP status code to inform the client that he should refresh the state, as not just a single thing was changed.

There are multiple solutions.
You can define specific resources as non-cacheable, so the client does not cache them at all. (no-store)
You can try giving a max-age of 0, so the client will have to re-validate those resources always. In this case you might have to implement ETags and conditional GETs, but it will be easier on the server than option 1.
Some push method like WebSockets.
If you really want to "notify" potentially multiple clients of a change, then it sounds like you would need option 3.
However, correctly configured caching is normally good enough. For example you could mark not-yet-executed orders as not cached (max-age=0), but as soon as it is executed, you might mark it to be cached indefinitely, since it can not change anymore.

Related

How could I modify django-tracking2 so users can opt out of tracking

I'm making a website right now and need to use django-tracking2 for analytics. Everything works but I would like to allow users to opt out and I haven't seen any options for that. I was thinking modifying the middleware portion may work but honestly, I don't know how to go about that yet since I haven't written middleware before.
I tried writing a script to check a cookie called no_track and if it wasn't set, I would set it to false for default tracking and if they reject, it sets no_track to True but I had no idea where to implement it (other than the middle ware, when I tried that the server told me to contact the administrator). I was thinking maybe I could use signals to prevent the user being tracked but then that would slow down the webpage since it would have to deal with preventing a new Visitor instance on each page (because it would likely keep making new instances since it would seem like a new user). Could I subclass the Visitor class and modify __init__ to do a check for the cookie and either let it save or don't.
Thanks for any answers, if I find a solution I'll edit the post or post and accept the answer just in case someone else needs this.
I made a function in my tools file (holds all functions used throughout the project to make my life easier) to get and set a session key. Inside the VisitorTrackingMiddleware I used the function _should_track() and placed a check that looks for the session key (after _should_track() checks that sessions is installed and before all other checks), with the check_session() function in my tools file, if it doesn't exist, the function creates it with the default of True (Track the user until they accept or reject) and returns an HttpResponse (left over from trying the cookie method).
When I used the cookie method, the firefox console said the cookie will expire so I just switched to sessions another reason is that django-tracking2 runs on it.
It seems to work very well and it didn't have a very large impact on load times, every time a request is made, that function runs and my debug tells me if it's tracking me or not and all the buttons work through AJAX. I want to run some tests to see if this does indeed work and if so, maybe I'll submit a pull request to django-tracking2 just in case someone else wants to use it.
A Big advantage to this is that you can allow users to change their minds if they want or you can reprompt at user sign up depending on if they accepted or not. with the way check_session() is set up, I can use it in template tags and class methods as well.

Event Sourcing: concurrently creating conflicting events

I am trying to implement an Event Sourcing system using Kafka and have run into the following issue. During a new user sign-up I want to check if the username the user provided is already taken. However, consider the case where 2 users are trying to sign-up at the same time providing the same username.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB). The problem is that the validation of the request is done against the materialized view but the actual persistence to it happens later. So because the 2 requests are being processed in parallel (by different service instances) they might both pass the validation, resulting in 2 NewUser messages. However, when the second controller tries to persist those 2 NewUser messages in the database saving the second event will fail because of the violation of the uniqueness constraint for the username.
Any ideas on how to address this?
Thanks.
UPDATE:
In particular, I would like to verify whether the following are accepted approaches to the problem:
use the username as the userId (restrictive)
send an event to a topic partitioned by username and when validation
is done send an event to another topic
Initial validation against the materialized view won't be enough in most scenarios where you have constraints. There can always be some relevant events haven't been materialized yet. There are two main concurrency control approaches to ensure that correct results are generated:
1. Pessimistic approach:
If you want to validate constraints before you publish an event, you need to lock relevant resources (entity, aggregate or data set). The locking means your services must not be able to publish events on these resources. After this point, to get the current state of your data:
You can wait until all events published before locking are materialized.
You can read current state from the database and apply events on it in a separate process.
2. Optimistic approach:
In this approach, you perform your validations after publishing events. To achieve this, you need to implement a feedback mechanism. The process which consumes events and performs validations should be able to publish validation results. You can perform the validations in-memory when possible. Otherwise, you can rely on your materialized data store.
Martin Kleppman talks about a two-step solution for exactly the same problem here and in his book. In this solution, there are two topics: "claims" and "registrations". First, you publish a claim to take the username, then try to write it to the database, and finally publish the result to the registrations topic. At conceptual level, it follows the same steps in the second approach you have mentioned. In validation step, it avoids implementing validation logic and keeping secondary indexes in memory by relying on the database.
During a new user sign-up I want to check if the username the user provided is already taken.
You may want to review Greg Young's essay on Set Validation.
In my understanding of how ES works the controller that processes the sign-up request will check if the request is valid, it will then send a new event (e.g. NewUser) to Kafka, and finally that event will be picked up by another controller which will persist it in a materialized view (e.g. Postgres DB).
That's a little bit different from the usual arrangement. (You may also want to review Greg's talk on polyglot data.)
Suppose we begin with two writers; that's fine, but if there is going to be a single point of truth, then you are going to need synchronization somewhere.
The usual arrangement is to use a form of optimistic concurrency; when processing a request, you reserve a copy of your original state, then you do your calculation, and finally you send the book of record a `replace(originalState,newState)'.
So at this point, we have two writes racing toward the book of record
replace(red,green)
replace(red,blue)
At the book of record, the writes are processed in series.
[...,replace(red,blue)...,replace(red,green)]
So when the book of record processes replace(red,blue), it performs a check that yes, the state is currently red, and swaps in blue. Later, when the book of record tries to process replace(red,green), the book of record performs the check, which fails because the state is no longer red.
So one of the writes has succeeded, and the other fails; the latter can propagate the failure outwards, or retry, or..., precisely what depends on the specific mechanics in question. A retry should mean, of course, reload the "original state", at which point the model would discover that some previous edit already claimed the username.
Any ideas on how to address this?
Single writer per stream makes the rest of the problem pretty simple, by eliminating the ambiguity introduced by having multiple in memory copies of the model.
Multiple writers using a synchronous write to the durable store is probably the most common design. It requires an event store that understands the idea of writing to a specific location in a stream -- aka "expected version".
You can perform an asynchronous write, and then start doing other work until you get an acknowledgement that the write succeeded (or not, or until you time out, or)....
There's no magic -- if you want uniqueness (or any other sort of invariant enforcement, for that matter), then everybody needs to agree on a single authority, and anybody else who wants to propose a change won't know if it has been accepted without getting word back from the authority, and needs to be prepared for a rejected proposal.
(Note: this shouldn't be a surprise -- if you were using a traditional design with current state stored in a RDBMS, then your authority would be a user table in the database, with a uniqueness constraint on the username column, and the race would be between the two insert statements trying to finish their transaction first....)

WS2ESB: Store state between sequence invocations

I was wondering about the proper way to store state between sequence invocations in WSO2ESB. In other words, if I have a scheduled task that invokes sequence S, at the end of iteration 0 I want to store some String variable (lets' call it ID), and then I want to read this ID at the start (or in the middle) of iteration 1, and so on.
To be more precise, I want to get a list of new SMS messages from an existing service, Twilio to be exact. However, Twilio only lets me get messages for selected days, i.e. there's no way for me to say give me only new messages (since I last checked / newer than certain message ID). Therefore, I'd like to create a scheduled task that will query Twilio and pass only new messages via REST call to my service. In order to do this, my sequence needs to query Twilio and then go through the returned list of messages, and discard messages that were already reported in the previous invocation. Now, to do this I need to store some state between different task/sequence invocations, i.e. at the end of the sequence I need to store the ID of the newest message in the current batch. This ID can then be used in subsequent invocation to determine which messages were already reported in the previous invocation.
I could use DBLookup and DB Report mediators, but it seems like an overkill (using a database to store a single string) and not very performance friendly. On the other hand, as far as I can see Class mediators are instantiated as singletons, therefore I could create a custom Class mediator that would manage this state and filter the list of messages to be sent to my service. I am quite sure that this will work, but I was wondering if this is the way to go, or there might be a more elegant solution that I missed.
We can think of 3 options here.
Using DBLookup/Report as you've suggested
Using the Carbon registry to store the values (this again uses DBs in the back end)
Using a Custom mediator to hold the state and read/write it from/to properties
Out of these three, obviously the third one will deliver the best performance since everything will be in-memory. It's also quite simple to implement and sometime back I did something similar and wrote a blog post here.
But on the other hand, the first two options can keep the state even when the server crashes, if it's a concern for your use case.
Since esb 490 you can persist and read properties from registry using property mediator.
https://docs.wso2.com/display/ESB490/Property+Mediator

How do you perform service-oriented parent-child transactions?

Example:
A SalesOrder is composed of a SalesOrderHeader and one or more SalesOrderItems. When editing an existing SalesOrder, the SalesOrderHeader can be modified and SalesOrderItems can be added, modified and deleted. All changes must be saved in a single transaction. Multiple users may edit the SalesOrder at the same time with optimistic concurrency.
I believe that the requirement to have the save done in a single transaction encourages us to communicate both the SaleOrderHeader and the SalesOrderItems in a single service call. The implication of packaging up the child data with its parent is that there will need to be some understanding as to whether the child data is added, modified or deleted.
Change tracking of the child entities can happen either on the server or on the client.
Change tracking on the server
The idea with this strategy is that the client can modify the SalesOrder to its will without tracking which SalesOrderItems are added, modified or deleted. The state of the SalesOrderItems will be determined on the server when the save service is called.
The server should remain stateless between service calls. This means that the server can’t retain any information about the state of the SalesOrder between its retrieval and its eventual save. The only option left if for the server to determine the state of its entities by comparing the modified object graph to the database object graph.
With nHibernate, there is a merge function to accomplish this. With Entity framework, the highest voted feature request is to have this added. There’s also an open source implementation of this for EF called GraphDiff.
This sounds great in theory because it makes the services very easy to design and use. However, I see two major issues with this strategy. The first is performance. The entire object graph must be sent back on every save. Whether or not a SalesOrderItem was modified, it must be sent back or the server will assume it’s been deleted. The second problem is even more critical and it has to do with concurrency. If User 1 adds a SalesOrderItem to a SalesOrder and User 2 makes a change to the same SalesOrder, when User 2 saves the server will assume that the SalesOrderItem added by User 1 should be deleted because it was not included in User 2’s object graph. I don’t see a way this can be prevented in any implementation of server side change tracking.
Change tracking on the client
The alternative is to have the client track changes to its entities and communicate that state when calling the save service. One benefit is that the client does not need to send its unchanged child entities. This helps with performance. A downside is that all entities will need an additional property named something along the lines of “ObjectState” to track whether it’s added, modified or deleted. This makes the entity models on the server quite messy and filled with concerns unrelated to the business domain. This also puts onus on the different consumers of the service to maintain this state. Another problem is that it becomes difficult to deal with deleted entities. Should the SalesOrderHeader maintain a list of deleted SalesOrderItems? or should the SalesOrderItems get assigned a state of deleted which must be filtered out by the client UI?
I know that breeze javascript library has its own implementation of client-side entity tracking but my concern is that its implementation requires both client-side and server-side components. Shouldn't the service layer isolate which technology we use on either side? What if non-javascript clients want to use my services?
Question
I would think this is a common scenario that should be addressed by the majority of service implementations. Have I made any incorrect assumptions or am I doing anything out or the ordinary? What strategy have you implemented? Are there any reasonable alternatives?
Full disclosure: I work with Breeze, and I think change tracking on the client is the way to go. Change tracking on the client allows stateless servers, reduces traffic between the client and server, and allows offline use.
In Breeze, the "ObjectState" that you mention is called the EntityAspect, and each entity has one, but it is not part of the domain model. The server-side entities don't need an EntityAspect, but the server-side service has to know how to handle the entity state information that comes from the client.
Basically, the service needs to create, update, or delete entities based on the information coming from the client. There are existing server-side backends for Breeze that do all this already (in .NET (EF and NHibernate), Java, PHP, Node, and Ruby), but you can also write your own. Your server just needs to know how to talk to the client.
Let's say we've updated a SalesOrder and added a new SalesOrderItem. The Breeze client sends a save bundle that looks something like this:
{
"entities": [
{
"Id": 123,
"Title": "My Updated Title",
"OrderDate": "2014-08-03T07:00:00.000Z",
"entityAspect": {
"entityTypeName": "SalesOrder:#My.DomainModel",
"entityState": "Modified",
"originalValuesMap": {
"Title": "My Original Title"
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
},
{
"Id": -1,
"SalesOrderId": 123,
"ProductId": 456,
"Quantity": 11,
"entityAspect": {
"entityTypeName": "SalesOrderItem:#My.DomainModel",
"entityState": "Added",
"originalValuesMap": {
},
"autoGeneratedKey": {
"propertyName": "Id",
"autoGeneratedKeyType": "Identity"
}
}
}
]
}
Here, SalesOrder with Id# 123 has been modified (its Title has been changed). The entityAspect includes the originalValuesMap which shows what the previous Title was.
The server would need to update the existing SalesOrder with the new value. Whether the server needs to query the existing SalesOrder from the database before applying the changes is implementation-dependent.
A new SalesOrderItem has been added. A temporary Id, -1, was created for it on the client. The server needs to create and persist a new SalesOrderItem and generate a real Id for it.
The response from the server should contain the entities that were created and updated, and KeyMapping information that shows what server-generated keys map to the temporary client-side keys, so that the client can replace them.
Change tracking is not a simple problem, but Breeze tries to do the hard parts for you.
I'd like to piggy back on Steve's answer.
We should be clear: the onus for implementing the Order-graph (AKA "Order aggregate") transaction in a relational data model falls on the developer. BreezeJS (and Breeze helpers for .NET servers) can facilitate but you have to make it work.
The key to making this work is including the root element of the aggregate - the Order - in all changes to any entity within the aggregate. If you add, delete, or modify an OrderItem, make sure you modify the Order at the same time .
How? By bumping the Order's concurrency property (e.g, the rowVersion) and making sure that Breeze KNOWS this is your concurrency property.
You must implement root entity optimistic concurrency if you want to ensure Order aggregate consistency.
Now you can detect if someone else has made a change to any part of the Order aggregate. That could be a change to the Order or an add/mod/delete of one of its OrderItems.
You do not have to include all OrderItems in the change-set when you save a changed Order aggregate. You only need to include the OrderItems that are added/modified/deleted.
Of course some other user may make a change to the Order aggregate before you save yours. When you try to save yours, the save will fail with an optimistic concurrency error.
Upon detecting an optimistic concurrency error for an Order, make sure the client removes the entire order aggregate from cache - the Order and all of its OrderItems - and then re-fetch the aggregate Don't just re-fetch the root Order entity and start messing with its items. Make sure you remove the entire aggregate from cache and then re-fetch it (the order and its items).
If everyone follows this protocol you'll be in fine shape on the server.

Sharing data across Sitecore pipelines

I´m trying to perform some actions in the pipeline "httpRequestBegin" only when necessary.
My processor is executed after Sitecore resolves the user (processor type="Sitecore.Pipelines.HttpRequest.UserResolver, Sitecore.Kernel" ), as i´m resolving the user too if Sitecore is not able to resolve it first.
Later, i want to add some rendering in the pipeline "insertRenderings", only if actions in the previous pipeline were executed (If i resolved the user, show a message), so i´m trying to save some "flag" in the first step, to check in the second.
My question is, where can I store that flag? I´m trying to find some kind of "per request" cache...
So far, I've tried:
The session: Wrong, it's too early, session doesn't exists yet.
Items (HttpContext.Current.Items): It doesn't work either, my item is not there on the seconds step.
So far i'm using the application cache (HttpContext.Current.Cache) with some unique key, but I don´t like this solution.
Anybody body knows a better approach to share this "flag"?
You could add a flag to the request header and then check it's existence in the latter pipelines, e.g.
// in HttpRequest pipeline
HttpContext.Current.Request.Headers.Add("CustomUserResolve", "true");
// in InsertRenderings pipeline
var customUserResolve = HttpContext.Current.Request.Headers["CustomUserResolve"];
if (Sitecore.MainUtil.GetBool(customUserResolve, false))
{
// custom logic goes here
}
This feels a little dirty, I think adding to Request.QueryString or Request.Params would been nicer but those are readonly. However, if you only need this for a one time deal (i.e. only the first time it is resolved) then it will work since in the next request the Headers are back to default without your custom header added.
HttpContext.Current.Cache or HttpRuntime.Cache could be the fastest solution here. Though this approach would not preserve data when the AppPool gets recycled.
If you add only a few keys to the cache and then maintain them, this solution might work for you. If each request puts an entry into the cache, it may eventually overflow the memory used by worker process in a long run.
As alternative to this you may try to use Sitecore.Context.ClientData property. It uses ClientDataStore that employs a database (look for clientDataStore section in the web.config file) to store data. These entries can survive the AppPool recycle.
Though if you use them a lot, it may become a bottleneck under the load when you need to write to and/or read from the entries.
If you do know that there could be a lot of entries created for sharing purposes, I'd create a scheduled task to clean up the data store from obsolete entries.
I know this is a very old question, but I just want post solution I worked around
Below will hold data per http request basis.
HttpContext.Current.Items["ModuleInfo"] = "Custom Module Info"
we can store data to httpcontext in one sitecore pipeline and retrieve in another...
https://www.codeproject.com/Articles/146455/When-Can-We-Use-HttpContext-Current-Items-to-Store