In a web service that I am working on, a user's data needs to be updated in the background - for example pulling down and storing their tweets. As there may be multiple servers performing these updates, I want to ensure that only one can update any single user's data at one time. Therefore, (I believe) I need a method of doing an atomic read (is the user already being updated) and write (no? Then I am going to start updating). What I need to avoid is this:
Server 1 sends request to see if user is being updated.
Server 2 sends request to see if user is being updated.
Server 1 receives response back saying the user is not being updated.
Server 2 receives response back saying the user is not being updated.
Server 1 starts downloading tweets.
Server 2 starts downloading the same set of tweets.
Madness!!!
Steps 1 and 3 need to be combined into an atomic read+write operation so that Step 2 would have to wait until Step 3 had completed before a response was given. Is there a simple mechanism for effectively providing a "lock" around access to something across multiple servers, similar to the synchronized keyword in Java (but obviously distributed across all servers)?
Take a loot at Dekker's algorithm, it might give you an idea.
http://en.wikipedia.org/wiki/Dekker%27s_algorithm
Related
I have an issue with concurrent requests to (any) one API endpoint (ASP.NET Web API 2). When the second request starts to get processed before the first one is completed there can be concurrency issues with database entities.
Example:
1: Request R1 starts to be processed, it reads entities E1 and E2 into memory
2: Request R2 starts to be processed, it reads entities E2 and E1 into memory
3: Request R1 updates object E1
4: Request R1 creates E3, if E1 and E2 have been updated
5: Request R2 updates object E2
6: Request R2 creates E3, if E1 and E2 have been updated
The problem is that both requests read the entities E1 and E2 before the other request updated it. So both see outdated, non-updated versions and E3 is never created.
All requests are processed in a single database transaction (isolation level "read committed", Entity Framework 6 on SQL Server 2017).
In my current understanding, I don't think the issue can be solved on the data access/data store level (e.g. stricter database isolation level or optimistic locking) but needs to be higher up in the code hierarchy (request level).
My proposal is to pessimistically lock all entities that could be updated when entering an API method. When another API request is started that needs a write-lock on any entitiy that is already locked it needs to wait until all previous locks are released (queue). A request is normally processed in less than 250ms.
This could be implemented with a custom attribute that decorates the API methods: [LockEntity(typeof(ExampleEntity), exampleEntityId)]. A single API method could be decorated with zero to many of these attributes. The locking mechanism would work with async/await if possible or just put the thread to sleep. The lock synchronization needs to work across multiple servers so the easiest option I see here is a representation in the applications data store (single, global one).
This approach is complex and has some performance disadvantages (blocks request processing, performs DB queries during lock processing) so I am open to any suggestions or other ideas.
My questions:
1. Are there any terms that describe this problem? It seems like a very common thing but it was not known to me so far.
2. Are there any best practices on how to solve this?
3. If not, does my proposal make sense?
I wouldn't hack around the stateless of the web and REST and not block the requests. 250ms is a quarter second what is pretty slow IMO. E.g 100 concurrent requests will block each other and at least one stream will wait for 25 seconds! This slows down the application a lot.
I've seen some apps (including Atlassians Confluence) that notifies the user, that the current data set has changed in the meanwhile, by another user. (Sometime the other user was mentioned as well.) I would do it the same way using Websockets, via e.g. SignalR. Other apps like service-now, are merging the remote changes into the current open set of data.
If you don't want to use Websockets, you could also try to merge or to check on the server side and to notify the user with a specific HTTP Status code and an message. Tell the user what changed in the meanwhile, try to merge and ask whether the merge is fine or not.
My django rest app accepts request to scrape multiple pages for prices & compare them (which takes time ~5 seconds) then returns a list of the prices from each page as a json object.
I want to update the user with the current operation, for example if I scrape 3 pages I want to update the interface like this :
Searching 1/3
Searching 2/3
Searching 3/3
How can I do this?
I am using Angular 2 for my front end but this shouldn't make a big difference as it's a backend issue.
This isn't the only way, but this is how I do this in Django.
Things you'll need
Asynchronous worker procecess
This allows you to do work outside the context of the request-response cycle. The most common are either django-rq or Celery. I'd recommend django-rq for its simplicity, especially if all you're implementing is a progress indicator.
Caching layer (optional)
While you can use the database for persistence in this case, temporary cache key-value stores make more sense here as the progress information is ephemeral. The Memcached backend is built into Django, however I'd recommend switching to Redis as it's more fully featured, super fast, and since it's behind Django's caching abstraction, does not add complexity. (It's also a requirement for using the django-rq worker processes above)
Implementation
Overview
Basically, we're going to send a request to the server to start the async worker, and poll a different progress-indicator endpoint which gives the current status of that worker's progress until it's finished (or failed).
Server side
Refactor the function you'd like to track the progress of into an async task function (using the #job decorator in the case of django-rq)
The initial POST endpoint should first generate a random unique ID to identify the request (possibly with uuid). Then, pass the POST data along with this unique ID to the async function (in django-rq this would look something like function_name.delay(payload, unique_id)). Since this is an async call, the interpreter does not wait for the task to finish and moves on immediately. Return a HttpResponse with a JSON payload that includes the unique ID.
Back in the async function, we need to set the progress using cache. At the very top of the function, we should add a cache.set(unique_id, 0) to show that there is zero progress so far. Using your own math implementation, as the progress approaches 100% completion, change this value to be closer to 1. If for some reason the operation fails, you can set this to -1.
Create a new endpoint to be polled by the browser to check the progress. This looks for a unique_id query parameter and uses this to look up the progress with cache.get(unique_id). Return a JSON object back with the progress amount.
Client side
After sending the POST request for the action and receiving a response, that response should include the unique_id. Immediately start polling the progress endpoint at a regular interval, setting the unique_id as a query parameter. The interval could be something like 1 second using setInterval(), with logic to prevent sending a new request if there is still a pending request.
When the progress received equals to 1 (or -1 for failures), you know the process is finished and you can stop polling
That's it! It's a bit of work just to get progress indicators, but once you've done it once it's much easier to re-use the pattern in other projects.
Another way to do this which I have not explored is via Webhooks / Channels. In this way, polling is not required, and the server simply sends the messages to the client directly.
We have a very simple AppFabric setup where there are two clients -- lets call them Server A and Server B. Server A is also the lead cache host, and both Server A and B have a local cache enabled. We'd like to be able to make an update to an item from server B and have that change propagate to the local cache of Server A within 30 seconds (for example).
As I understand it, there appears to be two different ways of getting changes propagated to the client:
Set a timeout on the client cache to evict items every X seconds. On next request for the item it will get the item from the host cache since the local cache doesn't have the item
Enable notifications and effectively subscribe to get updates from the cache host
If my requirement is to get updates to all clients within 30 seconds then setting a timeout of less than 30 seconds on the local cache appears to be the only choice if going with option #1 above. Due to the size of the cache, this would be inefficient to evict all of the cache (99.99% of which probably hasn't changed in the last 30 seconds).
I think what we need to implement is option #2 above, but I'm not sure I understand how this works. I've read all of the msdn documentation (http://msdn.microsoft.com/en-us/library/ee808091.aspx) and have looked at some examples but it is still unclear to me whether it is really necessary to write custom code or if this is only if you want to do extra handling.
So my question is: is it necessary to add code to your existing application if want to have updates propagated to all local caches via notifications, or is the callback feature just an bonus way of adding extra handling or code if a notification is pushed down? Can I just enable Notifications and set the appropriate polling interval at the client and things will just work?
It seems like the default behavior (when Notifications are enabled) should be to pull down fresh items automatically at each polling interval.
I ran some tests and am happy to say that you do NOT need to write any code to ensure that all clients are kept in sync. If you set the following as a child element of the cluster config:
In the client config you need to set sync="NotificationBased" on the element.
The element in the client config will tell the client how often it should check for new notifications on the server. In this case, every 15 seconds the client will check for notifications and pull down any items that have changed.
I'm guessing the callback logic that you can add to your app is just in case you want to add your own special logic (like emailing the president every time an item changes in the cache).
We have two systems, A and B. System B sends Write and Read request as well as A returns a response for every read request using the existing engine E_current in A. Each Write request causes a modification in the existing engine E_current.
Periodically E_current will be replaced by E_new. While in renewal process, E_new is not able to used yet. Some of the Read request that comes while this renewal process depends on Write request that came after beginning of the renewal process. The new engine, E_new, should also do modifications on itself for each Write request that came during the renewal process and already processed by .
After completion of renewal process E_current will be evicted and E_new becomes E_current.
Requirements:
Requests are completely concurrent. For example, a write request can
come while a read request is being processed.
Multiple modifications on any engine E could cause inconsistent state, state consistency should be preserved.
Diagrams:
https://dl.dropbox.com/u/3482709/stack1.jpg
https://dl.dropbox.com/u/3482709/stack2.jpg
I'm creating a web app for handling various surveys. An admin can create his own survey and ask users to fill it up. Users are defined by target groups assigned to the survey (so only user in survey's target group can fill the survey).
One of methods to define a target group is a "Token target group". An admin can decide to generate e.g. 25 tokens. After that, the survey can be accessed by anyone who uses a special link (containing the token of course).
So now to the main question:
Every token might have an e-mail address associated with itself. How can I safely send e-mails containing the access link for the survey? I might need to send a few thousand e-mails (max. 10 000 I believe). This is an extreme example and such huge mailings would be needed only occasionally.
But I also would like to be able to keep track of the e-mail message status (was it send or was there any error?). I would also like to make sure that the SMTP server doesn't block this mailing. It would also be nice if the application remained responsive :) (The task should run in background).
What is the best way to handle that problem?
As far as I'm concerned, the standard Django mailing feature won't be much help here. People report that setting up a connection and looping through messages calling send() on them takes forever. It wouldn't run "in background", so I believe that this could have negative impact on the application responsiveness, right?
I read about django-mailer, but as far as I understood the docs - it doesn't allow to keep track of the message status. Or does it?
What are my other options?
Not sure about the rest, but regardless for backgrounding the task (no matter how you eventually do it) you'll want to look for Celery
The key here is to reuse connection and to not open it again for each email. Here is a documentation on the subject.