general Java design of consuming an async web service - web-services

i need to consume a batch web service where I send a unique id to identify myself, then the service sends back a unique response id that I am to use some minutes later to get the info I need.
in general, what is a good way to keep track of the response id and call the service again at some later time to get the real response?

The easy solution is to stick the ID and timestamp a map* or list, then have a loop in a separate thread that wakes up and processes all IDs older than a certain age. (Make sure that the map or list is thread-safe.) However, if your app goes down and gets relaunched, it will lose track of pending requests. If you must handle that case, use a database.
*
One specific solution is to use a SortedMap keyed by timestamp. You must make sure every timestamp is unique so you should expect not to put more than one element per millisecond into the map. Then to put an ID into the map, let the timestamp be System.currentTimeMillis(), and while the timestamp is already a key in the map, increment it. Then put the (timestamp, ID) pair into the SortedMap. This solution is convenient because the loop thread can just read elements of the SortedMap from the beginning until they are too new and then stop, because all the oldest elements are at the beginning of the map.

Related

What's the most efficient data structure for store huge number of sessions?

For every client, the server creates a session for that specific client. A session has a expire time of 1 day. So that will end up with up to billion of sessions.
Suppose I use a hash map, then look up will be fast when a client communicates with server. However, I need to erase those expired sessions, for example once an hour. During the erase, then it may take some time due to the huge number, and this will cause server not being able to handle communication from the client.
So are there any high performance solution for this? i.e. I don't want to lock the map for erasing expired ones.
Using a data structure is probably too simple if you have a very high number of sessions, you will need a slightly different approach.
Look into storing session data in Redis or another key value store. This would be more normal for servers with high load. Redis and most others offer persistence and don't have locking issues if you need to clear things out in the background.
I don't think a map is really the best collection. With what you said in mind, I would go for a Set (an unordered one if you don't need an order). As you will never have 2 times the same Session they will all be different, and you don't really need an association which a map offers, or I didn't understand your problem correctly.
Simple solution: use a hash table. When you are searching a bucket for an entry, delete any expired sessions you come across. This is almost free, since you are searching the chain anyway. It doesn't guarantee that sessions will be deleted immediately on expiry, but it is highly probable that the chain containing an expired session will be searched not long afterwards.
You should presize the hashtable to a fixed number of buckets representing what you expect to be the capacity of the server. That avoids the need to rehash, and that means that each bucket chain can be independently locked. You don't need a lock for every chain, though; you can use the same lock for several -- even many -- chains. Choose a number of locks sufficient that your expected lock contention will be low under peak request pressure; you can compute a good number based on the number of simultaneously active handler threads you have. A chain search will take very little time if the chain is memory-resident, so it will almost always complete before a context-switch. So "simultaneously active" means that they are actually mapped to a CPU and running, not just mapped to a kernel process. So with even a small vector of locks, you should be able to reduce bucket chain contention to a very low level.
One way to handle this is to create a hash map to hold the sessions, and a MRU (most recently used) list. The MRU list is implemented as a doubly-linked list. Whenever a user accesses the site, his session is moved back to the top of the MRU list. Also, whenever a session is created, the system checks the last item in the MRU list to see if the oldest session has expired, so that you can delete it.
Or, you could delete all of the expired sessions at the end of the list.
In addition, you'll want to have your lookup code delete an expired session if it hasn't already been deleted.
So, when you get a request, the sequence of events looks something like this:
session = get session info from user token
if no session
create session
add to front of MRU list
else if session has expired
delete from mru list
remove from hash map
else // session has not expired
move session to front of MRU list
end
// delete expired sessions
p = last item in MRU list
while p has expired
prev = p->prev
remove from MRU list
delete from hash map
p = prev
end
If you're worried that cleaning up expired sessions will lock your hash map for too long, set a limit on the number of expired sessions you'll remove at any one time. If you set it to only clean up two expired sessions when a new session is added, you'll minimize the amount of time your data structure is locked, and expired sessions won't linger too long.

Reprocess batches of items over and over again - and the batch might change any time

I am just looking for ideas on how to solve one specific thing I'd like to build.
Say I have two sets of items. Each item is just a couple of lines of JSON. Any time an item is added to one set I immediately (well, almost) want to process this against the full other set. So item is added to set A: Process against each item in set B. And vice versa.
Items come in through API Gateway + Lambda. Match processing in Lambda from a queue/stream.
What AWS technology would be a good fit? I have no idea and no clear pattern on when or how often the sets change. Also, I want it to be as strongly consistent as possible. And of course, I want it to be as serverless and cost-effective as possible. :)
Options could be:
sets stored in Aurora, match processing for a new item in A would need to query the full set B from the database each time
sets stored in DynamoDB, maybe with DynamoDB stream in the background; match processing for a new item in A would need to query the full set B from Dynamo; but spiky load, not a good fit because of unclear read/write provisioning
have each set in its own "static" Kinesis stream where match processing reads through items but doesn't trim. Streams to be replaced with fresh sets regularly
My pain point is: While processing items from A there might be thousands of items in B to be matched. And I want to avoid having to load the full set B from some database every time I process an item from A. I was thinking about some caching of sets but then would need a good option to invalidate that cache whenever something changes.

Dynamically assign MFC command IDs during runtime

I have a menu-like MFC control which hosts a lot of menu-entries (with command IDs). The number of menu-entries as well as the structure changes dynamically during runtime. That means that I have to create controls and assign new IDs dynamically from time to time.
What I did so far is to reserve a large static range of IDs and assign them sequentially. Even though the range is pretty large I'm afraid I will end up at the point where there are no IDs left. I cannot start over at the beginning either because I do not know which of the previously assigned IDs have been released.
My first thought was to find the largest command ID in the current resource handle and start from there. But I don't know how to accomplish that.
Or is there a better way to manage this? I think I might not be the first person with this kind of problem.
Hmm. It is not very possible to run out of IDs. You can start from WM_USER and increment the ID each time with 1. But if you really think that you can run out of IDs then you can use a stack or list keeping the already used IDs and reuse them the next time when you need ID. When you finish processing the message add the ID into the stack with push(ID) method (you can pass the ID with LPARAM of the ON_MESSAGE macro in MFC). Then when you need a new ID first check if the ID stack is empty, if not take the top ID with pop(). Only if the IDs stack is empty use the last available in the range ID.

Web Service to return unique auto incremented human readable id number

I'm looking to create a simple web service that when polled returns a unique id. The ID has to be human readable (i.e. not a guid, probably in the form 000023) and is simply incremented by 1 each time its called.
Now I need to consider that it may be called by two different applications at the same time and I don't want it to return the same number to each application.
Is there another option than using a database to store the current number?
Surely this has been done before, can anyone point me at some source code if it is.
Thanks,
Neil
Use a critical section piece of code to control flow one at a time through a section of code. You can do this using the lock statement or by being slightly more hardcore and using a mutex directly. Doing this will ensure that you return a different number to each caller.
As for storing it, using a database is overkill for returning an auto incrementing number - although SQLServer and Oracle (and most likely others but i can't speak for them) both provide an auto incrementing keys feature, so you could have the webservice called, generate a new entry in the database table, return the key, and the caller can use that number as a key back to that record (if you are saving more data later after the initial call). This way you also let the database worry about the generation of unique numbers, you don't have to worry about the details of it - although this is not a good option if you don't already have a database.
The other option is to store it in a local file, although that would be expensive to read the file, increment the number, and write it back out, all within a critical section.
you can use a file.
Pseudocode:
if (!locked('counter.txt'))
counter = read('counter.txt')
else
wait
startAgain
lock('counter.txt')
counter++
print counter
write('counter.txt', counter)
unlock('counter.txt)

Auto-increment on Azure Table Storage

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).