Storing and iterating lists in window azure cache - list

I am making an azure web role service, in where I have a long list (thousands) of objects, which I am filter upon different criteria. I need to cache the list, but I have concern, which is:
Suppose I have a number of role instances, and the list is cached on one machine, while another machine wants to iterate the list. Will the list be copied into the memory of the requesting machine and iterated after?

Windows Azure Caching is serialized - meaning that when you store an item to cache it is serialized (using the .Net XmlSerializer by default, but you can change this) and when it is retrieved from the cache it is deserialized to a new object.
So yes - when you retrieve a list from the cache (even on the same role instance!) you will have a new list in memory that is iterated over.

Related

Why deleted items still appear in AWS Datastore

I have a backend in AWS Amplify where my data is stored. Upon querying a model and then deleting an item of the model directly from Content in Amplify Studio, I still get the same number of items. After checking their content I found out the difference is that for existing items the property _deleted holds the value null and the items that have actually been deleted hold the value undefined.
Why is that ? and is there a way to deleted items and make them completely disappear from the datastore ?
DataStore.query always operates on local data, not server data--it relies on the automatically managed subscriptions to keep the local store consistent.
If you want a deleted item to be reflected in your UI immediately, i've been using DataStore.observeQuery() seen in the documentation linked below for real time data subscriptions:
https://docs.amplify.aws/lib/datastore/real-time/q/platform/js/

Store image with tag and prefix to query fast (s3 aws)

I use Ionic to create a mobile app which can take photo and can upload image from mobile to s3. I wonder how to make a prefix or tag beside the upload image which help me query to this fast and unique. I think about make a prefix and create folder:
year/month/day/filename ( e.g: 2018/11/27/image.png )
If there are a lot of image in 2018/11/27/ folder, I think it will query slow and sometime the image filename not unique. Any suggest for that ?? Tks a lot.
Amazon S3 is an excellent storage service, but it is not a database.
You can store objects in Amazon S3 with whatever name you wish, but if you wish to list/sort/find objects quickly you should store the name of the object, together with its metadata, in a database. Then you can query the database to find the object of interest.
DynamoDB would be a good choice because it can be configured for guaranteed speed. You could also put DAX in front of DynamoDB for even greater performance.
With information about the objects stored in a database, you can quite frankly name each individual object anything you wish. Many people just use a UUID since it just needs to be a unique identifier. The object name itself does not need to convey any meaning - it is simply a Key to identify the object when it needs to be accessed later.
If, however, objects are typically processed in groups (such as having daily files grouped together into months for processing with Hadoop clusters), then locating objects in a particular path is useful. It allows the objects to be processed together without having to consult the database.

Is Redis atomic when multiple clients attempt to read/write an item at the same time?

Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client

DynamoDB local db limits - use for initial beta-go-live

given Dynamo's pricing, the thought came to mind to use DynamoDB Local DB on an EC2 instance for the go-live of our startup SaaS solution. I've been trying to find like a data sheet for the local db, specifying limits as to # of tables, or records, or general size of the db file. Possibly, we could even run a few local db instances on dedicated EC2 servers as we know at login what user needs to be connected to what db.
Does anybody have any information on the local db limits or on this approach? Also, anybody knows of any legal/licensing issues with using dynamo-local in that way?
Every item in DynamoDB Local will end up as a row in the SQLite database file. So the limits are based on SQLite's limitations.
Maximum Number Of Rows In A Table = 2^64 but the database file limit will likely be reached first (140 terabytes).
Note: because of the above, the number of items you can store in DynamoDB Local will be smaller with the preview version of local with Streams support. This is because to support Streams the update records for items are also stored. E.g. if you are only doing inserts of these items then the item will effectively be stored twice: once in a table containing item data and once in a table containing the INSERT UpdateRecord data for that item (more records will also be generated if the item is being updated over time).
Be aware that DynamoDB Local was not designed for the same performance, availability, and durability as the production service.

How to handle different structure of same Entity in Hibernate L2 cache with Coherence for caching

I am using Hibernate L2 cache with Coherence for caching in two different web services.
Scenario
First web service has an entity class Employee with 5 fields
Second web service has the same entity class Employee but with 3 fields.
Both are pointing to same table/schema and the package hierarchy is also same.
Now when fresh request for employeeId=1 comes to second web service, it fetches from the value from the database and caches the 3 columns; keeps the other 2 as null.
Now when a request for employeeId=1 comes from the first web service, it directly fetches from cache by providing 3 columns and returns the other 2 as null, even though in the database the 2 columns have non-null values.
Is there a way by which I can force it get these column from database?
Approaches already tried
If I keep the columns in both the web services as same the problem goes away but this is not a acceptable solution in my scenario.
I tried added different serialVersion but it doesn't work.
Keeping the fully qualified name different works, but this is force us add overhead to performing manual eviction
You should be able to use the Evolvable interface for this, which will allow you to insert an object into the grid that is both forward and backward compatible. You just need to ensure that Second Webservice sets a lower version than First.