I have been attempting to implement CloudKit specifically for the sharing capability. I need to share one set of data with a selected list of other user's of the app. CloudKit may be more fire power than I need if icloud's key-value storage can be similarly shared. I have not been able to find any information regarding that. What I need to share can easily be represented with one key-value pair. The 1 MB limit my end up killing this idea but not necessarily. Any information on this would be greatly appreciated.
iCloud KVS (Key Value Store) does not support sharing. Sharing is specifically a CloudKit feature.
Related
Some functions I am writing would need to store and share a set of cryptographic keys (<1kb) somewhere so that:
it is shared across functions and within instances of the same function
it is maintained after function deploys
The keys are modified (and written) every 4 hours or so, based on whether a key has expired or a new key needs to be created.
Right now, I am storing the keys as encrypted binary in a cloud bucket with access limited to that function. It works, except that it is fairly slow (~500ms for the read / write that is required when updating the keys).
I have considered some other solutions:
Redis: fast, but overkill given the price ($40/month) it would cost to store a single value
Cloud SQL: the functions are already connected to a cloud instance so it would not incur more costs
Dropping everything and using a KMS. Unfortunately it would not meet the requirements I have.
The library I use in my functions is available here.
Is there a better way to store a single small blob of data for cloud functions (and possibly other tools like GKE) ?
Edit
The solution I ended up using was using a single table in a database that the app was already connected to. It is also about 5 times faster than using a bucket (<100ms).
The moral of the story is to use whatever is already provisioned to store the keys. If storing a key is a problem, then using the combo KMS + cloud functions for rotations described below seems like a good option.
All the code + more details are available here.
A better approach would be to manage your keys with Cloud KMS. However, as you mentioned before Cloud KMS does not automatically delete the old key version material and you will need to manually delete old versions which I suspect is a thing that you don’t want to do.
Another possibility is to just keep the keys in Firestore. Since for this you don’t have to provision any specific infrastructure such as with Redis Memorystore and Postgres Cloud SQL it will be easier to manage and to scale in the long run.
The general idea would be to have a Cloud Function triggered by Cloud Scheduler every 4 hours, and this function will rotate the keys on your Cloud Firestore.
How does this sound to you?
I am an entry level developer in a startup. I am trying to deploy a text classifier on GCP. For storing inputs(training data) and outputs, I am struggling to find the right storage option.
My data isn't huge in terms of columns but is fairly huge in terms of instances. It could even be just key-value pairs. My use case is to retrieve each entity from just one particular column from the DB, apply some classification on it and store the result in the corresponding column and update the DB. Our platform requires a DB which can handle a lot of small queries at once without much delay. Also, the data is completely unrelational.
I looked into GCP's article of choosing a storage option but couldn't narrow down my options to any specific answer. Would love to get some advice on this.
You should take a look at Google's "Choosing a Storage Option" guide: https://cloud.google.com/storage-options/
Your data is structured, your main goal is not analytics, your data isn't relational, you don't mostly need mobile SDKs, so you should probably use Cloud Datastore. That's a great choice for durable key-value data.
In brief, these are the storage options available. May be in future it can be more or less.
Depending on choice, you can select your storage option which is best suited.
SOURCE: Linux Academy
Is this the same as php $_sessions?
Does it use php $_sessions? (edited)
When should I use it?
What are some down sides of using it?
Can or should I use it to store user input and results of forms for later operations?
And is it secure? (edited)
The private temporary storage differs from session storage in very significant ways and it is not intended as a replacement of it.
For logged in users, sessions are completely irrelevant, the data stored in the temporary storage is shared among all sessions of a given user, current and future.
Only for an anonymous user are sessions relevant: should their sessions expire, the contents is not retrievable any more because it is tied to the session ID. But the data is not stored in the session storage for anonymous users either, only the session ID is relevant.
The data expires after a time set on the container parameter called tempstore.expire which has nothing to do with the session cookie lifetime (nor is the latter relevant for logged in users).
There is metadata associated with each piece of data: the owner (either the logged in user id or a session id) and the updated time.
The durability expectations completely differ. Sessions are fundamentally ephemeral. Many places will tie sessions to IP addresses. It is certainly tied to a browser and as such, to a device. As a corollary, if clients can't expect sesssions to last, there's no reason for the server to cling to them heavily: putting the session storage on fast but less durable storage (say, memcached etc) is a completely valid speedup strategy. However, the private temporary storage is durable -- within expire, of course. A typical thing to store in a session is a "flash" message -- the one you set with drupal_set_message. If you set one such and then the session gets lost, oh well. Yeah, informing the user would've been nice but oh well. I certainly wouldn't expect to see a flash message follow me across browsers and devices.
In theory, a typical thing to store in the private temp storage would be a shopping cart. In practice, this is not done because a) carts, if not for the end user but for the back office are valuable, not temporary data b) when a user logs in, their session data is migrated but their private temp storage is not. WHether this is a bug is debateable, at the time of this writeup I can't find a core issue about this. This is a possible downside. So a Views UI like complex edit is one possible use case but note the Views UI itself uses the shared temporary storage facility, not the private one. In fact, the only usage I can find are node previews.
Here a very good articles about Storing Session Data with Drupal 8.
It cover exactly all your questions & more !
Take a look at it, the author give you also a lot of other links to help you.
Here a short summary:
1. Is this the same as php $_sessions?
Roughly equivalent. But (and it's an important but) using Drupal 8 services provides needed abstraction and structure for interacting with a global construct. It's part of an overall architecture that allows developers to build and extend complex applications sustainably.
2. When should I use it?
In past versions of Drupal, I might have just thrown the data in $_SESSION. In Drupal 8 there's a service for that; actually, two services: use user.private_tempstore and user.shared_tempstore for temporarily storing user-specific and non-user-specific data, respectively.
3. What are some down sides of using it?
Knowing POO.
4. Can or should I use it to store user input in forms for later operations?
Should.
So I'm currently developing a messaging application to learn the process and I'm actually using Redis as a cache and use it with websockets to push real-time messages.
And then, this question popped in my mind:
Is it possible, to use Redis only to run a whole service (like a messaging application for example) ?
NOTE : This imply removing any form of database (we're only keeping strings)
I know you can set-up Redis to be persistent, but is it enough ? Is it robust enough ? Would it be a safe enough move ? Or totally insane ?
What are you thoughts ? I'd really like to know, and if you think it is possible, I'll give it a shot.
Thanks !
A few companies use Redis as their unique or primary database, so it is definitely not insane.
You can develop and run a full service using Redis as a backend, as long as you understand and accept the tradeoffs it implies.
By this I mean:
that you can use a Redis server as a high performance database as long as your whole data can reside in memory. It may imply that you reduce the size of your data, or choose not to store some of them which may be computed by your app on read access or imported from another source;
that if you can't store all of your data in the memory of a single server, you can use a Redis cluster, but it will limit the available Redis features (see implemented subset
that you have to think about the potential data losses when a server crashes, and determine if they are acceptable or not. It may be OK to lose some data if the process which produced them is robust and will create them again when the database restart (by example when the data stored in Redis come from an import process, which will start again from the last imported item). You can also use several Redis instances, with different persistency configuration: one which writes on disk each time a key is modified, avoiding potential data loss, but with much lower performances; and another one to store non critical data, which are written on disk every couple of seconds.
Redis may be used to store structured data, not only strings, using hashes. Each time you would create an index in a relational model, you have to create a data structure in Redis. By example if you want to store Person objects, you create a HASH for each of them, to store their properties, including a unique ID. If you want to be able to get people by city, you create a SET for each city, and you insert the ID of each newly created Person in the corresponding SET. So you will be able to get the list of persons in a given city. It's just an example, you have to define the model and data structures to be used according to your application.
I am figuring out my options for storing hierarchical data (parent - child relationships).
Since a tree is a graph and a forest (of trees) is also technically a graph, a graph database seems to fit the bill much better than a RDBMS esp. since I am concerned with optimizing both read and write operations.
Optimizing writes implies changes in hierarchy require minimal writes.
Optimizing reads implies materializing the full path to a particular node consumers minimal read operations.
My use case is:
A tree per user. Should I store and use one graph across the user space or one graph per user?
Path queries starting at any node and back to root of tree for a user.
Child nodes store links to parent nodes
Since all of my resources are in AWS, being able to use the Titan DynamoDB backend seems ideal.
My real problem is in understanding how to scale and manage Titan though.
Do I need a gremlin server instance? In other words, do I need to stand up a EC2 instance with gremlin server in order to do anything with Titan? Or can I use the Java Titan API to work with graph data directly?
Do I need to explicitly shard the data? In other words, do I need to stand up more gremlin servers as usage increases and the amount of data and the amount of operations increase? When the number of servers scale out, do I need to consistent hash across those servers from the client in order to perform operations?
Do I need to setup an elastic search cluster to be able to start traversals from any node? Or is using vertices to represent objects and edges to represent parent relationships sufficient at this point? I can guarantee that vertex ID's are unique across the user-space ; I can also decorate each vertex with the unique user ID as well. In that case, do I need elastic search? My hope is that elastic search is for free form or more complex search type queries and not for exact queries!
As the number of front-ends increase, can each front-end open the graph (single graph across user space)? If a graph per user, then since front-ends have no affinity, the same graph may be opened for each user; is that OK?
I wasn't able to find much documentation on any of this. Thank you!
I will try to answer your questions in the following:
Both solutions are possible, it is highly depend on your application to decide between choosing gremlin server or having a customized data access layer with customized queries through other secondary data stores. Although I would prefer having customized data access layer, it is possible to response all gremlin query requirements through gremlin server.
Gremlin server is just an interface between your application and data stores, and due to the caching mechanism it is memory-intensive. Data can be stored in different machines for example a cluster of DynamoDB machines. It depend on the number of concurrent users, but I think vertical scaling is more than enough for most of the applications. If you are going to use titan in a highly concurrent environment, beyond resource of single machine, probably you have to create different gremlin-servers on different machines and handle the load balancing mechanism. The problem is you have to control sending requests in a way that similar queries hit the same gremlin-server from the cache efficiency point of view.
Yes, indexing backend is just useful for more complicated queries other than simple retrieval. Secondary index backend like Solr/ Elastic or Lucene is useful if you want to have conditional search or text search by similarity. It is because that indexer like Lucene can provide a reverse index structure that can be helpful for similar searches. If you are going to search for all parents/children with having "foo" in their names you have to use indexing backend. If you are going to search for all parents/children with age less than 40, you have to use indexing backend too.
More information about indexing backends could be accessed via these link.
http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
http://s3.thinkaurelius.com/docs/titan/1.0.0/index-parameters.html
It is highly recommended to limit the number of open graphs to one for the entire application. Titan uses some caching mechanisms that encourage you to have a single graph instance in the entire application for the sake of performance. Since uncommitted data is just visible on a single graph instance and transaction, if you want real-time application it is suggested that use single graph instance and single transaction. However, using more than 1 graph instance in the entire application for a read-only transaction is not wrong, but it is not efficient.
You can find lots of information about Titan graph database in the following links:
Main Titan documentaion: http://s3.thinkaurelius.com/docs/titan/1.0.0/
An old but really useful document about how Titan works: https://github.com/elffersj/delftswa-aurelius-titan/tree/master/SA-doc