Where is iCloud KV data stored on the filesystem? - icloud

I need to do some debugging, and need to read and modify values stored on iCloud Key-Value storage for my app (using NSUbiquitousKeyValueStore. Does anybody know where this is stored on disk, and what format it is in (some sort of database, json, etc)? I know containers are stored in the equivalently named folder in Library, and documents are stored in Mobile Documents, but what about iCloud KV data?

Related

Storing raw text data vs analytics

I’ve been working on a hobby project that’s a django react site that give analytics and data viz for texts. Most likely will host on AWS. The user uploads a csv of texts. The current logic is that they get stored in the db and then when the user calls the api it runs the analytics on them and sends the analytics. I’m trying to decide whether to store the raw text data (what I have now) or run the analytics on the texts once when they're uploaded and then discard them, only storing the analytics.
My thoughts are:
Raw data:
pros:
changes to analytics won’t require re uploading
probably simpler db schema
cons:
more sensitive data (not sure how safe it is in a django db on AWS, not sure what measures I could put in place to protect it more)
more data to store (not sure what it would cost to store a lot of rows of texts)
Analytics:
pros:
less sensitive, less space
cons:
if something goes wrong with the analytics on the first run (that doesn’t throw an error), then they could be inaccurate and will remain that way

How much can request.session store?

I'm new to learning about Django sessions (and Django in general). It seems to me that request.session functions like a dictionary, but I'm not sure how much data I can save on it. Most of the examples I have looked at so far have been using request.session to store relatively small data such as a short string or integer. So is there a limit to the amount of data I can save on a request.session or is it more related to what database I am using?
Part of the reason why I have this question is because I don't fully understand how the storage of request.session works. Does it work like another Model? If so, how can I access the keys/items on the admin page?
Thanks for any help in advance!
In short: it depends on the backend you use, you specify this with the SESSION_BACKEND [Django-doc]. Te backends can be (but are not limited to):
'django.contrib.sessions.backends.db'
'django.contrib.sessions.backends.file'
'django.contrib.sessions.backends.cache'
'django.contrib.sessions.backends.cached_db'
'django.contrib.sessions.backends.signed_cookies'
Depending on how each backend is implemented, different maximums are applied.
Furthermore the SESSION_SERIALIZER matters as well, since this determines how the data is encoded. There are two builtin serializers:
'django.contrib.sessions.serializers.JSONSerializer'; and
''django.contrib.sessions.serializers.PickleSerializer'.
Serializers
The serializer determines how the session data is converted to a stream, and thus has some impact on the compression rate.
For the JSONSerializer, it will make a JSON dump that is then compressed with base64 compression, and signed with hmac/SHA1. This compression ratio will likely have ~33% overhead compared to the original JSON blob.
The PickleSerializer will first pickle the object, and then compress it as well and sign it. Pickling tends to be less compact than JSON encoding, but pickling on the other hand can convert objects that are not dictionaries, lists, etc. into a stream.
Backends
Once the data is serialized, the backend determines where it is stored. Some backends have limitations.
django.contrib.sessions.backends.db
Here Django uses a database model to store session data. If the database can store values up to 4 GiB (like MySQL for example), then it will probably store JSON blobs up to 3 GiB per session. Note that of course there should be sufficient disk space to store the table.
django.contrib.sessions.backends.file
Here the data is written to a file. There are no limitations implemented, but of course there should be sufficient disk space. Some operating systems can add certain limitations to the amount of disk space files in a directory can allocate.
django.contrib.sessions.backends.cache
Here it is stored in one of the caches you specified in the CACHES setting [Django-doc], depending on the cache system you pick certain limitations can apply.
django.contrib.sessions.backends.cache_db
Here you use a combination of cache and db: you use the cache, but the data is backed by the database, such that if the cache is invalidated, the database still contains the data. This thus means that the limitations of both backends apply.
django.contrib.sessions.backends.signed_cookies
Here you store signed cookies at the browser of the client. The limitations of the cookies are here specified by the browser.
RFC-2965 on HTTP State Management Mechanism specifies that a browser should normally be capable of storing at least 4096 bytes per cookie. But with the signing part, it might be possible that this threshold is not sufficient at all.
If you use the cookies of the browser, you thus can only store very limited amounts of data.

Store image with tag and prefix to query fast (s3 aws)

I use Ionic to create a mobile app which can take photo and can upload image from mobile to s3. I wonder how to make a prefix or tag beside the upload image which help me query to this fast and unique. I think about make a prefix and create folder:
year/month/day/filename ( e.g: 2018/11/27/image.png )
If there are a lot of image in 2018/11/27/ folder, I think it will query slow and sometime the image filename not unique. Any suggest for that ?? Tks a lot.
Amazon S3 is an excellent storage service, but it is not a database.
You can store objects in Amazon S3 with whatever name you wish, but if you wish to list/sort/find objects quickly you should store the name of the object, together with its metadata, in a database. Then you can query the database to find the object of interest.
DynamoDB would be a good choice because it can be configured for guaranteed speed. You could also put DAX in front of DynamoDB for even greater performance.
With information about the objects stored in a database, you can quite frankly name each individual object anything you wish. Many people just use a UUID since it just needs to be a unique identifier. The object name itself does not need to convey any meaning - it is simply a Key to identify the object when it needs to be accessed later.
If, however, objects are typically processed in groups (such as having daily files grouped together into months for processing with Hadoop clusters), then locating objects in a particular path is useful. It allows the objects to be processed together without having to consult the database.

Retrieve JSON from Firebase Realtime Database Using C++

I am using the Firebase C++ SDK (on desktop) to retrieve data from the realtime database. I want to store some parts of the database on the client but I can only get a firebase::Variant. I would just like to store the json that belongs to the data path as a file.
I could iterate over the data and generate json from that, but that seems to be a lot of overkil as the data is retrieved from json in the first place.
Is there a way to retrieve the json itself (as text), instead of querying the values and children from firebase::database::DataSnapshot?
Kind regards,
Jeroen

How to split data when archiving from AWS database to S3

For a project we've inherited we have a large-ish set of legacy data, 600GB, that we would like to archive, but still have available if need be.
We're looking at using the AWS data pipeline to move the data from the database to be in S3, according to this tutorial.
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html
However, we would also like to be able to retrieve a 'row' of that data if we find the application is actually using a particular row.
Apparently that tutorial puts all of the data from a table into a single massive CSV file.
Is it possible to split the data up into separate files, with 100 rows of data in each file, and giving each file a predictable file name, such as:
foo_data_10200_to_10299.csv
So that if we realise we need to retrieve row 10239, we can know which file to retrieve, and download just that, rather than all 600GB of the data.
If your data is stored in CSV format in Amazon S3, there are a couple of ways to easily retrieve selected data:
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
S3 Select (currently in preview) enables applications to retrieve only a subset of data from an object by using simple SQL expressions.
These work on compressed (gzip) files too, to save storage space.
See:
Welcome - Amazon Athena
S3 Select and Glacier Select – Retrieving Subsets of Objects