Retrieve JSON from Firebase Realtime Database Using C++ - c++

I am using the Firebase C++ SDK (on desktop) to retrieve data from the realtime database. I want to store some parts of the database on the client but I can only get a firebase::Variant. I would just like to store the json that belongs to the data path as a file.
I could iterate over the data and generate json from that, but that seems to be a lot of overkil as the data is retrieved from json in the first place.
Is there a way to retrieve the json itself (as text), instead of querying the values and children from firebase::database::DataSnapshot?
Kind regards,
Jeroen

Related

Building Google Cloud Platform Data Catalog on unstructured data

I have unstructured data in the form of document images. We are converting these documents to JSON files. I now want to have technical metadata captured for this. Can someone please give me some tips/best practices for building a data catalog on unstructured data in Google Cloud Platform?
This answer comes with the assumption that you are not using any tool to create schemas around your unstructured data and query your data, like BigQuery, Hive, Presto. And you simply want to catalog your files.
I had a similar use case, Google Data Catalog has an option to create custom entries.
Some tips on building a Data Catalog on unstructured files data:
Use meaningful file names on your JSON files. That way searching for them will become easier.
Since you are already using GCP, use their managed Data Catalog, and leverage their custom entries API to ingest the files metadata into it.
In case you also want to look for sensitive data in your JSON files, you could run DLP on them.
Use Data Catalog Tags to enrich the files metadata. The tutorial on the link shows how to do it on Big Query tables, but you can do the same on custom entries.
I would add some information about your ETL jobs that convert these documents in JSON files as Tags. Like execution time, data quality score, user, business owner, etc.
In case you are wondering how to do the step 2, I put together one script that automatically does that:
link for the GitHub. Another option is to work with Data Catalog Filesets.
So between using custom entries or filesets, I'd ask you this, do you need information about your files name?
If not then filesets might easier, since at the time of this writing it does not show any info about your files name, but are good to manage file patterns in GCS buckets: It is defined by one or more file patterns that specify a set of one or more Cloud Storage files.
The datatalog-util also has an option to enrich your filesets, in case you just want to have statistics about them, like average file size, types, etc.

I have trouble loading data from multiple csv files or python code in powerbi

I am using power bi for loading some data. The data is present in the data lake store, and python can load data and do some initial transforms on the data.
The problem is however that the data is located in multiple files as per ID (1 - 10000). It is not possible to load all the data in memory.
I am looking for some way to have a button or something similar that acts as a selector for these Ids and then loads the data for the selected ID using python script, hopefully by passing a parameter to the python script.

Pulling Instagram data into Google Big Query

I am new to development, so I am sorry if this is a really basic question. I am trying to access some of the data available from instagram's API as documented here. https://developers.facebook.com/docs/instagram-api/insights.
I would like some kind of data repository to pull the data into, so I am looking at Google Big Query to see if I can pull in the data. (The ultimate place will be PowerBi so I can publish online)
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
I am replacing the 'instagram-business-user-id' with an ID I have generated already - but it feels like perhaps it needs more markup to let Big Query know what language it is in.
Any help would be much appreciated.
GET graph.facebook.com/{instagram-business-user-id}/insights
?metric=impressions,reach,profile_views
&period=day
Looking at the Facebook request code - is it possible to put this into Google Big query to return the data?
Yes it's absolutely possible using bigQuery API or bigQuery CLI
You can use this Psuedo workflow as an example (using BigQuery API):
Create a table in bigQuery with the desired schema for this you also have 2 options:
Save the result in 1 column with the full JSON, This means to the select you need you use JSON_EXTRACT to fetch specific data
Process the JSON in your code and save it in specific columns to simplify the select statement
Call instagram's API
Call bigQuery API or bigQuery CLI to insert the data, This link provides one option how to do this
Call bigQuery API or bigQuery CLI to fetch the data, This link provides one option how to do this

How to split data when archiving from AWS database to S3

For a project we've inherited we have a large-ish set of legacy data, 600GB, that we would like to archive, but still have available if need be.
We're looking at using the AWS data pipeline to move the data from the database to be in S3, according to this tutorial.
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html
However, we would also like to be able to retrieve a 'row' of that data if we find the application is actually using a particular row.
Apparently that tutorial puts all of the data from a table into a single massive CSV file.
Is it possible to split the data up into separate files, with 100 rows of data in each file, and giving each file a predictable file name, such as:
foo_data_10200_to_10299.csv
So that if we realise we need to retrieve row 10239, we can know which file to retrieve, and download just that, rather than all 600GB of the data.
If your data is stored in CSV format in Amazon S3, there are a couple of ways to easily retrieve selected data:
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
S3 Select (currently in preview) enables applications to retrieve only a subset of data from an object by using simple SQL expressions.
These work on compressed (gzip) files too, to save storage space.
See:
Welcome - Amazon Athena
S3 Select and Glacier Select – Retrieving Subsets of Objects

What is the quickest way to import a JSON object created on the client?

I have an app that allows the user to select an excel file from there computer and it breaks it down into a JSON object so that it can be imported into the store. is this.store.pushPayload('part',payload)the quickest and most efficient way to add around 300 records to the store without hitting the server?
Running the latest beta of ember-data and ember.js
push is the preferred method to put one normalized object in the store, pushMany does the same thing but with a collection of the same type:
http://emberjs.com/api/data/classes/DS.Store.html#method_pushMany
If you need your data to be normalized by the adapter, you should call pushPayload.