Question about high level architecture required to process and visualize fitness app data (From Apple Health for example) using google cloud services?

Question about high level architecture required to process and visualize fitness app data (From Apple Health for example) using google cloud services? - google-cloud-platform

I'm working on a project where I am tasked to use google cloud services to process and visualize fitness data. For example, I have exported some apple health data from my watch, and it is in .xml format. From a high level, I envision this .xml file starting off in object storage, and being converted to .csv through a cloud function (triggered by the creation of the .xml object in storage) and stored again in object storage (different bucket). Then I see these .csv files being processed by a DataFlow pipeline, which will reformat the data to the template schema that I would like the data to be organized with. This pipeline will output the resultant .csv to BigQuery, which will then be designated as a data source for Data Studio. I will then configure Data Studio to produce some simple reports that compare the health data to recommended values. I would like for this report to be accessible as a .pdf in object storage potentially as well. Am I on the right track, or am I missing some key services to accomplish this?
Also, I'm new to posting on StackOverflow, so if this question is against the rules or not welcome, please let me know.
Any feedback is greatly appreciated, as I have not been able to bounce these ideas off of other experienced cloud architects/developers.

This question is currently off-topics by the rule of StackOverflow, as it does not contain any problems to resolve. See point 4-5.
As a high-level advice, I do not see why it should not be possible based on the services you mentioned but you would need to implement it and try it on your side and evaluate the features of each service in your workflow.
In terms of solution or architecture advice, those are generally paid services and you would most likely find little help here for those unless you have a specific problem to solve with said services. You might find some help on the internet as well. ie.Cloud Solutions, Built it on GCP, etc
You might find this interesting to review as well as it mimics your solution. Hope this helps.

Related

Can we use Google cloud function to convert xls file to csv

I am new to google cloud functions. My requirement is to trigger cloud function on receiving a gmail and convert the xls attachment from the email to csv.
Can we do using GCP.
Thanks in advance !

Very shortly - that is possible as far as I know.
But.
You might found that in order to automate this task in a reliable, robust and self-healing way, it may be necessary to use half a dozen cloud functions, pubsub topics, maybe a cloud storage, maybe a firestore collection, security manager, customer service account with relevant IAM permissions, and so on. Maybe more than a dozen or two dozens of different GCP resources. And, obviously, those cloud functions are to be developed (I mean the code is to be developed). All together that may be not a very easy or quick to implement.
At the same time, I personally saw (and contributed to a development of) a functional component, based on cloud functions, which together did exactly what you would like to achieve. And that was in production.

Backup of Datastore/Firestore without gcloud import/export

Hello Google Cloud Platform users!
I am interested in a solution for a regular (let's say daily) backup of Datastore/Firestore databases. Typical use: for some reason (bad "manual" operation, bug, whatever), a series of entities have been wrongly modified or destroyed, or the database is corrupted; in that case, the database version from the previous day will be restored.
I know this has been discussed in previous posts, but mostly through gcloud datastore|firestore import|export through files hosted on Google Cloud Storage. The problem is that for large databases (typically for professional applications with thousands and thousands of entities), this approach can take huge time and resources, even if launched in batch during the night (and it can only get worse when the database increases).
A solution that I have thought about would be to copy to another Datastore/Firestore dataset at each upsert, but that seems like overkill, since Datastore/Firestore services already guarantees replica anyway. But most of all: it does not address the issue of unwanted writing or deletion of entities if this second database is 100% synced with the original one...
Are there best practices to backup Datastore/Firestore entities for this use case?
Any (brilliant) idea is welcome!
Thanks.

You can have a look on this project: https://github.com/Zenika/alpine-firestore-backup
I'm a contributor on it, don't hesitate if you have question or if you want new features.

At the moment that function is not available for the datastore/firestore, there is a Feature Request to implement the functionality
https://issuetracker.google.com/133662510

Ship Google Cloud Text to Speech WAV Files in Product?

We used to have someone go into a recording studio to record speech for our software. Recently, the Google Cloud has become good enough so that we are considering to use the API instead. We went through the terms, and it seems that distributing the WAV results is not covered by the terms. Does anybody know if this is okay?

I'm curious what exactly means "distributing the WAV results" for you.
In GCP all the data you upload or generate is owned by you according to security policy. It is not accessible for anyone else than you, unless you explicitly grant access. So, in this case once you obtain the audio result from Text-To-Speech you decide how to distribute it.

Getting data from local running java app to google cloud app and back

I wanted to dive into the world of distributed systems, cloud computing, IoT, etc., and I gotta be honest, I imagined everything being a little more intuitive than it finally turned out.
I had a tiny testing architecture in mind, that I'd like to set up with Google Clouds and their services, but I am kinda stuck since I can't get my head around some concepts.
What I basically wanted to do (as a first step) is writing a simple java application that would run locally on my computer. This application should just generate random numbers and send those numbers somehow to the google cloud. On the cloud I wanted to define another java application that would manipulate those random numbers in some kind of way (it doesn't matter actually). Afterwards, the output should somehow get back to me of course. And actually, at the moment, I don't even care about how exactly. It could be somehow back to my local app (with some kind of listener, would that be possible?). But it could also simply store the results somewhere on the google cloud? Or maybe upload them to my google drive?
I guess you already noticed that - at some points - I don't even know what i want exactly, since I'm not sure of what is possible, and what not.
Could you provide me some help to get this set up?
The most important questions for me right now are:
Do I need to use a pubsub system, where my generated numbers are sent
to, and which then forwards this to the cloud app, that transforms my
data?
How do I get my data from the local app to the cloud services?
Would my data transforming app run on Google Dataflow?
Above I wrote "as a first step"... because later I would also like to send config files (for example in json format, or xml) to the cloud, and the
cloud application should transform those config files... if I get the
first scenario running the I guess this woul also be no problem
right?
Those are just a few of the questions that are on my mind currently. The most important ones I guess.
It would be a big help. Sorry, if the questions are not very precise, but I really need some kind of pointing into the right direction.
Thank you in advance!

I think it would be good to read up on some of the technologies you mention here:
Google Cloud Pubsub: Pub/Sub enables you to publish messages to a topic, and consume them in another place in the (Google) Cloud. You can see some different examples of publishers and consumers in the link. In your case you could for example write a Java application that writes random numbers to the Pub/Sub queue, where they will sit for 7 days to be consumed by another component (for example, Google Cloud Dataflow). To get started developing, you can find the SDKs here (there is a Java SDK).
Google Cloud Dataflow is managed service running Apache Beam pipelines to process your data at scale. You can learn about the different concepts here and get started designing your pipeline here. I suggest taking a look at some examples first though, which will make it more easy to grasp what is actually going on. Dataflow has a PubSub connector, so in your application you will be able to read from the topic you created before. In Dataflow you can for example multiply all your random numbers and write them to a certain sink (for example Google Cloud Storage, or even BigQuery or PubSub again).
Google Cloud Storage: is a cloud storage where you can put files, for example the output of your Dataflow pipeline. You will be able to manually download the files using the Cloud Console UI, or you can use one of the SDKs to download the output programmatically.
Hope this gives you an overview and some pointers to start. Whenever you are ready and have a more concrete use case in mind, you can start looking at some more components.

Amazon EC2 scaling and upload temporary folder

I have an application based on php in one amazon instance for uploading and transcoding audio files. This application first uploads the file and after that transcodes that and finally put it in one s3 bucket. At the moment application shows the progress of file uploading and transcoding based on repeatedly ajax requests by monitoring file size in a temporary folder.
I was wondering all the time if tomorrow users rush to my service and I need to scale my service with any possible way in AWS.
A: What will happen for my upload and transcoding technique?
B: If I add more instances does it mean I have different files on different temporary conversion folders in different physical places?
C: If I want to get the file size by ajax from http://www.example.com/filesize up to the finishing process do I need to have the real address of each ec2 instance (i mean ip,dns) or all of the instances folders (or folder)?
D: When we scale what will happen for temporary folder is it correct that all of instances except their lamp stack locate to one root folder of main instance?
I have some basic information about scaling in the other hosting techniques but in amazon these questions are in my mind.
Thanks for advice.

It is difficult to answer your questions without knowing considerably more about your application architecture, but given that you're using temporary files, here's a guess:
Your ability to scale depends entirely on your architecture, and of course having a wallet deep enough to pay.
Yes. If you're generating temporary files on individual machines, they won't be stored in a shared place the way you currently describe it.
Yes. You need some way to know where the files are stored. You might be able to get around this with an ELB stickiness policy (i.e. traffic through the ELB gets routed to the same instances), but they are kind of a pain and won't necessarily solve your problem.
Not quite sure what the question is here.
As it sounds like you're in the early days of your application, give this tutorial and this tutorial a peek. The first one describes a thumbnailing service built on Amazon SQS, the second a video processing one. They'll help you design with best AWS practices in mind, and help you avoid many of the issues you're worried about now.

One way you could get around scaling and session stickiness is to have the transcoding update a database with the current progress. Any user returning checks the database to see the progress of their upload. No need to keep track of where the transcoding is taking place since the progress gets stored in a single place.
However, like Christopher said, we don't really know anything about you're application, any advice we give is really looking from the outside in and we don't have a good idea about what would be the easiest thing for you to do. This seems like a pretty simple solution but I could be missing something because I don't know anything about your application or architecture.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js