Getting data from local running java app to google cloud app and back - google-cloud-platform

I wanted to dive into the world of distributed systems, cloud computing, IoT, etc., and I gotta be honest, I imagined everything being a little more intuitive than it finally turned out.
I had a tiny testing architecture in mind, that I'd like to set up with Google Clouds and their services, but I am kinda stuck since I can't get my head around some concepts.
What I basically wanted to do (as a first step) is writing a simple java application that would run locally on my computer. This application should just generate random numbers and send those numbers somehow to the google cloud. On the cloud I wanted to define another java application that would manipulate those random numbers in some kind of way (it doesn't matter actually). Afterwards, the output should somehow get back to me of course. And actually, at the moment, I don't even care about how exactly. It could be somehow back to my local app (with some kind of listener, would that be possible?). But it could also simply store the results somewhere on the google cloud? Or maybe upload them to my google drive?
I guess you already noticed that - at some points - I don't even know what i want exactly, since I'm not sure of what is possible, and what not.
Could you provide me some help to get this set up?
The most important questions for me right now are:
Do I need to use a pubsub system, where my generated numbers are sent
to, and which then forwards this to the cloud app, that transforms my
data?
How do I get my data from the local app to the cloud services?
Would my data transforming app run on Google Dataflow?
Above I wrote "as a first step"... because later I would also like to send config files (for example in json format, or xml) to the cloud, and the
cloud application should transform those config files... if I get the
first scenario running the I guess this woul also be no problem
right?
Those are just a few of the questions that are on my mind currently. The most important ones I guess.
It would be a big help. Sorry, if the questions are not very precise, but I really need some kind of pointing into the right direction.
Thank you in advance!

I think it would be good to read up on some of the technologies you mention here:
Google Cloud Pubsub: Pub/Sub enables you to publish messages to a topic, and consume them in another place in the (Google) Cloud. You can see some different examples of publishers and consumers in the link. In your case you could for example write a Java application that writes random numbers to the Pub/Sub queue, where they will sit for 7 days to be consumed by another component (for example, Google Cloud Dataflow). To get started developing, you can find the SDKs here (there is a Java SDK).
Google Cloud Dataflow is managed service running Apache Beam pipelines to process your data at scale. You can learn about the different concepts here and get started designing your pipeline here. I suggest taking a look at some examples first though, which will make it more easy to grasp what is actually going on. Dataflow has a PubSub connector, so in your application you will be able to read from the topic you created before. In Dataflow you can for example multiply all your random numbers and write them to a certain sink (for example Google Cloud Storage, or even BigQuery or PubSub again).
Google Cloud Storage: is a cloud storage where you can put files, for example the output of your Dataflow pipeline. You will be able to manually download the files using the Cloud Console UI, or you can use one of the SDKs to download the output programmatically.
Hope this gives you an overview and some pointers to start. Whenever you are ready and have a more concrete use case in mind, you can start looking at some more components.

Related

Pros and Cons of Google Dataflow VS Cloud Run while pulling data from HTTP endpoint

This is a design approach question where we are trying to pick the best option between Apache Beam / Google Dataflow and Cloud Run to pull data from HTTP endpoints (source) and put them down the stream to Google BigQuery (sink).
Traditionally we have implemented similar functionalities using Google Dataflow where the sources are files in the Google Storage bucket or messages in Google PubSub, etc. In those cases, the data arrived in a 'push' fashion so it makes much more sense to use a streaming Dataflow job.
However, in the new requirement, since the data is fetched periodically from an HTTP endpoint, it sounds reasonable to use a Cloud Run spinning up on schedule.
So I want to gather pros and cons of going with either of these approaches, so that we can make a sensible design for this.
I am not sure this question is appropriate for SO, as it opens a big discussion with different opinions, without clear context, scope, functional and non functional requirements, time and finance restrictions including CAPEX/OPEX, who and how is going to support the solution in BAU after commissioning, etc.
In my personal experience - I developed a few dozens of similar pipelines using various combinations of cloud functions, pubsub topics, cloud storage, firestore (for the pipeline process state managemet) and so on. Sometimes with the dataflow as well (embedded into the pipelieines); but never used the cloud run. But my knowledge and experience may be not relevant in your case.
The only thing I might suggest - try to priorities your requirements (in a whole solution lifecycle context) and then design the solution based on those priorities. I know - it is a trivial idea, sorry to disappoint you.

How to build an IoT system with mobile app using AWS

My project is to develop an IoT system that uses sensors to collect data which can be monitored by the user using a mobile app. I want to use AWS for this project but since I am a beginner, I am confused on where to start. Do you have some tips or some chronological steps that I have to learn and do to be able to create this project?
Not a criticism, but an observation. If you don't know the conceptual steps to achieve this, understand this before you jump into any technology. If you can't do this right now without any AWS technology, jumping in with AWS is going to make your life 100x harder.
Break it down into key challenges.
IoT Sensors. Start with a single sensor. Build a POC with a Raspberry PI that sends data from the device to www.example.com
Build a listening service on your localhost. i.e. HelloWorldIoTDevice simply responds with a 200 OK, don't worry about the data and payloads right now
Save this to a database that simple shows 200 OK messages every time a successful message is received from the IoT device
Build a HelloWorldWebApp that reads this data from the database
Build an API that reads the data from the database
Build a HelloWorldMobileApp that reads data from the API
Build it in full - Add payloads, add authentication and authorisation, get a POC published on the web so it works end to end
Productionise it
Then you need to get actual data flowing. Conceptually this is a relatively simple thing. But in reality this requires an extremely in-depth understanding of technology layers across the entire stack which is extremely challenging if you are a beginner as this is an enormous learning curve.
Take a look at Ngrok to help with building these types of POCs, https://www.contradodigital.com/2016/04/09/access-localhost-internet/
Hope that provides a bit of guidance. Take one step at a time.

Can we use Google cloud function to convert xls file to csv

I am new to google cloud functions. My requirement is to trigger cloud function on receiving a gmail and convert the xls attachment from the email to csv.
Can we do using GCP.
Thanks in advance !
Very shortly - that is possible as far as I know.
But.
You might found that in order to automate this task in a reliable, robust and self-healing way, it may be necessary to use half a dozen cloud functions, pubsub topics, maybe a cloud storage, maybe a firestore collection, security manager, customer service account with relevant IAM permissions, and so on. Maybe more than a dozen or two dozens of different GCP resources. And, obviously, those cloud functions are to be developed (I mean the code is to be developed). All together that may be not a very easy or quick to implement.
At the same time, I personally saw (and contributed to a development of) a functional component, based on cloud functions, which together did exactly what you would like to achieve. And that was in production.

Google API Speeds Slow in Cloud Run / Functions?

Bottom Line: Cloud Run and Cloud Functions seem to have bizarrely limited bandwidth to the Google Drive API endpoints. Looking for advice on how to work around, or, ideally, #Google support to fix the underlying issue(s) as I will not be the only like use case.
Background: I have what I think is a really simple use case. We're trying to automate private domain Google Drive users to take existing audio recordings and send them off to Speech API to generate a transcript on an ad hoc basis, and to dump the transcript back into the same Drive folder with email notification to the submitter. Easy, right? Only hard part is that Speech API will only read from Google Cloud Storage, so the 'hard part' should be moving the file over. 'Hard' doesn't really cover it...
Problem: Writing in nodejs and using the latest version of the official modules for Drive and GCS, the file copying was going extremely slow. When we broke things down, it became apparent that the GCS speed was acceptable (mostly -- honestly it didn't get a robust test, but was fast enough in limited testing); it was the Drive ingress which was causing the real problem. Using even the sample Google Drive Download app from the repo was slow as can be. Thinking the issue might be either my code or the library, though, I ran the same thing from the Cloud Console, and it was fast as lightning. Same with GCE. Same locally. But in Cloud Functions or Cloud Run, it's like molasses.
Request:
Has anyone in the community run into this or a like issue and found a workaround?
#Google -- Any chance that whatever the underlying performance bottleneck is, you can fix it? This is a quintessentially 'serverless' use case, and it's hard to believe that the folks who've been doing this the longest can't crack it.
Thank you all in advance!
Updated 1/4/19 -- GCS is also slow following more robust testing. Image base also makes no difference (tried nodejs10-alpine, nodejs12-slim, nodejs12-alpine without impact), and memory limits equally do not impact results locally or on GCP (256m works fine locally; 2Gi fails in GCP).
Google Issue at: https://issuetracker.google.com/147139116
Self-inflicted wound. Google-provided code seeks to be asynchronous and do work in the background. Cloud Run and Cloud Functions do not support that model (for now at least). Move to promise-chaining and all of a sudden it works like it should -- so long as the CPU keeps the attention it needs. Limits what we can do with CR / CF, but hopefully that too will evolve.

Picture Manipulation with AWS

I've been a mobile developer for a few years, and Im looking to expand to cloud integration with my apps. Im looking into AWS solutions to fill this need. I don't know a ton about servers or cloud capabilities, so I'm trying to get pointed in the right direction, and maybe be introduced to some good resources.
My goal is to be able to upload some images to AWS and manipulate these images in the cloud. I'm sure that I'll need S3 to store my images, but is an EC2 instance the correct thing to use to perform the manipulation? This is where my lack of knowledge of servers is holding me back.
I think that the best answer I could get would be a comment on whether my needs from AWS are what I listed above, and a point in the right direction towards articles to tutorials of how to get things up and running.
Thanks much for the help!
What I ended up doing was using AWS Lambda to accomplish what I needed. Running a node.js based lambda function with ffmpeg-like manipulation on the images/media that I was uploading worked out quite well.
Side note :: The processing that I was doing was fairly lightweight, so it worked well with lambda. If things scale up any further I might consider switching the processing to an EC2 instance.