Can I define the region Google Cloud Container Builder runs in? - google-container-registry

My Google Cloud Container Builder steps are touching information that should not leave the European Union. Is it possible to restrict the region for a build so that it won't be executed in us-central1 for example?
I know I can define the region, where the resulting images are stored, but that does not mean the processing happens within the EU, or am I wrong?

You are correct that you cannot at this time define the region where the processing happens. However, since all processing of your build is ephemeral, data lives in the arbitrarily-selected processing region only for the brief lifetime of your build.
If you believe you need more control over regionality than this ephemeral lifespan offers, please make this a formal feature request by sending details regarding your use case to gcr-contact#google.com.

Related

Is the content on disk in cloud (Azure, AWS) zeroized prior to re-releasing to other users?

Wanted to know if cloud based platforms such as Azure and Amazon zeroize the content on the hard disk whenever an 'instance' is 'deleted' and prior to making it available for other users?
I've tried using 'dd' command on an Amazon-LightSail instance and it appears that the raw data is indeed zeroized. However was not sure if it was by chance (i just tried few random lengths) or if they actually take care to do that.
The concern is, if I leave passwords in configuration files, then someone who comes along would be able to read them (theoretically). Same goes for data in a database.
Generically, the solution to your concern typically used by Azure is storage encryption.
Your data is encrypted by default at the platform level with a key specific to your subscription; when the data or resource is removed, whether or not the storage is zeroed, it is effective inaccessible to a resource deployed on the same storage in another subscription.

GCP cloud functions, how to trigger everything in a bucket?

I'm playing around with Google cloud functions. My first conclusion: they are really perfect! I created a function which gets triggered by a modification on the document which is stored in a bucket (or a new upload). This works fine.
But then I started to think: what if I want to trigger all files inside the buckets to be executed against a NEW function. The previous functions are already ran against all files, so I prefer to only run the NEW function agains all documents.
How do you guys do this? So basically my questions are:
How do you keep track of what functions are already applied to the files?
How do you trigger all files to re-apply all functions?
How do you trigger all files for just ONE (new) function?
How do you keep track of what functions are already applied to the
files?
Cloud Functions triggers on events. Once the event fires, a Cloud Function is called (if setup to do so). Nothing within GCP keeps track of this except forStackDriver. Your functions will need to keep track of their actions including being triggered for which object.
How do you trigger all files to re-apply all functions?
There is no command or feature to trigger a function for all files. You will need to implement this feature yourself.
How do you trigger all files for just ONE (new) function?
There is no command or feature to trigger a function for a new function. You will need to implement this feature yourself.
Depending on the architecture that you are trying to implement, most people use a database such as Cloud Datastore to track objects within a bucket, transformations that occur and results.
Using a database will allow you to accomplish your goals, but with some effort.
Keep in mind that Cloud Functions has a timeout after running for 540 seconds. This means if you have millions of files, you will need to implement an overlapping strategy for procssing that many objects.
For cases where I need to process millions of objects, I usually launch App Engine Flexible or Compute Engine to complete large tasks and then shutdown once completed. The primary reason is very high bandwidth to Google Storage and Datastore.

Does SNS retain my data?

I am evaluating push notification services and cannot use services on the cloud as laws prohibit customer identification data being stored off-premise.
Question
Is there any chance data will be stored off-premise if I use AWS-SNS API (not the console) to send push notifications to end user devices via code hosted on-premise(using AWS SDK)? In other words, will SNS retain my data or will it forget it right after it send the notification?
What have I tried so far?
Combed through the documentation as much as I could, but couldn't find anything to be 100% sure.
Would appreciate any pointers on this. TIA.
I would pose this question directly to AWS as it pertains to a legal requirement. I would clarify if the laws you need to comply with are in relation to data at rest or in transit, or both. Additionally if there are any circumstances where it would be ok for one or both of the aforementioned if there was certain security aspects that have been met.
Knowing no real detail about your use case I will say that AWS has a Region specifically for use by the US Government. If your solution is for the US Government then you should be making use of this Region as it ticks off a lot of compliance forms for you well in advance.
You can open a support ticket in the AWS console.
Again if there is a legal requirement for your data I thoroughly recommend that you ask AWS directly so that you may reference their answer in writing in the future.
Even if they didn't store it, how can you prove that to auditors?
Besides, what is the difference between storing something in memory (which they obviously have to do) and storing something on disk? One is volatile and the other isn't I guess. But from a compliance point of view, an admin on the box can get both, so who cares if the hardware with your data on it is a stick of RAM or a disk plugged into a SATA port?

In WebJobs SDK, How to bind additional CloudStorageAccount for Blob output?

WebJobs SDK is doing wonderful job simplifying amount of code one need to write to save blobs to storage, but all within ONE storage account that is the default AzureJobsStorage.
Having everything (Queues,Blobs,Tables, and Heartbeats) in one storage account will throttle that account in medium-load production environment.
Of course, I can write legacy WindowsAzure.Storage code to save blobs to desired storage account, but I will loose the simplicity of the WebJobs SDK.
Appreciate any suggestions or advice.
Today, the WebJobs SDK supports only two Storage accounts per host:
AzureWebJobsStorage - used for your app's data
AzureWebJobsDashboard - used for logging (heartbeats, functions, etc) and dashboard indexing
The two accounts can be different if you want but that's all the separation you can do for now.
We have an item on the backlog to support multiple storage accounts for data but there is no ETA for it.
This is somewhat of a hack around the limitation, but let's say you want specific jobs associated with storage accounts (instead of one job accessing and writing to different storage accounts). You could open two different job hosts with different configs, but also create your own TypeLocator to filter which jobs are associated with specific hosts.

Amazon EC2 scaling and upload temporary folder

I have an application based on php in one amazon instance for uploading and transcoding audio files. This application first uploads the file and after that transcodes that and finally put it in one s3 bucket. At the moment application shows the progress of file uploading and transcoding based on repeatedly ajax requests by monitoring file size in a temporary folder.
I was wondering all the time if tomorrow users rush to my service and I need to scale my service with any possible way in AWS.
A: What will happen for my upload and transcoding technique?
B: If I add more instances does it mean I have different files on different temporary conversion folders in different physical places?
C: If I want to get the file size by ajax from http://www.example.com/filesize up to the finishing process do I need to have the real address of each ec2 instance (i mean ip,dns) or all of the instances folders (or folder)?
D: When we scale what will happen for temporary folder is it correct that all of instances except their lamp stack locate to one root folder of main instance?
I have some basic information about scaling in the other hosting techniques but in amazon these questions are in my mind.
Thanks for advice.
It is difficult to answer your questions without knowing considerably more about your application architecture, but given that you're using temporary files, here's a guess:
Your ability to scale depends entirely on your architecture, and of course having a wallet deep enough to pay.
Yes. If you're generating temporary files on individual machines, they won't be stored in a shared place the way you currently describe it.
Yes. You need some way to know where the files are stored. You might be able to get around this with an ELB stickiness policy (i.e. traffic through the ELB gets routed to the same instances), but they are kind of a pain and won't necessarily solve your problem.
Not quite sure what the question is here.
As it sounds like you're in the early days of your application, give this tutorial and this tutorial a peek. The first one describes a thumbnailing service built on Amazon SQS, the second a video processing one. They'll help you design with best AWS practices in mind, and help you avoid many of the issues you're worried about now.
One way you could get around scaling and session stickiness is to have the transcoding update a database with the current progress. Any user returning checks the database to see the progress of their upload. No need to keep track of where the transcoding is taking place since the progress gets stored in a single place.
However, like Christopher said, we don't really know anything about you're application, any advice we give is really looking from the outside in and we don't have a good idea about what would be the easiest thing for you to do. This seems like a pretty simple solution but I could be missing something because I don't know anything about your application or architecture.