I am new to snowflake and I have two questions regarding Snowflake on AWS.
I registered for a free account of Snowflake and It gave me a link to access its web UI and thereafter I could create a stage in WebUI using my exist AWS S3 bucket , however after loading of data , I am not sure , where does snowflake stores the data. Can I access its file system? Can I change its file system to my existing AWS S3?
While registration of Snowflake on AWS , I went to AWS Marketplace and Subscribed to snowflake account and it gave a snowflake webUI. Do I need to do anything else for deployment of Snowflake on AWS?
The data you imported from S3 into Snowflake now resides in a logical database table. The database stores its data in its own S3 bucket. The database storage format is proprietary, and a database abstract storage layer S3 bucket possibly contains data from multiple customers. The data is encrypted, and in the end Snowflake probably doesn't even know eg. which disk the data is on, they are S3 users like everyone else.
You can do almost anything from the GUI. But the GUI doesn't provide a proper archive for code and object history etc. Snowflake has recently acquired a company with a development tool, so maybe something more than the GUI is in the coming.
Related
Currently, we use AWS IAM User permanent credentials to transfer customers' data from our company's internal AWS S3 buckets to customers' Google BigQuery tables following BigQuery Data Transfer Service documentation.
Using permanent credentials possesses security risks related to the data stored in AWS S3.
We would like to use AWS IAM Role temporary credentials, which require the support of a session token on the BiqQuery side to get authorized on the AWS side.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
We considered Omni framework (https://cloud.google.com/bigquery/docs/omni-aws-cross-cloud-transfer) to transfer data from S3 to BQ, however, we faced several concerns/limitations:
Omni framework targets data analysis use-case rather than data transfer from external services. This concerns us that the design of Omni framework may have drawbacks in relation to data transfer at high scale
Omni framework currently supports only AWS-US-EAST-1 region (we require support at least in AWS-US-WEST-2 and AWS-EU-CENTRAL-1 and corresponding Google regions). This is not backward compatible with current customers' setup to transfer data from internal S3 to customers' BQ.
Our current customers will need to signup for Omni service to properly migrate from the current transfer solution we use
We considered a workaround with exporting data from S3 through staging in GCS (i.e. S3 -> GCS -> BQ), but this will also require a lot of effort from both customers and our company's sides to migrate to the new solution.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
No unfortunately.
The official Google BigQuery Data Transfer Service only mentions AWS access keys all throughout the documentation:
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
The irony of the Google documentation is that while it refers to best practices and links to the official AWS docs, it actually doesn't endorse best practices and ignores what AWS mention:
We recommend that you use temporary access keys over long term access keys, as mentioned in the previous section.
Important
Unless there is no other option, we strongly recommend that you don't create long-term access keys for your (root) user. If a malicious user gains access to your (root) user access keys, they can completely take over your account.
You have a few options:
hook into both sides manually (i.e. link up various SDKs and/or APIs)
find an alternative BigQuery-compatible service, which does as such
accept the risk of long-term access keys.
In conclusion, Google is at fault here of not following security best practices and you - as a consumer - will have to bear the risk.
I am using RESTful API, API provider having images on S3 bucket more than 80GB size.
I need to download these images and upload in my AWS S3 bucket, its time taking job.
Is there any way to copy image from API to my S3 bucket instead of I download and upload again.
I talked with API support they saying you are getting image URL, so its up to you how you handle,
I am using laravel.
is it way to get the sourced images url's and directly move images to S3 instead of first I download and upload.
Thanks
I think downloading and re-uploading to different accounts would be inefficient plus pricey for the API Provider. Instead of that I would talk to the respective API Provider and try to replicate the images across accounts.
Post replicate you can Amazon S3 inventory for various information related to the objects in the bucket.
Configuring replication when the source and destination buckets are owned by different accounts
You want "S3 Batch Operations". Search for "xcopy".
You do not say how many images you have, but 1000 at 80GB is 80TB, and for that size you would not even want to be downloading to a temporary EC2 instance in the same region file by file which might be a one or two day option otherwise, you will still pay for ingress/egress.
I am sure AWS will do this in an ad-hoc manner for a price, as they would do if you were migrating from the platform.
It may also be easier to allow access to the original bucket from the alternative account, but this is no the question.
I can't find some information about Amazon S3, hope you will help me. When is a file available for user to download, after the POST upload? I mean some small JSON file that doesn't require much processing. Is it available to download immediately after uploading? Or maybe amazon s3 works in some sessions and it always takes a few hours?
According to the doc,
Amazon S3 provides strong read-after-write consistency for PUTs and DELETEs of objects in your Amazon S3 bucket in all AWS Regions.
This means that your objects are available to download immediately after it's uploaded.
An object that is uploaded to an Amazon S3 bucket is available right away. There is no time period that you have to wait. That means if you are writing a client app that uses these objects, you can access them as soon as they are uploaded.
In case anyone is wondering how to programmatically interact with objects located in an Amazon S3 bucket through code, here is an example of uploading and reading objects in an Amazon S3 bucket from a client web app....
Creating an example AWS photo analyzer application using the AWS SDK for Java
My question is about how can I setup my Cloud storage bucket to retrieve data from my Campaign Manager account. I aim to process Camapaign report data in Big query, combining them with others data sources.
So in the documentation, it seems that it is possible with Transfert Data utility but I need before to store data files in a Cloud bucket and then it will be possible to use Data Transfer to get the data in BigQuery.
So how can I get Campaign Manager Data in a Google Cloud Storage?
Have you already tried following this documentation to setup BigQuery Data Transfer Service for Campaign manager? In the Before you begin section you'll need to contact either your campaign manager reseller or the campaign manager support to setup the Campaign Manager DTv2 files.
After completing this step, you will receive a Cloud Storage bucket name similar to the following: dcdt_-dcm_account123456
After doing this, you may now complete the rest of the documentation.
I'm planning to create a small public AMI for converting text files to PDF (I find nothing satisfying on the Store) and I have an issue.
I understood that AMI is nothing else than a frozen copy of a software that I've run successfully on another machine.
I however have an issue: how do I create a S3 Bucket by the installation of the AMI.
Use case:
A user comes to the store and finds the idea of my service cool and launch an instance.
The instance needs to create a S3 bucket to save the converted files (and maybe the source file as well) and it has to be one bucket per user and not a big bucket for all the files converted via the software.
I have many questions for that:
Is that possible to achieve that (was it designed like this)?
How should I create the bucket, is there a point and click interface at AMI setup or I need to do via AWS SDK?
If I need to it via the SDK, is there a way to access the user credentials (or some random token) so that I can create a bucket successfully?
Am I wrong, should be all the file saved on EBS and made available via an nginx on the AMI (and not using S3 at all)?
Oh and sorry if this question seems silly but I'm very fresh in this AWS cool tech!
Thanks!
You can achieve this using AWS cloudformation.
Steps would be that you first create an AMI and then write a cloudformation script that
Creates an instance from that AMI
Creates the S3 buckets for the
Map the newly created EC2 with the S3. (Since S3 buckets names are unique globally you might not get the name you want) .
However note that the services in cloud are not installed for each customer like the traditional on-premise system.
Here you have a concept of tenants. So every new customer would be a tenant and should be served from the same infrastructure. Basically when a new customer comes in you onboard that as a tenant and possibly create a folder within the already created S3 for this tenant where you store its artifacts. Or if for some justified business reason you just want to have a separate S3 bucket for each of the tenant then even that new S3 bucket should be created during tenant onboarding. Store the mapping of the tenant and the S3 folder/bucket somewhere so that you know which tenant's artifacts to store where