Store data from online JSON into Amazon aws (rds or dynamo) - amazon-web-services

I have some JSON data, I want to record this JSON into my Amazon DB. Is there anyway to do it? I researched some info but nothing helped.

With PostgreSQL on RDS you can take advantage JSON specific datatypes and functions.
https://aws.amazon.com/rds/postgresql/
http://www.postgresql.org/docs/9.3/static/functions-json.html

Related

Build s3 Datalake Using Dynamo DB data source

i'am a data engineer using AWS, we want to build a data pipeline in order to visualise our Dynmaodb data on QuickSigth, as u know, it's not possible de connect directly dynamo to Quick...u have to pass by S3.
S3 Will be our datalake, the issue is that the date updates frequently (for exemple column named can change / costumer status can evolve..)
So i'am looking for a batch solution in order to always get the lastest data from dynamo on my s3 datalake and visualise it in quicksigth.
Thank u
You can access your tables at DynamoDB, in the console, and export data to S3 under the Streams and Exports tab. This blog post from AWS explains just what you need.
You could also try this approach with Athena instead of S3.

AWS quicksight and influx db

Hi I have an influxDb installed on aws ec2 instance, wanted to show its data on aws quicksight. As I don't see influx dB in predefined data source list of aws quicksight. Will it be possible to create data source for influxDb and show its data on quick sight view. How i can defined my custom datasource for influxdb.
As I know influxDb compatibility is good with Grafana, so not sure whether I will be abel get data on quicksight view. Please let me know if you are aware, how can I achieve it.
Thanks.
Currently, AWS Quicksight doesn't support showing the metrics from InfluxDB.
To visualize your metric in AWS Quicksight, simply do following steps:
Collect data from InfluxDB and store in S3, GlueCatalog, Athena or relational database.
Show your data in Quicksight from stored data.
Let's see official documentation for more detailed information
https://docs.aws.amazon.com/quicksight/latest/user/welcome.html
Not yet a supported database worth using the send feedback setting in Quicksight to see if it is in the roadmap.

AWS Timestream DB - AWS IOT

I am building out a simple sensor which sends out 5 telemetry data to AWS IoT Core. I am confused between AWS Timestream DB and Elastic Search to store this telemetries.
For now I am experimenting with Timestream and wanted to know is this the right choice ? Any expert suggestions.
Secondly I want to store the db records for ever as this will feed into my machine
learning predictions in the future. Timestream deletes records after a while or is it possible to never delete it
I will be creating a custom web page to show this telemetries per tenant - any help with how I can do this. Should I directly query the timestream db over api or should i back it up in another db like dynamic etc ?
Your help will be greatly appreciated. Thank you.
For now I am experimenting with Timestream and wanted to know is this the right choice? Any expert suggestions.
I would not call myself an expert but Timestream DB looks like a sound solution for telemetry data. I think ElasticSearch would be overkill if each of your telemetry data is some numeric value. If your telemetry data is more complex (e.g. JSON objects with many keys) or you would benefit from full-text search, ElasticSearch would be the better choice. Timestream DB is probably also easier and cheaper to manage.
Secondly I want to store the db records for ever as this will feed into my machine learning predictions in the future. Timestream deletes records after a while or is it possible to never delete it
It looks like the retention is limited to 4 weeks 200 Years per default. You probably can increase that by contacting AWS support. But I doubt that they will allow infinite retention.
We use Amazon Kinesis Data Firehose with AWS Glue to store our sensor data on AWS S3. When we need to access the data for analysis, we use AWS Athena to query the data on S3.
I will be creating a custom web page to show this telemetries per tenant - any help with how I can do this. Should I directly query the timestream db over api or should i back it up in another db like dynamic etc ?
It depends on how dynamic and complex the queries are you want to display. I would start with querying Timestream directly and introduce DynamoDB where it makes sense to optimize cost.
Based on your approach " simple sensor which sends out 5 telemetry data to AWS IoT Core" Timestream is the way to go, fairly simple and cheaper solution for simple telemetry data.
The Magnetic storage is above what you will ever need (200years)

Storing data into AWS Aurora Mysql through Lambda?

I want to run Lambda once daily which would query third-party API and returned data needs to be stored in Aurora MySQL database. Is it even possible to save data in Aurora directly through Lambda. After reading docs, the only way possible is via text files saved in S3 which is also through command line Interface using LOAD DATA FROM S3 statement and not through AWS SDK or API call.. Is that so or Am I missing something? Is there any way I can achieve desired results Lambda--->Aurora through SDK or API calls.
Thanks in advance. Pardon me if I am sounding silly
Take a look at RDS Proxy: https://aws.amazon.com/rds/proxy/. This should allow you to interact.

Confusions related to Redshift about dataset (Structured, Unstructured, Semi-structured) and format to be used

Can anyone explain me clearly about what kind of data Redshift can handle(like structured, unstructured , or in any formats)?
How to copy Cloudfront logs into Amazon Redshift even the log is in unstructured data without going to Amazon EMR?
**How to find Database size which is created in Amazon Redshift?
Please someone explain me clearly about all the three questions which i have mentioned it above...It will be better if you explain me with some example or sample code or any source it will be very helpful for my project
Amazon Redshift provides a standard SQL interface (based on PostgreSQL). Therefore, it is best suited for structured data that is stored in Tables, Rows and Columns.
It is also possible to store JSON records within a field and access them via JSON functions.
To load data into Amazon Redshift, it needs to be in a delimited file format, such as comma delimited, tab delimited, fixed-length fields or JSON format. Any data that is not in a suitable format will need to be pre-processed and converted to a suitable format. This could be done with tools such as Amazon Athena (Presto) or Amazon EMR (Hadoop).
Amazon CloudFront logs are in Tab-Delimited format and can be loaded directly into Amazon Redshift. For an example, see: Analyzing S3 and CloudFront Access Logs with AWS Redshift
Information about disk space consumed by tables can be obtained via the SVV_DISKUSAGE system view.