I am using the memcache cluster in Elasticache. On the AWS UI, I do not see the data I am storing in the cache (storing a JSON response object). Is there a way I can visualize this json?
Why I want to see the data being cached:
I recently updated the json structure that needs to be stored and want to verify that the new structure is being stored in cache.
AWS just manages the Memchached servers for you. It doesn't really concern itself with the data. You would need to connect to the Memcached cluster via a Memcached client to view the data.
Related
We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.
This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.
I am building out a simple sensor which sends out 5 telemetry data to AWS IoT Core. I am confused between AWS Timestream DB and Elastic Search to store this telemetries.
For now I am experimenting with Timestream and wanted to know is this the right choice ? Any expert suggestions.
Secondly I want to store the db records for ever as this will feed into my machine
learning predictions in the future. Timestream deletes records after a while or is it possible to never delete it
I will be creating a custom web page to show this telemetries per tenant - any help with how I can do this. Should I directly query the timestream db over api or should i back it up in another db like dynamic etc ?
Your help will be greatly appreciated. Thank you.
For now I am experimenting with Timestream and wanted to know is this the right choice? Any expert suggestions.
I would not call myself an expert but Timestream DB looks like a sound solution for telemetry data. I think ElasticSearch would be overkill if each of your telemetry data is some numeric value. If your telemetry data is more complex (e.g. JSON objects with many keys) or you would benefit from full-text search, ElasticSearch would be the better choice. Timestream DB is probably also easier and cheaper to manage.
Secondly I want to store the db records for ever as this will feed into my machine learning predictions in the future. Timestream deletes records after a while or is it possible to never delete it
It looks like the retention is limited to 4 weeks 200 Years per default. You probably can increase that by contacting AWS support. But I doubt that they will allow infinite retention.
We use Amazon Kinesis Data Firehose with AWS Glue to store our sensor data on AWS S3. When we need to access the data for analysis, we use AWS Athena to query the data on S3.
I will be creating a custom web page to show this telemetries per tenant - any help with how I can do this. Should I directly query the timestream db over api or should i back it up in another db like dynamic etc ?
It depends on how dynamic and complex the queries are you want to display. I would start with querying Timestream directly and introduce DynamoDB where it makes sense to optimize cost.
Based on your approach " simple sensor which sends out 5 telemetry data to AWS IoT Core" Timestream is the way to go, fairly simple and cheaper solution for simple telemetry data.
The Magnetic storage is above what you will ever need (200years)
I would like to understand how we can estimate the data transfer costs.
let me explain the set up,
I have a rest endpoint for accessing data from our caches for multiple users in multiple regions on the cloud.
the set up consists of cassandra, hazelcast caches for data storage. the added complexity is in having the source of the data to cassandra from components in on-premise server
Cassandra Set up:
cassandra nodes spread across the AZs. these are in two regions (UK and HK). streaming services from US, ME on premise servers access the data but only when the data is not present in our Hazelcast caches. the UK cassandra instance replicates data to HK instance for data consistency
HZ set up:
HZ caches are set up in 5 regions as a local cache. these caches sync up using a bidirectional sync. when a data is not found in the cache to serve a rest call, it initiates a gprc call from a service to pull the data to pull the missing data
my method of estimating data transfer is
for api, payload * number of requests in a day
How do I estimate the data transfer for cassandra replication ( includes the gossip ) and Hazelcast Replication across regions ?
For the Hazelcast part, if you enable diagnostics logging on the Hazelcast member, you can read the following metrics: bytesReceived and bytesSent.
Read more at: https://groups.google.com/g/hazelcast/c/IDIynkEG1YE
I have some JSON data, I want to record this JSON into my Amazon DB. Is there anyway to do it? I researched some info but nothing helped.
With PostgreSQL on RDS you can take advantage JSON specific datatypes and functions.
https://aws.amazon.com/rds/postgresql/
http://www.postgresql.org/docs/9.3/static/functions-json.html
has anyone ever exported S3 data from Amazon AWS into local database using EMR? I want to write a custom M/R solution that would extract certain data and parallel load into a local network database instance. I have not seen anything on Amazon website that states that that is possible or not. Lot of mentioning of moving the data within AWS instances.
When you say a "local network database", are you referring to a database on an EC2 instance or your local network?
Either way is possible - if you are using a non-EC2 or non-AWS database, just make sure to open up your security groups / firewall to make the necessary network connections.
As for loading data from S3 into your local database:
You can crunch data from S3 using EMR and convert it into CSV format using the mappers, and bulk import that into your database. This will likely be the fastest - since bulk import from CSV will allow the database to import data really fast.
You can use the EMR mappers to insert data directly into the database - but I don't recommend this approach. With multiple mappers writing to the database directly, you can easily overload the database and cause stalls and the process to fail.