What is the difference between AWS Elasticsearch and AWS CloudSearch? - amazon-web-services

When should I use AWS Elasticsearch over AWS CloudSearch and vice versa?

Amazon Elasticsearch Service provides a fully-managed implementation of Elasticsearch and Kibana. It is commonly used for near real-time visualizations of logs files (but can handle many use-cases).
Amazon CloudSearch is based on Apache Solr. It requires data to be loaded as documents and is good for full-text search, with an understanding of languages and grammar (eg synonyms, words to ignore).
So, it really comes down to whether you want to use Elasticsearch or Solr.

Related

How to differentiate Glue and Athena to use Apache Spark in AWS?

On Nov-2022, Amazon Athena had started supporting Apache Spark.
https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-athena-now-supports-apache-spark/?nc1=h_ls
I think it looks very similar to Glue.
How can we differentiate Athena and Glue when using serverless Spark on AWS ?
Thanks in advance.
I looked for information comparing the two on internet, but could not find it.
I would like to decide which to use depending on the situation, especially for streaming processes.

What is drawback of AWS Glue Data Catalog?

What is major drawback of AWS Glue Data Catalog? Been asked in one of interview.
That could be answered in a number of ways depending on the wider context. For example:
It's an AWS managed service, so using it locks you into the AWS
ecosystem (instead of using a standalone Hive metastore for example)
It's limited to the data sources supported by the Glue Data Catalog
It doesn't integrate with third-party authentication and
authorisation tools like Apache Ranger (as far as I am aware)

What are the differences between Amazon Redshift and the new AWS Glue datawarehousing services?

I am confused about these two services. It looks that they are offering the same service. Probably the only difference is that the Glue catalog can contain a wider range of data sources. Does it mean that AWS Glue can replace Redshift?
The Comment is right , These two services are not same AWS Glue is ETL Service while AWS Redshift is Data Warehousing service.
According to AWS Documentation :
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.
According to AWS Documentation :
AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores
You can Refer the Documentation Provided by AWS for Details but essentially these are totally different services.

Aws: best approach to process data from S3 to RDS

I'm trying to implement, I think, a very simple process, but I don't really know what's the best approach.
I want to read a big csv (around 30gb) file from S3, make some transformation and load it into RDS MySQL and I want this process to be replicable.
I tought that the best approach was Aws data pipeline, but I've found that this service is more designed to load data from different sources to redshift after several transformtions.
I've also seen that the process of creating a pipeline is slow and a little bit messy.
Then I've found the dataduct wrapper of Coursera, but after some research, it seems that this project has been abandoned (the last commit was one year ago).
So I don't know if I should continue trying with aws data pipeline or take another approach.
I've also read about AWS Simple Workflow and Step Functions, but I don't know if it's simpler.
Then I've seen a video of AWS glue and it looks nice, but unfortunatelly it's not yet available and I don't know when Amazon will launch it.
As you see, I'm a little bit confuse, can anyone enlight me?
Thanks in advance
If you are trying to get them into RDS so you can query them, there are other options that do not require the data to be moved from S3 to RDS to do SQL like queries.
You can use Redshift spectrum to read and query information from S3 now.
Using Amazon Redshift Spectrum, you can efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables
Step 1. Create an IAM Role for Amazon Redshift
Step 2: Associate the IAM Role with Your Cluster
Step 3: Create an External Schema and an External Table
Step 4: Query Your Data in Amazon S3
Or you can use Athena to query the data in S3 as well if Redshift is too much horsepower for the need job.
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.
You could use an ETL tool to do the transformations on your csv data and then load it into your RDS database. There are a number of open source tools that do not require large licensing costs. That way you can pull the data into the tool, do your transformations and then the tool will load the data into your MySQL database. For example there is Talend, Apache Kafka, and Scriptella. Here's some information on them for comparison.
I think Scriptella would be an option for this situation. It can use SQL scripts (or other scripting languages), and has JDBC/ODBC compliant drivers. With this you could create a script that would perform your transformations and then load the data into your MySQL database. And you would be using familiar SQL (I'm assuming you already can create SQL scripts) so there isn't a big learning curve.

What does it mean to use "Amazon ES for content recommendations"?

Looking at the marketing literature for amazon kinesis analytics, I'm looking through their real-time log analytics flow and they have a 4th step where the data is piped to Amazon ES for content suggestions.
What is Amazon ES? Is that their elastic search service? If so, how are personalized recommendations generated from elastic search?
I'm guessing you've already realized this by now, but, yes Amazon ES is ElasticSearch, you would need to do the work yourself to pull the elasticsearch data and determine a recommendation. This marketing material appears to be more for what you can do rather than what it does out of the box.