How to understand an OLAP cube in S3 or Cassandra?

How to understand an OLAP cube in S3 or Cassandra? - amazon-web-services

In this repository, its author mentions that we can stage OLAP cubes in Cassandra or S3:
Once the data is in Redshift, our chief goal is for the BI apps to be
able to connect to Redshift cluster and do some analysis. The BI apps
can either directly connect to the Redshift cluster or go through an
intermediate stage where data is in the form of aggregations
represented by OLAP cubes.
How is it possible? How would that work? Am I missing any essential concept? As I understand OLAP cubes are a special data structure that exists in OLAP databases. Does he maybe mean specific pre-calculated combinations of dimensions and facts stored in a OLTP-oriented database, like Cassandra?

Key features of OLAP are:
pivoting
slicing
dicing
drilling
And Redshift can do this.
It's architecture is aimed to solve OLAP and BI tasks. See amazon-redshift-developer-guide
Amazon Redshift is specifically designed for online analytic processing (OLAP) and business intelligence (BI) applications, which require complex queries against large datasets. Because it addresses very different requirements, the specialized data storage schema and query execution engine that Amazon Redshift uses are completely different from the PostgreSQL implementation. For example, where online transaction processing (OLTP) applications typically store data in rows, Amazon Redshift stores data in columns, using specialized data compression encodings for optimum memory usage and disk I/O. Some PostgreSQL features that are suited to smaller-scale OLTP processing, such as secondary indexes and efficient single-row data manipulation operations, have been omitted to improve performance.
But the line between terms is very smooth.
As Diana Shealy said:
Stop Abusing OLTP as OLAP
There’s a lot of confusion in the market between OLTP and OLAP, and due to the high price of commercial OLAPs, startups and budget-constrained developers have gone on to abuse an OLTP database as an OLAP database. The abuse falls into two categories:
An often multi-shard MySQL database with application layer scripting to perform historical event data analysis. Although this setup is extremely common, it is one of the least productive ways to approach analytics. MySQL is not optimized in any way for reading large ranges of data and its support for analytic functions is weak. As there are multiple alternatives, avoid this “inexpensive” solution because you’ll be paying the price in other places eventually.
Using PostgreSQL as an OLAP layer. This is a more legitimate choice than above for starting an analytics platform because of Postgres’s solid analytic User Defined Functions (UDFs). Also, thanks to its c-store extension, PostgreSQL can be turned into a columnar database, making it an affordable alternative to commercial OLAPs.
Finally, if you are considering moving from OLTPs abused as OLAPs to “real” OLAPs like Redshift, I encourage you to learn how to use Redshift’s COPY Command so that you can start seeing your data inside Redshift.
As for your questions:
How is it possible?
It's possible due to Redshift architecture (column database) and analytical features such as:
Window functions
Data Warehouse System Architecture
Performance
Columnar Storage
Internal Architecture and System Operation
Workload Management
Aggregate functions
How would that work?
See System and Architecture Overview for a detailed explanation of the Amazon Redshift data warehouse system architecture.
(Some links are already mentioned before in this post)
Essential concept?
Am I missing any essential concept?
I'd suggest more rely on technical details of specific solution instead of marketing terms. In the end, practical tasks are not solved by software naming or marketing, but with it's real functionality.
What's really important in DB landscape - is to consider two theorems:
CAP theorem
According to Iron triangle of CAP theorem, you can choose two points of three DB architecture components:
* consistency
* availability
* persistence
PIE theorem
Rick Houlihan of Amazon had a speech on choosing the DB archotecture. In addition to the CAP theorem, he also presented PIE theorem:
The PIE theorem posits that you can choose two out of three desirable features in a data system:
Pattern Flexibility
Efficiency
Infinite Scale
And Redshift is on PI dimension of the PIE triangle
Data structure
As I understand OLAP cubes are an special data structure that exists in OLAP databases. Does he maybe mean specific pre-calculated combinations of dimensions and facts stored in a OLTP-oriented database, like Cassandra?
Both OLAP aggregated data structures and Redshift distribution styles aimed one goal: make queries faster.
Column DB, distribution, parallel queries and other features are good for analytical tasks.
UPD
In comments you asked if Cassandra can work as OLAP service.
Cassandra and S3 can be used as a storage for pre-calculated aggregated data of dimensions.

Related

Google Professional Cloud Architect Exam - BigQuery vs Cloud Spanner

Recently I cleared the google cloud PCA exam but want to clarify one question which I have doubt.
" You are tasked with building online analytical processing (OLAP) marketing analytics and reporting tools. This requires a relational database that can operate on hundreds of terabytes of data. What is the Google-recommended tool for such applications?"
What is the answer? Is it Bigquery or cloud spanner? as there are 2 parts in question. If we consider it for OLAP then it is Bigquery and for 2nd part for RDBMS it should be Cloud Spanner.
Appreciate it if I can have some clarification.
Thanks

For Online Analytical Processing (OLAP) databases, consider using BigQuery.
When performing OLAP operations on normalized tables, multiple tables have to be JOINed to perform the required aggregations. JOINs are possible with BigQuery and sometimes recommended on small tables.
You can check this documentation for further information.
BigQuery for OLAP and Google Cloud Spanner for OLTP.
Please check this other page for more information about it.

I agree that the question is confusing.
But according to the official documentation :
Other storage and database options
If you need interactive querying in an online analytical processing
(OLAP) system, consider BigQuery.
However BigQuery is not considered relational database.

The big query does not provide you relationship between tables but you can join them freely.
If your performance falls cluster then partition on the joining fields.
Is it possible to create relationships between tables?
Some more literature if some want to go into the details.
By using MapReduce, enterprises can cost-effectively apply parallel
data processing on their Big Data in a highly scalable manner, without
bearing the burden of designing a large distributed computing cluster
from scratch or purchasing expensive high-end relational database
solutions or appliances.
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
Hence Bigquery

Price aside, why ever choose Google Cloud Bigtable over Google Cloud Datastore?

If I have a use case for both huge data storage and searchability, why would I ever choose Google Cloud Bigtable over Google Cloud Datastore?
I've seen a few questions on SO and other sides "comparing" Bigtable and Datastore, but it seems to boil down to the same non-specific answers.
Here's my current knowledge and my thoughts:
Datastore is more expensive.
In the context of this question, let's forget entirely about pricing.
Bigtable is good for huge datasets.
It seems like Datastore is, too? I'm not seeing what specifically makes Bigtable objectively superior here.
Bigtable is better than Datastore for analytics.
How? Why? It seems like I can do analytics in Datastore as well, no problem. Why is Bigtable seemingly the unanimous decision industry-wide for analytics? What value do GMail, eBay, etc. get from Bigtable that Datastore can't provide?
Bigtable is integrated with Hadoop, Spark, etc.
Is Datastore not as well, considering it's built on Bigtable?
From this question, this statement was made in an answer:
Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of [car] wheels, and so a car is not much different from wheels.
However, this seems analogy seems nonsensical, since the car (including the wheels) intrinsically provides more value than just the wheels of a car by themselves.
It seems at first glance that Bigtable is strictly worse than Datastore, only providing a single index and limiting quick searchability. What am I missing?

Bigtable and Datastore are optimized for slightly different use-cases, and offer different tradeoffs. The main ones are:
Data model:
Bigtable is a wide-column database -- think HBase and Cassandra
Datastore is a document database -- think MongoDB
Note that both of these can be used for key-value use cases
Cost model:
Bigtable charges per provisioned nodes
Datastore is serverless and charges per operation
In general, Bigtable is a good choice if you need:
Fast point-reads and range scans (especially at scale). Bigtable will offer lower latency for key-value lookups, as well as fast scans of contiguous rows - a powerful tool since rows are stored in lexicographic order. If you have simple, predictable query patterns and design your schema well, reading from Bigtable can be incredibly efficient.
High throughput writes (again, especially at scale). This is possible in part because Bigtable is eventually consistent - in exchange you can see big wins in price/performance.
Example use-cases that are great for Bigtable include time series data (for IoT, monitoring, and more - think extremely write heavy workloads and massive amounts of data generated over x units of time), analytics (think fraud detection, personalization, recommendations), and ad-serving (every microsecond counts).
Datastore (or Firestore) is a good choice if you need:
Query flexibility: Datastore offers document support and secondary indexes.
Strong consistency and/or transactions: Bigtable has eventually consistent replication and does not support multi-row transactions.
Mobile SDKs: Datastore and Firestore are incredibly well-integrated with firebase ecosystem.
Example use-cases include mobile and web applications, game state, user profiles, and product catalogs.
To answer a few of your questions explicitly:
Why is Bigtable used for analytics? It's mostly about performance: analytics use-cases are more likely to have large datasets and require high write throughput. It's a lot easier to run into the limits of a database if you're storing clickstream data, as opposed to something like user account information. Fast scans are also important for analytics use-cases: Bigtable allows you to retrieve all of the information you need about a user or a device extremely quickly, which you can process in a batch job or use to create recommendations and analysis on the fly.
Is Bigtable strictly worse than Datastore? Datastore definitely provides more built-in functionality like secondary indexes and document support, and if you need those features, Datastore is a fantastic choice. But that functionality comes with tradeoffs. Bigtable provides perhaps lower-level, but incredibly performant APIs that allow users to make those tradeoffs for themselves: If a user values, say, write performance over secondary indexes, Bigtable is an excellent option. You can think of it as an extremely versatile and powerful infrastructural building block. I actually like the wheel/car analogy: sometimes you don't want the car -- if what you really need is a dirt bike, a set of solid wheels is much more useful :)

Aggregate tables vs real-time analytics [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I've been researching different approaches to streaming data to a real-time dashboard. One way that I have done in the past is using a star schema/dimension and fact tables. This would be an implementation of aggregate tables. For example, the dashboard would contain multiple charts, one being the total sales for the day, total sales by product, total sales by manufacturer, etc. etc.
But what if this needed to be real-time? What if the data needs to stream to these charts and do the analytical processing real-time?
I've been looking into solutions like Kinesis streams and Kafka, but I may be missing something obvious. For example, consider the following example. A company runs a website where they sell pies. The company has a backend dashboard where they keep track of all data and analytics related to sales, users, orders, etc.
Custom places order through website
The relational (mysql) database receives this new order
The charts and analytical data updates real-time on the backend, for example total sales for the day, or total sales for the year by user.
If the scenario is that this data needs to be streamed, what is the best approach to this? Aggregate tables seem like the obvious but it seems that would be periodic and not real-time. Kinesis/Kafka feels like it would fit somewhere in here. The other option would be something like Redshift but it's pretty pricey and still may not be the best way to address the issue and scale.
Here is an example of a chart that would need to be updated in real-time that could suffer by just doing place aggregate SQL queries when there are tons and tons of rows to parse.

In case of "always up-to-date" reports like this (sales, users, orders etc) that don't need live updates with near-zero-latency streaming processing might be overkill, and ROLAP-like approach seems to be more optimal in meaning of efforts/result.
You mentioned Redshift, and if you already ready to mirror your data for analytics purposes and only problem is a price you can consider another free open-source alternatives that could be used for handling OLAP (aggregate) queries in the real-time (like Yandex ClickHouse, or maybe MongoDb in some cases).
A lot of depends on the dataset size; unless you have really big data that need to be aggregated (hundreds of GB) you can try to keep using mysql and use some tricks:
use separate slave mysql server with high IOPS for analytics and replicate only tables needed to build your reports; possibly use another table engine, more suitable for analytical queries. Setup indexes specially for these queries, to avoid table full scan if you need to get numbers only for last weeks.
pre-calculate metrics for previous periods (with materialized view-like approach) and refresh them on schedule (say, daily), and then combine pre-calculated aggregates with on-the-fly aggregates only for last period to get actual report data without need to scan whole facts table each time.
use data visualization backend that can efficiently cache reports data in-memory to prevent SQL DB overload because of many similar queries (and if the same report or dashboard is displayed for 100 users SQL DB load will be the same as for 1). BTW, I develop solution like that (cannot adv it here as it is commercial product).

This is a typical trade-off for most the architects. Amazon Redshift offers exemplary read optimisations but AWS stack comes for a price. You may try using Cassandra, but it comes with its own set of challenges. When it comes to analytics, I never recommend going real time for the reasons elaborated below.
Doing analytics at real time is not desired, specially using MySQL
The solution for above comes by seggregating transactional and analytical infra. This involves cost but will make sure you don't have to spend time in housekeeping once you scale. MySQL is a row based RDBMS mostly used for storing transactional data. Being row based, it optimises writes i.e. the writes are almost real time and thus, it compromises on reads. When I say this, I refer to a typical analytics dataset running into millions of records/day. If your dataset is not that voluminous, you might still be able to render a graph showing transactional status. But since you're referring to Kafka, I assume the dataset is very large.
A real-time dashboard with visualisations gives a bad customer experience
Considering the above point, even if you go for a warehouse / a read optimised infra, you need to understand how the visualisations work. If 100 people access the dashboard at the same time, 100 connections will be made to the database, all fetching the same data, putting them in memory, applying calculations, parameters and filters defined in your dashboard, adjust the refined dataset in the visualisation and then render the dashboard. Till this time, the dashboard will simply freeze. A poorly constructed query, inefficient use of indexes etc will further make the matter worse.
The above problems will amplify more and more with the increase in your dataset. Good practices to achieve what you need would be:
To have almost realtime (delay of 1hr, 30 mins, 15 mins etc) rather than an absolute real time system. This will help you to create a flat file with the data already fetched in the memory. Your dashboard will simply read this data and will be extremely fast in terms of responses to filters etc. Also, multiple connections to databases will be avoided.
Have a data structure, database/warehouse optimised for reads.

For these types of operational analytics use-cases where the real-time nature of the data is critical, you're completely correct that most "traditional" methods can be quite clumsy, especially as your data size increases. A quick overview of your options:
Historical Approach (TLDR– Meh)
Up until about 5 years ago, the de facto way to do this looked something like
Set up a primary OLTP database that will handle the data in its raw form and have stricter guarantees on performance or ACID properties. Usually this is something SQL-esque, i.e. MySQL, PostgreSQL.
Set up a secondary OLAP database that is meant for serving offline (aka non user-facing) queries. This could also be a SQL-esque db but its schema would be drastically different because it stores the data in enriched form.
Set up some mechanism by which you can keep these 2 in sync. This pretty much boils down to either a) changing your application to always write to both databases and performing the necessary data enrichment or b) building a stand-alone application that reads from your OLTP database, performs the necessary transformations and enrichment and writes to your OLAP database
Plug your dashboard into your OLAP database which will have a schema and indexes optimized for the kind of queries you want.
Using your example about the pie store, the OLTP database would be used to store the purchases of all the pies and reference things like customer ids, billing information, delivery information, etc. In contrast, the OLAP database might just maintain a table with a schema
purchase_totals(day: Date, weekNumber: int, dayOfWeek: int, year: int, total: float)
While the weekNumber, dayOfWeek, and year and technically redundant they make your queries faster! With the proper indexes on these fields, your dashboard has turned into 5 simple (and fast!) aggregation queries with a group by and sum, and then the differences week-over-week or year-over-year can be computed on the client-side. As long as your dashboard refreshes every minute or so you have near-real-time data at your fingertips.
Current Approach (TLDR– Ok)
The recent trends in computing, database technologies, and data science/analytics have led to improvements to the above process, namely by replacing certain components of it. The changes include
Making the OLTP db, the OLAP db, or both a NoSQL database (Mongo usually being the most popular). The pro here is that you have a more flexible schema which won't break if something upstream changes (say, you start selling cakes in addition to pies).
Keeping the SQL db but shifting to cloud provider solution like AWS RDS or Google Cloud SQL. This fundamentally doesn't change anything about the architecture, but it does significantly reduce your operational burden.
Using hard-to-maintain ETL pipelines on top of streaming platforms like Kafka or AWS Kinesis to act as the middle layer between OLAP and OLTP.
Using dedicated tools for data cleaning and transformation as you plan out how to do your ETL
Using dedicated visualization tools on top of your OLAP db (think Tableau)
Using a pull-based approach for getting data out of your OLTP db or your application directly instead of waiting for it to eventually reach your OLAP db. This is helpful for online services because it actually gives you both the data you want AND confirmation that the service is alive and running well (because it just served your request for data). Systems like Prometheus are quite popular for this now.

AWS Redshift vs Snowflake use cases

I was wondering if anyone has used both AWS Redshift and Snowflake and use cases where one is better . I have used Redshift but recently someone suggested Snowflake as a good alternative . My use case is basically retail marketing data that will be used by handful of analysts who are not terribly SQL savvy and will most likely have reporting tool on top

Redshift is a good product, but it is hard to think of a use case where it is better than Snowflake. Here are some reasons why Snowflake is better:
The admin console is brilliant, Redshift has none.
Scale-up/down happens in seconds to minutes, Redshift takes minutes to hours.
The documentation for both products is good, but Snowflake is better laid
out and more accessible.
You need to know less "secret sauce" to make Snowflake work well. On Redshift you need to know and understand the performance impacts of things like distribution keys and sort keys, at a minimum.
The load processes for Snowflake are more elegant than Redshift. Redshift assumes that your data is in S3 already. Snowflake supports S3, but has extensions to JDBC, ODBC and dbAPI that really simplify and secure the ingestion process.
Snowflake has great support for in-database JSON, and is rapidly enhancing its XML. Redshift has a more complex approach to JSON, and recommends against it for all but smaller use cases, and does not support XML.
I can only think of two cases which Redshift wins hands-down. One is geographic availability, as Redshift is available in far more locations than Snowflake, which can make a difference in data transfer and statement submission times. The other is the ability to submit a batch of multiple statements. Snowflake can only accept one statement at a time, and that can slow down your batches if they comprise many statements, especially if you are on another continent to your server.
At Ajilius our developers use Redshift, Snowflake and Azure SQL Data Warehouse on a daily basis; and we have customers on all three platforms. Even with that choice, every developer prefers Snowflake as their go-to cloud DW.

I evaluated both Redshift(Redshfit spectrum with S3) and SnowFlake.
In my poc, snowFlake is way way better than Redshift. SnowFlake integrates well with Relational/NOSQL data. No upfront index or partition key required. It works amazing without worrying about what way to access the day.
Redshift is very limited and no json support. Its hard to understand the partition. You have to do lot of work to get something done. No json support. You can use redshift specturm as a bandaid to access S3. Good luck with partioning upfront. Once you created partition in S3 bucket, you are done with that and no way to change until unless you redo process all data again to new structue. You will end up sending time to fix these issues instead of working on fixing real business problems.
Its like comparing Smartphone vs Morse code mechine. Redshift is like morse code kind of implementation and its not for mordern development

We recently switched from Redshift to Snowflake for the following reasons:
Real-time data syncing
Handling of concurrent queries
Minimizing of database administration
Providing different amounts of computing power to different Looker users
A more in-depth writeup can be found on our data blog.

I evaluated Redshift and Snowflake, and a little bit of Athena and Spectrum as well. The latter two were non-starters in cases where we had big joins, as they would run out of memory. For Redshift, I could actually get a better price to performance ratio for a couple reasons:
allows me to choose a distribution key which is huge for co-located joins
allows for extreme discounts on three year reserved pricing, so much so that you can really upsize your compute at a reasonable cost
I could get better performance in most cases with Redshift, but it requires good MPP knowledge to setup the physical schema properly. The cost of expertise and complexity offsets some of the product cost.
Redshift stores JSON in a VARCHAR column. That can cause problems (OOM) when querying a subset of JSON elements across large tables, where the VARCHAR column is sized too big. In our case we had to define the VARCHAR as extremely large to accommodate a few records that had very large JSON documents.
Snowflake functionality is amazing, including:
ability to clone objects
deep functionality in handling JSON data
snowpipe for low maintenance loading, auto scaling loads, trickle updates
streams & tasks for home grown ETL
ability to scale storage and compute separately
ability to scale compute within a minute, requiring no data migration
and many more
One thing that I would caution about Snowflake is that one might be tempted to hire less skilled developers/DBAs to run the system. Performance in a bad schema design can be worked around using a huge compute cluster, but that may not be the best bang for the buck. Regardless, the functionality in Snowflake is amazing.

What is the different between AWS Elasticsearch and AWS Redshift

I read the document that both for data analysis and in cluster structure but I don't understand what use case different.
Amazon Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analytics.Amazon Elasticsearch
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. Amazon Redshift

Amazon Redshift is a hosted data warehouse product, while Amazon Elasticsearch is a hosted ElasticSearch cluster.
Redshift is based on PostgreSQL and (afaik) mostly used for BI purpuses and other compute-intensive jobs, the Amazon Elasticsearch is an out-of-the-box ElasticSearch managed cluster (which you cannot use to run SQL queries, since ES is a NoSQL database).
Both Amazon Redshift and Amazon ES are managed services, which means you don't need to do anything in order to manage your servers (this is what you pay for). Using the AWS Console you can add new cluster and you don't need to run any commands on order to install any software - you just need to choose which server to run your cluster on (number of nodes, disk, ram, etc).
If you are not familiar with ElasticSearch you should check their website.
Edit: It is now possible to write SQL queries on ElasticSearch: SQL Support for AWS ElasticSearch

I agree with #IMSoP's assertions above...
To compare the two is like comparing an elephant and a tiger - you're not really asking the right question quite yet.
What you should really be asking is - what are my requirements for my use cases to best fulfill my stakeholder / customer needs, first, and then which data storage technology best aligns with my requirements second...
To be clear - Whether speaking of AWS ElasticSearch Service, or FOSS / Enterprise ElasticSearch (which have signifficant differences, between, even) - ElasticSearch is NOT a Relational Database (RDBMS), nor is it quite a NoSQL (Document Store) Database, either...
ElasticSearch is a Search Engine / Index. It does some things very well, for very specific use cases, however unlike RDBMS data models most signifficantly, ElasticSearch or NoSQL are not going to provide you with FULL ACID Compliance, or Transactional Statement Processing, so if your use case prioritizes data integrity, constrainability, reliability, audit ability, regulatory compliance, recover ability (to Point in Time, even), and normalization of data model for performance and least repetition of data while providing deep cardinality and enforcing model constraints for optimal integrity, "NoSQL and Elastic are not the Droids you're looking for..." and you should be implementing a RDBMS solution. As already mentioned, the AWS Redshift Service is based on PostgreSQL - which is one of the most popular OpenSource RDBMS flavors out there, just offered by AWS as a fully managed solution / service for their customers.
Elastic falls between RDBMS and NoSQL categories, as it is a Search Engine / Index that works most optimally with "single index" type use cases, where A LOT of content is indexed all at once and those documents aren't updated very frequently after the initial bulk indexing,but perhaps the most important thing I could stress is that in my experience it typically does not scale very cost effectively (even managed cluster services) if you want your clusters to perform well, not degrade over time, retain large historical datasets, and remain highly available for your consumers - and for most will likely become cost PROHIBITIVE VERY fast. That said, Elastic Search DOES still have very optimal use cases, so is always worth evaluating against your unique requirements - just keep scalability and cost in mind while doing so.
Lastly let's call NoSQL what it is, a Document Store that stores collections of documents (most often in JSON format) and while they also do indexing, offer some semblance of an Authentication and Authorization model, provide CRUD operability (or even SQL support nowadays, which makes the career Enterprise Data Engineer in me giggle, that SQL is now the preferred means of querying data from their NoSQL instances! :D )- Still NOT a traditional database, likely won't provide you with much control over your data's integrity - BUT that is precisely what "NoSQL" Document Stores were designed to work best for - UNSTRUCTURED DATA - where you may not always know what your data model is going to look like from the start, or your use case prioritizes data model flexibility over enforcing data integrity in general (non mission critical data). Last - while most modern NoSQL Document Stores may have SOME features that appear on the surface to resemble RDBMS, I am not aware of ANY in that category at current that could claim to offer all that a relational database does, with Oracle MySQL's DocumentStore being probably the best of both worlds in my opinion (and not just because I've worked with it every day for the last decade, either...).
So - I hope Developers with similar questions come across this thread, and after reading are much better informed to make the most optimal design decisions for their use cases - because if we're all being honest with ourselves - everything we do in our profession is about data - either generating it, transporting it, rendering it, transforming it....it all starts and ends with data, and making the most optimal data storage decisions for your applications will literally define the rest of your project!
Cheers!

This strikes me as like asking "What is the difference between apples and oranges? I've heard they're both types of fruit."
AWS has an overview of the analytics products they offer, which at the time of writing lists 21 different services. They also have a list of database products which includes Redshift and 10 others. There's no particularly obvious reason why these two should be compared, and the others on both pages ignored.
There is inevitably a lot of overlap between the capabilities of these tools, so there is no way to write an exhaustive list of use cases for each. Their strengths and weaknesses, and the other tools they integrate easily with, will change over time, and some differences are a matter of "taste" or "style".
Regarding the two picked out in the question:
Elasticsearch is a product built by elastic.co, which AWS can manage the installation and configuration for. As its name suggests, its core functionality is based around search - it can be used to build a flexible but fast product search for an e-commerce site, for instance. It's also commonly used along with other tools to search and aggregate logs and monitoring data.
Redshift is a database system built by AWS, based on PostgreSQL but optimised for extremely large data sets. It is designed for "data warehouse" applications, where you want to write complex logical queries against the data, like "how many people in each city bought both a toothbrush and toothpaste, this year compared to last year".
Rather than trying to make an abstract comparison of all the different services available, it makes more sense to start from the use case which you actually have, and see which tool best fits that need.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js