I have an instance of Elasticsearch running in AWS OpenSearch. From the documentation I've found online, AWS has something called OpenSearch Dashboards that is essentially their own forked version of Kibana. This can be connected to my Elasticsearch instance to visualize data.
For OpenSearch Dashboards, all of the the guides I've found online deal with how we can visualize patterns in the underlying data (ie. what's the most popular keywords in an index column, etc...) or how we can visualize data about how the Elasticsearch service is running (ie. the CPU usage rate, the indexing rate, etc...).
Is there anyway for me to get statistics on and get visualizations for what's being searched and how often? For example, I would like data on what unique search terms users have typed into our search bar in the past week, the number of times each of those unique terms have been searched, and the number of results each of these searches with that unique term have returned.
Related
We are building a customer facing App. For this app, data is being captured by IoT devices owned by a 3rd party, and is transferred to us from their server via API calls. We store this data in our AWS Documentdb cluster. We have the user App connected to this cluster with real time data feed requirements. Note: The data is time series data.
The thing is, for long term data storage and for creating analytic dashboards to be shared with stakeholders, our data governance folks are requesting us to replicate/copy the data daily from the AWS Documentdb cluster to their Google cloud platform -> Big Query. And then we can directly run queries on BigQuery to perform analysis and send data to maybe explorer or tableau to create dashboards.
I couldn't find any straightforward solutions for this. Any ideas, comments or suggestions are welcome. How do I achieve or plan the above replication? And how do I make sure the data is copied efficiently - memory and pricing? Also, don't want to disturb the performance of AWS Documentdb since it supports our user facing App.
This solution would need some custom implementation. You can utilize Change Streams and process the data changes in intervals to send to Big Query, so there is a data replication mechanism in place for you to run analytics. One of the use cases of using Change Streams is for analytics with Redshift, so Big Query should serve a similar purpose.
Using Change Streams with Amazon DocumentDB:
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html
This document also contains a sample Python code for consuming change streams events.
Is it possible to have some metrics about how many search requests were processed over a certain time on ElasticSearch at AWS? Something like the cloudwatch monitoring for Cloudsearch that you can check the number of successful requests per minute (RPM):
Just find out the endpoint _stats that allow user to retrieve interesting metrics, so basically you will have to divide indices.search.query_total for indices.search.query_time_in_millis to have an average time for each query.
Still don't know a good way to have a real time data to plot a monitoring graph
Source #1
Source #2
I'm working on a project to manage documents (eg: create, read, maintain different versions etc...) and my plan is to use the following AWS architecture.
When a document is created/updated it will be saved on to a version enabled s3 bucket via API Gateway S3 proxy. S3 put event will trigger a lambda to get latest version and all version ids and save it to DynamoDB. Once it is saved on a DynamoDB table, it will be indexed in Elasticsearch via DynamoDB stream.
My Plan is to use Elasticsearch for all search queries. And I will load the latest documents from DynamoDB. Since each record has S3 version ids i can query old versions from S3 as well.
Since my architecture relies much on eventual consistency i.e. (S3 to DynamoDB and DynamoDB to Elastic Search) I'm worried that I would not get the latest document data either when I query the Elasticsearch or query DynamoDB after I create a document.
Any suggestions for improvements will be much appreciated.
Thanks!
As you said your application architecture has multiple points where eventual consistency is used.
If your application business case absolutely requires that when you query data, you get the absolute latest version, then your architecture choices are bad and you should, for example, consider using a RDS persistence instead.
If not, then you just design the rest of your system keeping in mind that getting a completed PUT does not guarantee that queries immediately return the data. Giving instructions on how to do this vastly depends on your application and cannot feasibly be generalized.
Since you use a dynamodb stream, your dynamodb insert will reach your elastic search server but with a delay. In case of write failure it's up to the client to issue a retry.
Also you have to keep in mind the time it takes to trigger a dynamodb stream and the time it takes for the elastic search indexing (Plus the s3 event).
So your problem has to do more with the time it takes to reach the elastic search server.
If you want something more consistent that depicts the current status (since that is the problem you will end up with) without any delays you need to change the tools.
When should I use AWS Elasticsearch over AWS CloudSearch and vice versa?
Amazon Elasticsearch Service provides a fully-managed implementation of Elasticsearch and Kibana. It is commonly used for near real-time visualizations of logs files (but can handle many use-cases).
Amazon CloudSearch is based on Apache Solr. It requires data to be loaded as documents and is good for full-text search, with an understanding of languages and grammar (eg synonyms, words to ignore).
So, it really comes down to whether you want to use Elasticsearch or Solr.
Is there some way to purchase a product on Amazon via the API?
Currently I'm buying several products on daily base, where each product can be delivered to differnet addresses, and each time I have to go over the checkout phase on Amazon (many clicks).
According to my searches (for example Programmatically make Amazon purchase?) it seems that there is no way to purchase a product via the API and I understand the reasons for that.
However, I wonder if there is some other way to automate the process of ordering multiple products on Amazon.
Another way at it would be to automate the browser with Selenium. Of course this would require updating the code every time the Amazon website changed.