Our setup is this, AWS Services produce and publish logs to the CloudWatch Service. From there we use the standard Lambda function to publish the logs to the AWS ElasticSearch
The lambda function pushes the logs to ES using the file format cloudwatch-logs-<date> This creates a new index every day
We have an issue with mapping of the data. So for example when a service (eg. aurora db) publish its first set of logs and the field CPU value is 0 the ES set that as a long. When that same service publishes a second set of logs and the CPU is set to 10.5 the ES rejects that set of data with the error mapper cannot change type [long] to [float]
We have allot of services publishing logs with allot of data sets. Is the best way to resolve this for lambda to push the logs with format of cloudwatch-logs so only one index is created and then manual fix the mapping issue for that index ? or is there a better way to resolve this ?
Related
I have several lambda functions deployed on AWS that I want to monitor directly for errors to update a postgresql table with.
I have created a lambda to parse streamed log data and update the db. I want to set up subscription filters between this lambda and my other function logs.
There are 6 log streams I want to monitor and the AWS Console limits the subscription filters to 2 per log group.
Is there a workaround or a better way to implement this kind of monitoring?
Thanks
I am using google IoT core and pubsub services for my IoT devices. I am publishing data using pubsub to the database. but I think its quite expensive to store every data into the database. I have some data like if the device is on or off and a configuration file which has some parameter which I need to process my IoT payload. Now I am not able to understand if configuration and state topic in IoT is expensive or not? and how long the data is stored in the config topic and is it feasible that whenever the parameter is changed in the config file it publish that data into config topic? and what if I publish my state of a device that if it is online or not every 3 seconds or more into the state topic?
You are mixing different things. There is Cloud IoT, where you have a device registry, with metadata, configuration and states. You also have PubSub topic in which you can publish message about IoT payload that can contain configuration data (I assume that is that you means in this sentence: "it publish that data into config topic").
In definitive it's simple.
All the management operations on Cloud IoT are free (device registration, configuration, metadata,...). There is no limitation and no duration limit. The only one which exists in the quotas for rate limit and configuration size.
The inbound and outbound traffic from and to the IoT devices is billed as described here
If you use PubSub for pushing your messages, Cloud Functions (or Cloud Run, or other compute option), a database (Cloud SQL or Datastore/Firestore), all these services are billed as usual, there is no relation with Cloud IoT service & billing. The constraints of each services are applied as a regular usage. For example, a PubSub message live up to 7 days (by default) in a subscription and until it hasn't acknowledged.
EDIT
Ok, got it, I took time to understood what you wanted to achieve.
The state is designed for getting the internal representation of the devices, but the current limitation doesn't allow you to update it automatically when you received message.
You have 2 solutions:
Either you can update your devices and send an update message only when its state changes (it's for this kind of use case that the feature is designed!)
Or, let the device published the messages every 3 seconds, but in the event PubSub topic. Get the events in a function which get the state list, get the first one (the most recent) and compare the value with the PubSub message. If different, update the state. This workflow also work with external database like Datastore or Firestore.
I have a bit of a mysterious issue: I have a lambda function which transports data from S3 bucket to AWS ES cluster.
My lambda function runs correctly and reports the following:
All 6 log records added to ES
However added documents do not appear in AWS ElasticSearch index
/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open logs 3N2O9CqhSwCP6sj1QK5EQw 5 1 0 0 1.2kb 1.2kb
I'm using this lambda function https://github.com/aws-samples/amazon-elasticsearch-lambda-samples/blob/master/src/s3_lambda_es.js
Lambda function's role has full permissions to ES cluster and S3 bucket. It can access S3 bucket because I can print out contents to Lambda's console log
Any ideas for further debugging are much appreciated!
Cheers
There can be many reasons for this. since you are asking about ideas for debugging, here are couple of them:
Add the console.log in postDocumentToES method of the lambda that shows where exactly does it connect
Try to extract the code from lambda and run it locally just to make sure it succeeds to send to elastic search (so that the code is correct at least)
Make sure that there are no "special restrictions" on index (like ttl for a couple of minutes or something), or, maybe something that doesn't allow inserting into the index.
How many ES servers do you have? Maybe there is a cluster of them and the replication is not configured correctly, so when you check the state of the index in one ES it doesn't actually have the documents but the other ES server could have these docs.
So, I created the cloudwatch index and streaming the cloud watch logs to elastic search and I am seeing data, however, I am only seeing current date data. I dont see old logs in elastic search which are in same log group in cloudwatch. I changed the date filter in elastic search, but dont see any change. Any idea why?
The index name created is, cwl-2018.03.20
That's the expected behavior. The streaming of logs from CloudWatch to Elasticsearch relies of a feature called subscription filters which only forwards new data to the destination.
How do I integrate AWS RDS with the AWS Elasticsearch service? Is there any AWS service so that I can use it to stream data from AWS RDS to AWS Elasticsearch for Indexing?
I'm not seeing a magic way like this for DynamoDB.
I can think of three ways.
set up your RDS to log all transactions, and set up a logstash to parse any inserts and updates and insert to ES.
Create a special log file, that your app uses to store the inserts and updates. Less work to set up logstash this way.
Make your app send all inserts and updates through SNS. From there, distribute them to a ES SQS queue and a RDS SQS queue, and have workers (or lambdas) for each queue to do the inserts to their respective stores.