log producer docker container - amazon-web-services

I am trying to do performance testing with aws firelens. I have a json file with 10 sample log messages. I want to be able to produce docker logs at a set rate. ex: 10,000 log messages/sec from a docker container that will be consumed by aws firelens log collector.
Is there any open source projects that already does this? Can any of you help with creating this container?

You can try this: https://github.com/mehiX/log-generator
I made it for this purpose so if there is anything you can't do let me know and I can hopefully fix it.

Related

How can I push text log files into Cloud Logging?

I have an application (Automation Anywhere A360) that whenever I want to log something with the app it will log it into a txt/csv file. I run a process in Automation Anywhere that is run in 10 bot runners (Windows VMs) concurrently (so each bot runner is going to log what is going on locally)
My intention is that instead of having sepparate log files for each bot runner, I'd like to have a centralized place where I store all the logs (i.e. Cloud Logging).
I know that this can be accomplished using Python, Java, etc. However, if every time I need to log something into Cloud Logging I invoke a Python script, even though that does the job, it takes around 2-3 seconds (I think this is a bit slow) connecting to gcp client and logging in (taking in this first step most of the time).
How woud you guys tackle this?
The solution that I am looking for is something like this. It is named BindPlane and it can collect log data from on-premises and hybrid infra and send it to GCP monitoring/logging stack
To whom it may (still) concern: You could use fluentd to forward logs to pubSub and from there to a Cloud Logging bucket.
https://flugel.it/infrastructure-as-code/how-to-setup-fluentd-to-retrieve-logs-send-them-to-gcp-pub-sub-to-finally-push-them-to-elasticsearch/

Google Cloud Run service deployment, is it the best direction in my situation?

I have some experience with Google Cloud Functions (CF). I tried to deploy a CF function recently with a Python app, but it uses an NLP model so the 8GB memory limit is exceeded when the model is triggered. The function is triggered when a JSON file is uploaded to a bucket.
So, I plan to try Google Cloud Run but I have no experience with it. Also, I am not completely sure if it is the best course of action.
If it is, what is the best way of implementing provided that the Run service will be triggered by a file uploaded to a bucket? In CF, you can select the triggering event, in Run I didn't see anything like that. I could use some starting points as I couldn't find my case in the GCP documentation.
Any help will be appreciated.
You can use at least these two things:
The legacy one: Create a GCS notification in PubSub. Then create a push subscription and add the Cloud Run URL in the HTTP push destination
A more recent way is to use Eventarc to invoke directly a Cloud Run endpoint from an event (it roughly create the same thing with a PubSub topic and push subscription, but it's fully configured for you)
EDIT 1
When you use Push notification, you will received a standard PubSub message. The format is described in the documentation for the attributes and for the body content; keep in mind that the raw content is base64 encoded and you have to decode it to get the final format
I personally have a Cloud Run service that log the contents of any requests to be able to get in the logs all the data that I need to develop. When I have a new message format, I configure the push to that Cloud Run endpoint and I automatically get the format
For Eventarc, the format will be added to the UI soon (I view that feature in preview, but it's not yet available). The best solution is to log the content to know what you get to know what to do!

AWS service for doing jobs

I have the following need - the code needs to call some APIs, get some data, and store them in a database (flat file will do for our purpose). As the APIs give access to a huge number of records, we want to split it into 30 parts, each part scraping a certain section of the data from the APIs. We want these 30 scrapers to run in 30 different machines - and for that, we have got a Python program that does the following:
Call the API, get the data, based on parameters (which part of the API to call)
Dump it to the local flatfile.
And then later, we will merge the output from the 30 files into one giant DB.
Question is - which AWS tool to use for our purpose? We can use EC2 instance, but we have to keep the EC2 console open on our desktop where we connect to it to run the Python program, it is not feasible to keep 30 connections open on my laptop. It is very complicated to get remote desktop on those machines, so logging there, starting the job and then disconnecting - this is also not feasible.
What we want is this - start the tasks (one each on 30 machines), let them run and finish by themselves, and if possible notify me (or I can myself check for health periodically).
Can anyone guide me which AWS tool suits our purpose, and how?
"We can use EC2 instance, but we have to keep the EC2 console open on
our desktop where we connect to it to run the Python program"
That just means you are running the script wrong, and you need to look into running it as a service.
In general you need to look into queueing up these tasks in SQS and then triggering either EC2 auto-scaling or Lambda functions depending on if your script will run inside the Lambda runtime restrictions.
This seems like a good application for Step Functions. Step Functions allow you to orchestrate multiple lambda functions, Glue jobs, and other services into a business process. You could write lambda functions that call the API endpoints and store the results in S3. Once all the data is gathered, your step function could trigger a lambda function, glue job, or something else that processes the data into your database. Step Functions help with error handling and retry and allow easy monitoring of your process.

Newman/postman results to prometheus for automated tests

I've got a task of automating postman smoke tests to run every x minutes in kubernetes cluster and pushing results into prometheus, later visualized by grafana, with alerts pushed to a mattermost channel.
I've created a custom docker image based on alpine with newman and other packages (I didn't use newman docker image because I cannot add there whatever I want), copied all my collections and env. files into the docker image as well, the command of newman run is also packed into the dockerfile (otherwise it did not work if I invoked it from the pod definition yaml in kubernetes). All what is necessary is to run the container and it creates a report placed in /newman folder inside the container.
I've created kubernetes cronjob to run the container, it runs and gets into the completed state. If I keep container open with some loop command I can login and make sure that the results are there (and they are). Since this job is ephemeral and prometheus won't have time to scrape it, I'm trying to push the results into the prometheus pushgateway (which I've deployed for this reason). I'm trying to curl the results into it (command also defined in the dockerfile), something like cat myreport.xml | curl --data-binary #- push-prometheus-pushgateway:9091/metrics/job/newman
However here is the problem: I cannot get results formatted for it to be accepted by prometheus pushgateway in any meaningful way. I did not find any custom 'reporter' either which may suit this purpose. Currently I'm using junit reporter, but I did not manage to sed/awk the output to be digested by pushgateway and for it to make any practical sense...
Had anyone done anything similar already in the past and reached some success in it?
Many thanks in advance!
You may try Postman exporter for Prometheus: https://github.com/hudeldudel/postman-exporter

How to retreive only the logs I care about from AWS Beanstalk

My Issue
I just deployed my first application to AWS Beanstalk. I have logging in my application using logback. When I download all logs from AWS, I get a huge bundle:
Not only that, but it is pretty annoying to log in, navigate to my instance, download a big zip file, extract it, navigate to my log, open it, then parse for the info I want.
The Question
I really only care about a single one of the log files on AWS - the one I set up my application to create.
What is the easiest way to view only the log file I care about? Best solution would display only the one log file I care about in a web console somewhere, but I don't know if that is possible in AWS. If not, then what is the closest I can get?
You can use the EB console to display logs, or the eb logs command-line tool. By default, each will only show the last 100 lines of each log file. You could also script ssh or scp to just retrieve a single log file.
However, the best solution is probably to publish your application log file to a service like Papertrail or Loggly. If and when you move to a clustered environment, retrieving and searching log files across multiple machines will be a headache unless you're aggregating your logs somehow.