Find strings in a log file for zabbix monitoring - regex

I need to find strings in a log file with regex and later send output to Zabbix monitoring server to fire triggers if needed.
For example here is a part of the log file:
===== Backup Failures =====
Description: Checks number of studies that their backup failed
Status: OK , Check Time: Sun Oct 30 07:31:13 2022
Details: [OK] 0 total backup commands failed during the last day.
===== Oracle queues =====
Description: Count Oracle queues sizes. The queues are used to pass information between the applications
Status: OK , Check Time: Sun Oct 30 07:31:04 2022
Details: [OK] All queues have less than 15 elements.
===== Zombie Services =====
Description: Checks for zombie services
Status: Error , Check Time: Sun Oct 30 07:31:30 2022, Script: <check_mvs_services.pl>
Details: [CRITICAL] 1 missing process(es) found. Failed killing 1 process(es)
===== IIS Application Pools Memory Usage =====
Description: Checks the memory usage of the application pools that run under IIS (w3wp.exe)
Status: OK , Check Time: Sun Oct 30 07:32:30 2022
Details: [OK] All processes of type w3wp.exe don't exceed memory limits
===== IIS Web Response =====
Description: Checks that the web site responds properly
Status: OK , Check Time: Sun Oct 30 07:32:34 2022
Details: [OK] All addresses returned 200
I need to find all items for monitoring and it's results.
If results not OK then Zabbix triggers should send alarm.
I found Zabbix can handle log file monitoring with similar command here but first need to find strings in the log file:
log[/path/to/the/file,"regex expression",,,,]
In this example I believe these items should find for Zabbix:
===== Backup Failures =====
Details: [OK] 0 total backup commands failed during the last day.
===== Oracle queues =====
Details: [OK] All queues have less than 15 elements.
===== Zombie Services =====
Details: [CRITICAL] 1 missing process(es) found. Failed killing 1 process(es)
===== IIS Application Pools Memory Usage =====
Details: [OK] All processes of type w3wp.exe don't exceed memory limits
===== IIS Web Response =====
Details: [OK] All addresses returned 200
Can you advise how possible to achieve this solution?
For any help I would be really appreciated.
Thanks in advance.

Related

Cant Upload image mikrotik-chr on google cloud

I start make image mikrotik-chr from my bucket but always error. I dontt know how to fix it
[inflate.import-virtual-disk]: 2021-08-16T05:39:39Z CreateInstances: Creating instance "inst-importer-inflate-6t2qt".
[inflate]: 2021-08-16T05:39:46Z Error running workflow: step "import-virtual-disk" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-08-15T22:39:46.802-07:00 Error:0xc00007b770 HttpErrorMessage:SERVICE UNAVAILABLE HttpErrorStatusCode:503 Id:1873370325760361715 InsertTime:2021-08-15T22:39:40.692-07:00 Kind:compute#operation Name:operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 OperationGroupId: OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/operations/operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 StartTime:2021-08-15T22:39:40.692-07:00 Status:DONE StatusMessage: TargetId:6947401086746772724 TargetLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/instances/inst-importer-inflate-6t2qt User:606260965808#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 16 Aug 2021 05:39:46 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}:
Code: ZONE_RESOURCE_POOL_EXHAUSTED
Message: The zone 'projects/circular-jet-322614/zones/asia-southeast2-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
[inflate]: 2021-08-16T05:39:46Z Workflow "inflate" cleaning up (this may take up to 2 minutes).
[inflate]: 2021-08-16T05:39:48Z Workflow "inflate" finished cleanup.
[import-image]: 2021-08-16T05:39:48Z Finished creating Google Compute Engine disk
[import-image]: 2021-08-16T05:39:49Z step "import-virtual-disk" run error: operation failed &{ClientOperationId: CreationTimestamp: Description: EndTime:2021-08-15T22:39:46.802-07:00 Error:0xc00007b770 HttpErrorMessage:SERVICE UNAVAILABLE HttpErrorStatusCode:503 Id:1873370325760361715 InsertTime:2021-08-15T22:39:40.692-07:00 Kind:compute#operation Name:operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 OperationGroupId: OperationType:insert Progress:100 Region: SelfLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/operations/operation-1629092379433-5c9a6a095186f-620afe4b-ba26ba50 StartTime:2021-08-15T22:39:40.692-07:00 Status:DONE StatusMessage: TargetId:6947401086746772724 TargetLink:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a/instances/inst-importer-inflate-6t2qt User:606260965808#cloudbuild.gserviceaccount.com Warnings:[] Zone:https://www.googleapis.com/compute/v1/projects/circular-jet-322614/zones/asia-southeast2-a ServerResponse:{HTTPStatusCode:200 Header:map[Cache-Control:[private] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 16 Aug 2021 05:39:46 GMT] Server:[ESF] Vary:[Origin X-Origin Referer] X-Content-Type-Options:[nosniff] X-Frame-Options:[SAMEORIGIN] X-Xss-Protection:[0]]} ForceSendFields:[] NullFields:[]}: Code: ZONE_RESOURCE_POOL_EXHAUSTED; Message: The zone 'projects/circular-jet-322614/zones/asia-southeast2-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
ERROR
ERROR: build step 0 "gcr.io/compute-image-tools/gce_vm_image_import:release" failed: step exited with non-zero status: 1
You will need to check if you have enough CPUs and other resources quota in 'projects/circular-jet-322614/zones/asia-southeast2-a'. Resource requirement can be found by looking at the deployment specs of the workload.

Deploy spring-cloud-dataflow stream oracle2hdfs not well (ambari)

Thanks advance for your help.
I deploy spring-cloud-dataflow-server-yarn with ambari. But when I start a stream jdbc-source-kafka and hdfs-sink-kafka. The stream deploy not run.
The streams config
jdbc-source-kafka --max-rows-per-poll=10
--query='select t.* from clear_order_report t'
--password=******
--driver-class-name=oracle.jdbc.OracleDriver
--username=****** --url=jdbc:oracle:thin:#10.48.171.21:1521:******
| hdfs-sink-kafka
--fs-uri=hdfs://master.99wuxian.com:8020
--file-name=clear_order_report
--directory=/dataflows/apps/top
I aslo repackage the jdbc-source-kafka-10-1.1.1.RELEASE.jar and add oracle jdbc driver.
And the yarn log is below
Application Overview
User: scdf
Name: scdstream:app:oracle2hdfs
Application Type: DATAFLOW
Application Tags:
Application Priority: 0 (Higher Integer value indicates higher priority)
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Queue: default
FinalStatus Reported by AM: Application has not completed yet.
Started: 星期四 二月 09 17:38:33 +0800 2017
Elapsed: 21hrs, 18mins, 27sec
Tracking URL: ApplicationMaster
Log Aggregation Status NOT_START
Diagnostics: [星期四 二月 09 17:38:34 +0800 2017] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:1024, vCores:1>; Queue Resource Limit for AM = <memory:1280, vCores:1>; User AM Resource Limit of the queue = <memory:1280, vCores:1>; Queue AM Resource Usage = <memory:1024, vCores:1>;
Unmanaged Application: false
Application Node Label expression: <Not set>
AM container Node Label expression: <DEFAULT_PARTITION>

ALTER DATABASE - Cannot process request. Not enough resources to process request.

I am working to automate some of my performance tests on Azure SQL Data Warehouse. I had been scaling up/down the databases using the Azure portal. I read in https://msdn.microsoft.com/en-us/library/mt204042.aspx that it is possible to use T-SQL to accomplish this via
ALTER DATABASE ...
In my first attempt using T-SQL, the attempt failed:
RunScript:INFO: Mon Feb 6 20:11:06 UTC 2017 : Connecting to host "logicalserver.database.windows.net" database "master" as "myuser"
RunScript:INFO: stdout from sqlcmd will follow...
ALTER DATABASE my_db MODIFY ( SERVICE_OBJECTIVE = 'DW1000' ) ;
Msg 49918, Level 16, State 1, Server logicalserver, Line 1
Cannot process request. Not enough resources to process request. Please retry you request later.
RunScript:INFO: Mon Feb 6 20:11:17 UTC 2017 : Return Code = "1" from host "logicalserver.database.windows.net" database "master" as "myuser"
RunScript:INFO: stdout from sqlcmd has ended ^.
I immediately went to the Azure portal, requested a scale, and it worked (taking 10 minutes to complete).
Is there any explanation?

OpenStack HAproxy issues

I am using openshift-django17 to bootstrap my application on Openshift. Before I moved to Django 1.7, I was using authors previous repository for openshift-django16 and I did not have the problem which I will describe next. After running successfully for approximately 6h I get the following error:
Service Temporarily Unavailable The server is temporarily unable to
service your request due to maintenance downtime or capacity problems.
Please try again later.
After I restart the application it works without any problem for some hours, then I get this error again. Now gears should never enter idle mode, as I am posting some data every 5 minutes through RESTful POST API from outside of the app. I have run rhc tail command and I think the error lies in HAproxy:
==> app-root/logs/haproxy.log <== [WARNING] 081/155915 (497777) : config : log format ignored for proxy 'express' since it has no log
address. [WARNING] 081/155915 (497777) : Server express/local-gear is
DOWN, reason: Layer 4 connection problem, info: "Connection refused",
check duration: 0ms. 0 active and 0 backup servers left. 0 sessions
active, 0 requeued, 0 remaining in queue. [ALERT] 081/155915 (497777)
: proxy 'express' has no server available! [WARNING] 081/155948
(497777) : Server express/local-gear is UP, reason: Layer7 check
passed, code: 200, info: "HTTP status check returned code 200", ch eck
duration: 11ms. 1 active and 0 backup servers online. 0 sessions
requeued, 0 total in queue. [WARNING] 081/170359 (127633) : config :
log format ignored for proxy 'stats' si nce it has no log address.
[WARNING] 081/170359 (127633) : config : log format ignored for proxy
'express' since it has no log address. [WARNING] 081/170359 (497777) :
Stopping proxy stats in 0 ms. [WARNING] 081/170359 (497777) : Stopping
proxy express in 0 ms. [WARNING] 081/170359 (497777) : Proxy stats
stopped (FE: 1 conns, BE: 0 conns). [WARNING] 081/170359 (497777) :
Proxy express stopped (FE: 206 conns, BE: 312 co
I also run some CRON job once a day, but I am 99% sure it does not have to do anything with this. It looks like a problem on Openshift side, right? I have posted this issue on the github of the authors repository, where he suggested I try stackoverflow.
It turned out this was due to a bug in openshift-django17 setting DEBUG in settings.py to True even though it was specified in environment variables as False (pull request for fix here). The reason 503 Service Temporarily Unavailable appeared was because of Openshift memory limit violations due to DEBUG being turned ON as stated in Django settings documentation for DEBUG:
It is also important to remember that when running with DEBUG turned on, Django will remember every SQL query it executes. This is useful when you’re debugging, but it’ll rapidly consume memory on a production server.

Simple query takes minutes to execute on a killed/inactive session

I'm trying to add simple failover functionality to my application that talks to Oracle 8 11 database. To test that my session is up I issue a simple query (select 1 from dual).
Now when I try to simulate a network outage by killing my Oracle session by doing "alter system kill session 'sid,serial';" and execute this test query it takes up to 5 minutes for the application to process it and return error from Execute method (I'm using OCI API, C++):
Tue Feb 21 21:22:47 HKT 2012: Checking connection with test query...
Tue Feb 21 21:28:13 HKT 2012: Warning - OCI_SUCCESS_WITH_INFO: 3113: ORA-03113: end-of-file on communication channel
Tue Feb 21 21:28:13 HKT 2012: Test connection has failed, attempting to re-establish connection...
If I kill session with the 'immediate' keyword at the end of the query, then the test query returns error instantly.
Question 1: why it takes 5 minutes to execute my query? Are there any Oracle/PMON logs that can shed some light on what is happening during this delay?
Question 2: is it a good choice to use 'alter system kill session ' to simulate network failure? How close the outcomes of this query to a real-world network failure between application and Oracle DB?
Update:
Oracle version:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
There is a good chance that the program is waiting for rollback to complete.