Pentaho Kettle- run check db connections without stopping job - kettle

I have read blogs and one question close to mine, but have not found a solution to my problem. I have a transformation job setup to extract three tables from 84 DBs to generate one report. My problem is when a DB connection is not available, the whole job stops.
I would like to be able to check DB connections before initializing the job, log errors for inaccessible DBs and create a new dynamic list of successful tests from which I will then run my job. I have used the check DB connections step but it still stalls when a connection is false.How can I process my list of DBs, running through to the end, without aborting the job?

First of all you have absolutely used the correct step to check the DB Connections. Now for your question, i would try to explain in parts (hope i am correct):
Case I: "My problem is when a DB connection is not available, the whole job stops"
This scenario is obvious. Whenever a step finds any error, it would throw an exception and would stop the entire execution of the Job.
But does it mean that the step "Check Db connections" would stop checking the db connections if it gets an error connecting. Answer is NO. The Step would complete testing all the connections even if it gets an error in some connection in middle. Try observing the logs carefully, it would give you a final consolidated list of all the checked db connections (check the image below):
I tried testing with 4 db connections out of which i got One error and 3 Success.
Now for the "Whole Job Stops" portion: Since the stopping behavior is obvious (as i have mentioned above), what you can do is to pass the flow using "Error hop" so that if a job finds an error, it will take the error hop. Check the image below:
Here i have used two hops: One Success and One Error. If the Job fails, it would take the error path (red colored hop) else it would take the Success path (green colored hop).
CASE II: "log errors for inaccessible DBs and create a new dynamic list of successful tests"
You can either log the errors into a separate log files or table (depends on your requirement) and then read through the log to generate a list of DB connections. Check the image below:
The output generates a list of Connections along with an Error flag.
Y : Failure in connecting to Database
N : successful connection
Note: i have used text file input since i have logged the previous step into a text file instead of database. You can customize as per your req.
I have placed sample code in gist. You can check for your ref.
Hope it helps :)

Related

mysql lost connection error

Currently, I am working on a project to integrate mysql with the IOCP server to collect sensor data and verify the collected data from the client.
However, there is a situation where mysql misses a connection.
The query itself is a simple query that inserts a single row of records or gets the average value between date intervals.
The data of each sensor flows into the DB at the same time every 5 seconds. When the messages of the sensors come on occasionally or overlap with the message of the client, the connection is disconnected.
lost connection to mysql server during query
In relation to throwing the above message
max_allowed_packet Numbers changed.
interactive_timeout, net_read_timeout, net_write_timeout, wait_timeout
It seems that if there are overlapping queries, an error occurs.
Please let me know if you know the solution.
I had a similar issue in a MySQL server with very simple queries where the number of concurrent queries were high. I had to disable the query cache to solve the issue. You could try disabling the query cache using following statements.
SET GLOBAL query_cache_size = 0;
SET GLOBAL query_cache_type = 0;
Please note that a server restart will enable the query cache again. Please put the configuration in MySQL configuration file if you need to have it preserved.
Can you run below command and check the current timeouts?
SHOW VARIABLES LIKE '%timeout';
You can change the timeout, if needed -
SET GLOBAL <timeout_variable>=<value>;

Jmeter- Load testing EC2 instance, only 50% request are successful

I am trying to load test Nginx installed on an EC2 instance via Jmeter, Everytime I try to load test, only 50% request are successful,
For Eg:
If I try with 10 users, only 5 response are OK
If I try with 100 users, only 50 response are OK
If I try with 500, only 250 response are OK
Any Idea, regarding this strange behavior?
This sounds weird. I would recommend the following troubleshooting techniques:
First of all always check jmeter.log file, it should contain enough information to get to the bottom of your test failure(s).
If JMeter log file doesn't contain any suspicious entries next step would be checking response messages using i.e. View Results In Table and/or View Results Tree listener. This should provide you some high-level information and trends, i.e. you will be able to see if some particular sampler(s) is(are) always failing.
If above steps don't give enough clue to resolve your issue you can temporary enable saving of request and response data to see what is wrong with the failing sampler(s). Add the next lines to user.properties file (located in JMeter's "bin" folder)
jmeter.save.saveservice.output_format=xml
jmeter.save.saveservice.response_data=true
jmeter.save.saveservice.samplerData=true
jmeter.save.saveservice.requestHeaders=true
jmeter.save.saveservice.responseHeaders=true
jmeter.save.saveservice.url=true
and next time your run JMeter test the .jtl results file will contain all the relevant data which can be analyzed using aforementioned View Results Tree listener. Don't forget to revert the change once you fix the script as JMeter listeners are very resource intensive per se and above settings greatly increase disk IO and it may ruin your test.
If none of above helps - check logs on the application under test side, most probably you will get something from them.

"Zombie Requests" CFQUERY tags get stuck and are unkillable

Coldfusion 2016
Microsoft Server 2012
Oracle 12
ODBC connection
I turned on profiling and monitoring and now I can see that there are requests that are stuck and cannot be terminated by the CF monitor; Some are over 200k seconds.
I know I can increase the number of simultaneous requests but I want to solve the underlying problem. As I read the stack traces of these “zombie requests” they are getting stuck on and some are in but some are not. I ran the query in my oracle client and they resolve instantly.
Is there a way to terminate these requests or prevent this from happening at all?
EDIT: The server monitor does not treat these requests as slow or hung, the alerts are not triggering for any of these. Honestly, they should have be going off constantly considering how many of these there are.
Also, the execution time is a mere .003 seconds so what happened? Why doesn't ColdFusion know this?
An example of a "zombie"
The active query that is stuck
We have a similar situation with a different database engine - redbrick, which runs on a unix server. We solved it as follows.
We set up a cron job on the database server to run every 5 minutes. This job uses a combination of unix and awk commands.
This job runs a query against the system table that looks for queries that have been running for more than 120 seconds, where the database account is the one used by ColdFusion. Records are outputted to a file. Something like this:
print "alter system cancel user command userName process " $1 ";"
$1 comes from the query and is the process Id we want to stop.
Then we run the file, which executes all those alter system commands.
With a different database engine, and possible different OS for the database server, the details would be different, but the approach should work.
Edit Starts Here
To prevent recurrence, look at the pages that call the ones with the long running queries. If impatient users are able to repeatedly click something because nothing is happening, do something about that. You can use javascript to make the link/button go away. Alternatively, you can go to an intermediate page with a display for the user and something that carries them through to the real page.

Django/Postgres performance worsening after repeatedly processing the same query

I am running Django on Apache. I have several client computers which should call urllib2.urlopen() and send over some data which my server will process and immediately send back a reply. However, when I am testing this I found a very tricky issue. I have one client repeatedly send the same data to be processed. The first time, it takes around ~20 seconds, second time, it takes about 40 seconds, third time I get a 504 (gateway timeout) error. If I try to send data some more 504 errors randomly pop up. I am pretty sure this is an issue with Postgres as the function that processes the information makes many database calls, however, I do not know why the performance of Postgres would decline so much. I have tried several database optimization tricks, including this one: (http://stackoverflow.com/questions/1125504/django-persistent-database-connection), to no avail.
Thanks in advance.
Edit: The requests are not coming concurrently. They are coming in back to back and each query involves a lot of SELECTs and JOINs, and there are a few INSERTs and UPDATEs as well. The apache error logs show that it is just a simple timeout, where the function to process the client posted data takes over 90 seconds.
If it's really Postgres, then you should turn on the logging of slow statements in the Postgres configuration to find out which statement exactly is taking so much time.
This can be done by setting the configuration property log_min_duration.
Details are in the manual:
http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
You say the function makes "many database calls" so I'd start with a very low number, or even 0 to log the duration of all statements, then you might be able to identify the slow ones.
It could also be a locking issued. Maybe the first call does not end its transaction properly and subsequent calls run into a timeout when waiting for a resource.
You can verify this by checking the system view pg_locks after the first call.
Have you checked the Apache error_logs? Have you set django DEBUG = True or ADMINS = ('email#addr.com',) so you can get a detailed error report about what the actual cause of the issue is? If so, how about pasting some information here.
Why are you certain that it's postgres? Have you done diagnostics to come to that conclusion? If so, please let us know.
Are you running apache with mod_wsgi? How many processes and threads have you allocated to your django application?
Also, 20 seconds to process the first transaction is a huge amount of time. Perhaps you could show us the view code that is causing the time out. We may be able to help there.
I sincerely doubt that it's going to be postgres alone that is causing the issue. It probably has something to do with application code, or server configuration.

buildforge problem

when i tried to run the job i am getting the error saying that
No server could be found matching all conditions
please any one help me on this
In buildforge, the job is assigned to a selector.
The selector can be thought of as a pointer to an object that can represent either the name of a server, or a set of servers that matches a set of criteria.
When the job executes, the selector for that step attempts to find the server based on the criteria defined in the selector conditions. If it can't find a server that matches the selection conditions, you get the posted error message.
In real life, this error usually indicates the following:
1. The agent on the server is dead, or the server is down. Run bfservertest (or test connection in the buildforge web UI) to see if the agent is functioning. Visit the machine remotely or in person to verify that the server is up. Try restarting the agent service if machine is up and connection test fails.
2. The selector is pointing to a non-existent server because you misspelled the server name.
3. You have conditions defined in the selector that unintentionally exclude all servers from being used.