I'm running into an unusual situation that I can't get to the bottom of.
We have a rails 4.1 application running on jRuby 1.7.12 inside Torquebox 3.1.0. One of the API endpoints retrieves a list of objects.
Currently the database has just under 7000 records in. When an API request is made, these are queried and rendered in JSON using the ActiveModel::Serializersgem. This all works as we'd expect, and doing this in a rails console, it works perfectly.
The problem lies when making an actual API request. It seems to work as expected, and looking at the rails log, there is the output
Completed 200 OK in 6354ms (Views: 5479.0ms | ActiveRecord: 867.0ms)
At this point, I expect to see the data returned from the server, however it takes a good 2.7 minutes to actually see a response from the server. I've tried making the request from Chrome, Safari and even in curl just to make sure it's not a weird browser issue but not having any luck.
I've implemented some caching within the serialisers as described here. I'm pretty sure this isn't the issue however as it works as expected in a console, so I'm really confused.
What else could be going on that is causing the 2+ minute delay. During this time I'm seeing around a 100% CPU usage for Java, so something is definitely going on.
Well it turns out it was the bullet gem taking up all the time. It seems that with that many records, detecting whether there is an N+1 query is quite time consuming. Simply disabling bullet reduces the time to exactly what the rails console says it is.
Related
We are seeing a random error that seems to be caused by two requests' data getting mixed up. We receive a request for quoting shipping costs on an Order, but the request fails because the requested Order is not accessible by the requesting account. I'm looking for anyone who can provide an inkling on what might be happening here, I haven't found anything on google, the official flask help channels, or SO that looks like what we're experiencing.
We're deployed on AWS, with apache, mod_wsgi, 1 process, 15 threads, about 10 instances.
Here's the code that sends the email:
msg = f"Order ID {self.shipping.order.id} is not valid for this Account {self.user.account_id}"
body = f"Error:<br/>{msg}<br/>Request Data:<br/>{request.data}<br/>Headers:<br/>{request.headers}"
send_email(msg, body, "devops#*******.com")
request_data = None
The problem is that in that scenario we email ourselves with the error and the request data, and the request data we're getting, in many cases, would've never landed in that particular piece of code. It can be a request from the frontend to get the current user's settings, for example, that make no reference to any orders, nevermind trying to get a shipping quote for it.
Comparing the application logs with apache's access_log, we see that, in all cases, we got two requests on the same instance, one requesting the quoting, and another which is the request that is actually getting logged. We don't know whether these two requests are processed by the same thread in rapid succession, or by different threads, but they come so close together that I think the latter is much more probable. We have no way of univocally tying the access_log entries with the application logging, so far, so we don't know which one of the requests is logging the error, but the fact is that we're getting routed to a view that does not correspond to the request's content (i.e., we're not sure whether the quoting request is getting the wrong request object, or if the other one is getting routed to the wrong view).
Another fact that is of interest is that we use graphql, so part of the routing is done after flask/werkzeug do theirs, but the body we get from flask.request at the moment the error shows up does not correspond with the graphql function/mutation that gets executed. But this also happens in views mapped directly through flask. The user is looked up by the flask-login workflow at the very beginning, and it corresponds to the "bad" request (i.e., the one not for quoting).
The actual issue was a bug on one of python-graphql's libraries (promise), not on Flask, werkzeug or apache. It was not the request data that was "moving" to a different thread, but a different thread trying to resolve the promise for a query that was supposed to be handled elsewhere.
I'm running on Windows 10 with the latest version, 7.21.1. I imported the example collection featured in their documentation - https://learning.postman.com/docs/postman/collection-runs/building-workflows/. I run Request 1 in the collection through the Runner and Request 4 does not trigger as it seems it should. It is setup in Tests and seems correct based on their documentation. I have my own collection I was trying this with, but when it was not working I then tried this sample and realized there was something else amiss.
Any assistance would be helpful! I can provide more information if needed.
This is what the example collection has in Request 1 under Tests: postman.setNextRequest('Request 4');
Thanks!
I figured this out. I didn't understand the flow of how the operation worked or the flow of runner. All requests that could be called next need to be selected in runner. And this operation works like a goto where all requests after the request that is run next all run. Closing this out.
I've been evaluating moving our Mapping and Routing apps to use HERE's Rest API. I've been testing some scenarios to proof it out and one I can't seem to get working correctly is the Batch Geocoding.
The submission of the data to Geocode works fine and I do get a valid RequestID back but when I poll for the status of the Batch Job the status always says "accepted" but never seems to change.
I am using a developer account that has a 90 day trial. Could there be a limitation due to the type of account?
Looks like it's a queue issue, except mine has been going on for nearly a week.
HERE API never runs batch job, always returns accepted status
I have a rails 4 application which has some API methods and those methods consumes time for computation and generating huge JSON response for clients. The problem is that these requests block the entire app. And only one 1 user(request) can be served at the same time. The request runs for a long time to generate JSON response. When the first request is running, and the application receives new request's then all those new request's are failed.. How to solve this.. Unfortunately rails doesn't this automatically..
I have gone through similar threads on SO, but was not able to find solution for Rails 4 application.Please share your experience and guide me in the right direction to solve this issue.
Thanks!
The best practice for such long running API calls is to make them asynchronous. That way the user thread is not blocked.
The two popular gems to run background jobs are
1) https://github.com/collectiveidea/delayed_job
2) https://github.com/resque/resque
I am running Django on Apache. I have several client computers which should call urllib2.urlopen() and send over some data which my server will process and immediately send back a reply. However, when I am testing this I found a very tricky issue. I have one client repeatedly send the same data to be processed. The first time, it takes around ~20 seconds, second time, it takes about 40 seconds, third time I get a 504 (gateway timeout) error. If I try to send data some more 504 errors randomly pop up. I am pretty sure this is an issue with Postgres as the function that processes the information makes many database calls, however, I do not know why the performance of Postgres would decline so much. I have tried several database optimization tricks, including this one: (http://stackoverflow.com/questions/1125504/django-persistent-database-connection), to no avail.
Thanks in advance.
Edit: The requests are not coming concurrently. They are coming in back to back and each query involves a lot of SELECTs and JOINs, and there are a few INSERTs and UPDATEs as well. The apache error logs show that it is just a simple timeout, where the function to process the client posted data takes over 90 seconds.
If it's really Postgres, then you should turn on the logging of slow statements in the Postgres configuration to find out which statement exactly is taking so much time.
This can be done by setting the configuration property log_min_duration.
Details are in the manual:
http://www.postgresql.org/docs/current/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT
You say the function makes "many database calls" so I'd start with a very low number, or even 0 to log the duration of all statements, then you might be able to identify the slow ones.
It could also be a locking issued. Maybe the first call does not end its transaction properly and subsequent calls run into a timeout when waiting for a resource.
You can verify this by checking the system view pg_locks after the first call.
Have you checked the Apache error_logs? Have you set django DEBUG = True or ADMINS = ('email#addr.com',) so you can get a detailed error report about what the actual cause of the issue is? If so, how about pasting some information here.
Why are you certain that it's postgres? Have you done diagnostics to come to that conclusion? If so, please let us know.
Are you running apache with mod_wsgi? How many processes and threads have you allocated to your django application?
Also, 20 seconds to process the first transaction is a huge amount of time. Perhaps you could show us the view code that is causing the time out. We may be able to help there.
I sincerely doubt that it's going to be postgres alone that is causing the issue. It probably has something to do with application code, or server configuration.