Gatling: polling a webservice, and failing the scenario on incorrect response-messages - web-services

Hard to write a good title for this question. I am developing a performance test in Gatling for a SOAP Webservice. I'm not very experienced with Gatling so I'm learning things as I go, but this conundrum has me entirely stumped.
One of the scenarios I am implementing a test for is an order-process consisting of several unique consecutive calls to the webservice, one of which is a polling call that returns the current status of the ordering process. Simplified, this call gets a SOAP Response with a status that can be of three types:
PROCESSING - Signifying the order is still processing.
ORDER_OK - Order completed without errors.
EVERYTHING_ELSE - A group of varying error-statuses and other results.
What I want to do, is have Gatling continuously poll the webservice until the processing-status changes - and then check that the status says it completed successfully. Polling continuously is easily implemented, but performing the check after it completes is turning out to be a far greater challenge than it has any business being.
So far, this is what I've done to solve the polling:
exec { session => session.set("status", "PROCESSING") }
.asLongAs(session => session("status").as[String].equals("PROCESSING")) {
exec(http("Poll order")
.post("/MyWebService")
.body(ELFileBody("bodies/ws/pollOrder.xml"))
.check(
status.is(200),
regex("soapFault").notExists,
regex("pollResponse").exists,
xpath("//*[local-name(.)='result']").exists.saveAs("status")
)
).exitHereIfFailed.pause(5 seconds)
}
This snip appears to be performing the polling correctly, it continues to poll until the orderStatus changes from processing to something else. I need to check the status to see if it changed to the response I am interested in however, because I don't know what it is, and only one of the many results it can be should cause the scenario to continue for that user.
A potential fix would be to add more checks in that call that go something like this:
.check(regex("EVERYTHING_ELSE_XYZ")).notExists
The service can return a LOT of different "not a happy day" messages however and I'm only really interested in the two other ones, so it would be preferable for me to be able to do a check only for the two valid happy-day responses. Checking if one exact thing exists seems far more sensible than checking that dozens of things don't.
What I thought I would be able to do was performing a check on the status variable in the users session when the step exits the asLongAs-loop, and continue/exit the scenario for that user. As it's a session-variable I could probably do this in the next step of the total scenario and break the run for that user there, but that would also mean the error is reported in the wrong place, and the next calls fault-% would be polluted by errors from the previous call.
Using pseudocode, being able to do something like this immediately after it exits the asLongAs loop would have been perfect:
if (session("status").as[String].equals("ORDER_OK")) ? continueTheScenario : failTheScenario
but I've not been able to do anything similar to that inside a gatling-chain. It's almost starting to appear impossible to do something like that, but can anyone see a solution to this that I'm not seeing?

Instead of "exists", use "in" to check that the result is one of the 2 valid values.

Related

Django: for loop through parallel process and store values and return after it finishes

I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.
You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).
You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.

Always get Response back APIs?

I forget, is there ever a situation where you may not get an http response back? Let's say you send a request to some API, and it bombs on their side. They're supposed to set a status code if that happens but I assume there have to be times where there could be other variables that could fail in which you might not get a response back.
I'm trying to setup some of my TDD. I think testing whether I get a non-null response back is a good first 'simplest as possible' test to start out with.
Well, I would suggest that having a test for checking only that response is not null is almost worthless. TDD is not about writing infinite little tests to develop something (like testing that constructor actually creates an object etc.), but that is another topic altogether.
Back on the topic, there could be a situation where the network fails, so you wouldn't get a response at all.

Saving the same UIManagedDocument on two different devices generates error

Should the same UIManagedDocument be open on both of my devices, and I save (using the following code):
[self.documentDatabase.managedObjectContext performBlockAndWait:^{
STNoteLabelCell *cell = (STNoteLabelCell *)[self.noteTableView cellForRowAtIndexPath:indexPath];
[cell setNote:newNote animated:YES];
}];
I am told that the UIManagedDocuments documentState is changed to UIDocumentStateSavingError then I get this error:
CoreData: error: (1) I/O error for database at /var/mobile/Applications/some-long-id/Documents/Read.dox/StoreContent.nosync/persistentStore. SQLite error code:1, 'cannot rollback - no transaction is active'
2013-05-14 16:30:09.062 myApp[11711:4d23] -[_PFUbiquityRecordImportOperation main](312): CoreData: Ubiquity: Threw trying to get the knowledge vector from the store: <NSSQLCore: 0x1e9e2680> (URL: file://localhost/var/mobile/Applications/some-long-id/Documents/Read.dox/StoreContent.nosync/persistentStore)
Does anybody know why this error happens?
A couple of things...
I think the saving the same document on two different devices is a red herring, as it will actually be working on a local copy of your database on each device - only the transaction logs get uploaded to iCloud.
Secondly, I may be confused, but I don't see anything in the above code snippet that indicates you are performing a save (unless it is triggered by one of those calls, or autosave happens).
What that code snippet does seem to be doing is:
On your database document's child MOC thread, run the following block
of code
And that block of code is doing purely UI related stuff,
nothing to actually do with the database? The only thing that might
be going out to the DB is cellForRowAtIndexPath - and this usually
would be expected to only be doing read operations, not something
that needs to save.
The only other thing.... if the above code does trigger a save - you might have an issue with performing that as a performBlockAndWait. The UIManagedDocument save routines do stuff asynchronously - but they need the run loop to execute before they actually get a chance to run the async part... So by blocking before continuing you may actually be preventing the actual save from being executed or something.
I'm guessing wildly here, but have seen enough with saving managed documents to know to be really careful with which thread things are actually being called from, and that after a save request has been made, to allow the run loop to have a chance to actually do it. To be fair, this has only ever been an issue with calling saveToURL: repeatedly within one method, or in a loop, in which case all the async parts of the saves get queued up and executed at the end, usually to great comical effect.

How do I detect an aborted connection in Django?

I have a Django view that does some pretty heavy processing and takes around 20-30 seconds to return a result.
Sometimes the user will end up closing the browser window (terminating the connection) before the request completes -- in that case, I'd like to be able to detect this and stop working. The work I do is read-only on the database so there isn't any issue with transactions.
In PHP the connection_aborted function does exactly this. Is this functionality available in Django?
Here's example code I'd like to write:
def myview(request):
while not connection_aborted():
# do another bit of work...
if work_complete:
return HttpResponse('results go here')
Thanks.
I don't think Django provides it because it basically can't. More than Django itself, this depends on the way Django interfaces with your web server. All this depends on your software stack (which you have not specified). I don't think it's even part of the FastCGI and WSGI protocols!
Edit: I'm also pretty sure that Django does not start sending any data to the client until your view finishes execution, so it can't possibly know if the connection is dead. The underlying socket won't trigger an error unless the server tries to send some data back to the user.
That connection_aborted method in PHP doesn't do what you think it does. It will tell you if the client disconnected but only if the buffer has been flushed, i.e. some sort of response is sent from the server back to the client. The PHP versions wouldn't even work as you've written if above. You'd have to add a call to something like flush within your loop to have the server attempt to send data.
HTTP is a stateless protocol. It's designed to not have either the client or the server dependent on each other. As a result the state of either is only known when there is a connection is created, and that only occurs when there's some data to send one way or another.
Your best bet is to do as #MattH suggested and do this through a bit of AJAX, and if you'd like you can integrate something like Node.js to make client "check-ins" during processing. How to set that up properly is beyond my area of expertise, though.
So you have an AJAX view that runs a query that takes 20-30 seconds to process requested in the background of a rendered page and you're concerned about wasted resources for when someone cancels the page load.
I see that you've got options in three broad categories:
Live with it. Improve the situation by caching the results in case the user comes back.
Make it faster. Throw more space at a time/space trade-off. Maintain intermediate tables. Precalculate the entire thing, etc.
Do something clever with the browser fast-polling a "is it ready yet?" query and the server cancelling the query if it doesn't receive a nag within interval * 2 or similar. If you're really clever, you could return progress / ETA to the nags. However, this might not have particularly useful behaviour when the system is under load or your site is being accessed over limited bandwidth.
I don't think you should go for option 3 because it's increasing complexity and resource usage for not much gain.

Approach for REST request with long execution time?

We are building a REST service that will take about 5 minutes to execute. It will be only called a few times a day by an internal app. Is there an issue using a REST (ie: HTTP) request that takes 5 minutes to complete?
Do we have to worry about timeouts? Should we be starting the request in a separate thread on the server and have the client poll for the status?
This is one approach.
Create a new request to perform ProcessXYZ
POST /ProcessXYZRequests
201-Created
Location: /ProcessXYZRequest/987
If you want to see the current status of the request:
GET /ProcessXYZRequest/987
<ProcessXYZRequest Id="987">
<Status>In progress</Status>
<Cancel method="DELETE" href="/ProcessXYZRequest/987"/>
</ProcessXYZRequest>
when the request is finished you would see something like
GET /ProcessXYZRequest/987
<ProcessXYZRequest>
<Status>Completed</Status>
<Results href="/ProcessXYZRequest/Results"/>
</ProcessXYZRequest>
Using this approach you can easily imagine what the following requests would give
GET /ProcessXYZRequests/Pending
GET /ProcessXYZRequests/Completed
GET /ProcessXYZRequests/Failed
GET /ProcessXYZRequests/Today
Assuming that you can configure HTTP timeouts using whatever framework you choose, then you could request via a GET and just hang for 5 mins.
However it may be more flexible to initiate an execution via a POST, get a receipt (a number/id whatever), and then perform a GET using that 5 mins later (and perhaps retry given that your procedure won't take exactly 5 mins every time). If the request is still ongoing then return an appropriate HTTP error code (404 perhaps, but what would you return for a GET with a non-existant receipt?), or return the results if available.
As Brian Agnew points out, 5 minutes is entirely manageable, if somewhat wasteful of resources, if one can control timeout settings. Otherwise, at least two requests must be made: The first to get the result-producing process rolling, and the second (and third, fourth, etc., if the result takes longer than expected to compile) to poll for the result.
Brian Agnew and Darrel Miller both suggest similar approaches for the two(+)-step approach: POST a request to a factory endpoint, starting a job on the server, and later GET the result from the returned result endpoint.
While the above is a very common solution, and indeed adheres to the letter of the REST constraints, it smells very much of RPC. That is, rather than saying, "provide me a representation of this resource", it says "run this job" (RPC) and then "provide me a representation of the resource that is the result of running the job" (REST). EDIT: I'm speaking very loosely here. To be clear, none of this explicitly defies the REST constraints, but it does very much resemble dressing up a non-RESTful approach in REST's clothing, losing out on its benefits (e.g. caching, idempotency) in the process.
As such, I would rather suggest that when the client first attempts to GET the resource, the server should respond with 202 "Accepted" (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.3), perhaps with "try back in 5 minutes" somewhere in the response entity. Thereafter, the client can poll the same endpoint to GET the result, if available (otherwise return another 202, and try again later).
Some additional benefits of this approach are that single-use resources (such as jobs) are not unnecessarily created, two separate endpoints need not be queried (factory and result), and likewise the second endpoint need not be determined from parsing the response from the first, thus simpler. Moreover, results can be cached, "for free" (code-wise). Set the cache expiration time in the result header according to how long the results are "valid", in some sense, for your problem domain.
I wish I could call this a textbook example of a "resource-oriented" approach, but, perhaps ironically, Chapter 8 of "RESTful Web Services" suggests the two-endpoint, factory approach. Go figure.
If you control both ends, then you can do whatever you want. E.g. browsers tend to launch HTTP requests with "connection close" headers so you are left with fewer options ;-)
Bear in mind that if you've got some NAT/Firewalls in between you might have some drop connections if they are inactive for some time.
Could I suggest registering a "callback" procedure? The client issues the request with a "callback end-point" to the server, gets a "ticket". Once the server finishes, it "callbacks" the client... or the client can check the request's status through the ticket identifier.