Which is the best way to retrieve data from a remote server using concurrent calls? - concurrency

I'm working on retrieving data like Products, Orders eCommerce platforms such as BigCommerce, Shopify, etc., and save it in our own databases. To improve the data retrieval speed from their APIs, we're planning to use the Bluebird library.
Earlier, the data retrieval logic was like retrieving one page at a time. Since we're planning to make concurrent calls "n" number of pages will be retrieved concurrently.
For example, Bicommerce allows us to make up to 3 concurrent calls at a time. So, we need to make the concurrent calls so that we will not retrieve the same page more than once, and in case if a request failed then a request for that page will be resent.
What's the best way to implement this? One idea that strikes my mind is,
One Possible Solution - Keep an index of ongoing requests in the database and update it on the API completion, so we will know which are unsuccessful.
Is there a better way of doing this? Any suggestions/ideas on this would be highly appreciated.

Related

Where should I sort and filter? back-end or front-end?

I am running into a conceptual conflict on whether I should sort and filter on frontend or backend.
Some suggested the logic should be at backend and limited number of data ie) 10-100 results at a time should be provided to the client if you have like millions of data set to reduce the page load time. What I am confused is, what if there are many clients sorting and filtering concurrently (ie 100 users) If this is the case, then you are having to sort and filter millions of records 100 times, constantly, which I think will slow down the server.
If I assume that my data set is around 10000 - 100000 and I have 10-50 users using the app concurrently, and if initial time load doesn't matter so much as it would be a private enterprise app ie) ERP, where should filter and sort logic live?
There is no one right answer to your question, following are pros and cons for your question.
Sorting at server end:
Server overhead by sorting the data.
Less data transfer for the user since you are already sending filtered data.
User experience might be bad if sorting take lots of time, user would be waiting with a blank screen.
Sorting at user end:
Increased network usage for both server and the user, hence long running process for the server.
Might give slightly better user experience, it might increase the overall increase in page load time but user would have some data on his screen.
Best would be to user best of the both world, rather than sticking to one.
You can use caching at server level, that may increase your application performance.

Pattern for sharing a large amount of data between the web application and a backend service in a Service Oriented Application

I have a web application which performs CRUD operations on a database. At times, I have to run a backend job to do a fair amount of number crunching/analytics on this data. This backend job will be written as a different service in a concurrent language, which will be independent of the main web application.
But actually sharing the DB between the 2 applications is probably not a best practice as it will lead to tight coupling. What is the right pattern to use here? Since this data might amount to millions of DB rows, I'm not sure using a message queue / REST APIs would be the best way to go.
This is perhaps a very common scenario and many companies/devs have already solved this problem. Any pointers will be helpful.
From the question, it would seem that the background job does not modify state of database.
Simplest way to avoid performance hit on main application, while there is a background job running, is to take database dump and perform analysis on that dump.

Java web application for multiple users

I need to design and implement a Java web application that can be used by multiple users at the same time. The data that is handled by this application is going to be huge and may take about 5 minutes for a page to display the results(database records).
I had designed this application using HTML, Servlets and JSP. But when two users would try to get the records, only one user was able to view the results while the other faced an error.
I always thought a web application would take care of handling multiple users but this is not the case.
Any insights on this would be highly appreciated.
Thanks.
I always thought a web application would take care of handling multiple users but this is not the case.
They do if they're written correctly. Obviously yours is not. That's all we can tell you unless you give more information, most importantly details of the error shown to the second user.
One possibility is that everything is OK on the web layer but your DB access for the first user causes an exclusive lock so that the second user cannot access the data at the same time. This could be fixed by using non-exclusive read locks. How to do that depends mainly on what DB you're using.
Getting concurrency right requires you to choose the correct tools and use them correctly. It doesn't just happen magically because it's a web app.
What are are using to develop this web-application? If you are developing it in your own way from the start I must say you are trying to re-invent the same wheel which has been already created and enhanced by very solid frameworks.
I suggest you analyze your requirements thoroughly and study some available frameworks. Let them handle the things like multi threading and other aspects in the best possible manner.
Handling multiple request at a time is a container work and as an application developer we have to concentrate how we are handling and processing those requret being forwarded by the container.
I must suggest you to get some insight how web-application work and how request -response cycle happens

Is there a nice way to exchange django objects between 2 servers?

I have 2 django servers, with their own database, I want to exchange some specific objects between them over the http protocol.
Actually, I planed to create some views to generate XML output on one side to be imported on the other side. Is there a nicer way ?
Is there a reason this needs to happen through http?
If you just want to read data from one server to be used on the other, you could create a simple API that returns a representation of the object you queried for (in xml/json or whatever other format you wanted).
If there is going to be a decent amount of processing going on, or slow communication, and you don't need it to happen real time (in the request/response cycle), you could look at a message queue. Something like RabbitMQ for instance.
If you want both servers to have direct access to both databases, you could try to take advantage of Django's multiple database support.
If it's more of a one-off copy of data, just write a small (non-Django) script to do it.

How can I scale a webapp with long response time, which currently uses django

I am writing a web application with django on the server side. It takes ~4 seconds for server to generate a response to the user. It makes use of a weather api. My application has to make ~50 query to that api for each user request.
Server side uses urllib of python for using the weather api. I used pythons threading to speed up the process because urllib is synchronous. I am using wsgi with apache. The problem is wsgi stack is fully synchronous and when many users use my application, they have to wait for one anothers request to finish. Since each request takes ~4 seconds, this is unacceptable.
I am kind of stuck, what can I do?
Thanks
If you are using mod_wsgi in a multithreaded configuration, or even a multi process configuration, one request should not block another from being able to do something. They should be able to run concurrently. If using a multithreaded configuration, are you sure that you aren't using some locking mechanism on some resource within your own application which precludes requests running through the same section of code? Another possibility is that you have configured Apache MPM and/or mod_wsgi daemon mode poorly so as to preclude concurrent requests.
Anyway, as mentioned in another answer, you are much better off looking at caching strategies to avoid the weather lookups in the first place, or offloading to client.
50 queries to an outside resource per request is probably a bad place to be, and probably not neccesary at all.
The weather doesn't change all that quickly, and so you can probably benefit enormously by just caching results for a while. Then it doesn't matter how many requests you're getting, you don't need to do more than a few queries per day
If that's not your situation, you might be able to get the client to do the work for you. Refactor the code so that the weather api aggregation happens on the client in javascript, rather than funneling it all through the server.
Edit: based on comments you've posted, what you are asking for probably cannot be optimized within the constraints of the API you are using. The problem is that the service is doing a good job of abstracting away the differences in the many sources of weather information they aggregate into a nearest location query. after all, weather stations provide only point data.
If you talk directly to the technical support people that provide the API, you might find that they are willing to support more complex queries (bounding box), for which they will give you instructions. More likely, though, they abstract that away because they don't want to actually reveal the resolution that their API actually provides, or because there is some technical reason in the way that they model their data or perform their calculations that would make such queries too difficult to support.
Without that or caching, you are just out of luck.