REST API service has a limit of requests (say a maximum of 100 requests per minute). In Django, I am trying to allow USERs to access such API and retrieve data in real-time to update SQL tables. Therefore there is a problem that if multiple users are trying to access the API, the limit of requests is likely to be exceeded.
Here is a code snippet as an example of how I currently perform requests - each user will add a list of objects he wants to request and run request_engine().start(object_list) to access the API. I use multithreading to speed up requests. I also allow retrying failed API requests via setting a limit on the number of requests for each request object upper_limit.
As I understand there should be some queue for API requests. I anticipate there must be a more elegant solution for this, however, I could not find any similar examples. How can one implement/rewrite this for multiUSER usage with Django?
import requests
from multiprocessing.dummy import Pool as ThreadPool
N=50 # number of threads
upper_limit=1 # limit on the number of requests for a single object
class request_engine():
def __init__(self):
pass
def start(self,objs):
self.objs={obj:{'status':0,'data':None} for obj in objs}
done=False
while not done:
self.parallel_requests()
done=all(_['status']>upper_limit or _['status']==-1 for obj,_ in self.objs.items())
return dict(self.objs)
def single_request(self,request_obj):
URL = f"https://reqres.in/api/users?page={request_obj}"
r = requests.get(url = URL)
if r.ok:
res = r.json()
self.objs[request_obj]['status']=-1
self.objs[request_obj]['data']=res
else:
self.objs[request_obj]['status']+=1
def parallel_requests(self):
objs=[obj for obj,_ in self.objs.items() if _['status']!=-1 and _['status']<=upper_limit]
pool = ThreadPool(N)
pool.map(self.single_request, objs)
pool.close()
pool.join()
objs=[1,2,3,4,5,6,7,7,8,234,124,24,535,6,234,24,4,1,3,4,5,4,3,5,3,1,5,2,3,5,3]
result=request_engine().start(objs)
print([_['status'] for obj,_ in result.items()])
# status corresponds to the number of unsuccessful requests
# status=-1 implies success of the request
Thanks in advance.
Related
I am trying to setup a Flask API limiter for each user. The following code limits an IP Address to 3 request per minute.
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
limiter = Limiter(
key_func=get_remote_address, #limit by IP Address
storage_uri="redis://localhost:6379",
strategy="moving-window"
)
#api.route('/api/submit-code')
#limiter.limit('3 per minute')
def submit_code():
user_id = session.get("user_id")
if not user_id:
return jsonify({"error": "Unauthorized"}), 401
How can I change this to limit the user instead of IP address? I am using server sessions so I'm not sure how to include user_id in the limiter decorator.
I ended up modifying the decorator to the following:
#limiter.limit('3 per minute', key_func = lambda : session.get("user_id"))
The limiter will not apply to users that are not logged in. You can simply add a check in the function for this like the OP.
I have two containers (docker) running the application and I'm trying to redirect the request from one of the container to another. The container where I'm redirecting from has this code and the 172.17.0.3 is the IP of the second container. I have seen that it can be pinged. In the other container I don't have the else part and no if condition check. When I run a curl request to this container using another client container in the same network curl http://172.17.0.2:3333?count=100, it should ideally redirect but I get Internal Server Error as the response. However, when I login to the container 2 and run curl, I get redirected to ... response.
from flask_restful import Resource, Api
app = Flask(__name__)
api = Api(app)
class Greeting (Resource):
def get(self):
offload = True
if offload == False:
count = request.args.get('count')
count = int(count)
for i in range(count):
continue
return count
else:
count = request.args.get('count')
redirect_str = "http://172.17.0.3:3333?count=" + count
return redirect(redirect_str, code=302)
api.add_resource(Greeting, '/') # Route_1
if __name__ == '__main__':
app.run('0.0.0.0', '3333')
I want to be able to wait for the response back from the server 172.17.0.3 and once I receive the message, send the response back to the client. Can anyone tell me how it can be done?
You need to send a request to the other container instead of redirect to it if you want to wait for its response. Using e.g. the requests library this would look something like
import requests
resp = requests.get('http://172.17.0.3:3333?count=' + count)
return resp.text
Check request's Quickstart guide for more info: https://requests.readthedocs.io/en/master/user/quickstart.
I'll first explain the architecture of my system and then move to the question:
I have a REST API which is used as my API gateway. This server is build using Flask. I also have RabbitMQ cluster, and a client I wrote that listens to a specific queue and executes the tasks its getting.
Until now, all of my requests were asynchronous, so once a request has reached to the API gateway, a callback_uri field with URL to POST the results to provided as part of the request, and the API gateway was just responsible for sending the task to RabbitMQ and the worker processed the task, and at the end POSTed the results back to the callback URL.
My question is:
I want a new endpoint to be synchronous in the sense of, that the processing will be done still by the same worker as before, but I want to get the results back to the API gateway to return to the user and release the connection.
My current solution:
I'm sending a unique callback_uri as part of the request to the worker as before, but now the specific endpoint is implemented by my API gateway and allow both POST and GET methods, so the worker can POST the results once it finished, and my API gateway keeps polling the callback URL until a result is available and then return the result to the client.
Is there any other preferred option other than having a busy-waiting HTTP worker polling its own endpoint to get the results? but still be synchronous so the connection released only when the results become available?
Code for illustration only:
#app.route('/long_task', methods=['POST'])
#sync_request
def long_task():
try:
if request.get_json() is None:
return ERROR_MSG_NO_JSON, 400
create_and_send_request_to_rabbitmq()
return '', 200
except Exception as ex:
return ERROR_MSG_NO_DATA, 400
def sync_request(func):
def call(*args, **kwargs):
create_callback_uri()
result = func(*args, **kwargs)
status_code = result[1]
if status_code == 200:
result = get_callback_result()
return result
return call
def get_callback_result():
callback_uri = request.get_json()['callback_uri']
has_answer = False
headers = {'content-type': 'application/json'}
empty_response = {}
content = json.dumps(empty_response)
try:
with Timeout(seconds=SYNC_REQUEST_TIMEOUT_SECONDS):
while not has_answer:
response = requests.get(callback_uri, headers=headers)
if response.status_code == 200:
has_answer = True
content = response.content
else:
time.sleep(0.2)
except TimeoutException:
log.debug('Timed out on sync request for request %s ' % request)
return content, 200
So if I understand you correctly you want your backend to wait for the response from some worker (via RabbitMQ). You can achieve that by implementing rpc over rabbitmq. The key idea is to use the correlation id.
But definitely the most efficient way would be to run the client over websockets (or raw tcp socket if it is not a browser) and notify him directly when the job is done. That way you don't lock resources (client connection, rabbitmq queues) and you avoid performance hit (rpc).
I have a long running celery task which iterates over an array of items and performs some actions.
The task should somehow report back which item is it currently processing so end-user is aware of the task's progress.
At the moment my django app and celery seat together on one server, so I am able to use Django's models to report the status, but I am planning to add more workers which are away from Django, so they can't reach DB.
Right now I see few solutions:
Store intermediate results manually using some storage, like redis or mongodb making then available over the network. This worries me a little bit because if for example I will use redis then I should keep in sync the code on a Django side reading the status and Celery task writing the status, so they use the same keys.
Report status to the Django back from celery using REST calls. Like PUT http://django.com/api/task/123/items_processed
Maybe use Celery event system and create events like Item processed on which django updates the counter
Create a seperate worker which runs on a server with django which holds a task which only increases items proceeded count, so when the task is done with an item it issues increase_messages_proceeded_count.delay(task_id).
Are there any solution or hidden problems with the ones I mentioned?
There are probably many ways to achieve your goal, but here is how I would do it.
Inside your long running celery task set the progress using django's caching framework:
from django.core.cache import cache
#app.task()
def long_running_task(self, *args, **kwargs):
key = "my_task: %s" % self.result.id
...
# do whatever you need to do and set the progress
# using cache:
cache.set(key, progress, timeout="whatever works for you")
...
Then all you have to do is make a recurring AJAX GET request with that key and retrieve the progress from cache. Something along those lines:
def task_progress_view(request, *args, **kwargs):
key = request.GET.get('task_key')
progress = cache.get(key)
return HttpResponse(content=json.dumps({'progress': progress}),
content_type="application/json; charset=utf-8")
Here is a caveat though, if you are running your server as multiple processes, make sure that you are using something like memcached, because django's native caching will be inconsistent among the processes. Also I probably wouldn't use celery's task_id as a key, but it is sufficient for demonstration purpose.
Take a look at flower - a real-time monitor and web admin for Celery distributed task queue:
https://github.com/mher/flower#api
http://flower.readthedocs.org/en/latest/api.html#get--api-tasks
You need it for presentation, right? Flower works with websockets.
For instance - receive task completion events in real-time (taken from official docs):
var ws = new WebSocket('ws://localhost:5555/api/task/events/task-succeeded/');
ws.onmessage = function (event) {
console.log(event.data);
}
You would likely need to work with tasks ('ws://localhost:5555/api/tasks/').
I hope this helps.
Simplest:
Your tasks and django app already share access one or two data stores - the broker and the results backend (if you're using one that is different to the broker)
You can simply put some data into one or other of these data stores that indicates which item the task is currently processing.
e.g. if using redis simply have a key 'task-currently-processing' and store the data relevant to the item currenlty being processed in there.
You can use something like Swampdragon to reach the user from the Celery instance (you have to be able to reach it from the client thou, take care not to run afoul of CORS thou). It can be latched onto the counter, not the model itself.
lehins' solution looks good if you don't mind your clients repeatedly polling your backend. That may be fine but it gets expensive as the number of clients grows.
Artur Barseghyan's solution is suitable if you only need the task lifecycle events generated by Celery's internal machinery.
Alternatively, you can use Django Channels and WebSockets to push updates to clients in real-time. Setup is pretty straightforward.
Add channels to your INSTALLED_APPS and set up a channel layer. E.g., using a Redis backend:
CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels_redis.core.RedisChannelLayer",
"CONFIG": {
"hosts": [("redis", 6379)]
}
}
}
Create an event consumer. This will receive events from Channels and push them via Websockets to the client. For instance:
import json
from asgiref.sync import async_to_sync
from channels.generic.websocket import WebSocketConsumer
class TaskConsumer(WebsocketConsumer):
def connect(self):
self.task_id = self.scope['url_route']['kwargs']['task_id'] # your task's identifier
async_to_sync(self.channel_layer.group_add)(f"tasks-{self.task_id}", self.channel_name)
self.accept()
def disconnect(self, code):
async_to_sync(self.channel_layer.group_discard)(f"tasks-{self.task_id}", self.channel_name)
def item_processed(self, event):
item = event['item']
self.send(text_data=json.dumps(item))
Push events from your Celery tasks like this:
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
...
async_to_sync(get_channel_layer.group_send)(f"tasks-{task.task_id}", {
'type': 'item_processed',
'item': item,
})
You can also write an async consumer and/or invoke group_send asynchronously. In either case you no longer need the async_to_sync wrapper.
Add websocket_urlpatterns to your urls.py:
websocket_urlpatterns = [
path(r'ws/tasks/<task_id>/', TaskConsumer.as_asgi()),
]
Finally, to consume events from JavaScript in your client, you can do something like this:
let task_id = 123;
let protocol = location.protocol === 'https:' ? 'wss://' : 'ws://';
let socket = new WebSocket(`${protocol}${window.location.host}/ws/tasks/${task_id}/`);
socket.onmessage = function(event) {
let data = JSON.parse(event.data);
let item = data.item;
// do something with the item (e.g., push it into your state container)
}
this the code I'm using, is there anyway to make it run faster:
src_uri = boto.storage_uri(bucket, google_storage)
for obj in src_uri.get_bucket():
f.write('%s\n' % (obj.name))
This is an example where it pays to use the underlying Google Cloud Storage API more directly, using the Google API Client Library for Python to consume the RESTful HTTP API. With this approach, it is possible to use request batching to retrieve the names of all objects in a single HTTP request (thereby reducing the extra HTTP request overhead) as well as to use field projection with the objects.get operation (by setting &fields=name) to obtain a partial response so that you aren't sending all the other fields and data over the network (or waiting for retrieval of unnecessary data on the backend).
Code for this would look like:
def get_credentials():
# Your code goes here... checkout the oauth2client documentation:
# http://google-api-python-client.googlecode.com/hg/docs/epy/oauth2client-module.html
# Or look at some of the existing samples for how to do this
def get_cloud_storage_service(credentials):
return discovery.build('storage', 'v1', credentials=credentials)
def get_objects(cloud_storage, bucket_name, autopaginate=False):
result = []
# Actually, it turns out that request batching isn't needed in this
# example, because the objects.list() operation returns not just
# the URL for the object, but also its name, as well. If it had returned
# just the URL, then that would be a case where we'd need such batching.
projection = 'nextPageToken,items(name,selfLink)'
request = cloud_storage.objects().list(bucket=bucket_name, fields=projection)
while request is not None:
response = request.execute()
result.extend(response.items)
if autopaginate:
request = cloud_storage.objects().list_next(request, response)
else:
request = None
return result
def main():
credentials = get_credentials()
cloud_storage = get_cloud_storage_service(credentials)
bucket = # ... your bucket name ...
for obj in get_objects(cloud_storage, bucket, autopaginate=True):
print 'name=%s, selfLink=%s' % (obj.name, obj.selfLink)
You may find the Google Cloud Storage Python Example and other API Client Library Examples helpful in figuring out how to do this. There are also a number of YouTube videos on the Google Developers channel such as Accessing Google APIs: Common code walkthrough that provide walkthroughs.