I have this:
eta_date = date.today()
eta = datetime.combine(eta_date, time.max)
scheduled_task.apply_async(eta=eta)
scheduled_tasks:
#task
def scheduled_task():
for obj in ModelData.objects.all():
send_data(obj)
send_data function sends object to other server as JSON. I use Celery. I want to start task on end of the day but in such a way that one of the objects is sent once per second. How to do it?
allcaps already told you the answer in the comment section, but it's what I would have answered anyway. Just add a sleep after send_data to wait X seconds.
import time
#task
def scheduled_task():
for obj in ModelData.objects.all():
send_data(obj)
time.sleep(1) # You can also use a float here if 1 second is too long
Another option could be to spawn a task per obj in ModelData and set a limit on it to 1s.
#task
def scheduled_task():
for obj in ModelData.objects.all():
send_data_task.delay(obj)
#task(rate_limit='1/s')
def send_data_task(obj):
send_data(obj)
Related
For a scenario with sales orders, I'm needing to execute a task with a given delay.
To accomplish this, I added a task in my tasks.py file like so:
from huey import crontab
from huey.contrib.djhuey import db_task
#db_task(delay=3600)
def do_something_delayed(instance):
print("Do something delayed...by 3600 seconds")
However, this delay setting doesnt seem to delay anything. The task is just scheduled and executed immediately.
What am I doing wrong?
Thanks to coleifer on the GitHub repo:
https://github.com/coleifer/huey/issues/678#issuecomment-1184540964
The task() decorators do not accept a delay parameter, see https://huey.readthedocs.io/en/latest/api.html#Huey.task
I assume you've already read the docs on scheduling/delaying invocations of tasks: https://huey.readthedocs.io/en/latest/guide.html#scheduling-tasks -- but this applies to individual invocations.
If you want your task to always be delayed by 1 hour, the better way would be:
#db_task()
def do_something(instance):
print("Do something")
def do_something_delayed(instance):
return do_something.schedule((instance,), delay=3600)
I have a Django service that register lot of clients and render a payload containing a timer (lets say 800s) after which the client should be suspended by the service (Change status REGISTERED to SUSPENDED in MongoDB)
I'm running celery with rabbitmq as broker as follows:
celery/tasks.py
#app.task(bind=True, name='suspend_nf')
def suspend_nf(pk):
collection.update_one({'instanceId': str(pk)},
{'$set': {'nfStatus': 'SUSPENDED'}})
and calling the task inside Django view like:
api/views.py
def put(self, request, pk):
now = datetime.datetime.now(tz=pytz.timezone(TIME_ZONE))
timer = now + datetime.timedelta(seconds=response_data["heartBeatTimer"])
suspend_nf.apply_async(eta=timer)
response = Response(data=response_data, status=status.HTTP_202_ACCEPTED)
response['Location'] = str(request.build_absolute_uri())
What am I missing here?
Are you asking that your view blocks totally or view is waiting the "ETA" to complete the execution?
Did you receive any error?
Try using countdown parameter instead of eta.
In your case it's better because you don't need to manipulate dates.
Like this: suspend_nf.apply_async(countdown=response_data["heartBeatTimer"])
Let's see if your view will have some different behavior.
I have finally find a work around, since working on a small project, I don't really need Celery + rabbitmq a simple Threading does the job.
Task look like this :
def suspend_nf(pk, timer):
time.sleep(timer)
collection.update_one({'instanceId': str(pk)},
{'$set': {'nfStatus': 'SUSPENDED'}})
And calling inside the view like :
timer = int(response_data["heartBeatTimer"])
thread = threading.Thread(target=suspend_nf, args=(pk, timer), kwargs={})
thread.setDaemon(True)
thread.start()
I have a problem. I'm having a echoserver which will accept clients and process his requirement and it returns the result to client.
Suppose I have two clients and 1 client requirement processing time would be 10 sec and 2 client requirement processing time would be 1 sec.
So when both clients connected to server at a time. how to run both the clients tasks at a time parallely and return the response to specific client which ever finishes first.
I have read that we can achieve this problem using python twisted. I have tried my luck, but Im unable to do it.
Please help me out of this Issue
Your code (https://trinket.io/python/87fd18ca9e) has many mistakes in terms of async design patterns, but I will only address the most blatant mistake. There are a few calls to time.sleep(), this is blocking code and is causing your code to stop until the sleep function is done running. The number 1 rule it async programming is do not use blocking functions! Don't worry, this is a very common mistake and the Twisted and Python async communities are there to help you :) I'll give you a naive solution for your server:
from twisted.internet.protocol import Factory
from twisted.internet import reactor, protocol, defer, task
def sleep(n):
return task.deferLater(reactor, n, lambda: None)
class QuoteProtocol(protocol.Protocol):
def __init__(self, factory):
self.factory = factory
def connectionMade(self):
self.factory.numConnections += 1
#defer.inlineCallbacks
def recur_factorial(self,n):
fact=1
print(n)
for i in range(1,int(n)+1):
fact=fact*i
yield sleep(5) # async sleep
defer.returnValue(str(fact))
def dataReceived(self, data):
try:
number = int(data) # validate data is an int
except ValueError:
self.transport.write('Invalid input!')
return # "exit" otherwise
# use Deferreds to write to client after calculation is finished
deferred_factorial = self.recur_factorial(number)
deferred_factorial.addCallback(self.transport.write)
def connectionLost(self, reason):
self.factory.numConnections -= 1
class QuoteFactory(Factory):
numConnections = 0
def buildProtocol(self, addr):
return QuoteProtocol(self)
reactor.listenTCP(8000, QuoteFactory())
reactor.run()
The main differences are in recur_factorial() and dataReceived(). The recur_factorial() is now utilizing Deferred (search how inlineCallbacks or coroutine's works) which allows for functions to execute after the result is available. So when the data in received, the factorial is calculated, then written to the end user. Finally there's the new sleep() function which allows for an async sleep function. I hope this helps. Keep reading the Krondo blog.
I use scrapy for scraping this site.
I want to save all the sub-categories in an array, then get the corresponding pages (pagination)
first step i have
def start_requests(self):
yield Request(start_urls[i], callback=self.get_sous_cat)
get_sous_cat is a function which gets all the sub-categories of a site, then starts asynchronously jobs to explore the sub-sub-categories recursively.
def get_sous_cat(self,response):
#Put all the categgories in a array
catList = response.css('div.categoryRefinementsSection')
if (catList):
for category in catList.css('a::attr(href)').extract():
category = 'https://www.amazon.fr' + category
print category
self.arrayCategories.append(category)
yield Request(category, callback=self.get_sous_cat)
When all the respective request have been sent, I need to call this termination function :
def pagination(self,response):
for i in range(0, len(self.arrayCategories[i])):
#DO something with each sub-category
I tried this
def start_requests(self):
yield Request(start_urls[i], callback=self.get_sous_cat)
for subCat in range(0,len(self.arrayCategories)):
yield Request(self.arrayCategories[subCat], callback=self.pagination)
Well done, this is a good question! Two small things:
a) use a set instead of an array. This way you won't have duplicates
b) site structure will change once a month/year. You will likely crawl more frequently. Break the spider into two; 1. The one that creates the list of category urls and runs monthly and 2. The one that gets as start_urls the file generated by the first
Now, if you really want to do it the way you do it now, hook the spider_idle signal (see here: Scrapy: How to manually insert a request from a spider_idle event callback? ). This gets called when there are no further urls to do and allows you to inject more. Set a flag or reset your list at that point so that the second time the spider is idle (after it crawled everything), it doesn't re-inject the same category urls for ever.
If, as it seems in your case, you don't want to do some fancy processing on the urls but just crawl categories before other URLs, this is what Request priority property is for (http://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-subclasses). Just set it to e.g. 1 for your category URLs and then it will follow those links before it processes any non-category links. This is more efficient since it won't load those category pages twice as your current implementation would do.
This is not "recursion", it's asynchronous jobs. What you need is a global counter (protected by a Lock) and if 0, do your completion :
from threading import Lock
class JobCounter(object):
def __init__(self, completion_callback, *args, **kwargs):
self.c = 0
self.l = Lock()
self.completion = (completion_callback, args, kwargs)
def __iadd__(self, n):
b = false
with self.l:
self.c += n
if self.c <= 0:
b = true
if b:
f, args, kwargs = self.completion
f(*args, **kwargs)
def __isub__(self, n):
self.__iadd__(-n)
each time you launch a job, do counter += 1
each time a job finishes, do counter -= 1
NOTE : this does the completion in the thread of the last calling job. If you want to do it in a particular thread, use a Condition instead of a Lock, and do notify() instead of the call.
Okay, so I'm working on a scheduler and I was thinking of something like, timeOut(3,print,'hello') and it would print hello every three seconds, I have tried some methods but all failed. Also Using time.sleep for this wouldn't quite work because I need to run other tasks as well besides just one
Edit:
I found out how to do what I needed, sorry for being confusing but this did the trick for what I needed, thanks for answering everyone.
class test:
def __init__(self):
self.objectives = set()
class Objective:
pass
def interval(self,timeout,function,*data):
newObjective = self.Objective()
newObjective.Class = self
newObjective.timeout = time.time()+timeout
newObjective.timer = timeout
newObjective.function = function
newObjective.repeate = True
newObjective.data = data
self.objectives.add(newObjective)
return True
def runObjectives(self):
timeNow = time.time()
for objective in self.objectives:
timeout = objective.timer
if objective.timeout <= timeNow:
objective.function(*objective.data)
if objective.repeate:
objective.timeout = timeNow + timeout
self.main()
else:
self.objectives.remove(objective)
print('removed')
def main(self):
while True:
self.runObjectives()
The standard library includes a module called sched for scheduling. It can be adapted to work in a variety of environments using the delayfunc constructor parameter. Using it your question would likely read:
def event():
scheduler.enter(3, 0, event, ()) # reschedule
print('hello')
Now it depends on how you run the other tasks. Are you using an event loop? It probably has a similar scheduling mechanism (at least twisted has callLater and GObject has timeout_add). If all else fails, you can spawn a new thread and execute a sched.scheduler with time.sleep there.