I am building an Akka Streams application that goes through a couple of steps. There is one particular step that produces 0 or more results, it is not known in advance how many there are. Each of the results has to be processed asynchronously (by the same kind of component) and finally all the results have to be merged.
How should I model this in Akka Streams? I noticed that GraphDsl has a Broadcast element that lets you model a fan out, however this seems to be only possible when the number of outlets is know in advance.
Is there a way in Akka Streams to have something like Broadcast but that fans out to a dynamic number of outlets?
Check out Hubs in this page: https://doc.akka.io/docs/akka/current/stream/stream-dynamic.html?language=scala
There are many cases when consumers or producers of a certain service (represented as a Sink, Source, or possibly Flow) are dynamic and not known in advance. The Graph DSL does not allow to represent this, all connections of the graph must be known in advance and must be connected upfront. To allow dynamic fan-in and fan-out streaming, the Hubs should be used.
It turns out that mapConcat does what I want. Here's a POC:
package streams
import scala.concurrent._
import akka._
import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import scala.util.Random
object StreamsTest extends App {
implicit val system = ActorSystem("TestSystem")
implicit val materializer = ActorMaterializer()
import system.dispatcher
case class SplitRequest(s: String)
def requestHandlerWithMultipleResults(request: SplitRequest): List[String] =
request.s.split(" ").toList
def slowProcessingTask(s: String) = {
Thread.sleep(Random.nextInt(5000))
s.toUpperCase
}
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val source: Source[String, NotUsed] = Source(List(SplitRequest("january february march april may")))
.mapConcat(requestHandlerWithMultipleResults)
.mapAsyncUnordered(5)(s => Future(slowProcessingTask(s)))
val sink = Sink.foreach(println)
source ~> sink
ClosedShape
})
g.run()
}
Output, e.g:
MAY
JANUARY
FEBRUARY
MARCH
APRIL
Related
How to publish multiple messages to pubsub fast? Without multiprocessing and multithreading becaue the code is already in a Thread
The code bellow is publishing 40 messages per second
publisher = pubsub.PublisherClient(
credentials=credentials,
batch_settings=types.BatchSettings(
max_messages=1000, # default is 100
max_bytes=1 * 1000 * 1000, # 1 MiB
max_latency=0.1, # default is 10 ms
)
)
topic_name = 'projects/{project_id}/topics/{topic}'.format(
project_id=PROJECT_ID,
topic=TOPIC_PUBSUB,
)
for data in results:
bytes_json_data = str.encode(json.dumps(data))
future = publisher.publish(topic_name, bytes_json_data)
future.result()
Instead of publishing the messages one at a time and then wait on the future, you should publish them all at once and then wait on the published futures at the end. It will look something like:
from concurrent import futures
...
publish_futures = []
for data in results:
bytes_json_data = str.encode(json.dumps(data))
future = publisher.publish(topic_name, bytes_json_data)
publish_futures.append(future)
...
futures.wait(publish_futures, return_when=futures.ALL_COMPLETED)
There's a detailed example in the docs with sample code.
Take out:
future.result()
Leave it just:
for data in results:
bytes_json_data = str.encode(json.dumps(data))
future = publisher.publish(topic_name, bytes_json_data)
And should take it less than a second to publish 10k messages
I have a Raspberry Pi, using Wifi, it is running a people counting model, and will send the processed image to my server using a ZeroMQ socket. On this server I build a web server, using Flask. I got an error that is not shown during the streaming for the first person who accessed the website, but the next access following the first one will fail:
zmq.error.ZMQError: Address in use
What should i do to more people access my server and can see video streaming?
The server code :
from flask import Flask, render_template, Response, request, jsonify
from flask_restful import Api
import numpy as np
from api.configs import configs
from flask_cors import CORS
import zmq
import cv2
import base64
app = Flask(__name__, static_url_path='/static')
api_restful = Api(app)
cors = CORS(app)
def gen():
context = zmq.Context()
footage_socket = context.socket(zmq.SUB)
footage_socket.bind('tcp://*:{}'.format(configs.SOCKET_PORT))
footage_socket.setsockopt_string(zmq.SUBSCRIBE, np.unicode(''))
while True:
frame = footage_socket.recv()
npimg = np.fromstring(frame, dtype=np.uint8)
source = cv2.imdecode(npimg, 1)
s = cv2.imencode('.jpg', source)[1].tobytes()
yield ( b'--frame\r\n'
b'Content-Type: image/jpeg\r\n\r\n' + s + b'\r\n' )
#app.route('/video_feed')
def video_feed():
return Response( gen(),
mimetype = 'multipart/x-mixed-replace; boundary=frame'
)
if __name__ == '__main__':
app.run( debug = True,
threaded = True,
port = configs.PORT,
host = configs.HOST
)
The Raspberry Pi client code :
import socket
import logging as log
import numpy as np
from api.configs import configs
import zmq
import numpy as np
import cv2
context = zmq.Context()
footage_socket = context.socket(zmq.PUB)
footage_socket.connect( 'tcp://{}:{}'.format( configs.SERVER_HOST,
configs.SOCKET_PORT )
)
time.sleep(1)
cap = cv2.VideoCapture(0)
while True:
while cap.isOpened():
ret, frame = cap.read()
frame = cv2.resize(frame, (640, 480))
encoded, buffer_ = cv2.imencode('.jpg', frame)
footage_socket.send(buffer_)
I omitted some code to make it easier to see
I cannot speak about the details hidden inside the Flask() web-served video re-wrapping/delivery, yet the ZeroMQ part seems to be the trouble, so :
a )it seems that every web-watcher spawns another #app.route( '/video_feed' )-decorated Flask()-instance ( check the mode - if just a thread-based, or a process-based - it is important, whether a central Context() instance may or cannot get shared and efficiently re-used ).
b ) the code, residing inside there hooked Response()-feeder, attempts to .bind() for each subsequent Flask()-served user, which for obvious reasons collides, as the resource ( the address:port ) has been POSACK'ed for exclusive use for the first-came, first served visitor ( any next one must crash and crashed as documented above ).
PROFESSIONAL SOLUTION ( or just a quick-but-dirty-fix ? ):
For indeed small scale use-cases, it shall be enough to reverse the .bind()/.connect()-s, i.e. the PUB will .bind() ( as it does not replicate itself and can serve any small counts of further SUB(s), arriving via .connect()-s )
However, this quick-fix is a bit dirty. This should never be done in a production-grade system. The workload envelope is soon to grow above reasonable levels. The proper architecture should work in a different manner.
instantiate a central zmq.Context( configs.nIOthreads )-instance ( could be shared to re-use in called methods ), having thus also a central scaling of the nIOthreads for performance reasons.
instantiate a one, central SUB-instance, that communicates with Rpi and collects video-frames, to avoid any re-duplications of image/serving
configure both sides to use .setsockopt( zmq.CONFLATE ) to principally avoid any re-broadcasting of a video "History of No Value"
instantiate all the #app-decorated Flask()-tools to re-use the central Context()-instance and to internally connect to a central video-re-PUB-lisher, using a SUB-side of another PUB/SUB archetype, here, using an efficient memory-mapped inproc://, again using .setsockopt( zmq.CONFLATE )
configure all resources to have higher robustness - at least with an explicit .setsockopt( zmq.LINGER, 0 )
detect and handle all error-states, as the ZeroMQ API documents all of them well enough to help you diagnose + manage any failed state professionally and safely
I wrote two types of greenlets. MyGreenletPUB will publish message via ZMQ with message type 1 and message type 2.
MyGreenletSUB instances will subscribe to ZMQ PUB, based on parameter ( "1" and "2" ).
Problem here is that when I start my Greenlets run method in MyGreenletSUB code will stop on message = sock.recv() and will never return run time execution to other greenlets.
My question is how can I avoid this and how can I start my greenlets asynchronous with a while TRUE, without using gevent.sleep() in while methods to switch execution between greenlets
from gevent.monkey import patch_all
patch_all()
import zmq
import time
import gevent
from gevent import Greenlet
class MyGreenletPUB(Greenlet):
def _run(self):
# ZeroMQ Context
context = zmq.Context()
# Define the socket using the "Context"
sock = context.socket(zmq.PUB)
sock.bind("tcp://127.0.0.1:5680")
id = 0
while True:
gevent.sleep(1)
id, now = id + 1, time.ctime()
# Message [prefix][message]
message = "1#".format(id=id, time=now)
sock.send(message)
# Message [prefix][message]
message = "2#".format(id=id, time=now)
sock.send(message)
id += 1
class MyGreenletSUB(Greenlet):
def __init__(self, b):
Greenlet.__init__(self)
self.b = b
def _run(self):
context = zmq.Context()
# Define the socket using the "Context"
sock = context.socket(zmq.SUB)
# Define subscription and messages with prefix to accept.
sock.setsockopt(zmq.SUBSCRIBE, self.b)
sock.connect("tcp://127.0.0.1:5680")
while True:
message = sock.recv()
print message
g = MyGreenletPUB.spawn()
g2 = MyGreenletSUB.spawn("1")
g3 = MyGreenletSUB.spawn("2")
try:
gevent.joinall([g, g2, g3])
except KeyboardInterrupt:
print "Exiting"
A default ZeroMQ .recv() method modus operandi is to block until there has arrived anything, that will pass to the hands of the .recv() method caller.
For indeed smart, non-blocking agents, always use rather .poll() instance-methods and .recv( zmq.NOBLOCK ).
Beware, that ZeroMQ subscription is based on topic-filter matching from left and may get issues if mixed unicode and non-unicode strings are being distributed / collected at the same time.
Also, mixing several event-loops might become a bit tricky, depends on your control-needs. I personally always prefer non-blocking systems, even at a cost of more complex design efforts.
I am trying to build an akka based system which will periodically(every 15 second) send REST request, do some filter, some data cleansing and validation on the received data and save into HDFS.
Below is the code that I wrote.
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Flow, Sink, Source}
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.{HttpRequest, HttpResponse, StatusCodes}
import akka.actor.Props
import akka.event.Logging
import akka.actor.Actor
import scala.concurrent.{ExecutionContext, Future}
import scala.util.Try
import akka.http.scaladsl.client.RequestBuilding._
/**
* Created by rabbanerjee on 4/6/2017.
*/
class MyActor extends Actor {
val log = Logging(context.system, this)
import scala.concurrent.ExecutionContext.Implicits.global
def receive = {
case j:HttpResponse => log.info("received" +j)
case k:AnyRef => log.info("received unknown message"+k)
}
}
object STest extends App{
implicit val system = ActorSystem("Sys")
import system.dispatcher
implicit val materializer = ActorMaterializer()
val ss = system.actorOf(Props[MyActor])
val httpClient = Http().outgoingConnection(host = "rest_server.com", port = 8080)
val filterSuccess = Flow[HttpResponse].filter(_.status.isSuccess())
val runnnn = Source.tick(
FiniteDuration(1,TimeUnit.SECONDS),
FiniteDuration(15,TimeUnit.SECONDS),
Get("/"))
.via(httpClient)
.via(filterSuccess)
.to(Sink.actorRef(ss,onCompleteMessage = "done"))
runnnn.run()
}
The problem I am currently facing is,
Even though I used a repeat/tick with source, I can see the result once. It's not repetitively firing the request.
I am also trying to find grouping the result of say 50 such request, coz as I will be writing it to hadoop, I cant write every request, as it will flood HDFS with multiple file.
You are not consuming the responses you are getting back from the HTTP call. It is compulsory to consume the entity bytes returned by Akka HTTP, even if you are not interested in them.
More about this can be found in the docs.
In your example, as you are not using the response entity, you can just discard its bytes. See example below:
val runnnn = Source.tick(FiniteDuration(1,TimeUnit.SECONDS),FiniteDuration(15,TimeUnit.SECONDS),Get("/"))
.via(httpClient)
.map{resp => resp.discardEntityBytes(); resp}
.via(filterSuccess)
.to(Sink.actorRef(ss,onCompleteMessage = "done"))
I run ownCloud on my webspace for a shared calendar. Now I'm looking for a suitable python library to get read only access to the calendar. I want to put some information of the calendar on an intranet website.
I have tried http://trac.calendarserver.org/wiki/CalDAVClientLibrary but it always returns a NotImplementedError with the query command, so my guess is that the query command doesn't work well with the given library.
What library could I use instead?
I recommend the library, caldav.
Read-only is working really well with this library and looks straight-forward to me. It will do the whole job of getting calendars and reading events, returning them in the iCalendar format. More information about the caldav library can also be obtained in the documentation.
import caldav
client = caldav.DAVClient(<caldav-url>, username=<username>,
password=<password>)
principal = client.principal()
for calendar in principal.calendars():
for event in calendar.events():
ical_text = event.data
From this on you can use the icalendar library to read specific fields such as the type (e. g. event, todo, alarm), name, times, etc. - a good starting point may be this question.
I wrote this code few months ago to fetch data from CalDAV to present them on my website.
I have changed the data into JSON format, but you can do whatever you want with the data.
I have added some print for you to see the output which you can remove them in production.
from datetime import datetime
import json
from pytz import UTC # timezone
import caldav
from icalendar import Calendar, Event
# CalDAV info
url = "YOUR CALDAV URL"
userN = "YOUR CALDAV USERNAME"
passW = "YOUR CALDAV PASSWORD"
client = caldav.DAVClient(url=url, username=userN, password=passW)
principal = client.principal()
calendars = principal.calendars()
if len(calendars) > 0:
calendar = calendars[0]
print ("Using calendar", calendar)
results = calendar.events()
eventSummary = []
eventDescription = []
eventDateStart = []
eventdateEnd = []
eventTimeStart = []
eventTimeEnd = []
for eventraw in results:
event = Calendar.from_ical(eventraw._data)
for component in event.walk():
if component.name == "VEVENT":
print (component.get('summary'))
eventSummary.append(component.get('summary'))
print (component.get('description'))
eventDescription.append(component.get('description'))
startDate = component.get('dtstart')
print (startDate.dt.strftime('%m/%d/%Y %H:%M'))
eventDateStart.append(startDate.dt.strftime('%m/%d/%Y'))
eventTimeStart.append(startDate.dt.strftime('%H:%M'))
endDate = component.get('dtend')
print (endDate.dt.strftime('%m/%d/%Y %H:%M'))
eventdateEnd.append(endDate.dt.strftime('%m/%d/%Y'))
eventTimeEnd.append(endDate.dt.strftime('%H:%M'))
dateStamp = component.get('dtstamp')
print (dateStamp.dt.strftime('%m/%d/%Y %H:%M'))
print ('')
# Modify or change these values based on your CalDAV
# Converting to JSON
data = [{ 'Events Summary':eventSummary[0], 'Event Description':eventDescription[0],'Event Start date':eventDateStart[0], 'Event End date':eventdateEnd[0], 'At:':eventTimeStart[0], 'Until':eventTimeEnd[0]}]
data_string = json.dumps(data)
print ('JSON:', data_string)
pyOwnCloud could be the right thing for you. I haven't tried it, but it should provide a CMDline/API for reading the calendars.
You probably want to provide more details about how you are actually making use of the API but in case the query command is indeed not implemented, there is a list of other Python libraries at the CalConnect website (archvied version, original link is dead now).