Simulate a real UDP application and measure traffic load on OMNeT++ - c++

So I am experimenting with OMNeT++ and I have created a DataCenter with Fat Tree topology. Now I want to observe how a UDP application works in (as-real-as it gets) conditions. I have used the INET Framework and implemented the VideoStream Client-Server application.
So my question is that:
My network seems to work perfectly and I don't want to. I want to measure the datarate received by the client and compare it to the server's broadcast datarate. But even though I put much traffic into the network (several UDP and TCP applications) the datarate received is exactly the same as the datarate broadcasted and I am guessing in real conditions this traffic load is expected to vary in such highly dynamic enviroments. So how can I achieve the right conditions on OMNeT++ to simulate a real communication (maybe with packet losses and queuing time etc.) so I can measures these traffic loads?
There is the .ini file used:
[Config UDPStreamMultiple]
**.Pod[2].racks[1].servers[1].vms[2].numUdpApps = 1
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].typename = "UDPVideoStreamSvr"
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].localPort = 1000
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].sendInterval = 1s
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].packetLen = 20480B
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].videoSize = 512000B
**.Pod[3].racks[0].servers[0].vms[0].numUdpApps = 1
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].typename = "UDPVideoStreamSvr"
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].localPort = 1000
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].sendInterval = 1s
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].packetLen = 2048B
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].videoSize = 51200B
**.Pod[0].racks[0].servers[0].vms[0].numUdpApps = 1
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].serverAddress = "20.0.0.47"
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].serverPort = 1000
**.Pod[1].racks[0].servers[0].vms[1].numUdpApps = 1
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].serverPort = 1000
**.Pod[2].racks[0].servers[0].vms[1].numUdpApps = 1
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].serverPort = 1000
**.Pod[2].racks[1].servers[0].vms[1].numUdpApps = 1
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].serverPort = 1000
Thanks in advance.

So after a lot of research, thinking and tests I managed to achieve my desired goal. What I did is this:
Because the whole topology of the Datacenter is built that way so it will reduce the load that the routers have and balance the traffic, to take the necessary measures I created a smaller network. Just simple 3 nodes (StandardHost) from the INET Framework and a simple UDP Application that goes from node A to node B through midHost (e.g nodeA ---> midHost ---> nodeB). In the .ini file there should be a couple of commands like:
**.ppp[*].queueType = "DropTailQueue"
**.ppp[*].queue.frameCapacity = 50
**.ppp[*].numOutputHooks = 1
**.ppp[*].outputHook[*].typename = "ThruputMeter"
These commands are managing the links between the nodes and can be scaled according to someone needs (adjust the frame capacity maybe or the queue type). By making this small network you can easily customize it and take the metrics you need. I hope I can help anyone who wanted to do the same to get some feel about how to do it.

Related

rtsp-sink(RTSPStreamer) DirectShow.Net filter based RTSP server causes high latency in complete network and even slowdown the network speed by 90%

Ref: I have a RTSP media server that sends video via TCP streaming.I am using rtsp-sink(RTSPStreamer) DirectShow.Net filter based RTSP server which is developed in C++.Where as The Wrapper app is developed using C#.
The problem i am facing is the moment RTSP server start streaming,it affects the system level internet connection & drops internet connection speed by 90 percent.
I wanted to get your input on how it would be possible? (if at all).Because it impacts on system level internet connection not the very app level.
For Example:- My normal internet connection speed is 25 mbps. It suddenly drops to 2 mbps , whenever the RTSP streaming started in the app's server tab.
Sometimes even disables the internet connection in the system(Computer) where the app is running.
I'm asking you because I consider you an expert so please bear with me on this "maybe wild" question and thanks ahead.
...of all the things I've lost, I miss my mind the most.
Code Snippet of RTSPSender.CPP
//////////////////////////////////////////////////////
// CStreamingServer
//////////////////////////////////////////////////////
UsageEnvironment* CStreamingServer::s_pUsageEnvironment = NULL;
CHAR CStreamingServer::s_szDefaultBroadCastIP[] = "239.255.42.42";
//////////////////////////////////////////////////////
CStreamingServer::CStreamingServer(HANDLE hQuit)
: BasicTaskScheduler(10000)
, m_hQuit(hQuit)
, m_Streams(NAME("Streams"))
, m_bStarting(FALSE)
, m_bSessionReady(FALSE)
, m_pszURL(NULL)
, m_rtBufferingTime(UNITS * 2)
{
s_pUsageEnvironment = BasicUsageEnvironment::createNew(*this);
rtspServer = NULL;
strcpy_s(m_szAddress,"");
strcpy_s(m_szStreamName,"stream");
strcpy_s(m_szInfo,"media");
strcpy_s(m_szDescription,"Session streamed by \"RTSP Streamer DirectShow Filter\"");
m_nRTPPort = 6666;
m_nRTSPPort = 8554;
m_nTTL = 1;
m_bIsSSM = FALSE;
}
Edited:WireShark logs:
WireShark logs at the time of RTSP streaming started

Jetty's HTTP/2 client slow?

Doing simple GET requests over a high speed internet connection (within an AWS EC2 instance), the throughput seems to be really low. I've tried this with multiple servers.
Here's the code that I'm using:
HTTP2Client http2Client = new HTTP2Client();
http2Client.start();
SslContextFactory ssl = new SslContextFactory(true);
HttpClient client = new HttpClient(new HttpClientTransportOverHTTP2(http2Client), ssl);
client.start();
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
System.out.println("i" + i);
ContentResponse response = client.GET("https://http2.golang.org/");
System.out.println(response.getStatus());
}
System.out.println(System.currentTimeMillis() - start);
The throughput I get is about 8 requests per second.
This seems to be pretty low (as compared to curl on the command line).
Is there anything that I'm missing? Any way to turbo-charge this?
EDIT: How do I get Jetty to use multiple streams?
That is not the right way to measure throughput.
Depending where you are geographically, you will be dominated by the latency between your client and the server. If the latency is 125 ms, you can only make 8 requests/s.
For example, from my location, the ping to http2.golang.org is 136 ms.
Even if your latency is less, there are good chances that the server is throttling you: I don't think that http2.golang.org will be happy to see you making 10k requests in a tight loop.
I'll be curious to know what is the curl or nghttp latency in the same test, but I guess won't be much different (or probably worse if they close the connection after each request).
This test in the Jetty test suite, which is not a proper benchmark either, shows around 500 requests/s on my laptop; without TLS, goes to around 2500 requests/s.
I don't know exactly what you're trying to do, but your test does not tell you anything about the performance of Jetty's HttpClient.

Improving Akka Remote Throughput

We're thinking about using Akka for client server communications, and trying to benchmark data transfer. Currently we're trying to send a million messages where each message is a case class with 8 string fields.
At this point we're struggling to get acceptable performance. We see about 600KB/s transfer rate and idle CPUs on client and server, so something is going wrong. Maybe it's our netty configuration.
This is our akka config
Server {
akka {
extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"]
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
log-dead-letters = 10
log-dead-letters-during-shutdown = on
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "instance.myserver.com"
port = 2553
maximum-frame-size = 1000 MiB
send-buffer-size = 2000 MiB
receive-buffer-size = 2000 MiB
}
}
cluster {
seed-nodes = ["akka.tcp://server#instance.myserver.com:2553"]
roles = [master]
}
contrib.cluster.receptionist {
name = receptionist
role = "master"
number-of-contacts = 3
response-tunnel-receive-timeout = 300s
}
}
}
Client {
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 0
maximum-frame-size = 1000 MiB
send-buffer-size = 2000 MiB
receive-buffer-size = 2000 MiB
}
}
cluster-client {
initial-contacts = ["akka.tcp://server#instance.myserver.com:2553/user/receptionist"]
establishing-get-contacts-interval = "10s"
refresh-contacts-interval = "10s"
}
}
}
UPDATE:
In the end, notwithstanding the discussion about serialisation (see below) we just converted our payloads to use byte arrays, that way serialisation would not affect the tests. We found that on core i7 with 24gb ram using jeroMQ (ie zeroMQ re-implemented in java - so still not the fastest) we saw consistently about 200k msgs/sec or about 20 MB/sec whilst on raw akka (ie without zeroMQ plugin) we saw about 10k msgs/sec or just under 1MB/sec. Trying on akka + zeroMQ made the performance worse.
To make it easy to get started with remoting akka uses Java serialization for message serialization, this is not what you'd typically use in production because it's not very fast and does not handle message versioning well.
What you should do is use kryo or protobuf for serialization and you should be able to get much better numbers.
You can read about how here, there are also a few links to available serializers at the bottom of the page: http://doc.akka.io/docs/akka/current/scala/serialization.html
Ok here's what we discovered:
Using our client server model we were able to get about < 3000 msg/sec without doing anything, which was not really acceptable to us, but we were really confused with what was happening because we weren't able to max out the cpu.
Therefore we went back to the akka source code and found a benchmarking sample there:
sample.remote.benchmark.Receiver
sample.remote.benchmark.Sender
These are two classes that use akka remote to ping pong themselves a bunch of messages from two jvms on the same machine. Using this benchmark we were able to get around 10~15k msg/sec on corei7 with 24g, using about 50% cpu. We found that adjusting the dispatchers and threads allocated to netty made a difference, but only marginally. Using asks instead of tells, made it a bit slower, but not by much. Using a balancing pool of many actors to send and receive made it considerably slower.
For comparison we did a similar test using jeromq and we managed to get about 100k msg/sec using 100% CPU (same payload, same machine).
We also hacked the Sender and Receivers to use the Akka zeromq extension to pipe messages directly from one jvm to another bypassing akka remote all together. In this test we found that we could get fast send and receive to start with, approx 100k msg/sec, but that performance quickly degraded. On further inspection the zeromq threads and the akka actor switching do not play nice together. We probably could have tuned the performance somewhat by being smarter with how zeromq and akka interacted with each other, but at that point we decided it would be better to go with raw zeromq.
Conclusion: Don't use akka if you care about fast serialisation of lots of data over the wire.

Akka durable mailbox results in dead letters

First, I am not an Akka newbie (been using it for 2+ years). I have a high throughput (many millions of msgs/min/node) application that does heavy network I/O. An initial actor (backed by a RandomRouter) receives messages and distributes them to the appropriate child actor for processing:
private val distributionRouter = system.actorOf(Props(new DistributionActor)
.withDispatcher("distributor-dispatcher")
.withRouter(RandomRouter(distrib.distributionActors)), "distributionActor")
The application has been highly tuned and has excellent performance. I'd like to make it more fault tolerant by using a durable mailbox in front of DistributionActor. Here's the relevant config (only change is the addition of the file-based mailbox):
akka.actor.mailbox.file-based {
directory-path = "./.akka_mb"
max-items = 2147483647
# attempting to add an item after the queue reaches this size (in bytes) will fail.
max-size = 2147483647 bytes
# attempting to add an item larger than this size (in bytes) will fail.
max-item-size = 2147483647 bytes
# maximum expiration time for this queue (seconds).
max-age = 3600s
# maximum journal size before the journal should be rotated.
max-journal-size = 16 MiB
# maximum size of a queue before it drops into read-behind mode.
max-memory-size = 128 MiB
# maximum overflow (multiplier) of a journal file before we re-create it.
max-journal-overflow = 10
# absolute maximum size of a journal file until we rebuild it, no matter what.
max-journal-size-absolute = 9223372036854775807 bytes
# whether to drop older items (instead of newer) when the queue is full
discard-old-when-full = on
# whether to keep a journal file at all
keep-journal = on
# whether to sync the journal after each transaction
sync-journal = off
# circuit breaker configuration
circuit-breaker {
# maximum number of failures before opening breaker
max-failures = 3
# duration of time beyond which a call is assumed to be timed out and considered a failure
call-timeout = 3 seconds
# duration of time to wait until attempting to reset the breaker during which all calls fail-fast
reset-timeout = 30 seconds
}
}
distributor-dispatcher {
executor = "thread-pool-executor"
type = Dispatcher
thread-pool-executor {
core-pool-size-min = 20
core-pool-size-max = 20
max-pool-size-min = 20
}
throughput = 100
mailbox-type = akka.actor.mailbox.FileBasedMailboxType
}
As soon as I introduced this I noticed lots of dropped messages. When I profile it via the Typesafe Console, I see a bunch of dead letters (like 100k or so per 1M). My mailbox files are only 12MB per actor, so they're not even close to the limit. I also set up a dead letters listener to count the dead letters so I could run it outside the profiler (maybe an instrumentation issue, I thought?). Same results.
Any idea what may be causing the dead letters?
I am using Akka 2.0.4 on Scala 2.9.2.
Update:
I have noticed that the dead letters seem to have been bound for a couple of child actors owned by DistributionActor. I'm not understanding why changing the parent's mailbox has any effect on this, but this is definitely the behavior.

pgpool2 load balancing not working

I have a Python/Django app that will require database load balancing at some point in the near future. In the meantime I'm trying to learn to implement pgpool on a local virtual machine setup.
I have 4 Ubuntu 12.04 VMs:
192.168.1.80 <- pool, pgppool2 installed and accessible
192.168.1.81 <- db1 master
192.168.1.82 <- db2 slave
192.168.1.83 <- db3 slave
I have pgpool-II version 3.1.1 and my database servers are running
PostgreSQL 9.1.
I have my app's db connection pointed to 192.168.1.80:9999 and it works fine.
The problem is when I use Apache ab to throw some load at it, none of
SELECT queries appear to be balanced. All the load goes to my db1
master. Also, quite concerning is the load on the pool server itself,
it is really high compared to db1, maybe an average of 8-10 times
higher. Meanwhile my db2 and db3 servers have a load of nearly zero,
they appear to only be replicating from db1, which isn't very load
intensive for my tests with ab.
ab -n 300 -c 4 -C 'sessionid=80a5fd3b6bb59051515e734326735f80' http://192.168.1.17:8000/contacts/
That drives the load on my pool server up to about 2.3. Load on db1
is about 0.4 and load on db2 and db3 is nearly zero.
Can someone take a look at my config and see if what I'm doing wrong?
backend_hostname0 = '192.168.1.81'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.1/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.1.82'
backend_port1 = 5433
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.1/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '192.168.1.83'
backend_port2 = 5434
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/9.1/main'
backend_flag2 = 'ALLOW_TO_FAILOVER'
load_balance_mode = on
My entire config is here:
http://pastebin.com/raw.php?i=wzBc0aSp
I needed
replication_mode = off
master_slave_mode = on
Thanks to Tatsuo Ishii:
http://www.pgpool.net/pipermail/pgpool-general/2013-January/001309.html