gatling stress testing on AWS, threads staying in active - amazon-web-services

I have set up two AMI(redhat) instances, one with an ActiveMQ 5.11 broker, the other with some stress testing tools (gatling w/ Mqtt, mqtt-stres, etc..). I stopped iptables service (firewall) on both instances. AWS security groups allow port 1883 for mqtt messaging.
Running mqtt-stress.py succeeds, it makes the round trip, and I get a message back from the broker. When I spin up a Gatling simulation, all the threads move to the 'Active' state and never make it to 'Done'. I even setup a broker on the same machine as Gatlin and pointed the Simulation to 'localhost', same result. Nothing appears in the simulation.log. tcpdump on the original server was showing that no incoming connections were being made. Here is the simulation:
import io.gatling.core.Predef._
import org.fusesource.mqtt.client.QoS
import scala.concurrent.duration._
import com.github.mnogu.gatling.mqtt.Predef._
class BasicMqttSimulation extends Simulation {
val mqttConf = mqtt
// MQTT broker
//.host("tcp://127.0.0.1:1883")
.host("tcp://175.10.130.10:1883")
val scn = scenario("MQTT Test")
.exec(mqtt("request")
// topic: "foo"
// payload: "Hello"
// QoS: AT_LEAST_ONCE
// retain: false
.publish("foo", "Hello", QoS.AT_LEAST_ONCE, retain = false))
setUp(
scn
.inject(constantUsersPerSec(10) during(10 seconds)))
.protocols(mqttConf)
}
Any tips on how to trouble shoot this further?

Rebuilt the instances and tested step by step,
sudo setenforce 0 # did not survive a reboot
Need to add rules for SElinux.

Related

Kafka Multi broker setup with ec2 machine: Timed out waiting for a node assignment. Call: createTopics

I am trying to setup kafka with 3 broker nodes and 1 zookeeper node in AWS EC2 instances. I have following server.properties for every broker:
kafka-1:
broker.id=0
listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
advertised.listeners=PLAINTEXT_1://ec2-**-***-**-17.eu-central-1.compute.amazonaws.com:9092
listener.security.protocol.map=,PLAINTEXT_1:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_1
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-2:
broker.id=1
listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
advertised.listeners=PLAINTEXT_2://ec2-**-***-**-43.eu-central-1.compute.amazonaws.com:9093
listener.security.protocol.map=,PLAINTEXT_2:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_2
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
kafka-3:
broker.id=2
listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
advertised.listeners=PLAINTEXT_3://ec2-**-***-**-27.eu-central-1.compute.amazonaws.com:9094
listener.security.protocol.map=,PLAINTEXT_3:PLAINTEXT
inter.broker.listener.name=PLAINTEXT_3
zookeeper.connect=ec2-**-***-**-105.eu-central-1.compute.amazonaws.com:2181
zookeeper:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
When I ran following command in zookeeper I see that they are connected
I also telnetted from any broker to other ones with broker port they are all connected
However, when I try to create topic with 2 replication factor I get Timed out waiting for a node assignment
I cannot understand what is incorrect with my setup, I see 3 nodes running in zookeeper, but having problems when creating topic. BTW, when I make replication factor 1 I get the same error. How can I make sure that everything is alright with my cluster?
It's good that telnet checks the port is open, but it doesn't verify the Kafka protocol works. You could use kcat utility for that, but the fix includes
listeners are set to either PLAINTEXT://:9092 or PLAINTEXT://0.0.0.0:9092 for every broker, which means using the same port
Removing the number from the listener mapping and advertised listeners property such that each broker is the same
I'd also recommend looking at using Ansible/Terraform/Cloudformation to ensure you consistently modify the cluster rather than edit individual settings manually

Greengrass_HelloWorld lambda doesn't publish to Amazon IoT console

I have been following the documentation in every step, and I didn't face any errors. Configured, deployed and made a subscription to hello/world topic just as the documentation detailed. However, when I arrived at the testing step here: https://docs.aws.amazon.com/greengrass/latest/developerguide/lambda-check.html
No messages were showing up on the IoT console (subscription view hello/world)! I am using Greengrass core daemon which runs on my Ubuntu machine, it is active and listens to port 8000. I don't think there is anything wrong with my local device because the group was deployed successfully and because I see the communications going both ways on Wireshark.
I have these logs on my machine: /home/##/Desktop/greengrass/ggc/var/log/system/runtime.log:
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Version: 1.9.3-RC3
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Root: /home/##/Desktop/greengrass
[2019-09-28T06:57:42.492-07:00][INFO]-Greengrass Write Directory: /home/##/Desktop/greengrass/ggc
[2019-09-28T06:57:42.492-07:00][INFO]-Group File Directory: /home/##/Desktop/greengrass/ggc/deployment/group
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda UID: 122
[2019-09-28T06:57:42.492-07:00][INFO]-Default Lambda GID: 127
[2019-09-28T06:57:42.492-07:00][INFO]-===========================================
[2019-09-28T06:57:42.492-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.492-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.493-07:00][INFO]-Will persist worker process info. {"dir": "/home/##/Desktop/greengrass/ggc/ggc/core/var/worker/processes"}
[2019-09-28T06:57:42.494-07:00][INFO]-No proxy URL found.
[2019-09-28T06:57:42.495-07:00][INFO]-Started Deployment Agent to listen for updates. [2019-09-28T06:57:42.495-07:00][INFO]-Connecting with MQTT. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.497-07:00][INFO]-The current core is using the AWS IoT certificates with fingerprint. {"fingerprint": "90##4d"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection successful. {"attemptId": "GVko", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection established. {"endpoint": "a6##ws-ats.iot.us-east-2.amazonaws.com:8883", "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-MQTT connection connected. Start subscribing. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Deployment agent connected to cloud.
[2019-09-28T06:57:42.685-07:00][INFO]-Start subscribing. {"numOfTopics": 2, "clientId": "simulators_gg_Core"}
[2019-09-28T06:57:42.685-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/update/delta
[2019-09-28T06:57:42.727-07:00][INFO]-Trying to subscribe to topic $aws/things/simulators_gg_Core-gda/shadow/get/accepted
[2019-09-28T06:57:42.814-07:00][INFO]-All topics subscribed. {"clientId": "simulators_gg_Core"}
[2019-09-28T06:58:57.888-07:00][INFO]-Daemon received signal: terminated. [2019-09-28T06:58:57.888-07:00][INFO]-Shutting down daemon.
[2019-09-28T06:58:57.888-07:00][INFO]-Stopping all workers.
[2019-09-28T06:58:57.888-07:00][INFO]-Lifecycle manager is stopped.
[2019-09-28T06:58:57.888-07:00][INFO]-IPC server stopped.
/home/##/Desktop/greengrass/ggc/var/log/system/localwatch/localwatch.log:
[2019-09-28T06:57:42.491-07:00][DEBUG]-will keep the log files for the following lambdas {"readingPath": "/home/##/Desktop/greengrass/ggc/var/log/user", "lambdas": "map[]"}
[2019-09-28T06:57:42.492-07:00][WARN]-failed to list the user log directory {"path": "/home/##/Desktop/greengrass/ggc/var/log/user"}
Thanks in advance.
I had a similar issue on another platform (Jetson Nano). I could not get a response after going through the AWS instructions for setting up a simple Lambda using IOT Greengrass. In my search for answers I discovered that AWS has a qualification test script for any device you connect.
It goes through an automated process of deploying and testing a lambda function(as well as other functionality) and reports results for each step and docs provide troubleshooting info for failures.
By going through those tests I was able to narrow down the issues with my setup, installation, and configuration. The testing docs give pointers to troubleshoot test results. Here is a link to the test: https://docs.aws.amazon.com/greengrass/latest/developerguide/device-tester-for-greengrass-ug.html
If you follow the 'Next Topic' links, it will take you through the complete test. Let me warn you that its extensive, and will take some time, but for me it gave a lot of detailed insight that a hello world does not.

How do I poll a web service from a GAE service in short intervals?

I'm developing a client app that relies on a GAE service. This service needs to get updates by polling a remote web service on a less than 1 minute interval so cron jobs are probably not the way to go here.
From the GAE service I need to poll the web service in intervals of a couple of seconds and then update the client app. So to break it down:
GAE service polls the remote web service in 5 sec intervals.
If a change is made, update the client app instantly.
Step 2 is solved already, but I'm struggling to find a good way on a polling of this sort. I have no control over the remote web service so I can't make any changes on that end.
I've looked at the Task queue API but the documentation specifically says that it is unsuitable for interactive applications where a user is waiting for the result
How would be the best way to solve this issue?
Use cron to schedule a bunch of taskqueue tasks with staggered etas
def cron_job(): # scheduled to run every 5 minutes
for i in xrange(0, 60*5, 5):
deferred.defer(poll_web_service, _countdown=i)
def poll_web_service():
# do stuff
Alternatively, with this level of frequency, you might as well have a dedicated instance on this. You can do this with manual-scaling microservice and you can have the request handler for /_ah/start/ never return, which will let it run forever (besides having periodic restarts). See this: https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed#instance_scaling
def on_change_detected(params):
queue = taskqueue.Queue('default')
task = taskqueue.Task(
url='/some-url-on-your-default-service/',
countdown=0,
target='default',
params={'params': params})
queue.add(task)
class Start(webapp2.RequestHandler):
def get(self):
while True:
time.sleep(5)
if change_detected: # YOUR LOGIC TO DETECT A CHANGE GOES HERE
on_change_detected()
_routes = [
RedirectRoute('/_ah/start', Start, name='start'),
]
for r in _routes:
app.router.add(r)

Akka: How to reconnect to restarted slave?

I have two docker containers running localy, one is master, the second is slave, communicating over akka remote. Slave can go OOM from time to time for certain messages, in which case docker gracefully restarts it..
The code looks a little bit like this:
object Master {
def main() {
...
val slave =
typedActorOf(TypedProps[Slave], resolveRemoteAtor(..))
val dispatcher =
typedActorOf(TypedProps(classOf[Dispatcher], new DispatcherImpl(slave)))
val httpServer =
typedActorOf(TypedProps(classOf[HTTPServer], new HTTPServerImpl(dispatcher)))
}
}
class Slave() { def compute() = ... }
class Dispatcher(s: Slave) { def compute() = s.compute() }
The problem is, that the master shutdowns the connection with the slave, once it becomes unavailable due to OOM, and it never renews it:
[ERROR] from a.r.EndpointWriter - AssociationError akka.tcp://MasterSystem#localhost:0] -> [akka.tcp://SlaveSystem#localhost:1]: Error [Shut down address: akka.tcp://SlaveSystem#localhost:1] [akka.remote.ShutDownAssociation: Shut down address: akka.tcp://SlaveSystem#localhost:1 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down. ]
[INFO] from a.r.RemoteActorRef - Message [akka.actor.TypedActor$MethodCall] from Actor[akka://MasterSystem/temp/$c] to Actor[akka.tcp://SlaveSystem#localhost:1/user/Slave#1817887555] was not delivered. [1] dead letters encountered.
So my question is, how can I force the master to reconnect with the slave once the slave restarts and send all the pending messages, that were not possible to deliver during the time it was down?
I'd recommend using Akka Cluster over remoting directly, for this and in general as well, cluster will allow you to listen for membership events so that you can react on a node leaving and reappearing.
Making guarantees around delivery of messages requires some extra thought though. This section of the docs is good to read for better understanding the issues around it.

boto.sqs connect to non-aws endpoint

I'm currently in need of connecting to a fake_sqs server for dev purposes but I can't find an easy way to specify endpoint to the boto.sqs connection. Currently in java and node.js there are ways to specify the queue endpoint and by passing something like 'localhst:someport' I can connect to my own sqs-like instance. I've tried the following with boto:
fake_region = regioninfo.SQSRegionInfo(name=name, endpoint=endpoint)
conn = fake_region.connect(aws_access_key_id="TEST", aws_secret_access_key="TEST", port=9324, is_secure=False);
and then:
queue = connAmazon.get_queue('some_queue')
but it fails to retrieve the queue object,it returns None. Has anyone achieved to connect to an own sqs instance ?
Here's how to create an SQS connection that connects to fake_sqs:
region = boto.sqs.regioninfo.SQSRegionInfo(
connection=None,
name='fake_sqs',
endpoint='localhost', # or wherever fake_sqs is running
connection_cls=boto.sqs.connection.SQSConnection,
)
conn = boto.sqs.connection.SQSConnection(
aws_access_key_id='fake_key',
aws_secret_access_key='fake_secret',
is_secure=False,
port=4568, # or wherever fake_sqs is running
region=region,
)
region.connection = conn
# you can now work with conn
# conn.create_queue('test_queue')
Be aware that, at the time of this writing, the fake_sqs library does not respond correctly to GET requests, which is how boto makes many of its requests. You can install a fork that has patched this functionality here: https://github.com/adammck/fake_sqs