Access Kafka Cluster Outside GCP - google-cloud-platform

I'm currently trying to access the kafka cluster(bitnami) from my local machine, however the problem is that even after exposing the required host and ports in server.properties and adding firewall rules to allow 9092 port it just doesn't connect.
I'm running 2 broker and 1 zookeeper configuration.
Expected Output: Producer.bootstrap_connected() should return True.
Actual Output: False
server.properties
listeners=SASL_PLAINTEXT://:9092
advertised.listeners=SASL_PLAINTEXT://gcp-cluster-name:9092
sasl.mechanism.inter.broker.protocol=PLAIN`
sasl.enabled.mechanisms=PLAIN
security.inter.broker.protocol=SASL_PLAINTEXT
Consumer.py
from kafka import KafkaConsumer
import json
sasl_mechanism = 'PLAIN'
security_protocol = 'SASL_PLAINTEXT'
# Create a new context using system defaults, disable all but TLS1.2
context = ssl.create_default_context()
context.options &= ssl.OP_NO_TLSv1
context.options &= ssl.OP_NO_TLSv1_1
consumer = KafkaConsumer('organic-sense',
bootstrap_servers='<server-ip>:9092',
value_deserializer=lambda x: json.loads(x.decode('utf-8')),
ssl_context=context,
sasl_plain_username='user',
sasl_plain_password='<password>',
sasl_mechanism=sasl_mechanism,
security_protocol = security_protocol,
)
print(consumer.bootstrap_connected())
for data in consumer:
print(data)

Related

AWS Glue - Kafka Connection using SASL/SCRAM

I am trying to create an AWS Glue Streaming job that reads from Kafka (MSK) clusters using SASL/SCRAM client authentication for the connection, per
https://aws.amazon.com/about-aws/whats-new/2022/05/aws-glue-supports-sasl-authentication-apache-kafka/
The connection configuration has the following properties (plus adequate subnet and security groups):
"ConnectionProperties": {
"KAFKA_SASL_SCRAM_PASSWORD": "apassword",
"KAFKA_BOOTSTRAP_SERVERS": "theserver:9096",
"KAFKA_SASL_MECHANISM": "SCRAM-SHA-512",
"KAFKA_SASL_SCRAM_USERNAME": "auser",
"KAFKA_SSL_ENABLED": "false"
}
And the actual api method call is
df = glue_context.create_data_frame.from_options(
connection_type="kafka",
connection_options={
"connectionName": "kafka-glue-connector",
"security.protocol": "SASL_SSL",
"classification": "json",
"startingOffsets": "latest",
"topicName": "atopic",
"inferSchema": "true",
"typeOfData": "kafka",
"numRetries": 1,
}
)
When running logs show the client is attempting to connect to brokers using Kerberos, and runs into
22/10/19 18:45:54 INFO ConsumerConfig: ConsumerConfig values:
sasl.mechanism = GSSAPI
security.protocol = SASL_SSL
security.providers = null
send.buffer.bytes = 131072
...
org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
Caused by: org.apache.kafka.common.KafkaException: Principal could not be determined from Subject, this may be a transient failure due to Kerberos re-login
How can I authenticate the AWS Glue job using SASL/SCRAM? What properties do I need to set in the connection and in the method call?
Thank you

Django set a connection proxy

I have long time seeking for a solution to set a proxy for my Django application.
1st I am using Django==2.0 and I run it in Windows Server 2016 in a local network that uses a Proxy to connect 10.37.235.99 and Port 80.
and I'm deploying the application using nginx-1.20.1
I have to scrape a data as
http_proxy = "10.37.235.99:80"
https_proxy = "10.37.235.99:80"
ftp_proxy = "10.37.235.99:80"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
import socket
if socket.gethostname() == "localhost":
os.environ["PROXIES"] = proxyDict
else:
os.environ["PROXIES"] = {}
URL='my_site.com'
page = requests.get(URL)
print(page)
I tried many solutions on the internet but no way!
Working with django : Proxy setup
when I remove the proxy configuration and I use Psiphon3(with proxy) everything works perfectly.
is there any solution?

Django channels redis channel layer opens a lot of connections

We ported a part of our application to django channels recently, using the redis channel layer backend. A part of our setup still runs on python2 in a docker which is why we use redis pub/sub to send messages back to the client. A global listener (inspired by this thread) catches all messages and distributes them into the django channels system. It all works fine so far but I see a lot of debug messages Creating tcp connection... passing by. The output posted below corresponds to one event. Both the listener as well as the consumer seem to be creating two redis connections. I have not enough knowledge about the underlying mechanism to be able to tell if this the expected behavior, thus me asking here. Is this to be expected?
The Listener uses a global channel layer instance:
# maps the publish type to a method name of the django channel consumer
PUBLISH_TYPE_EVENT_MAP = {
'state_change': 'update_client_state',
'message': 'notify_client',
}
channel_layer = layers.get_channel_layer()
class Command(BaseCommand):
help = u'Opens a connection to Redis and listens for messages, ' \
u'and then whenever it gets one, sends the message onto a channel ' \
u'in the Django channel system'
...
def broadcast_message(self, msg_body):
group_name = msg_body['subgrid_id'].replace(':', '_')
try:
event_name = PUBLISH_TYPE_EVENT_MAP[msg_body['publish_type']]
# backwards compatibility
except KeyError:
event_name = PUBLISH_TYPE_EVENT_MAP[msg_body['type']]
async_to_sync(channel_layer.group_send)(
group_name, {
"type": event_name,
"kwargs": msg_body.get('kwargs'),
})
The consumer is a JsonWebsocketConsumer that is initialized like this
class SimulationConsumer(JsonWebsocketConsumer):
def connect(self):
"""
Establishes the connection with the websocket.
"""
logger.debug('Incoming connection...')
# subgrid_id can be set dynamically, see property subgrid_id
self._subgrid_id = self.scope['url_route']['kwargs']['subgrid_id']
async_to_sync(self.channel_layer.group_add)(
self.group_name,
self.channel_name
)
self.accept()
And the method that is called from the listener:
def update_client_state(self, event):
"""
Public facing method that pushes the state of a simulation back
to the client(s). Has to be called through django channels
```async_to_sync(channel_layer.group_send)...``` method
"""
logger.debug('update_client_state event %s', event)
current_state = self.redis_controller.fetch_state()
data = {'sim_state': {
'sender_sessid': self.session_id,
'state_data': current_state}
}
self.send_json(content=data)
A single event gives me this output
listener_1 | DEBUG !! data {'publish_type': 'state_change', 'subgrid_id': 'subgrid:6d1624b07e1346d5907bbd72869c00e8'}
listener_1 | DEBUG !! event_name update_client_state
listener_1 | DEBUG !! kwargs None
listener_1 | DEBUG !! group_name subgrid_6d1624b07e1346d5907bbd72869c00e8
listener_1 | DEBUG Using selector: EpollSelector
listener_1 | DEBUG Parsing Redis URI 'redis://:#redis-threedi-server:6379/13'
listener_1 | DEBUG Creating tcp connection to ('redis-threedi-server', 6379)
listener_1 | DEBUG Parsing Redis URI 'redis://:#redis-threedi-server:6379/13'
listener_1 | DEBUG Creating tcp connection to ('redis-threedi-server', 6379)
threedi-server_1 | DEBUG Parsing Redis URI 'redis://:#redis-threedi-server:6379/13'
threedi-server_1 | DEBUG Creating tcp connection to ('redis-threedi-server', 6379)
threedi-server_1 | DEBUG update_client_state event {'type': 'update_client_state', 'kwargs': None}
threedi-server_1 | DEBUG Parsing Redis URI 'redis://:#redis-threedi-server:6379/13'
threedi-server_1 | DEBUG Creating tcp connection to ('redis-threedi-server', 6379)

Email on failure using AWS SES in Apache Airflow DAG

I am trying to have Airflow email me using AWS SES whenever a task in my DAG fails to run or retries to run. I am using my AWS SES credentials rather than my general AWS credentials too.
My current airflow.cfg
[email]
email_backend = airflow.utils.email.send_email_smtp
[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an
# smtp server here
smtp_host = emailsmtpserver.region.amazonaws.com
smtp_starttls = True
smtp_ssl = False
# Uncomment and set the user/pass settings if you want to use SMTP AUTH
smtp_user = REMOVEDAWSACCESSKEY
smtp_password = REMOVEDAWSSECRETACCESSKEY
smtp_port = 25
smtp_mail_from = myemail#myjob.com
Current task in my DAG that is designed to intentionally fail and retry:
testfaildag_library_install_jar_jdbc = PythonOperator(
task_id='library_install_jar',
retries=3,
retry_delay=timedelta(seconds=15),
python_callable=add_library_to_cluster,
params={'_task_id': 'cluster_create', '_cluster_name': CLUSTER_NAME, '_library_path':s3000://fakepath.jar},
dag=dag,
email_on_failure=True,
email_on_retry=True,
email=’myname#myjob.com’,
provide_context=True
)
Everything works as designed as the task retries the set number of times and ultimately fails, except no emails are being sent. I have checked the logs in the task mentioned above too, and smtp is never mentioned.
I've looked at the similar question here, but the only solution there did not work for me. Additionally, Airflow's documentation such as their example here does not seem to work for me either.
Does SES work with Airflow's email_on_failure and email_on_retry functions?
What I am currently thinking of doing is using the on_failure_callback function to call a python script provided by AWS here to send an email on failure, but that is not the preferable route at this point.
Thank you, appreciate any help.
--updated 6/8 with working SES
here's my write up on how we got it all working. There is a small summary at the bottom of this answer.
Couple of big points:
We decided not to use Amazon SES, and rather use sendmail We now have SES up and working.
It is the airflow worker that services the email_on_failure and email_on_retry features. You can do journalctl –u airflow-worker –f to monitor it during a Dag run. On your production server, you do NOT need to restart your airflow-worker after changing your airflow.cfg with new smtp settings - it should be automatically picked up. No need to worry about messing up currently running Dags.
Here is the technical write-up on how to use sendmail:
Since we changed from ses to sendmail on localhost, we had to change our smtp settings in the airflow.cfg.
The new config is:
[email]
email_backend = airflow.utils.email.send_email_smtp
[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an
# smtp server here
smtp_host = localhost
smtp_starttls = False
smtp_ssl = False
# Uncomment and set the user/pass settings if you want to use SMTP AUTH
#smtp_user = not used
#smtp_password = not used
smtp_port = 25
smtp_mail_from = myjob#mywork.com
This works in both production and local airflow instances.
Some common errors one might receive if their config is not like mine above:
socket.error: [Errno 111] Connection refused -- you must change your smtp_host line in airflow.cfg to localhost
smtplib.SMTPException: STARTTLS extension not supported by server. -- you must change your smtp_starttls in airflow.cfg to False
In my local testing, I tried to simply force airflow to show a log of what was going on when it tried to send an email – I created a fake dag as follows:
# Airflow imports
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
# General imports
from datetime import datetime,timedelta
def throwerror():
raise ValueError("Failure")
SPARK_V_2_2_1 = '3.5.x-scala2.11'
args = {
'owner': ‘me’,
'email': ['me#myjob'],
'depends_on_past': False,
'start_date': datetime(2018, 5,24),
'end_date':datetime(2018,6,28)
}
dag = DAG(
dag_id='testemaildag',
default_args=args,
catchup=False,
schedule_interval="* 18 * * *"
)
t1 = DummyOperator(
task_id='extract_data',
dag=dag
)
t2 = PythonOperator(
task_id='fail_task',
dag=dag,
python_callable=throwerror
)
t2.set_upstream(t1)
If you do the journalctl -u airflow-worker -f, you can see that the worker says that it has sent an alert email on the failure to the email in your DAG, but we were still not receiving the email. We then decided to look into the mail logs of sendmail by doing cat /var/log/maillog. We saw a log like this:
Jun 5 14:10:25 production-server-ip-range postfix/smtpd[port]: connect from localhost[127.0.0.1]
Jun 5 14:10:25 production-server-ip-range postfix/smtpd[port]: ID: client=localhost[127.0.0.1]
Jun 5 14:10:25 production-server-ip-range postfix/cleanup[port]: ID: message-id=<randomMessageID#production-server-ip-range-ec2-instance>
Jun 5 14:10:25 production-server-ip-range postfix/smtpd[port]: disconnect from localhost[127.0.0.1]
Jun 5 14:10:25 production-server-ip-range postfix/qmgr[port]: MESSAGEID: from=<myjob#mycompany.com>, size=1297, nrcpt=1 (queue active)
Jun 5 14:10:55 production-server-ip-range postfix/smtp[port]: connect to aspmx.l.google.com[smtp-ip-range]:25: Connection timed out
Jun 5 14:11:25 production-server-ip-range postfix/smtp[port]: connect to alt1.aspmx.l.google.com[smtp-ip-range]:25: Connection timed out
So this is probably the biggest "Oh duh" moment. Here we are able to see what is actually going on in our smtp service. We used telnet to confirm that we were not able to connect to the targeted IP ranges from gmail.
We determined that the email was attempting to be sent, but that the sendmail service was unable to connect to the ip ranges successfully.
We decided to allow all outbound traffic on port 25 in AWS (as our airflow production environment is an ec2 instance), and it now works successfully. We are now able to receive emails on failures and retries (tip: email_on_failure and email_on_retry are defaulted as True in your DAG API Reference - you do not need to put it into your args if you do not want to, but it is still good practice to explicitly state True or False in it).
SES now works. Here is the airflow config:
[email]
email_backend = airflow.utils.email.send_email_smtp
[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an
# smtp server here
smtp_host = emailsmtpserver.region.amazonaws.com
smtp_starttls = True
smtp_ssl = False
# Uncomment and set the user/pass settings if you want to use SMTP AUTH
smtp_user = REMOVEDAWSACCESSKEY
smtp_password = REMOVEDAWSSECRETACCESSKEY
smtp_port = 587
smtp_mail_from = myemail#myjob.com (Verified SES email)
Thanks!
Similar case here, I tried to follow the same debugging process but got no log output. Also, the outbound rule for my airflow ec2 instance is open to all ports and ips, so it should be some other causes.
I noticed that when you create the SMTP credential from SES, it will also create an IAM user. I am not sure how is airflow running in your case (bare metal on ec2 instance or wrapped in containers), and how that user access is set up.

APNS issue with django

I'm using the following project for enabling APNS in my project:
https://github.com/stephenmuss/django-ios-notifications
I'm able to send and receive push notifications on my production app fine, but the sandbox apns is having strange issues which i'm not able to solve. It's constantly not connecting to the push service. When I do manually the _connect() on the APNService or FeedbackService classes, I get the following error:
File "/Users/MyUser/git/prod/django/ios_notifications/models.py", line 56, in _connect
self.connection.do_handshake()
Error: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert handshake failure')]
I tried recreating the APN certificate a number of times and constantly get the same error. Is there anything else i'm missing?
I'm using the endpoints gateway.push.apple.com and gateway.sandbox.push.apple.com for connecting to the service. Is there anything else I should look into for this? I have read the following:
Apns php error "Failed to connect to APNS: 110 Connection timed out."
Converting PKCS#12 certificate into PEM using OpenSSL
Error Using PHP for iPhone APNS
Turns out Apple changed ssl context from SSL3 to TLSv1 in development. They will do this in Production eventually (not sure when). The following link shows my pull request which was accepted into the above project:
https://github.com/stephenmuss/django-ios-notifications/commit/879d589c032b935ab2921b099fd3286440bc174e
Basically, use OpenSSL.SSL.TLSv1_METHOD if you're using python or something similar in other languages.
Although OpenSSL.SSL.SSLv3_METHOD works in production, it may not work in the near future. OpenSSL.SSL.TLSv1_METHOD works in production and development.
UPDATE
Apple will remove SSL 3.0 support in production on October 29th, 2014 due to the poodle flaw.
https://developer.apple.com/news/?id=10222014a
I have worked on APN using python-django, for this you need three things URL, PORT and Certificate provided by Apple for authentication.
views.py
import socket, ssl, json, struct
theCertfile = '/tmp/abc.cert' ## absolute path where certificate file is placed.
ios_url = 'gateway.push.apple.com'
ios_port = 2195
deviceToken = '3234t54tgwg34g' ## ios device token to which you want to send notification
def ios_push(msg, theCertfile, ios_url, ios_port, deviceToken):
thePayLoad = {
'aps': {
'alert':msg,
'sound':'default',
'badge':0,
},
}
theHost = ( ios_url, ios_port )
data = json.dumps( thePayLoad )
deviceToken = deviceToken.replace(' ','')
byteToken = deviceToken.decode('hex') # Python 2
theFormat = '!BH32sH%ds' % len(data)
theNotification = struct.pack( theFormat, 0, 32, byteToken, len(data), data )
# Create our connection using the certfile saved locally
ssl_sock = ssl.wrap_socket( socket.socket( socket.AF_INET, socket.SOCK_STREAM ), certfile = theCertfile )
ssl_sock.connect( theHost )
# Write out our data
ssl_sock.write( theNotification )
# Close the connection -- apple would prefer that we keep
# a connection open and push data as needed.
ssl_sock.close()
Hopefully this would work for you.