using AWS Elastic search with VPC endpoint django haystack - django

I want to use AWS Elastic-search service with my django application which is running on EC2 instance.
For that I use the settings -
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch5_backend.Elasticsearch5SearchEngine',
'URL': 'https://vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com:9200/',
'INDEX_NAME': 'haystack',
'INCLUDE_SPELLING':True,
},
}
I am not even able to set the connection. Here I am getting this error -
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError((, 'Connection to vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com timed out. (connect timeout=10)')) caused by: ConnectTimeoutError((, 'Connection to vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com timed out. (connect timeout=10)'))
I have updated the access policy to allow the user for edit and list, also in security group add the port 9200 TCP rule. How to connect ec2 with elastic search using VPC.

It is working on 443 port, use
'URL': 'https://vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com:443/',
and in security groups add 443 open port.

Related

Connecting to AWS Elasticsearch within VPC from Lambda over HTTPS

I have a Lambda written in Python which writes some data to Elasticsearch hosted on AWS. The ES service is within a VPC, so I'm trying to use the internal DNS of the ES to connect to it. This is my code:
es_client = Elasticsearch(
hosts=[{'host': es_host, 'port': 443}],
http_auth=aws_auth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
However, I get this exception:
ssl.CertificateError: hostname 'x.y.internal' doesn't match '*.us-west-2.es.amazonaws.com
I don't wan't to use the public hostname because it is going to keep changing. How do I connect to the ES service using it's internal DNS?
====== UPDATE =======
I'm able to connect to the ES domain using HTTP with the below code:
es_client = Elasticsearch(
hosts=[{'host': es_host, 'port': 80}]
)
But how do I connect over HTTPS?
Got into similar issue using AWS.HttpClient. This happens when you connect to generated VPC endpoint of ES over https. You have to disable cert verification:
es_client = Elasticsearch(
hosts=[{'host': es_host, 'port': 443}],
http_auth=aws_auth,
use_ssl=True,
verify_certs=False,
connection_class=RequestsHttpConnection
)
In case you are using AWS.HttpClient like me you have to disable it like this:
const AWS = require('aws-sdk');
const https = require('https');
AWS.NodeHttpClient.sslAgent = new https.Agent({ rejectUnauthorized: false });
const httpClient = new AWS.HttpClient();
You need to use the host ending in .us-west-2.es.amazonaws.com as that is the domain in the SSL certificate that Elasticsearch is sending. If the hostname for the internal DNS is different then that connection will not work as the certificates don't match.

AWS Kubernetes RDS connection

I'm having some trouble with my AWS Kubernetes instance.
I'm trying to get my django instances to connect to the RDS service via the DB endpoint.
DATABASES = {
'default': {
'ENGINE': 'django.contrib.gis.db.backends.postgis',
'NAME': os.environ['NAME'],
'USER': os.environ['USER'],
'PASSWORD': os.environ['PASSWORD'],
'HOST': os.environ['HOST'],
'PORT': os.environ['PORT']
}
}
The host string would resemble this service.key.region.rds.amazonaws.com and is being passed to the container via env in the deploy.yml
containers:
- name: service
env:
- name: HOST
value: service.key.region.rds.amazonaws.com
This set up works locally in kubernetes but not when I put it in the cluster I have on AWS. It returns the following error instead:
django.db.utils.OperationalError: could not translate host name
Any suggestions or am I missing something in how AWS likes handling things?
Assuming your AWS deployment is now in the same VPC as your RDS, then you will need to change your host to use the private IP.

Django on EC2 security group fo postgresql on RDS

edited the DATABASES entry in my settings.py to be:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'Limbo',
'USER': '<username>',
'PASSWORD': '<password>',
'HOST': '<dbname>.<gibberish>.us-west-2.rds.amazonaws.com',
'PORT': '5432',
}
}
now when I .manage.py runserver 0.0.0.0:800, it says:
Performing system checks...
System check identified no issues (0 silenced).
then after a minute or two:
File
"/home/ec2-user///local/lib64/python2.7/site-packages/psycopg2/init.py", line 164, in connect
conn = _connect(dsn, connection_factory=connection_factory, async=async) django.db.utils.OperationalError: could not connect to
server: Connection timed out
Is the server running on host "..us-west-2.rds.amazonaws.com" (172.rest.of.ip) and
accepting
TCP/IP connections on port 5432?
I have made sure to include the access via 5432 incoming from my ec2's IP (verified with curl ifconfig.co) and the ip listed in the error message (starting with 172 above). perhaps I need to use a larger subnet (than 32)in the 172 source?
EDIT: same error when I run python limbo/manage.py migrate
EDIT2: if I allow 5432 connections from any IP in the security group it works, but as stated above, I am allowing my EC2's IP (according to curl ifconfig.co's return value. what other IP should I be including?
enter the following into the linux command line:
cd /path/to/djangoproj
screen
source <env>/bin/activate
then you should see the command line tool reads something like:
(<env>)[ec2-user#ip-172-31-26-243 djangoproj]$
see that section in the middle: ip-172-31-26-243, that's the local ip, use that in the security group settings. in this case, I used 172.31.26.243/32 as the incoming IP allowed
then try to runserver again

Django-Haystack using Amazon Elasticsearch hosting with IAM credentials

I am hoping to use Amazon's Elasticsearch server to power a search of longtext fields in a Django database. However, I also don't want to expose this search to those who don't have a log in and don't want to rely on security through obscurity or some IP restriction tactic (unless it would work well with an existing heroku app, where the Django app is deployed).
Haystack seems to go a long way toward this, but there doesn't seem to be an easy way to configure it to use Amazon's IAM credentials to access the Elasticsearch service. This functionality does exist in elasticsearch-py, whichi it uses.
https://elasticsearch-py.readthedocs.org/en/master/#running-with-aws-elasticsearch-service
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
host = 'YOURHOST.us-east-1.es.amazonaws.com'
awsauth = AWS4Auth(YOUR_ACCESS_KEY, YOUR_SECRET_KEY, REGION, 'es')
es = Elasticsearch(
hosts=[{'host': host, 'port': 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
print(es.info())
Regarding using HTTP authorization, I found this under issues at https://github.com/django-haystack/django-haystack/issues/1046
from urlparse import urlparse
parsed = urlparse('https://user:pass#host:port')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': parsed.hostname,
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': parsed.port,
'http_auth': (parsed.username, parsed.password),
'use_ssl': True,
}
}
}
I am wondering if there is a way to combine these two, something like the following (which, as expected, gives an error since it's more than just a user name and password):
from requests_aws4auth import AWS4Auth
awsauth = AWS4Auth([AACCESS_KEY],[SECRET_KEY],[REGION],'es')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': [AWSHOST],
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True
}
},
}
The error here:
TypeError at /admin/
must be convertible to a buffer, not AWS4Auth
Request Method: GET
Request URL: http://127.0.0.1:8000/admin/
Django Version: 1.7.7
Exception Type: TypeError
Exception Value:
must be convertible to a buffer, not AWS4Auth
Exception Location: /usr/lib/python2.7/base64.py in b64encode, line 53
Any ideas on how to accomplish this?
You are one step from success, add connection_class to KWARGS and everything should work as expected.
import elasticsearch
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': [AWSHOST],
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True,
'connection_class': elasticsearch.RequestsHttpConnection,
}
},
}
AWS Identity and Access Management (IAM) allows you to manage users and user permissions for AWS services, to control which AWS resources users of AWS itself can access.
You cannot use IAM credentials to authorize users at the application level via http_auth, as it appears you are trying to do via Haystack here. They are different authentication schemes for different services. They are not compatible.
In your security use case, you have stated the need to 1) restrict access to your application, and 2) to secure the Elasticsearch service port from open access. These two requirements can be met using the following methods:
Restrict access to your application
I also don't want to expose this search to those who don't have a log in
For the front-end search app, you want to use a server level Basic access authentication (HTTP auth) configuration on the web server. This is where you want to control user login access to your app, via a standard http_auth username and password (again, not IAM). This will secure your app at the application level.
Secure the Elasticsearch service port
don't want to rely on security through obscurity or some
IP restriction tactic (unless it would work well with an existing
heroku app, where the Django app is deployed).
IP restriction is exactly what would work here, and consistent with AWS security best practices. You want to use security groups and security group rules as a firewall to control traffic for your EC2 instances.
Given a Haystack configuration of:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
you will want to implement an IP restriction at the security group and/or ACL level on that IP and port 127.0.0.1, to restrict access from only your Django host or other authorize hosts. This will secure it from any unauthorized access at the service level.
In your implementation, the URL will likely resolve to a public or private IP, depending on your network architecture.

Django Haystack connection error using http authentication

I'm using Haystack to connect and interact with an installation of elasticsearch. Elasticsearch is installed on a different box to the main webserver.
I have set up HTTP authentication on the elasticsearch box using nginx. This is to stop unauthorised access to elasticsearch.
The Haystack config looks like this:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://USERNAME:PASSWORD#DOMAIN:PORT/',
'INDEX_NAME': 'haystack',
},
}
With this set up I get a connection error:
elasticsearch.exceptions.ConnectionError: ConnectionError(('Connection aborted.',
gaierror(-2, 'Name or service not known'))) caused by: ProtocolError(('Connection
aborted.', gaierror(-2, 'Name or service not known')))
If I turn off HTTP authentication and update the URL correspondingly to http://DOMAIN:PORT/ it connects without a problem.
Could it be that Haystack (or elasticsearch-py (http://www.elasticsearch.org/guide/en/elasticsearch/client/python-api/current/) doesn't allow HTTP authentication to be used in the URL? I notice this is a problem with Solr - Solr authentication (using Django Haystack)
For others with the same problem, add kwargs to config:
'KWARGS': {
'port': parsed.port,
'http_auth': (parsed.username, parsed.password),
}
See: https://github.com/toastdriven/django-haystack/issues/1046