I am hoping to use Amazon's Elasticsearch server to power a search of longtext fields in a Django database. However, I also don't want to expose this search to those who don't have a log in and don't want to rely on security through obscurity or some IP restriction tactic (unless it would work well with an existing heroku app, where the Django app is deployed).
Haystack seems to go a long way toward this, but there doesn't seem to be an easy way to configure it to use Amazon's IAM credentials to access the Elasticsearch service. This functionality does exist in elasticsearch-py, whichi it uses.
https://elasticsearch-py.readthedocs.org/en/master/#running-with-aws-elasticsearch-service
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
host = 'YOURHOST.us-east-1.es.amazonaws.com'
awsauth = AWS4Auth(YOUR_ACCESS_KEY, YOUR_SECRET_KEY, REGION, 'es')
es = Elasticsearch(
hosts=[{'host': host, 'port': 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
print(es.info())
Regarding using HTTP authorization, I found this under issues at https://github.com/django-haystack/django-haystack/issues/1046
from urlparse import urlparse
parsed = urlparse('https://user:pass#host:port')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': parsed.hostname,
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': parsed.port,
'http_auth': (parsed.username, parsed.password),
'use_ssl': True,
}
}
}
I am wondering if there is a way to combine these two, something like the following (which, as expected, gives an error since it's more than just a user name and password):
from requests_aws4auth import AWS4Auth
awsauth = AWS4Auth([AACCESS_KEY],[SECRET_KEY],[REGION],'es')
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': [AWSHOST],
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True
}
},
}
The error here:
TypeError at /admin/
must be convertible to a buffer, not AWS4Auth
Request Method: GET
Request URL: http://127.0.0.1:8000/admin/
Django Version: 1.7.7
Exception Type: TypeError
Exception Value:
must be convertible to a buffer, not AWS4Auth
Exception Location: /usr/lib/python2.7/base64.py in b64encode, line 53
Any ideas on how to accomplish this?
You are one step from success, add connection_class to KWARGS and everything should work as expected.
import elasticsearch
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': [AWSHOST],
'INDEX_NAME': 'haystack',
'KWARGS': {
'port': 443,
'http_auth': awsauth,
'use_ssl': True,
'verify_certs': True,
'connection_class': elasticsearch.RequestsHttpConnection,
}
},
}
AWS Identity and Access Management (IAM) allows you to manage users and user permissions for AWS services, to control which AWS resources users of AWS itself can access.
You cannot use IAM credentials to authorize users at the application level via http_auth, as it appears you are trying to do via Haystack here. They are different authentication schemes for different services. They are not compatible.
In your security use case, you have stated the need to 1) restrict access to your application, and 2) to secure the Elasticsearch service port from open access. These two requirements can be met using the following methods:
Restrict access to your application
I also don't want to expose this search to those who don't have a log in
For the front-end search app, you want to use a server level Basic access authentication (HTTP auth) configuration on the web server. This is where you want to control user login access to your app, via a standard http_auth username and password (again, not IAM). This will secure your app at the application level.
Secure the Elasticsearch service port
don't want to rely on security through obscurity or some
IP restriction tactic (unless it would work well with an existing
heroku app, where the Django app is deployed).
IP restriction is exactly what would work here, and consistent with AWS security best practices. You want to use security groups and security group rules as a firewall to control traffic for your EC2 instances.
Given a Haystack configuration of:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
you will want to implement an IP restriction at the security group and/or ACL level on that IP and port 127.0.0.1, to restrict access from only your Django host or other authorize hosts. This will secure it from any unauthorized access at the service level.
In your implementation, the URL will likely resolve to a public or private IP, depending on your network architecture.
Related
My django project runns normally on localhost and on heroku also, but when I deployed it to google cloud platform I am getting this error:
could not connect to server: Connection refused
Is the server running locally and accepting connections on Unix domain socket "/cloudsql/<connection_name>/.s.PGSQL.5432"?
The connection to the database in settings.py looks like this:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'database_name',
'USER': 'postgres',
'PASSWORD': "password",
# production
'HOST': '/cloudsql/connection_name',
'PORT': '5432',
}
Additionally, my app.yaml looks like
runtime: python37
handlers:
- url: /static
static_dir: static/
- url: /.*
script: auto
env_variables:
DJANGO_SETTINGS_MODULE: fes_app.settings
requirements.txt looks like this plus
sqlparse==0.4.2
toml==0.10.2
uritemplate==4.1.1
urllib3==1.25.11
whitenoise==5.2.0
twilio==6.9.0
I have tried using the binary version of psycopg, also gave a role client sql to the service account in the cloud.
**NOTE : ** I am using an app engine standard environment
Likely need more information to help you out, but did you follow the tutorial below in the official Google Docs? That's usually how I get started and then I make modifications from there.
I would compare how Google is deploying a Django app to your own settings and see what's missing. For example, your requirements.txt file does not look complete (unless you only pasted part of it) so I would start there.
https://cloud.google.com/python/django/appengine
Screenshot of ES Role Selection console
Trying to put a document to AWS ES cluster. Code:
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
host = 'search-dev-operations-2-XXXXXXXX.us-east-2.es.amazonaws.com' # For example, my-test-domain.us-east-1.es.amazonaws.com
region = 'us-east-2' # e.g. us-west-1
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
es = Elasticsearch(
hosts = [{'host': host, 'port': 443}],
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
document = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
es.index(index="dev-operations-2", doc_type="_doc", id="5", body=document)
print(es.get(index="dev-operations-2", doc_type="_doc", id="5"))
Getting this error message:
elasticsearch.exceptions.AuthorizationException: AuthorizationException(403, '{"Message":"User: arn:aws:iam::XXXXXX:user/andrey.tantsuyev#XXXtechnology.com is not authorized to perform: es:ESHttpPut with an explicit deny"}')
Set up arn:aws:iam::XXXXXX:user/andrey.tantsuyev#XXXtechnology.com as a IAM master user through Fine-grained access. This is my AWS user
Anybody could help me please? Have no ideas why I"m not authorized.
Screenshot of ES Cluster details
This is not a problem in ElasticSearch, this is being blocked based on the policies associated to your IAM user.
Go to the IAM service console and look up the permissions for the andrey.tantsuyev#XXXtechnology.com user. It appears that there is a "Deny" statement associated with one of the groups/policies attached to the user that matches the es:ESHttpPut action.
The problem was that andrey.tantsuyev#XXXtechnology.com had MFA restrictions. Ones I've implemented assumeRole with MFA credentials everything started working fine.
I want to use AWS Elastic-search service with my django application which is running on EC2 instance.
For that I use the settings -
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch5_backend.Elasticsearch5SearchEngine',
'URL': 'https://vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com:9200/',
'INDEX_NAME': 'haystack',
'INCLUDE_SPELLING':True,
},
}
I am not even able to set the connection. Here I am getting this error -
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError((, 'Connection to vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com timed out. (connect timeout=10)')) caused by: ConnectTimeoutError((, 'Connection to vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com timed out. (connect timeout=10)'))
I have updated the access policy to allow the user for edit and list, also in security group add the port 9200 TCP rule. How to connect ec2 with elastic search using VPC.
It is working on 443 port, use
'URL': 'https://vpc-ES-CLUSTER.ap-south-1.es.amazonaws.com:443/',
and in security groups add 443 open port.
I'm having some trouble with my AWS Kubernetes instance.
I'm trying to get my django instances to connect to the RDS service via the DB endpoint.
DATABASES = {
'default': {
'ENGINE': 'django.contrib.gis.db.backends.postgis',
'NAME': os.environ['NAME'],
'USER': os.environ['USER'],
'PASSWORD': os.environ['PASSWORD'],
'HOST': os.environ['HOST'],
'PORT': os.environ['PORT']
}
}
The host string would resemble this service.key.region.rds.amazonaws.com and is being passed to the container via env in the deploy.yml
containers:
- name: service
env:
- name: HOST
value: service.key.region.rds.amazonaws.com
This set up works locally in kubernetes but not when I put it in the cluster I have on AWS. It returns the following error instead:
django.db.utils.OperationalError: could not translate host name
Any suggestions or am I missing something in how AWS likes handling things?
Assuming your AWS deployment is now in the same VPC as your RDS, then you will need to change your host to use the private IP.
I'm using Haystack to connect and interact with an installation of elasticsearch. Elasticsearch is installed on a different box to the main webserver.
I have set up HTTP authentication on the elasticsearch box using nginx. This is to stop unauthorised access to elasticsearch.
The Haystack config looks like this:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://USERNAME:PASSWORD#DOMAIN:PORT/',
'INDEX_NAME': 'haystack',
},
}
With this set up I get a connection error:
elasticsearch.exceptions.ConnectionError: ConnectionError(('Connection aborted.',
gaierror(-2, 'Name or service not known'))) caused by: ProtocolError(('Connection
aborted.', gaierror(-2, 'Name or service not known')))
If I turn off HTTP authentication and update the URL correspondingly to http://DOMAIN:PORT/ it connects without a problem.
Could it be that Haystack (or elasticsearch-py (http://www.elasticsearch.org/guide/en/elasticsearch/client/python-api/current/) doesn't allow HTTP authentication to be used in the URL? I notice this is a problem with Solr - Solr authentication (using Django Haystack)
For others with the same problem, add kwargs to config:
'KWARGS': {
'port': parsed.port,
'http_auth': (parsed.username, parsed.password),
}
See: https://github.com/toastdriven/django-haystack/issues/1046