How to talk to aws elasticsearch service using elastic java client?

How to talk to aws elasticsearch service using elastic java client? - amazon-web-services

I have set up a elasticsearch server using AWS elasticsearch service (Not EC2). It gave me an endpoint https://xxx-xxxxxxxx.us-west-2.es.amazonaws.com/ and if I click this endpoint(Note that there is no port specified) I can get the expected
{
status: 200,
name: "Mastermind",
cluster_name: "xxxx",
version: {
number: "1.5.2",
build_hash: "yyyyyy",
build_timestamp: "2015-04-27T09:21:06Z",
build_snapshot: false,
lucene_version: "4.10.4"
},
tagline: "You Know, for Search"
}
The question is how do I get this through the elasticsearch java client without a port number? The sample code I get is
Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300));
If I use this code and just replace "host1" with my endpoint, I'll get "NoNodeAvailableException"
ps:
The java client version I'm using is 2.0.0.
Edit
I finally decided to go with Jest, a 3rd party REST client. But what Brooks answered below is also very helpful - AWS do use port 80 for http and 443 for https. The blocker for me was the firewall I guess.
Edit2
The AWS ES service documentation explicitly says:
The service supports HTTP on port 80, but does not support TCP transport.

Believe it or not, AWS doesn't launch Elasticsearch using 9200 and 9300. It's launched via plain old port 80.
So, to demonstrate, try this...
curl -XPOST "http://xxx-xxxxxxxx.us-west-2.es.amazonaws.com:80/myIndex/myType" -d '["name":"Edmond"}'
Or
curl -XPOST "https://xxx-xxxxxxxx.us-west-2.es.amazonaws.com:443/myIndex/myType" -d '["name":"Edmond"}'
It should respond with:
{"_index":"myIndex","_type":"myType","_id":"SOME_ID_#","_version":1,"created":true}
Check in Kibana and you'll see it's there.
So, then in your code, it should be:
Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("xxx-xxxxxxxx.us-west-2.es.amazonaws.com"), 80));
Unfortunately, I don't off-hand know how to transmit encrypted via SSL/HTTPS using the transport client. You could try using regular REST calls instead using JERSEY.
Finally, make sure your Elasticsearch access policy is configured properly. Something along the lines of:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": "*",
"Action": "es:*",
"Resource": "arn:aws:es:us-east-1:yyyyyyy:domain/myDomain/*"
},
{
"Sid": "",
"Effect": "Allow",
"Principal": "*",
"Action": "es:*",
"Resource": "arn:aws:es:us-east-1:yyyyyyyyy:domain/myDomain"
}
]
}
NOTE: The above access policy is completely wide open and is not recommended for anything remotely close to production. Just so you know....

Managed elastic search service in AWS does not provide the port for the transport protocol until now.
This question has been answered here ,
Elastic Transport client on AWS Managed ElasticSearch1
There is also a discussion in the AWS forum regarding the transport protocol. Here is the link
What is the port for the transport protocol ?

After lot of search i found an example which used a GET request, so I made minor changes to it for allowing POST requests so that complex queries can be submitted via POST body. The implementation is available at https://github.com/dy10/aws-elasticsearch-query-java
Apart from properly configuring access to you AWS ES (i.e. dont open it to Public), make sure to use https (the above code uses http; just replace http with https in the code and it will work).
Another useful looking but partial implementation is at https://github.com/aws/aws-sdk-java/issues/861

Related

How do I publish/Subscribe AWS IoT - server side using Java SDK

I'm really new with AWS and IoT, and my goal is to:
Use the Java SDK v.2 from my serverless application to create/get/update/attach/... certificates and things.
Create client side MQTT demo application to connect publish and subscribe to messages used by my new certificates and thing created in phase 1.
Publish/subscribe messages in the server side in order to talk to my things/clients.
1 & 2 I've managed to do perfectly.
But I don't understand how should I do the 3rd one.
Should I use the IoT device SDK as well in the server side ? If so with what credentials do I connect ?
Is there some objects in the SDK that I've missed?
In order to connect to IoT Core from the server I first configure my SSO connection using the AWSCLI and in the code I simply use my profile name and region to connect.

Your serverless Java application needs to be configured as a "Thing" in the same account/region as your IoT devices. In the console, go to
AWS IoT -> Manage -> Things
and create a thing for your app. In this case you shouldn't need a "Device Shadow", and you can select "Auto Generate Certificates".
For the IoT Policy, you will need the following :
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iot:Connect",
"Resource": "arn:aws:iot:us-east-1:YOUR_AWS_ACCOUNT_ID:client/*"
},
{
"Effect": "Allow",
"Action": "iot:Subscribe",
"Resource": "arn:aws:iot:us-east-1:YOUR_AWS_ACCOUNT_ID:topicfilter/*"
},
{
"Effect": "Allow",
"Action": "iot:Receive",
"Resource": "arn:aws:iot:us-east-1:YOUR_AWS_ACCOUNT_ID:topic/*"
},
{
"Effect": "Allow",
"Action": "iot:Publish",
"Resource": "arn:aws:iot:us-east-1:YOUR_AWS_ACCOUNT_ID:topic/*"
}
]
}
Your application will communicate with IoTCore using the endpoint shown in the Settings screen in IoTCore for the region where you have created your thing. Your application will authenticate using the key/cert you downloaded when creating the thing (username/password auth is not allowed).
Once your application connects to the endpoint, you will want to "subscribe" to the same topic your devices use to send messages. You can also publish to one or more topics.
In order to debug communications, you can use the MQTT client in the AWS IoTCore console, just note the console needs to be refreshed periodically when communication times out. I recommend marking your topics as favorites so they are easy to re-subscribe to on a refresh.
As for coding in Java, you should be able to leverage examples from the AWS IoT Device SDK here :
https://github.com/aws/aws-iot-device-sdk-java-v2/tree/main/samples
Here's a link to the MQTT client class :
http://aws-iot-device-sdk-java-docs.s3-website-us-east-1.amazonaws.com/com/amazonaws/services/iot/client/AWSIotMqttClient.html
Please note that your app will not have access to messages when not in use. There are a few strategies to deal with message persistence, but that's outside the scope of your question, so I won't cover it here.
Hopefully this gets you pointed in the right direction.

Response error from API Gateway HTTP endpoint to AWS Elasticsearch domain

I have a post endpoint in API gateway with the HTTP request integration. It's not a proxy endpoint. The URL it's pointed at is the search URL for my AWS ElasticSearch domain. I'm taking the body of the post request and mapping it to a template to query the ES domain with.
Right now I'm getting this response back from ES: {"Message":"User: anonymous is not authorized to perform: es:ESHttpPost"}.
I don't have fine grained access control enabled on the ES Domain, and it's not in a VPC. My JSON access policy looks like this:
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "es:ESHttp*",
"Resource": "arn:aws:es:xx-xxxx-x:999999999999:domain/XXXXX/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"X.X.X.X/X",
"X.X.X.X/X"
]
}
}
}
]
}
With this policy, I'm able to reach out and query the domain directly and navigate to the Kibana dashboard while on the IP's I have in the allow list. However, I'm assuming the query request coming from my gateway endpoint has a different IP address as that's where I get the 403 response mentioned above.
I know I can find the origin IP address in the request in the x-forwarded-for header, but I don't think I'm able to add that as a condition to the ES access policy, I've tried a few other policy conditions as well, such as a StringLike for the principle ARN and looking for the method request ARN, but these don't work.
Ideally, I would like to just be able to override the sourceIP with the actual origin and not gateway's IP address, but I don't know if this is possible. I also know I could just use a lambda function here, but I would like to avoid needing a function here, to avoid having code / another service to maintain.
Basically, I'm hoping to be able to still allow anonymous user access when requests are coming from certain IP's, but I want to create and abstract certain frequently used queries with a few gateway endpoints. Is there a way I can add the API endpoint to the allow list, or do I have to enable fine-grained access control?

Amazon cognito authentication for Kibana in ElasticSearch hosted in VPC - Link does not load for kibana

I have an ES domain and I want to access Kibana locally from within the same browser. Reading the documentation, it said that i could use Amazon Cognito to do the same with authentication for the users. I set the whole thing up as per the following AWS documentation Link
The problem is, whenever i try to access kibana from the browser using the link which looks like this :
https://vpc-something1-something2.us-east-1.es.amazonaws.com/_plugin/kibana/
the request times out. I'll post my access policy for the ES cluster here :
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::Account_ID:role/Cognito_Something_Auth_Role"
},
"Action": "es:*",
"Resource": "arn:aws:es:us-east-1:Account_ID:domain/domain_name/*"
}
]
}
I followed the procedure in the above link exactly and created a user group, identity group etc. But the link does not seem to load. Any help would be much appreciated.
PS : I'm new to AWS.
All this is assuming that i can directly access Kibana through my browser if i have cognito set up correctly.

If your ES cluster is created in a VPC, then you need networking access to it. I would recommend creating a cluster with 'Public access' instead, which is still subject to your access policy.
If you want a VPC cluster, and you want to access it (either ES directly, or Kibana) from outside that VPC, then you will need to VPN into the VPC, or do some routing that enables it to be exposed. The latter might be a bit tricky when the instances running your cluster aren't directly available to you, but you should be able to do it with some combination of Internet gateways, NAT gateways, security groups, routing tables, etc.
This might help: Connecting to a VPC

How to limit AWS API Gateway access to specific CloudFront distribution or Route53 subdomain

I have an API Gateway api setup that I want to limit access to. I have a subdomain setup in AWS Route 53 that points to a CloudFront distribution where my app lives. This app makes a POST request to the API.
I have looked into adding a resource policy for my api based on the example 'AWS API Whitelist' but I can't seem to get the syntax correct, I constantly get errors.
I also tried creating an IAM user and locking down the API with AWS_IAM auth but then I need to create a signed request which seems like a lot of work that should be a lot easier via resource policies?
This is an example of the resource policy I tried to attach to my API:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity {{CloudFrontID}}"
},
"Action": "execute-api:Invoke",
"Resource": [
"execute-api:/*/*/*"
]
}
]
}
This returns the following error:
Invalid policy document. Please check the policy syntax and ensure that Principals are valid.

"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity {{CloudFrontID}}"
The problem with this concept is that this is a public HTTP request. Unless it's a signed request, AWS will not know about any IAM or ARN resources, it just knows it has a standard HTTP request. If you make the request with a curl -v command you will see the request parameters look something like this:
GET
/test/mappedcokerheaders
HTTP/2
Host: APIID.execute-api.REGION.amazonaws.com
User-Agent: curl/7.61.1
Accept: */*
It's possible you could filter the user Agent as I do see that condition defined here.
I would check all of the values that are coming in the request from cloudfront vs the request from your curl directly to the API by trapping the api gw request id in the response headers, and looking for those in your API Gateway Access Logs. You'll have to enable Access Logs though, and define what parameters you want logged, which you can see how to do here.

The problem is that OAI cannot be used in CustomOrigin. If you are not forwarding User-Agent to the API Gateway CustomOrigin, then the simplest approach for you is to add a resource policy in API Gateway which only allows aws:UserAgent: "Amazon CloudFront".
Be careful: User-Agent can very easily be spoofed. This approach is designed to only prevent "normal access" like a random bot on the web is trying to scrape your site.
The User-Agent header is guaranteed to be Amazon CloudFront. See the quote from Request and Response Behavior for Custom Origins.
Behavior If You Don't Configure CloudFront to Cache Based on Header Values: CloudFront replaces the value of this header field with Amazon CloudFront. If you want CloudFront to cache your content based on the device the user is using, see Configuring Caching Based on the Device Type.
Here is how the full resource policy looks like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "execute-api:Invoke",
"Resource": "arn:aws:execute-api:us-west-2:123456789012:abcdefghij/*/*/*",
"Condition": {
"StringEquals": {
"aws:UserAgent": "Amazon CloudFront"
}
}
}
]
}
Here is how to configure it in serverless.yml:
provider:
resourcePolicy:
- Effect: Allow
Principal: "*"
Action: execute-api:Invoke
Resource:
- execute-api:/*/*/*
Condition:
StringEquals:
aws:UserAgent: "Amazon CloudFront"

I have a subdomain setup in AWS Route 53 that points to a CloudFront
distribution where my app lives. This app makes a POST request to the
API.
What I understand is that you have a public service that can be called from the web browser ( https://your-service.com )
You want the service to respond only when the client's browser is at https://your-site.com. The service will not respond when the browser for example is on https://another-site.com
If that is the case,
you will need to read more about CORS
This will not prevent a random guy / web client to go to and call your service directly to https://your-service.com however. To protect the service from that, you need proper authentication system

AWS Elastisearch Access Policy for CodeBuild Integration Tests with Hibernate Search using a ElasticSearch for indexes storage

I want to launch a CodeBuild project to run my integration tests. My application use AWS ElasticSearch Service as Hibernate Search index storage.
I have added a policy to my ES Domain which allows private ec2 instances to access ES through a NAT Gateway. Unfortunally I can't figured out the correct policy to allow CodeBuild access ES. When I run CodeBuild project I get a 403 error when Hibernate try to check an index existence.
Caused by: org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed.
Request:
Operation: IndicesExists
URI:com.mycompany.myproject.model.tenant
Data:
null
Response:
=========
Status: 403
Error message: 403 Forbidden
Cluster name: null
Cluster status: null
I tried to configured ES Access Policy to allow open access to the domain, then tests runs ok ("AWS": "*").
This is the ES Access Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::AWS_ACCOUNT_ID:role/CodeBuildRole-XXXXXXXX"
},
"Action": "es:*",
"Resource": "arn:aws:es:eu-west-1:AWS_ACOUNT_ID:domain/elastic-search-domain/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "es:*",
"Resource": "arn:aws:es:eu-west-1:AWS_ACCOUNT_ID:domain/elastic-search-domain/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": "NAT_GW_IP"
}
}
}
]
}
As principal I've also tried the following:
"arn:aws:sts::AWS_ACCOUNT_ID:assumed-role/CodeBuildRole-XXXXXXXXX/*"
"arn:aws:iam::AWS_ACCOUNT_ID:role/CodeBuildRole-XXXXXXXXX"
"arn:aws:iam::AWS_ACCOUNT_ID:root"
"arn:aws:iam::AWS_ACCOUNT_ID:user/MI_USER_ADMIN"
Any help will be very appreciated.
Thanks

I would like to extend the VME answer to be more precise.
To access ElasticSearch using a role, the request must certainly be signed.
This solution is generally correct, but on my particular case this is not suitable since the requests to AWS ES are generated by Hibernate Search ElasticSearch. (Might we find another solution using AOP?)
I finally figured out a workaround for this problem. On CodeBuild build spec I added the following steps:
Configure AWS CLI using a user with a policy that allows him to read and update ES domain.
Read and store the current ES Domain Access Policy
I get the CodeBuild ec2 IP
Update ES Domain policy access to allow access from CodeBuild IP
Wait until the changes applies (15 mins aprox)
Run the test
Restore the previous configuration
I don't like this solution very much because the Domain Policy updates takes too long. This step is part of a CodePipeline for Continuous Integration, and executions should not take more than 15 or 20 minutes.
Any ideas on how to improve this?

Possibly you need to sign your ES requests.
I am not familiar with CodeBuild, but generally the rule is: when using IAM roles to access Elasticsearch, your requests need to be signed with that IAM role.
E.g. For python you would use a tool like this: https://github.com/DavidMuller/aws-requests-auth
More info: http://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js