Running OrientDB in distributed mode on AWS does not work - amazon-web-services

I have 3 OrientDB (2.2.7) nodes setup on AWS. They are running in distributed mode.
Whenever I connect to the server on port 2424, the connection locks up in pyorient.
I'm aware of some issues in regards to running OrientDB in distributed mode as per this question:
Creating a database in Orientdb in distributed mode
In order to avoid any issues, I'm running permanent instances as suggested by the documentation.
I also configued the EC2 instances to be "c3.4xlarge" instances as suggested by the hazelcast EC2 whitepaper. (Amazon_EC2_Deployment_Guide_v0.3_web.pdf)
I had my hazelcast.xml configured to use tcp-ip and aws discovery strategies and both delivered the same results. The servers can be seen connecting to one another via hazelcast to the discovery is working fine.
I have the following policies attached to my user.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stm7747196888759",
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
Each have hazelcast.xml configured like so:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.7.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>orientdb</name>
<password>xxxxxxxxx</password>
</group>
<properties>
<property name="hazelcast.local.localAddress">{{LOCAL_IP}}</property>
<property name="hazelcast.icmp.enabled">true</property>
</properties>
<network>
<public-address>{{PUBLIC_IP}}</public-address>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="false">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
<tcp-ip enabled="true">
<member>57.xx.xx.165</member>
<member>57.xx.xx.236</member>
<member>57.xx.xx.133</member>
</tcp-ip>
<aws enabled="false">
<access-key>xxxx</access-key>
<secret-key>xxxx</secret-key>
<host-header>ec2.amazonaws.com</host-header>
<region>eu-west-1</region>
</aws>
</join>
<interfaces enabled="false">
<interface>{{LOCAL_IP}}</interface>
</interfaces>
</network>
<executor-service>
<pool-size>16</pool-size>
</executor-service>
</hazelcast>
As can be seen from my hazelcast.xml, I also tried upgrading hazelcast to version 3.7. It doesn't matter which version of hazelcast I use, the results are the same.
As soon as I connect to the server, the connection locks up. The server still works fine over port 2480. You can still use the front-end in the browser but can't open a connection via pyorient.
We have a large DB and collect around 2.5 million vertices data each month with about 5 million edges. It's vital for us to run in distributed mode because a single server won't be able to scale beyond that capacity. As things are at the moment, it seem like OrientDB has the capability to run as a distributed database but that functionality doesn't seem to work.
We were running the dockers but switched to the binaries in order to upgrade to hazelcast 3.7.
Has anyone been able to get OrientDB working in production as distributed and what are we missing?

This does not seem to be an issue with Hazelcast or AWS.
There was 2 issues with my setup.
The first issue has to do with OrientDB not refreshing of replacing my distributed-config.json with settings from
default-distributed-db-config.json. The result was that every node, that have ever connected to my DB, was appended to that file and none of my default-distributed-db-config.json settings were reflecting in that config.
I added a start-up, script to delete that distributed-config.json every time my server starts up in order to refresh the list of nodes and update my settings.
The second issue has to do with Pyorient. Pyorient has a bug in that it can't parse the messages returned from OrientDB when in distributed mode. This causes the connection to go into an infinite loop.
There is currently a development branch on pyorient that implements the missing binary serialiser (OrientSerialization.Binary). I have another branch that has some fixes merged into it.
Install it with:
pip install https://github.com/anber500/pyorient/tarball/17f5e42e83859a661c6483f7fa812226194694dd#egg=pyorient
Set your serialiser as follows:
client = pyorient.OrientDB("localhost", 2424, serialization_type=pyorient.OrientSerialization.Binary)
You will also need an updated version of pyorient_native. The first release had a memory leak so use the version from the master branch:
pip install https://github.com/nikulukani/pyorient_native/tarball/master#egg=pyorient_native
This works perfectly on AWS in distributed mode and is much faster than the CSV serializer.
Hope it helps.

You are using a ec2 public ip address and not the ec2 private ip address. Public ip addresss often start with 57 or 54. Private ip addresses often with 10.

Related

"Host header is specified and is not an IP address or localhost" message when using chromedp headless-shell

I'm trying to deploy chromedp/headless-shell to Cloud Run.
Here is my Dockerfile:
FROM chromedp/headless-shell
ENTRYPOINT [ "/headless-shell/headless-shell", "--remote-debugging-address=0.0.0.0", "--remote-debugging-port=9222", "--disable-gpu", "--headless", "--no-sandbox" ]
The command I used to deploy to Cloud Run is
gcloud run deploy chromedp-headless-shell --source . --port 9222
Problem
When I go to this path /json/list, I expect to see something like this
[{
"description": "",
"devtoolsFrontendUrl": "/devtools/inspector.html?ws=localhost:9222/devtools/page/B06F36A73E5F33A515E87C6AE4E2284E",
"id": "B06F36A73E5F33A515E87C6AE4E2284E",
"title": "about:blank",
"type": "page",
"url": "about:blank",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/B06F36A73E5F33A515E87C6AE4E2284E"
}]
but instead, I get this error:
Host header is specified and is not an IP address or localhost.
Is there something wrong with my configuration or is Cloud Run not the ideal choice for deploying this?
This specific issue is not unique to Cloud Run. It originates from an existing change in the Chrome DevTools Protocol which generates this error when accessing it remotely. It could be attributed to security measures against some types of attacks. You can see the related Chromium pull request here.
I deployed a chromedp/headless-shell container to Cloud Run using your configuration and also received the same error. Now, there is this useful comment in a GitHub issue showing a workaround for this problem, by passing a HOST:localhost header. While this does work when I tested it locally, it does not work on Cloud Run (returns a 404 error). This 404 error could be due to how Cloud Run also utilizes the HOST header to route requests to the correct service.
Unfortunately this answer is not a solution, but it sheds some light on what you are seeing and why. I would go for using a different service from GCP, such a GCE that are pure virtual machines and less managed.

AWS Airflow v2.0.2 doesn't show Google Cloud connection type

I want to load data from Google Storage to S3
To do this I want to use GoogleCloudStorageToS3Operator, which requires gcp_conn_id
So, I need to set up Google Cloud connection type
To do this, I added
apache-airflow[google]==2.0.2
to requirements.txt
but Google Cloud connection type is still not in Dropdown list of connections in MWAA
Same approach works well with mwaa local runner
https://github.com/aws/aws-mwaa-local-runner
I guess it does not work in MWAA because of security reasons discussed here
https://lists.apache.org/thread.html/r67dca5845c48cec4c0b3c34c3584f7c759a0b010172b94d75b3188a3%40%3Cdev.airflow.apache.org%3E
But still, is there any workaround to add Google Cloud connection type in MWAA?
Connections can be created and managed using either the UI or environment variables.
To my understanding the limitation that MWAA have over installation of some provider packages are only on the web server machine which is why the connections are not listed on the UI. This doesn't mean you can't create the connection at all, it just means you can't do it from the UI.
You can define it from CLI:
airflow connections add [-h] [--conn-description CONN_DESCRIPTION]
[--conn-extra CONN_EXTRA] [--conn-host CONN_HOST]
[--conn-login CONN_LOGIN]
[--conn-password CONN_PASSWORD]
[--conn-port CONN_PORT] [--conn-schema CONN_SCHEMA]
[--conn-type CONN_TYPE] [--conn-uri CONN_URI]
conn_id
You can also generate a connection URI to make it easier to set.
Connections can also be set as environment variable. Example:
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://?extra__google_cloud_platform__key_path=%2Fkeys%2Fkey.json&extra__google_cloud_platform__scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&extra__google_cloud_platform__project=airflow&extra__google_cloud_platform__num_retries=5'
If needed you can check the google provider package docs to review the configuration options of the connection.
For MWAA there are 2 options to set connection:
Setting environment variable.
Using pattern AIRFLOW_CONN_YOUR_CONNECTION_NAME,
where e.g. YOUR_CONNECTION_NAME = GOOGLE_CLOUD_DEFAULT.
That can be done using custom plugin
https://docs.aws.amazon.com/mwaa/latest/userguide/samples-env-variables.html
Using secret manager
https://docs.aws.amazon.com/mwaa/latest/userguide/connections-secrets-manager.html
Tested for google cloud connection, both are working.
I asked AWS support about this issue. Looks like they are working on it.
They told me a way to configure the the google cloud platform connection passing a json object in the extras with Conn Type as HTTP. And it works.
I have validated editing google_cloud_default (Airflow > Admin > Connections)
Conn Type: HTTP
Extra:
{
"extra__google_cloud_platform__project":"<YOUR_VALUE>",
"extra__google_cloud_platform__key_path":"",
"extra__google_cloud_platform__keyfile_dict":"{"type": "service_account","project_id": "<YOUR_VALUE>","private_key_id": "<YOUR_VALUE>", "private_key": "-----BEGIN PRIVATE KEY-----\n<YOUR_VALUE>\n-----END PRIVATE KEY-----\n", "client_email": "<YOUR_VALUE>", "client_id": "<YOUR_VALUE>", "auth_uri": "https://<YOUR_VALUE>", "token_uri": "https://<YOUR_VALUE>", "auth_provider_x509_cert_url": "https://<YOUR_VALUE>", "client_x509_cert_url": "https://<YOUR_VALUE>"}",
"extra__google_cloud_platform__scope":"",
"extra__google_cloud_platform__num_retries":"5"
}
airflow conn screenshot
!! You must escape the " and /n in extra__google_cloud_platform__keyfile_dict !!
In requirements.txt I used:
apache-airflow[gcp]==2.0.2
(I believe apache-airflow[google]==2.0.2 should work as well)

SSL connection from AWS lambda to AWS Redshift

I am trying to connect to an AWS Redshift database from a lambda function using c#, dotnet core 2.0, and npgsql. I am having difficulty with SSL.
I have created two non-publicly-accessible Redshift databases in a dedicated VPC. The lambda executes in the same VPC. The two databases are identical in every way except that one has the "force SSL" parameter set to true.
Using the following code snippet, I can access the non-SSL database just fine:
using (var conn = new NpgsqlConnection ("Host=x; Port=5439; Username=x;
Password=x;Database=xxx")
{
Console.WriteLine("Redshift pre-Open!");
conn.Open();
Console.WriteLine("Redshift: post-Open!");
...
}
When I access the SSL database, I get the "missing hba.conf" error message - seems standard, I've seen it before ...
When I append to the connection string: "ssl Mode=Require;Server Compatibility Mode=Redshift;Trust Server Certificate=true"
the conn.open statement hangs, and the second write statement never shows up in cloudwatch.
And yet ... this connection statement works when accessing the same database thru a rest API and C#/dotnetcore 2 WEBAPI (same runtime environment), with
an EC2 instance and load balancer.
A Python lambda connecting to the SSL database, in the same environment - subnets, security groups, lambda triggers, lambda parameters, ... is working just fine.
The csproj references Amazon.Lambda.Core 1.0.0, Amazon.Lambda.Serialization.Json 1.1.0, and
Npgsql.EntityFrameworkCore.PostgreSQL 2.0.1.
I'd try Wireshark, maybe, in another environment - but running as a lambda, I'm not sure how best to debug. I've tried many permutations and combinations, and I wouldn't put it past myself to be missing something blindingly obvious,
but I absolutely do not see why hangs. Thank you.

How to configure and enable Azure Service Fabric Reverse Proxy for an existing on-premises cluster?

Is the Azure Service Fabric Reverse Proxy available in an on-premises cluster? If so, how can I enable it for an existing cluster?
The Service Fabric Reverse Proxy is described here. It allows clients external to the cluster to access application services by name with a special URL, without needing to know the exact host:port on which an instance of the service is running (which may change as services are automatically moved around).
By default the Service Fabric Reverse Proxy does not appear to be enabled for my on-prem cluster with two instances of a stateless service. I tried using the documented port 19008 but could not reach the service using the recommended URI syntax.
To wit, this works:
http://fqdn:20001/api/odata/v1/$metadata
but this does not:
http://fqdn:19008/MyApp/MyService/api/odata/v1/$metadata
In the NodeTypes section of the ClusterConfig JSON used to set up my on-prem cluster, there is a property "httpGatewayEndpointPort": "19080", but that port does not appear to work as a reverse proxy (it is the Service Fabric Explorer web-app endpoint). I am guessing that the needed configuration is specified somehow in the cluster config JSON. There are instructions in the referenced article that explain how to configure the reverse proxy in the cloud, but not on-premises.
What I am looking for are instructions on how to set up the Service Fabric reverse proxy in an on-premises multi-machine cluster or dev cluster.
Yes, the reverse proxy is available on-premises.
To get it working for an existing cluster, it must be configured and enabled in the cluster config XML and then the new config must be deployed, as described below.
For a new cluster, set it up in the cluster config JSON before creating the cluster, as described by #Scott Weldon.
#Senj provided the clue (thanks!) that led me to the answer. I had recently updated my Service Fabric bits on my dev box to 5.1.163.9590. When I looked in C:\SfDevCluster\Data\FabricHostSettings.xml, I noticed the following:
<Section Name="FabricNode">
...
<Parameter Name="NodeVersion" Value="5.1.163.9590:1.0:0" />
...
<Parameter Name="HttpApplicationGatewayListenAddress" Value="19081" />
<Parameter Name="HttpApplicationGatewayProtocol" Value="http" />
...
</Section>
Interesting! With the dev cluster fired up, I browsed to:
http://localhost:19081/MyApp/MyService/api/odata/v1/$metadata
and voila! My API returned the expected data. So #Senj was correct that it has to do with the HttpApplicationGateway settings. I am guessing that in the latest SDK version it is pre-configured and enabled by default. (What threw me off is all the docs refer to port 19008, but the actual configured port was 19081!)
In order to get the reverse proxy to work on the 'real' multi-machine (VM) cluster, I did the following (Note: I don't think upgrading the cluster codepackage was necessary, but since I had nothing in my image store for the cluster upgrade, and the cluster upgrade process requires a code package, I used the latest version):
Copy the existing cluster manifest (from the Manifest tab in Service Fabric Explorer), paste into a new XML file, bump the version number and modify as follows:
To the NodeType Endpoints section, add:
<NodeTypes>
<NodeType Name="NodeType0">
<Endpoints>
<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
...
</Endpoints>
</NodeType>
</NodeTypes>
and under <FabricSettings>, add the following section:
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
Using Service Fabric PowerShell commands:
Copy the new cluster config (the previously copied manifest.xml) to the fabric image store
Register the new cluster config
Copy the Service Fabric Runtime cluster codepackage (available here - see the release notes for the link to the MSI) to the image store
Register the cluster codepackage
Start and complete cluster upgrade (I used unmonitored manual mode, which does one VM at a time and requires a manual Resume command after each node is complete)
After the cluster upgrade was complete, I was able to query my service API using the reverse proxy endpoint and appname/servicename URL syntax:
http://fqdn:19081/MyApp/MyService/api/odata/v1/$metadata
I enabled this in the standalone installer version (5.1.156) by adding the following line to the JSON configuration file under the nodeTypes element (I used ClusterConfig.Unsecure.MultiMachine.json but I assume any of the JSON files would work):
"httpApplicationGatewayEndpointPort": "19081"
So the final nodeTypes looked like this:
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpoint": "19001",
"httpGatewayEndpointPort": "19080",
"httpApplicationGatewayEndpointPort": "19081",
"applicationPorts": {
"startPort": "20001",
"endPort": "20031"
},
"ephemeralPorts": {
"startPort": "20032",
"endPort": "20062"
},
"isPrimary": true
}
]
I think it has something to do with the HttpApplicationGatewayEndpoint property, see also my question on https://github.com/Azure/service-fabric-issues/issues/5
But it doesn't work for me..
Also notice that
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
is true for me.
Edit:
I noticed that on my Windows-Only installation, HttpApplicationGatewayListenAddress has value 0 in the FabricHostSettings.xml
<Parameter Name="HttpGatewayListenAddress" Value="19080" />
<Parameter Name="HttpGatewayProtocol" Value="http" />
<Parameter Name="HttpApplicationGatewayListenAddress" Value="0" />
<Parameter Name="HttpApplicationGatewayProtocol" Value="" />

Enabling HA namenodes on a secure cluster in Cloudera Manager fails

I am running a CDH4.1.2 secure cluster and it works fine with the single namenode+secondarynamenode configuration, but when I try to enable High Availability (quorum based) from the Cloudera Manager interface it dies at step 10 of 16, "Starting the NameNode that will be transitioned to active mode namenode ([my namenode's hostname])".
Digging into the role log file gives the following fatal error:
Exception in namenode joinjava.lang.IllegalArgumentException: Does not contain a valid host:port authority: [my namenode's fqhn]:[my namenode's fqhn]:0 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:206) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:158) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147) at
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:143) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:547) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:480) at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:443) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589) at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140) at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
How can I resolve this?
It looks like you have two problems:
The NameNode's IP address is resolving to "my namenode's fqhn" instead of a regular hostname. Check your /etc/hosts file to fix this.
You need to configure dfs.https.port. With Cloudera Manager free edition, you must have had to add the appropriate configs to the safety valves to enable security. As part of that, you need to configure the dfs.https.port.
Given that this code path is traversed even in the non-HA mode, I'm surprised that you were able to get your secure NameNode to start up correctly before enabling HA. In case you haven't already, I recommend that you first enable security, test that all HDFS roles start up correctly and then enable HA.