Setting up an Appfabric Cluster - need some clarification - appfabric

There's something that's not very clear to me and does not show up in the documentation:
After successfully setting up a Cache Cluster with 2 hosts, what is the address for connecting to this cluster? I know this screams setting up a Windows Cluster all over but I just wanted to make sure there wasnt anything I'm missing.

Not sure I understood your question. The default port to connect to on of your cluster nodes is 22233. Here is an example. You can build up the connection within the code or by putting the cache host configuration into your Web.config file.

There is no 'address' as such, as you might expect in a load-balancing scenario, but you specify host(s) in the hosts section of the dataCacheClient config element:
<dataCacheClient … >
<hosts>
<host name="server1.mydomain.local" cachePort="22233" />
<host name="server2.mydomain.local" cachePort="22233" />
</hosts>
</dataCacheClient>

Related

Running OrientDB in distributed mode on AWS does not work

I have 3 OrientDB (2.2.7) nodes setup on AWS. They are running in distributed mode.
Whenever I connect to the server on port 2424, the connection locks up in pyorient.
I'm aware of some issues in regards to running OrientDB in distributed mode as per this question:
Creating a database in Orientdb in distributed mode
In order to avoid any issues, I'm running permanent instances as suggested by the documentation.
I also configued the EC2 instances to be "c3.4xlarge" instances as suggested by the hazelcast EC2 whitepaper. (Amazon_EC2_Deployment_Guide_v0.3_web.pdf)
I had my hazelcast.xml configured to use tcp-ip and aws discovery strategies and both delivered the same results. The servers can be seen connecting to one another via hazelcast to the discovery is working fine.
I have the following policies attached to my user.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stm7747196888759",
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
Each have hazelcast.xml configured like so:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.7.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>orientdb</name>
<password>xxxxxxxxx</password>
</group>
<properties>
<property name="hazelcast.local.localAddress">{{LOCAL_IP}}</property>
<property name="hazelcast.icmp.enabled">true</property>
</properties>
<network>
<public-address>{{PUBLIC_IP}}</public-address>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="false">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
<tcp-ip enabled="true">
<member>57.xx.xx.165</member>
<member>57.xx.xx.236</member>
<member>57.xx.xx.133</member>
</tcp-ip>
<aws enabled="false">
<access-key>xxxx</access-key>
<secret-key>xxxx</secret-key>
<host-header>ec2.amazonaws.com</host-header>
<region>eu-west-1</region>
</aws>
</join>
<interfaces enabled="false">
<interface>{{LOCAL_IP}}</interface>
</interfaces>
</network>
<executor-service>
<pool-size>16</pool-size>
</executor-service>
</hazelcast>
As can be seen from my hazelcast.xml, I also tried upgrading hazelcast to version 3.7. It doesn't matter which version of hazelcast I use, the results are the same.
As soon as I connect to the server, the connection locks up. The server still works fine over port 2480. You can still use the front-end in the browser but can't open a connection via pyorient.
We have a large DB and collect around 2.5 million vertices data each month with about 5 million edges. It's vital for us to run in distributed mode because a single server won't be able to scale beyond that capacity. As things are at the moment, it seem like OrientDB has the capability to run as a distributed database but that functionality doesn't seem to work.
We were running the dockers but switched to the binaries in order to upgrade to hazelcast 3.7.
Has anyone been able to get OrientDB working in production as distributed and what are we missing?
This does not seem to be an issue with Hazelcast or AWS.
There was 2 issues with my setup.
The first issue has to do with OrientDB not refreshing of replacing my distributed-config.json with settings from
default-distributed-db-config.json. The result was that every node, that have ever connected to my DB, was appended to that file and none of my default-distributed-db-config.json settings were reflecting in that config.
I added a start-up, script to delete that distributed-config.json every time my server starts up in order to refresh the list of nodes and update my settings.
The second issue has to do with Pyorient. Pyorient has a bug in that it can't parse the messages returned from OrientDB when in distributed mode. This causes the connection to go into an infinite loop.
There is currently a development branch on pyorient that implements the missing binary serialiser (OrientSerialization.Binary). I have another branch that has some fixes merged into it.
Install it with:
pip install https://github.com/anber500/pyorient/tarball/17f5e42e83859a661c6483f7fa812226194694dd#egg=pyorient
Set your serialiser as follows:
client = pyorient.OrientDB("localhost", 2424, serialization_type=pyorient.OrientSerialization.Binary)
You will also need an updated version of pyorient_native. The first release had a memory leak so use the version from the master branch:
pip install https://github.com/nikulukani/pyorient_native/tarball/master#egg=pyorient_native
This works perfectly on AWS in distributed mode and is much faster than the CSV serializer.
Hope it helps.
You are using a ec2 public ip address and not the ec2 private ip address. Public ip addresss often start with 57 or 54. Private ip addresses often with 10.

How to configure and enable Azure Service Fabric Reverse Proxy for an existing on-premises cluster?

Is the Azure Service Fabric Reverse Proxy available in an on-premises cluster? If so, how can I enable it for an existing cluster?
The Service Fabric Reverse Proxy is described here. It allows clients external to the cluster to access application services by name with a special URL, without needing to know the exact host:port on which an instance of the service is running (which may change as services are automatically moved around).
By default the Service Fabric Reverse Proxy does not appear to be enabled for my on-prem cluster with two instances of a stateless service. I tried using the documented port 19008 but could not reach the service using the recommended URI syntax.
To wit, this works:
http://fqdn:20001/api/odata/v1/$metadata
but this does not:
http://fqdn:19008/MyApp/MyService/api/odata/v1/$metadata
In the NodeTypes section of the ClusterConfig JSON used to set up my on-prem cluster, there is a property "httpGatewayEndpointPort": "19080", but that port does not appear to work as a reverse proxy (it is the Service Fabric Explorer web-app endpoint). I am guessing that the needed configuration is specified somehow in the cluster config JSON. There are instructions in the referenced article that explain how to configure the reverse proxy in the cloud, but not on-premises.
What I am looking for are instructions on how to set up the Service Fabric reverse proxy in an on-premises multi-machine cluster or dev cluster.
Yes, the reverse proxy is available on-premises.
To get it working for an existing cluster, it must be configured and enabled in the cluster config XML and then the new config must be deployed, as described below.
For a new cluster, set it up in the cluster config JSON before creating the cluster, as described by #Scott Weldon.
#Senj provided the clue (thanks!) that led me to the answer. I had recently updated my Service Fabric bits on my dev box to 5.1.163.9590. When I looked in C:\SfDevCluster\Data\FabricHostSettings.xml, I noticed the following:
<Section Name="FabricNode">
...
<Parameter Name="NodeVersion" Value="5.1.163.9590:1.0:0" />
...
<Parameter Name="HttpApplicationGatewayListenAddress" Value="19081" />
<Parameter Name="HttpApplicationGatewayProtocol" Value="http" />
...
</Section>
Interesting! With the dev cluster fired up, I browsed to:
http://localhost:19081/MyApp/MyService/api/odata/v1/$metadata
and voila! My API returned the expected data. So #Senj was correct that it has to do with the HttpApplicationGateway settings. I am guessing that in the latest SDK version it is pre-configured and enabled by default. (What threw me off is all the docs refer to port 19008, but the actual configured port was 19081!)
In order to get the reverse proxy to work on the 'real' multi-machine (VM) cluster, I did the following (Note: I don't think upgrading the cluster codepackage was necessary, but since I had nothing in my image store for the cluster upgrade, and the cluster upgrade process requires a code package, I used the latest version):
Copy the existing cluster manifest (from the Manifest tab in Service Fabric Explorer), paste into a new XML file, bump the version number and modify as follows:
To the NodeType Endpoints section, add:
<NodeTypes>
<NodeType Name="NodeType0">
<Endpoints>
<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
...
</Endpoints>
</NodeType>
</NodeTypes>
and under <FabricSettings>, add the following section:
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
Using Service Fabric PowerShell commands:
Copy the new cluster config (the previously copied manifest.xml) to the fabric image store
Register the new cluster config
Copy the Service Fabric Runtime cluster codepackage (available here - see the release notes for the link to the MSI) to the image store
Register the cluster codepackage
Start and complete cluster upgrade (I used unmonitored manual mode, which does one VM at a time and requires a manual Resume command after each node is complete)
After the cluster upgrade was complete, I was able to query my service API using the reverse proxy endpoint and appname/servicename URL syntax:
http://fqdn:19081/MyApp/MyService/api/odata/v1/$metadata
I enabled this in the standalone installer version (5.1.156) by adding the following line to the JSON configuration file under the nodeTypes element (I used ClusterConfig.Unsecure.MultiMachine.json but I assume any of the JSON files would work):
"httpApplicationGatewayEndpointPort": "19081"
So the final nodeTypes looked like this:
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpoint": "19001",
"httpGatewayEndpointPort": "19080",
"httpApplicationGatewayEndpointPort": "19081",
"applicationPorts": {
"startPort": "20001",
"endPort": "20031"
},
"ephemeralPorts": {
"startPort": "20032",
"endPort": "20062"
},
"isPrimary": true
}
]
I think it has something to do with the HttpApplicationGatewayEndpoint property, see also my question on https://github.com/Azure/service-fabric-issues/issues/5
But it doesn't work for me..
Also notice that
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
is true for me.
Edit:
I noticed that on my Windows-Only installation, HttpApplicationGatewayListenAddress has value 0 in the FabricHostSettings.xml
<Parameter Name="HttpGatewayListenAddress" Value="19080" />
<Parameter Name="HttpGatewayProtocol" Value="http" />
<Parameter Name="HttpApplicationGatewayListenAddress" Value="0" />
<Parameter Name="HttpApplicationGatewayProtocol" Value="" />

Forward EC2 traffic from 1 instance to another?

I set up countly analytics on the free tier AWS EC2, but stupidly did not set up an elastic IP with it. No, the traffic it too great that I can't even log into the analytics as the CPU is constantly running at 100%.
I am in the process of issuing app updates to change the analytics address to a private domain that forwards to the EC2 instance, so I can change the forwarding in future.
In the mean time, is it possible for me to set up a 2nd instance and forward all the traffic from the current one to the new one?
I found this http://lastzactionhero.wordpress.com/2012/10/26/remote-port-forwarding-from-ec2/ will this work from 1 EC2 instance to another?
Thanks
EDIT ---
Countly log
/home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/connection/server.js:529
throw err;
^ ReferenceError: liveApi is not defined
at processUserSession (/home/ubuntu/countlyinstall/countly/api/parts/data/usage.js:203:17)
at /home/ubuntu/countlyinstall/countly/api/parts/data/usage.js:32:13
at /home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/collection.js:1010:5
at Cursor.nextObject (/home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/cursor.js:653:5)
at commandHandler (/home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/cursor.js:635:14)
at null. (/home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/db.js:1709:18)
at g (events.js:175:14)
at EventEmitter.emit (events.js:106:17)
at Server.Base._callHandler (/home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/connection/base.js:130:25)
at /home/ubuntu/countlyinstall/countly/api/node_modules/mongoskin/node_modules/mongodb/lib/mongodb/connection/server.js:522:20
You can follow the steps described in the blog post to do the port forwarding. Just make sure not to forward it to localhost :)
Also about 100% CPU, it is probably caused by MongoDB. Did you have a chance to check the process? In case it is mongod, issue mongotop command to see the most time consuming collection accesses. We can go from there.
Yes. It is possible. I use ngnix with Node JS app. I wanted to redirect traffic from one instance to another. Instance was in different region and not configured in same VPC as mentioned in AWS documentation.
Step 1: Go to /etc/ngnix/site-enabled and open default.conf file. Your configuration might be on different file.
Step 2: Change proxy_pass to your chosen IP/domain/sub-domain
server
{
listen 80
server_name your_domain.com;
location / {
...
proxy_pass your_ip; // You can put domain, sub-domain with protocol (http/https)
}
}
Step 3: then restart the ngnix
sudo systemctl restart nginx
This can be possible for any external instances and different VPC instances.

Enabling HA namenodes on a secure cluster in Cloudera Manager fails

I am running a CDH4.1.2 secure cluster and it works fine with the single namenode+secondarynamenode configuration, but when I try to enable High Availability (quorum based) from the Cloudera Manager interface it dies at step 10 of 16, "Starting the NameNode that will be transitioned to active mode namenode ([my namenode's hostname])".
Digging into the role log file gives the following fatal error:
Exception in namenode joinjava.lang.IllegalArgumentException: Does not contain a valid host:port authority: [my namenode's fqhn]:[my namenode's fqhn]:0 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:206) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:158) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147) at
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:143) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:547) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:480) at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:443) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589) at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140) at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
How can I resolve this?
It looks like you have two problems:
The NameNode's IP address is resolving to "my namenode's fqhn" instead of a regular hostname. Check your /etc/hosts file to fix this.
You need to configure dfs.https.port. With Cloudera Manager free edition, you must have had to add the appropriate configs to the safety valves to enable security. As part of that, you need to configure the dfs.https.port.
Given that this code path is traversed even in the non-HA mode, I'm surprised that you were able to get your secure NameNode to start up correctly before enabling HA. In case you haven't already, I recommend that you first enable security, test that all HDFS roles start up correctly and then enable HA.

AppFabric Error ERRCA0017 SubStatus ES0006

Just installed Windows Server AppFabric 1.1 on my Windows 7 box and I'm trying to write some code in a console application against the cache client API as part of an evaluation of AppFabric. My app.config looks like the following:
<configSections>
<section name="dataCacheClient" type="Microsoft.ApplicationServer.Caching.DataCacheClientSection, Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" allowLocation="true" allowDefinition="Everywhere" />
</configSections>
<dataCacheClient>
<hosts>
<host name="pa-chouse2" cachePort="22233" />
</hosts>
</dataCacheClient>
I created a new cache and added my domain user account as an allowed client account using the Powershell cmdlet Grant-CacheAllowedClientAccount. I'm creating a new DataCache instance like so:
using (DataCacheFactory cacheFactory = new DataCacheFactory())
{
this.cache = cacheFactory.GetDefaultCache();
}
When I call DataCache.Get, I end up with the following exception:
ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.)
I'd be very grateful if anyone could point out what I'm missing to get this working.
Finally figured out what my problem was after nearly ripping out what little hair I have left. Initially, I was creating my DataCache like so:
using (DataCacheFactory cacheFactory = new DataCacheFactory())
{
this.cache = cacheFactory.GetDefaultCache();
}
It turns out that DataCache didn't like having the DataCacheFactory created it disposed of. Once I refactored the code so that my DataCacheFactory stayed in scope as long as I needed my DataCache, it worked as expected.