Neo4J online backup error on AWS - Failed to run a backup using the available strategies - amazon-web-services

I'm testing neo4j enterprise 3.3.3 on AWS and trying to run an online backup on a db, which is located on a different server.
I run on my AWS instance:
neo4j-admin backup --backup-dir=~/backup --name=graph.db-backup --from=0.0.0.0:4444
where I change 0.0.0.0 for my open IP for the external neo4j db and 4444 for my port.
But then I get this error:
Failed to load private key: /var/lib/neo4j/certificates/neo4j.key
UPDATE
I fixed that by running the command with sudo (on Amazon AWS).
However, now I'm getting another error:
Failed to run a backup using the available strategies.
The documentation on backups says that you only need to uncomment some settings in neo4j.conf, which is what I've done, both on the server which is being backed up and the one that is actually running backup.
Could it be that the issue is because on AWS you have to run commands with
systemctl
And if so, how do I run neo4-admin with it?
It doesn't work if I use
systemctl neo4j-admin ...
Somebody from Neo4J — can you please help? Backup is one of the main reason to get the Enterprise version but there is not enough documentation on how to use it.

Related

AWS Docker Interpreter with Pycharm

I'm having difficulty to set this up correctly, and burning through AWS server time while I try to make it work. I have segmentation code that is heavily memory intensive that I'd like to temporarily spin up an AWS server with 192GB of ram. I understand that this is possible using docker, but the instructions on pycharm are non-existent with respect to the docker instructions necessary to tie it together (it references existing code as opposed to showing how to assemble it from scratch). What would the docker run command on the server look like to enable a connection to the 2375 port?
EDIT: I am using Pycharm Professional
UPD: Checking PyCharm options I found that there is an option to use Docker Machines. This seem to be exactly what you need. With Docker Machines you can make Docker spin up an EC2 instance for you with proper security out-of-the-box. Read official documentation on how to get started here and AWS driver options to learn how to set EC2 instance type, AMI, and other options here .
Original post:
To enable this feature you have to run Docker daemon with '-H' option:
sudo dockerd -H tcp://0.0.0.0:2375
You may read more on that in the Docker docs: https://docs.docker.com/engine/reference/commandline/dockerd/ .
Beware though, for EC2 you may also need to open that port using security group https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html .
I also want to add that what you want to achieve isn't good from security perspective. Exposing docker socket like that is like an invitation for bad guys to throw a party at your EC2 instance. But since you mentioned that this is temporary...

Problem loading a Neo4J database from a dump on AWS

I made a dump of my local Neo4J database and then launched an AWS instance with Neo4J and I'm trying to run that same database there. Both are versions 3.5.14.
When I launch the local Neo4j AWS instance after it's just set up, it's launches fine and is accessible via bolt.
However, then I stop that instance using systemctl stop neo4j and attempt to load my dumped database onto AWS using the command
neo4j-admin load ....
(which works)
However, when I launch the database again after using systemctl start neo4j it just stops and restarts and cannot launch.
The datastore should not be upgraded as it's the same version, so I don't see what the problem may be.
Just in case, I uncommented the datastore upgrade in
/etc/neo4j/neo4j.template
But that doesn't help either.
Could you please advise me what else I could do?

Google Cloud virtual instance: Chrome remote desktop indicates remote computer is offline, however Google Cloud Platform shows instance is running

I am running a virtual machine in Google Cloud. I have installed the default Debian OS, and configured the desktop environment for remote connection, as explained here: https://cloud.google.com/solutions/chrome-desktop-remote-on-compute-engine
I have been able to connect to the instance via Chrome Remote Desktop, however periodically I have the problem that the Remote Desktop says the vm instance is online, however if I try to connect to it I get:
Looking at the Google Cloud console, the instance is clearly running. Normally if I restart the instance the problem is solved, however I have processes running on the instance that I do not want to stop.
UPDATE:
Following the advice from Serhii Rohoza I ran
sudo systemctl status chrome-remote-desktop
The status looked normal, listing:
Active: active (exited) since...
I then ran
sudo systemctl restart chrome-remote-desktop
and this solved the problem, I could log into remote desktop again, but it seemed the VM instance had restarted, which is a big problem since I am running processes on it that should not shut down. I guess this is a problem to send to Google Cloud Services support.
UPDATE 2:
I'm still running into this problem. I normally have a Jupyter Notebook running on the VM - this Notebook must keep running. When I saw the message saying that the remote computer is offline, I logged in via ssh and checked if the Jupyter Notebook is running:
jupyter notebook list
This returned:
http://localhost:8888/?token=9110bf40789971b5e252a272e9497039b4f3b45e506348df :: /home/qgenixtech
So the Notebook was running. I then ran:
sudo systemctl restart chrome-remote-desktop
and after that again:
jupyter notebook list
and then it shows no Notebooks running. So the restart command closed down the Notebook (and also all other open windows on the desktop).
UPDATE 3:
I spoke to a support technician at Google. The problem is on the Remote Desktop side, not the virtual machine. According to the technician this is a known problem, by he didn't have a solution for it. He referred me to these two links from Google Support:
https://support.google.com/chrome/thread/10213547?hl=en
https://support.google.com/chrome/thread/3333421?hl=en
The next option for me is to look at something like X2go
To solve your issue you should follow documentation Troubleshooting and check status of the Chrome Remote Desktop service with command:
sudo systemctl status chrome-remote-desktop
and check log messages at /tmp/chrome_remote_desktop_DATE_TIME_*.
To investigate why your VM instance was restarted you should look for some clues at the logs:
Go to Compute Engine -> VM instances -> click on NAME_OF_YOUR_VM -> find section Logs -> click on Stackdriver Logging. More information you can find in the documentation Viewing logs (Classic)
Go to Compute Engine -> VM instances -> click on NAME_OF_YOUR_VM -> find section Logs -> click on Serial port 1 (console). More information in the documentation Viewing Serial Port Output
You can contact with Google Cloud Support as well.
In addition, have a look at the documentation Setting instance availability policies.
same issue. when checking logs i see:
2021-01-05 14:29:38,319:INFO:Starting Xvfb on display :20
xdpyinfo: unable to open display ":20".
2021-01-05 14:29:40,837:INFO:X server is active.
restarting service or even VM doesn't work.
i need to delete connection on "client" and re-auth with /headless link

Puppet agents aren't applying changes from PuppetMaster

We have a deployment in AWS where we have a single PuppetMaster box that services hundreds of other servers within the AWS ecosystem. Starting yesterday, we noticed that puppet changes were not applying to the agents. At first we thought it was only newly provisioned boxes, but now we see that we simply aren't getting any error message on any of the machines where puppet agent runs.
# puppet agent --test --verbose
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for blarg-follower-0e5385bace7e84fe2
Info: Applying configuration version '1529498155'
Notice: Finished catalog run in 0.24 seconds
I have access to the PuppetMaster and have validated that the code there is up to date. Need help figuring out how to get better logging out of this and debugging what is wrong between the agent and the puppet master.
In this case the issue was that our Puppet Master's /etc/puppet/puppet.conf file had been modified, and indeed the agents weren't getting the full catalog from Puppet Master. We found a backup copy of the file, restored it, and we were back in business.

WebHCat on Amazon's EMR?

Is it possible or advisable to run WebHCat on an Amazon Elastic MapReduce cluster?
I'm new to this technology and I was wonder if it was possible to use WebHCat as a REST interface to run Hive queries. The cluster in question is running Hive.
I wasn't able to get it working but WebHCat is actually installed by default on Amazon's EMR instance.
To get it running you have to do the following,
chmod u+x /home/hadoop/hive/hcatalog/bin/hcat
chmod u+x /home/hadoop/hive/hcatalog/sbin/webhcat_server.sh
export TEMPLETON_HOME=/home/hadoop/.versions/hive-0.11.0/hcatalog/
export HCAT_PREFIX=/home/hadoop/.versions/hive-0.11.0/hcatalog/
/home/hadoop/hive/hcatalog/webhcat_server.sh start
You can then confirm that it's running on port 50111 using curl,
curl -i http://localhost:50111/templeton/v1/status
To hit 50111 on other machines you have to open the port up in the EC2 EMR security group.
You then have to configure the users you going to "proxy" when you run queries in hcatalog. I didn't actually save this configuration, but it is outlined in the WebHCat documentation. I wish they had some concrete examples there but basically I ended up configuring the local 'hadoop' user as the one that run the queries, not the most secure thing to do I am sure, but I was just trying to get it up and running.
Attempting a query then gave me this error,
{"error":"Server IPC version 9 cannot communicate with client version
4"}
The workaround was to switch off of the latest EMR image (3.0.4 with Hadoop 2.2.0) and switch to a Hadoop 1.0 image (2.4.2 with Hadoop 1.0.3).
I then hit another issues where it couldn't find the Hive jar properly, after struggling with the configuration more, I decided I had dumped enough time into trying to get this to work and decided to communicate with Hive directly (using RBHive for Ruby and JDBC for the JVM).
To answer my own question, it is possible to run WebHCat on EMR, but it's not documented at all (Googling lead me nowhere which is why I created this question in the first place, it's currently the first hit when you search "WebHCat EMR") and the WebHCat documentation leaves a lot to be desired. Getting it to work seems like a pain, though my hope is that by writing up the initial steps someone will come along and take it the rest of the way and post a complete answer.
I did not test it but, it should be doable.
EMR allows to customise the bootstrap actions, i.e. the scripts run where the nodes are started. You can use bootstrap actions to install additional software and to change the configuration of applications on the cluster
See more details at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html.
I would create a shell script to install WebHCat and test your script on a regular EC2 instance first (outside the context of EMR - just as a test to ensure your script is OK)
You can use EC2's user-data properties to test your script, typically :
#!/bin/bash
curl http://path_to_your_install_script.sh | sh
Then - once you know the script is working - make it available to the cluster on a S3 bucket and follow these instructions to include your script as custom bootstrap action of your cluster.
--Seb