ideal aws instance type for neo4j database - amazon-web-services

neo4j 3.2 version is installed aws Linux ec2 instance. I am checking for the ideal instance type.
There are different projects persisted in the neo4j graph and each project can have around 50k nodes, a million relationships, and around 100k properties. A single save operation can have up to 25k nodes, 75k relations, and 10k properties. i tried out i2,d2,r largest instance type. everything is taking almost a minute or more for every "save" operation. In my windows operating system, max time it takes is 20 seconds.
what would be the ideal instance type in this scenario?
and i also doubt whether it has something to do with the neo4j.conf file. there are so many jvm paramaters mentioned in neo4j.conf in Linux aws. none of them are specified in my windows system.
#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************
# The name of the database to mount
#dbms.active_database=graph.db
# Paths of directories in the installation.
dbms.directories.data=/var/lib/neo4j/data
dbms.directories.plugins=/var/lib/neo4j/plugins
dbms.directories.certificates=/var/lib/neo4j/certificates
dbms.directories.logs=/var/log/neo4j
dbms.directories.lib=/usr/share/neo4j/lib
dbms.directories.run=/var/run/neo4j
# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=/var/lib/neo4j/import
# Whether requests to Neo4j are authenticated.
# To disable authentication, uncomment this line
#dbms.security.auth_enabled=false
# Enable this to be able to upgrade a store from an older version.
#dbms.allow_upgrade=true
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
# dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=512m
# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
#dbms.memory.pagecache.size=10g
#*****************************************************************
# Network connector configuration
#*****************************************************************
# With default configuration Neo4j only accepts local connections.
# To accept non-local connections, uncomment this line:
dbms.connectors.default_listen_address=0.0.0.0
# You can also choose a specific network interface, and configure a non-default
# port for each connector, by setting their individual listen_address.
# The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or
# it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for
# individual connectors below.
#dbms.connectors.default_advertised_address=localhost
# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.
# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
#dbms.connector.bolt.listen_address=:7687
# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=:7474
# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473
# Number of Neo4j worker threads.
#dbms.threads.worker_count=
#*****************************************************************
# SSL system configuration
#*****************************************************************
# Names of the SSL policies to be used for the respective components.
# The legacy policy is a special policy which is not defined in
# the policy configuration section, but rather derives from
# dbms.directories.certificates and associated files
# (by default: neo4j.key and neo4j.cert). Its use will be deprecated.
# The policies to be used for connectors.
#
# N.B: Note that a connector must be configured to support/require
# SSL/TLS for the policy to actually be utilized.
#
# see: dbms.connector.*.tls_level
#bolt.ssl_policy=legacy
#https.ssl_policy=legacy
#*****************************************************************
# SSL policy configuration
#*****************************************************************
# Each policy is configured under a separate namespace, e.g.
# dbms.ssl.policy.<policyname>.*
#
# The example settings below are for a new policy named 'default'.
# The base directory for cryptographic objects. Each policy will by
# default look for its associated objects (keys, certificates, ...)
# under the base directory.
#
# Every such setting can be overriden using a full path to
# the respective object, but every policy will by default look
# for cryptographic objects in its base location.
#
# Mandatory setting
#dbms.ssl.policy.default.base_directory=certificates/default
# Allows the generation of a fresh private key and a self-signed
# certificate if none are found in the expected locations. It is
# recommended to turn this off again after keys have been generated.
#
# Keys should in general be generated and distributed offline
# by a trusted certificate authority (CA) and not by utilizing
# this mode.
#dbms.ssl.policy.default.allow_key_generation=false
# Enabling this makes it so that this policy ignores the contents
# of the trusted_dir and simply resorts to trusting everything.
#
# Use of this mode is discouraged. It would offer encryption but no security.
#dbms.ssl.policy.default.trust_all=false
# The private key for the default SSL policy. By default a file
# named private.key is expected under the base directory of the policy.
# It is mandatory that a key can be found or generated.
#dbms.ssl.policy.default.private_key=
# The private key for the default SSL policy. By default a file
# named public.crt is expected under the base directory of the policy.
# It is mandatory that a certificate can be found or generated.
#dbms.ssl.policy.default.public_certificate=
# The certificates of trusted parties. By default a directory named
# 'trusted' is expected under the base directory of the policy. It is
# mandatory to create the directory so that it exists, because it cannot
# be auto-created (for security purposes).
#
# To enforce client authentication client_auth must be set to 'require'!
#dbms.ssl.policy.default.trusted_dir=
# Client authentication setting. Values: none, optional, require
# The default is to require client authentication.
#
# Servers are always authenticated unless explicitly overridden
# using the trust_all setting. In a mutual authentication setup this
# should be kept at the default of require and trusted certificates
# must be installed in the trusted_dir.
#dbms.ssl.policy.default.client_auth=require
# A comma-separated list of allowed TLS versions.
# By default TLSv1, TLSv1.1 and TLSv1.2 are allowed.
#dbms.ssl.policy.default.tls_versions=
# A comma-separated list of allowed ciphers.
# The default ciphers are the defaults of the JVM platform.
#dbms.ssl.policy.default.ciphers=
#*****************************************************************
# Logging configuration
#*****************************************************************
# To enable HTTP logging, uncomment this line
#dbms.logs.http.enabled=true
# Number of HTTP logs to keep.
#dbms.logs.http.rotation.keep_number=5
# Size of each HTTP log that is kept.
#dbms.logs.http.rotation.size=20m
# To enable GC Logging, uncomment this line
#dbms.logs.gc.enabled=true
# GC Logging Options
# see http://docs.oracle.com/cd/E19957-01/819-0084-10/pt_tuningjava.html#wp57013 for more information.
#dbms.logs.gc.options=-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution
# Number of GC logs to keep.
#dbms.logs.gc.rotation.keep_number=5
# Size of each GC log that is kept.
#dbms.logs.gc.rotation.size=20m
# Size threshold for rotation of the debug log. If set to zero then no rotation will occur. Accepts a binary suffix "k",
# "m" or "g".
#dbms.logs.debug.rotation.size=20m
# Maximum number of history files for the internal log.
#dbms.logs.debug.rotation.keep_number=7
#*****************************************************************
# Miscellaneous configuration
#*****************************************************************
# Enable this to specify a parser other than the default one.
#cypher.default_language_version=3.0
# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
#dbms.security.allow_csv_import_from_file_urls=true
# Retention policy for transaction logs needed to perform recovery and backups.
dbms.tx_log.rotation.retention_policy=1 days
# Enable a remote shell server which Neo4j Shell clients can log in to.
#dbms.shell.enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
#dbms.shell.host=127.0.0.1
# The port the shell will listen on, default is 1337.
#dbms.shell.port=1337
# Only allow read operations from this Neo4j instance. This mode still requires
# write access to the directory for lock purposes.
#dbms.read_only=false
# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#dbms.unmanaged_extension_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged
#********************************************************************
# JVM Parameters
#********************************************************************
# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC
# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
#dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch
# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
dbms.jvm.additional=-XX:+DisableExplicitGC
# Remote JMX monitoring, uncomment and adjust the following lines as needed. Absolute paths to jmx.access and
# jmx.password files are required.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/8/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
# http://docs.oracle.com/javase/8/docs/technotes/guides/management/security-windows.html
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.port=3637
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.authenticate=true
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.ssl=false
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.password.file=/absolute/path/to/conf/jmx.password
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.access.file=/absolute/path/to/conf/jmx.access
# Some systems cannot discover host name automatically, and need this line configured:
#dbms.jvm.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME
# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
# using this configuration file has been installed as a service.
# Please uninstall the service before modifying this section. The
# service can then be reinstalled.
# Name of the service
dbms.windows_service_name=neo4j
#********************************************************************
# Other Neo4j system properties
#********************************************************************
dbms.jvm.additional=-Dunsupported.dbms.udc.source=rpm

According to your specified configurations and the give recommended system requirements in documentation at the official neo4j site I would suggest these instances:
1) If your application is burstable in nature, I would recommend Burstable Performance Instances like t2.xlarge(16 GB RAM, Intel Xeon family processor, moderate Networking Performance) or t2.2xlarge (32 GB RAM, Intel Xeon family processor, moderate Networking Performance).
2) If your application is general purpose and requires high configuration I would recommend:
If cost is a concern- m3.xlarge(4 vCPU, 15 GB RAM, Intel Xeon E5-2670 v2, High Networking Performance, 2 x 40 SSD), m3.2xlarge(8 vCPU, 30 GB RAM, Intel Xeon E5-2670 v2, High Networking Performance, 2 x 80 SSD)
or
If you want the best and the latest generation of General Purpose Instances- m4.xlarge(4 vCPU, 16 GB RAM, Intel Xeon E5-2676 v3, High Networking Performance, EBS Only storage), m4.2xlarge(8 vCPU, 32 GB RAM, Intel Xeon E5-2676 v3, High Networking Performance, EBS Only storage).
For more information on instance types you can refer here.

finally figured out.
For neo4j graph db, ubuntu OS they prefer. Regarding instance type,
one with SSD storage is preferable. So we can go with M3.2xlarge.
Remember to configure IOPS which will increase the throughput. set heap & cache memory if required.
Expand EBS Volume
Modify EBS volume

Related

DMS Task fail for EC2 oracle as source

Last failure message
Last Error Endpoint initialization failed. Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2859] [1020401] Cannot retrieve Oracle archived Redo log destination ids; Failed to set stream position on context 'now'; Error executing command; Stream component failed at subtask 0, component
Is your backup on? If your backup retention is set to 0, then your backup is off, thus disabling your archive logs. Please check whether the archive log is enabled at target or not
I have the same error on a EC2 instance, the archived mode is on and the retention policy is:
using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name MYDB are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/oracle/db/1900/dbs/snapcf_MYDB.f'; # default

Should django health-check endpoint /ht/ be accessible from everybody?

From the documentation reported here I read
This project checks for various conditions and provides reports when
anomalous behavior is detected.The following health checks are bundled
with this project: cache, database, storage, disk and memory
utilization (viapsutil), AWS S3 storage, Celery task queue, Celery
ping, RabbitMQ, Migrations
and from use case section
The primary intended use case is to monitor conditions via HTTP(S),
with responses available in HTML and JSONformats. When you get back a
response that includes one or more problems, you can then decide the
appropriate courseof action, which could include generating
notifications and/or automating the replacement of a failing node with
a newone
And then
The /ht/ endpoint will respond aHTTP 200 if all checks passed and a HTTP
500 if any of the tests failed.
From a security point of view: should this url (https://example.com/ht) be reachable from everybody? It seems to give away different information.

Prometehus target Invalid Urls syntax

Im trying to add a target I configured in an AWS EC2 container
GNU nano 2.5.3 File: prometheus.yml
# my global config
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s # By default, scrape targets every 15 seconds.
# scrape_timeout is set to the global default (10s).
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first.rules"
# - "second.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'sit'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:8000','localhost:9100','localhost:9125', 'localhost:9102', 'localhost:8125', 'http://test-elasticloa-y0avx674hv7dr7x-1495584279.us-west-2.elb.amazonaws.com/prometheus']
However Im getting an error message arguing that my URL is not valid. How can I get the correct syntax for my URL?
ERRO[0000] Error loading config: couldn't load configuration (-config.file=prometheus.yml): "http://test-elasticloa-y0avx674hv7dr7x-1495584279.us-west-2.elb.amazonaws.com/prometheus" is not a valid hostname source=main.go:149
looking at the code, it appears you can't have forward slashes in there - try removing that.
What you want here is to set the metrics_path to /prometheus for this target, though it'd be better if it served on the standard /metrics in the first place.
To give a bit of a history, it used to be the case that all addresses were full URLs. About two years ago this was changed so that addresses are just host:port to keep things cleaner. This error comes from aiding that transition.

Enabling HA namenodes on a secure cluster in Cloudera Manager fails

I am running a CDH4.1.2 secure cluster and it works fine with the single namenode+secondarynamenode configuration, but when I try to enable High Availability (quorum based) from the Cloudera Manager interface it dies at step 10 of 16, "Starting the NameNode that will be transitioned to active mode namenode ([my namenode's hostname])".
Digging into the role log file gives the following fatal error:
Exception in namenode joinjava.lang.IllegalArgumentException: Does not contain a valid host:port authority: [my namenode's fqhn]:[my namenode's fqhn]:0 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:206) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:158) at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147) at
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:143) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:547) at
org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:480) at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:443) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608) at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589) at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140) at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
How can I resolve this?
It looks like you have two problems:
The NameNode's IP address is resolving to "my namenode's fqhn" instead of a regular hostname. Check your /etc/hosts file to fix this.
You need to configure dfs.https.port. With Cloudera Manager free edition, you must have had to add the appropriate configs to the safety valves to enable security. As part of that, you need to configure the dfs.https.port.
Given that this code path is traversed even in the non-HA mode, I'm surprised that you were able to get your secure NameNode to start up correctly before enabling HA. In case you haven't already, I recommend that you first enable security, test that all HDFS roles start up correctly and then enable HA.

Setting Tomcat 7 sessionid and value to be identified via Hardware Load Balancing for session affinity

Although easily done from my perspective with IIS, I'm a total noob to Tomcat and have no idea how to set static values for cookie contents. Yes I've read the security implications and eventually will access via SSL so I'm not concerned. Plus I've read the Servlet 3.0 spec about not changing the value and I accept that.
In IIS I would simply set a HTTP Header named Set-Cookie with an arbitrary setting of WebServerSID and a value of 1001.
Then in the load balancer VIP containing this group of real servers, set the value WebServerSID at the VIP level, and for the first web server a cookie value of 1001 and so one for the remaining machines 1002 for server 2, 1003 for server 3.
This achieves session affinity via cookies until the client closes the browser.
How can this be done with Tomcat 7.0.22?
I see a great deal of configuration changes have occurred between Tomcat 6.x and 7.x with regard to cookies and how they're set up. I've tried the following after extensive research
over the last week.
In web.xml: (this will disable URL rewriting under Tomcat 7.x)
<tracking-mode>COOKIE</tracking-mode> under the default session element
In context.xml: (cookies is true by default but I was explicit as I can't get it working)
cookies=true
sessionCookiePath=/
sessionCookieName=WebServerSID
sessionCookieName=1001
I have 2 entries in context.xml for sessionCookieName because the equivalent commands from Tomcat 6.x look like they've been merged into 1.
See http://tomcat.apache.org/migration-7.html#Tomcat_7.0.x_configuration_file_differences
Extract:
org.apache.catalina.SESSION_COOKIE_NAME system property: This has been removed. An equivalent effect can be obtained by configuring the sessionCookieName attribute for the global context.xml (in CATALINA_BASE/conf/context.xml).
org.apache.catalina.SESSION_PARAMETER_NAME system property: This has been removed. An equivalent effect can be obtained by configuring the sessionCookieName attribute for the global context.xml (in CATALINA_BASE/conf/context.xml).
If this is not right then I simply do not understand the syntax that is required and I cannot find anywhere that will simply spell it out in plain black and white.
Under Tomcat 6.x, I would have used Java Options in the config like:
-Dorg.apache.catalina.SESSION_COOKIE_NAME=WebServerSID
-Dorg.apache.catalina.SESSION_PARAMETER_NAME=1001
The application I'm using does not have any of these values set elsewhere so it's not the application.
All these settings are in context/web/server.xml files at the Catalina base
At the end of the day what I need to see in the response headers under Set-Cookies: (as seen using Fiddler) is:
WebServerSID=1001
NOT
JSESSIONID=as8sd9787ksjds9d8sdjks89s898
thanks in advance
regards
The best you can do purely with configuration is to set the jvmRoute attribute of the Engine which will add the constant value to the end of the session ID. Most load-balancers can handle that. It would look like:
JSESSIONID=as8sd9787ksjds9d8sdjks89s898.route1
If that isn't good enough and you need WebServerSID=1001 you'll have to write a ServletFilter and configure that to add the header on every response.