Logstash & Elasticsearch issues - amazon-web-services

Logstash & Elasticsearch issues - amazon-web-services

I need you pro help to figure what is my issue with the ES & Logstash.
I'm using Elasticsearch 1.1.0 and logstash 1.4.0 to push logs to ES & Kibana .
My servers are located in AWS ( Master -> V.Core (4) , ECU (8) , RAM (15 GB) )
Node is the same .
My logstash configuration :
I'm taking the log files from S3 and put the locally in the Server
after that the logstash take them and push them to ES cluster .
Logstash + Kibana + ES Master (Located in 1 server)
those files are something like 12M size and I have more than 20,000 files.
MY ES configuration (Master)
cluster.name: MY-CLUSTER-NAME
node.name: MY-NODE-NAME
node.master: true
node.data: true
path.data: /PATH_TO_DATA/data
path.logs: /PATH_TO_LOGS/logs
ES Configuration (Node)
cluster.name: MY-CLUSTER-NAME (Same name)
node.name: MY-NODE-NAME (Another name)
node.master: false (This is node not master)
node.data: true
path.data: /PATH_TO_DATA/data
path.logs: /PATH_TO_LOGS/logs
For checking the cluster status :
http://MASTER_IP:9200/_cluster/health
this is the result :
{
"cluster_name": "es-cluster-onetagv2",
"status": "green",
"timed_out": false,
"number_of_nodes": 2,
"number_of_data_nodes": 2,
"active_primary_shards": 5,
"active_shards": 10,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0
}
my Java version ( Don't know if this is important ) :
java version "1.7.0_51" OpenJDK Runtime Environment
(amzn-2.4.4.1.36.amzn1-x86_64 u51-b02) OpenJDK 64-Bit Server VM (build
24.45-b08, mixed mode)
MY issue is that I'm trying to push more than 400 million hits for 1 day ,
and I can push in 24 hours something like 60 million. so I'm always behind.
I can see also that the ES taking 100% CPU USAGE.
but I don't know if this is the problem.
Maybe you can guide me what I'm doing wrong and how I can push big logs to ES fast.

Split the input file and give each file different name so that multiple inputs work on it, which helps logstash to read multiple files parallelly by different nodes.
Please increase the number of ES DataNodes to 4, and 2 ES Masters. This can be achieved by running two logstash instances( provides 2 Masters and 2 DataNodes) and 2 Elasticsearch Nodes(provides 2 Data Nodes).

Related

How to optimally set spark driver properties in YARN

I am trying out various options for setting spark driver memory in yarn.
Use Case:
I have a spark cluster with 1 master and 2 slaves
master : r5d xlarge - 8 vcore, 32GB
slave : r5d xlarge - 8 vcore, 32GB
I am using Apache Zeppelin to run the queries on spark cluster. Spark interpreter is configured with properties provided by Zeppelin. I am using spark 2.3.1 running on YARN. I want to create 4 interpreters so that 4 users can parallelly use this cluster.
Config 1:
spark.submit.deployMode client
spark.driver.cores 7
spark.driver.memory 24G
spark.driver.memoryOverhead 3072M
spark.executor.cores 1
spark.executor.memory 3G
spark.executor.memoryOverhead 512M
spark.yarn.am.cores 1
spark.yarn.am.memory 3G
spark.yarn.am.memoryOverhead 512M
Below is the spark executor UI:
Config 2:
spark.submit.deployMode client
spark.driver.cores 7
spark.driver.memory 12G
spark.driver.memoryOverhead 3072M
spark.executor.cores 1
spark.executor.memory 3G
spark.executor.memoryOverhead 512M
spark.yarn.am.cores 1
spark.yarn.am.memory 3G
spark.yarn.am.memoryOverhead 512M
Below is the spark executor UI:
Questions:
Why is the container size of driver 0?
Is the spark.memory.fraction calculated as (spark.driver.memory-300)*0.6 ? If so, why is it not exact ? (14.22, 7.02 resp)
Why is the container size of executor 3.8 GB ? According to my configuration, it should be 3G + 512M = 3.5 GB. This issue was not there with spark 2.1
No of VCores available to YARN is 8 per node. How is this possible since AWS gives vCPU with their instances? Hence I should only be getting 4 VCores according to AWS.
https://aws.amazon.com/ec2/instance-types/r5/
If I want to use 4 interpreters, should I distribute 32 GB of master equally to all the interpreters?
Driver:
spark.driver.cores 2
spark.driver.memory 7G
spark.driver.memoryOverhead 1024M

Intermittent DNS issues while pulling docker image from ECR repository

Has anyone facing this issue with docker pull. we recently upgraded docker to 18.03.1-ce from then we are seeing the issue. Although we are not exactly sure if this is related to docker, but just want to know if anyone faced this problem.
We have done some troubleshooting using tcp dump the DNS queries being made were under the permissible limit of 1024 packet. which is a limit on EC2, We also tried working around the issue by modifying the /etc/resolv.conf file to use a higher retry \ timeout value, but that didn't seem to help.
we did a packet capture line by line and found something. we found some responses to be negative. If you use Wireshark, you can use 'udp.stream eq 12' as a filter to view one of the negative answers. we can see the resolver sending an answer "No such name". All these requests that get a negative response use the following name in the request:
354XXXXX.dkr.ecr.us-east-1.amazonaws.com.ec2.internal
Would anyone of you happen to know why ec2.internal is being adding to the end of the DNS? If run a dig against this name it fails. So it appears that a wrong name is being sent to the server which responds with 'no such host'. Is docker is sending a wrong dns name for resolution.
We see this issue happening intermittently. looking forward for help. Thanks in advance.
Expected behaviour
5.0.25_61: Pulling from rrg
Digest: sha256:50bbce4af6749e9a976f0533c3b50a0badb54855b73d8a3743473f1487fd223e
Status: Downloaded newer image forXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Actual behaviour
docker-compose up -d rrg-node-1
Creating rrg-node-1
ERROR: for rrg-node-1 Cannot create container for service rrg-node-1: Error response from daemon: Get https:/XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/v2/: dial tcp: lookup XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com on 10.5.0.2:53: no such host
Steps to reproduce the issue
docker pull XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Output of docker version:
(Docker version 18.03.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3)
Output of docker info:
([ec2-user#ip-10-5-3-45 ~]$ docker info
Containers: 37
Running: 36
Paused: 0
Stopped: 1
Images: 60
Server Version: swarm/1.2.5
Role: replica
Primary: 10.5.4.172:3375
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 12
Plugins:
Volume:
Network:
Log:
Swarm:
NodeID:
Is Manager: false
Node Address:
Kernel Version: 4.14.51-60.38.amzn1.x86_64
Operating System: linux
Architecture: amd64
CPUs: 22
Total Memory: 80.85GiB
Name: mgr1
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Live Restore Enabled: false
WARNING: No kernel memory limit support)

h2o Flow UI not working

I recently upgraded to H2O-3.11.0.3820. My web based flow ui is not working.
When I go to the link, i get a light blue colored screen with all the options that I used to get earlier, missing.
Unable to find anything relevant on stackoverflow. Any one else facing similar issue? Any help will be much appreciated!
Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
; OpenJDK 64-Bit Server VM (Zulu 8.17.0.3-win64) (build 25.102-b14, mixed mode)
Starting server from C:\Users\shekh\Anaconda2\lib\site-packages\h2o\backend\bin\h2o.jar
Ice root: c:\users\shekh\appdata\local\temp\tmpvrxqvd
JVM stdout: c:\users\shekh\appdata\local\temp\tmpvrxqvd\h2o_shekh_started_from_python.out
JVM stderr: c:\users\shekh\appdata\local\temp\tmpvrxqvd\h2o_shekh_started_from_python.err
Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.
-------------------------- ------------------------------
H2O cluster uptime: 03 secs
H2O cluster version: 3.11.0.3820
H2O cluster version age: 5 days
H2O cluster name: H2O_from_python_shekh_vdbwfl
H2O cluster total nodes: 1
H2O cluster free memory: 51.56 Gb
H2O cluster total cores: 0
H2O cluster allowed cores: 0
H2O cluster status: accepting new members, healthy
H2O connection url: http://127.0.0.1:54321
H2O connection proxy:
H2O internal security: False
Python version: 2.7.12 final
-------------------------- ------------------------------

As you discovered, install a stable release of H2O-3.
Nightly builds are number 3.ODD.y.z
Stable builds are numbered 3.EVEN.y.z
(So 3.11.0.3820 is a random bleeding edge build, not a stable build.)

Unable to create kafka topic

I am trying to create a kafka topic on ec2 instance,
i am following this documentation https://aws.amazon.com/blogs/big-data/real-time-stream-processing-using-apache-spark-streaming-and-apache-kafka-on-aws/
but i am getting the following error please help
ec2-user#ip-10-100-53-218 bin]$ ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Error while executing topic command : replication factor: 1 larger than available brokers: 0
[2017-03-20 12:25:30,045] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: replication factor: 1 larger than available brokers: 0
(kafka.admin.TopicCommand$)

The kafka broker is not running. SSH into the Kafka Broker instance and check if kafka-server-start.sh is running.
ps -ef | grep kafka-server-start
If not running, start it.
nohup /app/kafka/kafka_2.9.2-0.8.2.1/bin/kafka-server-start.sh /app/kafka/kafka_2.9.2-0.8.2.1/config/server.properties

Increasing capacity of Micro Cloud Foundry 1.2 in offline mode

I would like to do some real world data testing on micro cloud foundry but the capacity of postgres database is limited to 256MB which is not sufficient for my testing. Is there a way to increase the db capacity temporarily in offline mode for testing?
If not, can somebody point me the latest instructions of setting up private cloud foundry server on Ubuntu Server 12.04

you can ssh in to the instance, change the configuration for mysql and restart the service.
SSH to the MCF instance :
$ ssh vcap#api.<your-mcf-instance-name-here>.cloudfoundry.me
*note - if you can't remember the password for the vcap user, you can change this via the vm console menu by selecting option 3.
Edit the mysql-node configuration file :
$ vi /var/vcap/jobs/mysql_node/config/mysql_node.yml
the file should look something or exactly like this :
---
local_db: sqlite3:/var/vcap/store/mysql_node.db
base_dir: /var/vcap/store/mysql
mbus: nats://nats:f5dc63f74be5e38f#127.0.0.1:4222
index: 0
logging:
level: debug
file: /var/vcap/sys/log/mysql_node/mysql_node.log
pid: /var/vcap/sys/run/mysql_node/mysql_node.pid
available_storage: 2048
node_id: mysql_node_1
max_db_size: 256
max_long_query: 3
mysql:
host: localhost
port: 3306
socket: /var/vcap/sys/run/mysqld/mysqld.sock
user: root
pass: dc64fad710976ea5
migration_nfs: /var/vcap/services_migration
max_long_tx: 0
max_user_conns: 20
mysqldump_bin: /var/vcap/packages/mysql/bin/mysqldump
mysql_bin: /var/vcap/packages/mysql/bin/mysql
gzip_bin: /bin/gzip
ip_route: 127.0.0.1
z_interval: 30
max_nats_payload: 1048576
The two lines you are interested in are;
available_storage: 2048
and
max_db_size: 256
The first line is the maximum available amount of disk storage made available to MySQL, the second is the maximum size per mysql db instance. Set these to your desired values, obviously available_storage has to be large than max_db_size and also a multiple of that value.
Save the file and then restart the VM (shut it down via the menu in the VM console or do it via SSH) and you should be good to go!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js