BOSH Director Installation fails on vSphere

BOSH Director Installation fails on vSphere - cloud-foundry

This is my first BOSH installation for PKS.
Environment:
vSphere 6.5 with VCSA 6.5u2,
OpsMgr 2.2 build 296
bosh stemcell vsphere-ubuntu-trusty build 3586.25
Using a flat 100.x network, no routing/firewall involved.
Summary - After deploying the OpsMgr OVF template, I'm configuring and installing BOSH Director.
However, it fails at "Waiting for Agent" in the dashboard.
A look at the 'current' log in the OpsMgr VM shows that it keeps trying to read settings from /dev/sr0, because the agent.json specifies settings Source as CDROM.
It cannot find any CDROM, so it fails.
A few questions:
How do I login to the VM that BOSH creates when I change the
setting to "default BOSH password" for all VMs in Ops Mgr?
There is no bosh.yml under
/var/tempest/workspaces/default/deployments.
Some docs point to it. So I don't know what settings its applying. Is
the location wrong?
Is there a way to change the stemcell used by the OpsMgr VM? Maybe I cantry
using the previous build?
How is the agent.json actually populated?
Any suggestions on troubleshooting this?
All logs/jsons below:
the GUI dashboard log:
===== 2018-07-30 08:20:52 UTC Running "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml"
Deployment manifest: '/var/tempest/workspaces/default/deployments/bosh.yml'
Deployment state: '/var/tempest/workspaces/default/deployments/bosh-state.json'
Started validating
Validating release 'bosh'... Finished (00:00:00)
Validating release 'bosh-vsphere-cpi'... Finished (00:00:00)
Validating release 'uaa'... Finished (00:00:00)
Validating release 'credhub'... Finished (00:00:01)
Validating release 'bosh-system-metrics-server'... Finished (00:00:01)
Validating release 'os-conf'... Finished (00:00:00)
Validating release 'backup-and-restore-sdk'... Finished (00:00:04)
Validating release 'bpm'... Finished (00:00:02)
Validating cpi release... Finished (00:00:00)
Validating deployment manifest... Finished (00:00:00)
Validating stemcell... Finished (00:00:14)
Finished validating (00:00:26)
Started installing CPI
Compiling package 'ruby-2.4-r4/0cdc60ed7fdb326e605479e9275346200af30a25'... Finished (00:00:00)
Compiling package 'vsphere_cpi/e1a84e5bd82eb1abfe9088a2d547e2cecf6cf315'... Finished (00:00:00)
Compiling package 'iso9660wrap/82cd03afdce1985db8c9d7dba5e5200bcc6b5aa8'... Finished (00:00:00)
Installing packages... Finished (00:00:15)
Rendering job templates... Finished (00:00:06)
Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:23)
Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3586.25'... Skipped [Stemcell already uploaded] (00:00:00)
Started deploying
Waiting for the agent on VM 'vm-87b3299a-a994-4544-8043-032ce89d685b'... Failed (00:00:11)
Deleting VM 'vm-87b3299a-a994-4544-8043-032ce89d685b'... Finished (00:00:10)
Creating VM for instance 'bosh/0' from stemcell 'sc-536fea79-cfa6-46a9-a53e-9de19505216f'... Finished (00:00:12)
Waiting for the agent on VM 'vm-fb90eee8-f3ac-45b7-95d3-4e8483c91a5c' to be ready... Failed (00:09:59)
Failed deploying (00:10:38)
Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)
Deploying:
Creating instance 'bosh/0':
Waiting until instance is ready:
Post https://vcap:<redacted>#192.168.100.201:6868/agent: dial tcp 192.168.100.201:6868: connect: no route to host
Exit code 1
===== 2018-07-30 08:32:20 UTC Finished "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml"; Duration: 688s; Exit Status: 1
Exited with 1.
The bosh_state.json
ubuntu#opsmanager-2-2:~$ sudo cat /var/tempest/workspaces/default/deployments/bosh-state.json
{
"director_id": "851f70ef-7c4b-4c65-73ed-d382ad3df1b7",
"installation_id": "f29df8af-7141-4aff-5e52-2d109a84cd84",
"current_vm_cid": "vm-87b3299a-a994-4544-8043-032ce89d685b",
"current_stemcell_id": "dcca340c-d612-4098-7c90-479193fa9090",
"current_disk_id": "",
"current_release_ids": [],
"current_manifest_sha": "",
"disks": null,
"stemcells": [
{
"id": "dcca340c-d612-4098-7c90-479193fa9090",
"name": "bosh-vsphere-esxi-ubuntu-trusty-go_agent",
"version": "3586.25",
"cid": "sc-536fea79-cfa6-46a9-a53e-9de19505216f"
}
],
"releases": []
The agent.json
ubuntu#opsmanager-2-2:~$ sudo cat /var/vcap/bosh/agent.json
{
"Platform": {
"Linux": {
"DevicePathResolutionType": "scsi"
}
},
"Infrastructure": {
"Settings": {
"Sources": [
{
"Type": "CDROM",
"FileName": "env"
}
]
}
}
}
ubuntu#opsmanager-2-2:~$
Finally, the current BOSH log
/var/vcap/bosh/log/current
2018-07-30_08:42:22.69934 [main] 2018/07/30 08:42:22 DEBUG - Starting agent
2018-07-30_08:42:22.69936 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/agent.json
2018-07-30_08:42:22.69937 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69937 ********************
2018-07-30_08:42:22.69938 {
2018-07-30_08:42:22.69938 "Platform": {
2018-07-30_08:42:22.69939 "Linux": {
2018-07-30_08:42:22.69939
2018-07-30_08:42:22.69939 "DevicePathResolutionType": "scsi"
2018-07-30_08:42:22.69939 }
2018-07-30_08:42:22.69939 },
2018-07-30_08:42:22.69939 "Infrastructure": {
2018-07-30_08:42:22.69940 "Settings": {
2018-07-30_08:42:22.69940 "Sources": [
2018-07-30_08:42:22.69940 {
2018-07-30_08:42:22.69940 "Type": "CDROM",
2018-07-30_08:42:22.69940 "FileName": "env"
2018-07-30_08:42:22.69940 }
2018-07-30_08:42:22.69941 ]
2018-07-30_08:42:22.69941 }
2018-07-30_08:42:22.69941 }
2018-07-30_08:42:22.69941 }
2018-07-30_08:42:22.69941
2018-07-30_08:42:22.69941 ********************
2018-07-30_08:42:22.69943 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_version
2018-07-30_08:42:22.69944 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69944 ********************
2018-07-30_08:42:22.69944 3586.25
2018-07-30_08:42:22.69944 ********************
2018-07-30_08:42:22.69945 [File System] 2018/07/30 08:42:22 DEBUG - Reading file /var/vcap/bosh/etc/stemcell_git_sha1
2018-07-30_08:42:22.69946 [File System] 2018/07/30 08:42:22 DEBUG - Read content
2018-07-30_08:42:22.69946 ********************
2018-07-30_08:42:22.69946 dbbb73800373356315a4c16ee40d2db3189bf2db
2018-07-30_08:42:22.69947 ********************
2018-07-30_08:42:22.69948 [App] 2018/07/30 08:42:22 INFO - Running on stemcell version '3586.25' (git: dbbb73800373356315a4c16ee40d2db3189bf2db)
2018-07-30_08:42:22.69949 [File System] 2018/07/30 08:42:22 DEBUG - Checking if file exists /var/vcap/bosh/agent_state.json
2018-07-30_08:42:22.69950 [File System] 2018/07/30 08:42:22 DEBUG - Stat '/var/vcap/bosh/agent_state.json'
2018-07-30_08:42:22.69951 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Running command 'bosh-agent-rc'
2018-07-30_08:42:22.70116 [unlimitedRetryStrategy] 2018/07/30 08:42:22 DEBUG - Making attempt #0
2018-07-30_08:42:22.70117 [DelayedAuditLogger] 2018/07/30 08:42:22 DEBUG - Starting logging to syslog...
2018-07-30_08:42:22.70181 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Stdout:
2018-07-30_08:42:22.70182 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Stderr:
2018-07-30_08:42:22.70183 [Cmd Runner] 2018/07/30 08:42:22 DEBUG - Successful: true (0)
2018-07-30_08:42:22.70184 [settingsService] 2018/07/30 08:42:22 DEBUG - Loading settings from fetcher
2018-07-30_08:42:22.70185 [ConcreteUdevDevice] 2018/07/30 08:42:22 DEBUG - Kicking device, attempt 0 of 5
2018-07-30_08:42:22.70187 [ConcreteUdevDevice] 2018/07/30 08:42:22 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:23.20204 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - Kicking device, attempt 1 of 5
2018-07-30_08:42:23.20206 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:23.70217 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - Kicking device, attempt 2 of 5
2018-07-30_08:42:23.70220 [ConcreteUdevDevice] 2018/07/30 08:42:23 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:24.20229 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - Kicking device, attempt 3 of 5
2018-07-30_08:42:24.20294 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:24.70249 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - Kicking device, attempt 4 of 5
2018-07-30_08:42:24.70253 [ConcreteUdevDevice] 2018/07/30 08:42:24 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20317 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20320 [ConcreteUdevDevice] 2018/07/30 08:42:25 ERROR - Failed to red byte from device: open /dev/sr0: no such file or directory
2018-07-30_08:42:25.20321 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Settling UdevDevice
2018-07-30_08:42:25.20322 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Running command 'udevadm settle'
2018-07-30_08:42:25.20458 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Stdout:
2018-07-30_08:42:25.20460 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Stderr:
2018-07-30_08:42:25.20461 [Cmd Runner] 2018/07/30 08:42:25 DEBUG - Successful: true (0)
2018-07-30_08:42:25.20462 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ensuring Device Readable, Attempt 0 out of 5
2018-07-30_08:42:25.20463 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.20464 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:25.70473 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ensuring Device Readable, Attempt 1 out of 5
2018-07-30_08:42:25.70476 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:25.70477 [ConcreteUdevDevice] 2018/07/30 08:42:25 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:26.20492 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ensuring Device Readable, Attempt 2 out of 5
2018-07-30_08:42:26.20496 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:26.20497 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:26.70509 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ensuring Device Readable, Attempt 3 out of 5
2018-07-30_08:42:26.70512 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:26.70513 [ConcreteUdevDevice] 2018/07/30 08:42:26 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.20530 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - Ensuring Device Readable, Attempt 4 out of 5
2018-07-30_08:42:27.20533 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:27.20534 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - Ignorable error from readByte: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70554 [ConcreteUdevDevice] 2018/07/30 08:42:27 DEBUG - readBytes from file: /dev/sr0
2018-07-30_08:42:27.70557 [settingsService] 2018/07/30 08:42:27 ERROR - Failed loading settings via fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70559 [settingsService] 2018/07/30 08:42:27 ERROR - Failed reading settings from file Opening file /var/vcap/bosh/settings.json: open /var/vcap/bosh/settings.json: no such file or directory
2018-07-30_08:42:27.70560 [main] 2018/07/30 08:42:27 ERROR - App setup Running bootstrap: Fetching settings: Invoking settings fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.70561 [main] 2018/07/30 08:42:27 ERROR - Agent exited with error: Running bootstrap: Fetching settings: Invoking settings fetcher: Getting settings from all sources: Reading files from CDROM: Waiting for CDROM to be ready: Reading udev device: open /dev/sr0: no such file or directory
2018-07-30_08:42:27.71258 [main] 2018/07/30 08:42:27 DEBUG - Starting agent
<and this whole block just keeps repeating>

How do I login to the VM that BOSH creates when I change the setting to "default BOSH password" for all VMs in Ops Mgr?
That's not a good idea. The default password is well-known and you should almost always use randomly generated passwords. I'm not honestly sure why that's even an option. The only thing that comes to mind might be some extremely rare troubleshooting scenario.
That said, you can securely obtain the randomly generated password through Ops Manager, if you need to access the VM manually. You can also securely access VMs via bosh ssh, and credentials are handled automatically. Even for troubleshooting, you don't usually need that option.
There is no bosh.yml under /var/tempest/workspaces/default/deployments. Some docs point to it. So I don't know what settings its applying. Is the location wrong?
The location is correct but the file contains sensitive information so Ops Manager deletes it immediately after it's done being used.
If you want to see the contents of the file, the easy way is to navigate to https://ops-man-fqdn/debug/files and you can see all of the configuration files, including your bosh.yml. The hard way is to watch the folder above while a deploy is going on and you'll see the file exist for a short period of time. You can make a copy during that window. The only advantage to the hard way is that you'll get the actual file, whereas the debug endpoint shows a file with sensitive info redacted.
Is there a way to change the stemcell used by the OpsMgr VM? Maybe I cantry using the previous build?
I don't think this is an issue with the stemcell. There are lots of people using those and not having this issue. If a larger issue like this were found with a stemcell, you would see a notice up on Pivotal Network and Pivotal would publish a new, fixed stemcell.
The problem also seems to be with how the VM is receiving it's initial bootstrap configuration. I'd suggest looking into that more before messing with the stemcells. See below.
How is the agent.json actually populated?
Believe it or not, for vSphere environments, that file is read from a fake CD-ROM that's attached to the VM. There's not a lot documented, but it's mentioned briefly in the BOSH docs here.
https://bosh.io/docs/cpi-api-v1-method/create-vm/#agent-settings
Any suggestions on troubleshooting this?
Look to understand why the CD-ROM can't be mounted. BOSH needs that to get it's bootstrap configuration, so you need to make that work. If there is something in your vSphere environment that is preventing the CD-ROM from being mounted, you'll need to modify it to allow the CD-ROM to be mounted.
If there's nothing on the vSphere side, I think the next step would be to check the standard system logs under /var/log and dmesg output to see if there are any errors or clues as to why the CD-ROM can't be loaded/read from.
Lastly, try doing some manual tests to mount & read from the CD-ROM. Start by looking at one of the BOSH deployed VMs in the vSphere client, look at the hardware settings and make sure there is a CD-ROM attached. It should point to a file called env.iso in the same folder as the VM on your datastore. If that's attached & connected, start up the VM and try to mount the CD-ROM. You should be able to see the BOSH config files on that drive.
Hope that helps!

Old thread but maybe it will help someone, there's a firewall in vCenter that will prevent the agent from talking to the Bosh director.

Related

Spark Job Crashes with error in prelaunch.err

We are runing a spark job which runs close to 30 scripts one by one. it usually takes 14-15h to run, but this time it failed in 13h. Below is the details:
Command:spark-submit --executor-memory=80g --executor-cores=5 --conf spark.sql.shuffle.partitions=800 run.py
Setup: Running spark jobs via jenkins on AWS EMR with 16 spot nodes
Error: Since the YARN log is huge (270Mb+), below are some extracts from it:
[2022-07-25 04:50:08.646]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : ermediates/master/email/_temporary/0/_temporary/attempt_202207250435265404741257029168752_0641_m_000599_168147 s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 using algorithm version 1 22/07/25 04:37:05 INFO FileOutputCommitter: Saved output of task 'attempt_202207250435265404741257029168752_0641_m_000599_168147' to s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 22/07/25 04:37:05 INFO SparkHadoopMapRedUtil: attempt_202207250435265404741257029168752_0641_m_000599_168147: Committed 22/07/25 04:37:05 INFO Executor: Finished task 599.0 in stage 641.0 (TID 168147). 9341 bytes result sent to driver 22/07/25 04:49:36 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Driver ip-10-13-52-109.bjw2k.asg:45383 disassociated! Shutting down. 22/07/25 04:49:36 INFO MemoryStore: MemoryStore cleared 22/07/25 04:49:36 INFO BlockManager: BlockManager stopped 22/07/25 04:50:06 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) 22/07/25 04:50:06 ERROR Utils: Uncaught exception in thread shutdown-hook-0 java.lang.InterruptedException

Hyperledger Sawtooth cannot start devmode consensus engine

I am trying to start up a Hyperledger Sawtooth network on Ubuntu 16.04. I am following the instructions of https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide/ubuntu.html.
Starting up the validation service works fine, but starting the devmode consensus engine does not work. The following happened:
mdi#boromir:~$ sudo -u sawtooth devmode-engine-rust -vv --connect tcp://localhost:5050
ERROR | devmode_engine_rust: | ReceiveError: TimeoutError
DEBUG | sawtooth_sdk::messag | Disconnected outbound channel
DEBUG | sawtooth_sdk::messag | Exited stream
DEBUG | zmq:547 | socket dropped
DEBUG | zmq:547 | socket dropped
DEBUG | zmq:454 | context dropped
mdi#boromir:~$
The validation service was running, as follows:
mdi#boromir:~$ sudo -u sawtooth sawtooth-validator -vv
[sudo] password for mdi:
[2019-03-07 16:40:15.601 WARNING (unknown file)] [src/pylogger.rs: 40] Started logger at level DEBUG
[2019-03-07 16:40:15.919 DEBUG ffi] loading library libsawtooth_validator.so
[2019-03-07 16:40:15.926 DEBUG ffi] loading library libsawtooth_validator.so
[2019-03-07 16:40:16.299 INFO path] Skipping path loading from non-existent config file: /etc/sawtooth/path.toml
[2019-03-07 16:40:16.299 INFO validator] Skipping validator config loading from non-existent config file: /etc/sawtooth/validator.toml
[2019-03-07 16:40:16.300 INFO keys] Loading signing key: /etc/sawtooth/keys/validator.priv
[2019-03-07 16:40:16.306 INFO cli] sawtooth-validator (Hyperledger Sawtooth) version 1.1.4
[2019-03-07 16:40:16.307 INFO cli] config [path]: config_dir = "/etc/sawtooth"; config [path]: key_dir = "/etc/sawtooth/keys"; config [path]: data_dir = "/var/lib/sawtooth"; config [path]: log_dir = "/var/log/sawtooth"; config [path]: policy_dir = "/etc/sawtooth/policy"
[2019-03-07 16:40:16.307 WARNING cli] Network key pair is not configured, Network communications between validators will not be authenticated or encrypted.
[2019-03-07 16:40:16.333 DEBUG state_verifier] verifying state in /var/lib/sawtooth/merkle-00.lmdb
[2019-03-07 16:40:16.337 DEBUG state_verifier] block store file is /var/lib/sawtooth/block-00.lmdb
[2019-03-07 16:40:16.338 INFO state_verifier] Skipping state verification: chain head's state root is present
[2019-03-07 16:40:16.339 INFO cli] Starting validator with serial scheduler
[2019-03-07 16:40:16.339 DEBUG core] global state database file is /var/lib/sawtooth/merkle-00.lmdb
[2019-03-07 16:40:16.340 DEBUG core] txn receipt store file is /var/lib/sawtooth/txn_receipts-00.lmdb
[2019-03-07 16:40:16.341 DEBUG core] block store file is /var/lib/sawtooth/block-00.lmdb
[2019-03-07 16:40:16.342 DEBUG threadpool] Creating thread pool executor Component
[2019-03-07 16:40:16.343 DEBUG threadpool] Creating thread pool executor Network
[2019-03-07 16:40:16.343 DEBUG threadpool] Creating thread pool executor Client
[2019-03-07 16:40:16.343 DEBUG threadpool] Creating thread pool executor Signature
[2019-03-07 16:40:16.345 DEBUG threadpool] Creating thread pool executor FutureCallback
[2019-03-07 16:40:16.346 DEBUG threadpool] Creating thread pool executor FutureCallback
[2019-03-07 16:40:16.352 DEBUG threadpool] Creating thread pool executor Executing
[2019-03-07 16:40:16.353 DEBUG threadpool] Creating thread pool executor Consensus
[2019-03-07 16:40:16.353 DEBUG threadpool] Creating thread pool executor FutureCallback
[2019-03-07 16:40:16.358 DEBUG threadpool] Creating thread pool executor Instrumented
[2019-03-07 16:40:16.368 DEBUG selector_events] Using selector: ZMQSelector
[2019-03-07 16:40:16.376 INFO interconnect] Listening on tcp://127.0.0.1:4004
[2019-03-07 16:40:16.377 DEBUG dispatch] Added send_message function for connection ServerThread
[2019-03-07 16:40:16.377 DEBUG dispatch] Added send_last_message function for connection ServerThread
[2019-03-07 16:40:16.382 DEBUG genesis] genesis_batch_file: /var/lib/sawtooth/genesis.batch
[2019-03-07 16:40:16.384 DEBUG genesis] block_chain_id: not yet specified
[2019-03-07 16:40:16.384 INFO genesis] Producing genesis block from /var/lib/sawtooth/genesis.batch
[2019-03-07 16:40:16.385 DEBUG genesis] Adding 1 batches
This output is on time 17:29, so no output has been appended for almost an hour.
I tried to see Sawtooth settings:
mdi#boromir:~$ sawtooth settings list
Error: Unable to connect to "http://localhost:8008": make sure URL is correct
mdi#boromir:~$
And I checked what processes were listening to what ports:
mdi#boromir:~$ netstat -plnt
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:4004 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
mdi#boromir:~$
Does anybody know whether the validator service initiates connection or the consensus engine? What is wrong with my sawtooth settings list command? And does anybody know how to get the consensus engine to work? Thanks.

I found the answers myself. I had another machine with a docker installation of Hyperledger Sawtooth. On that server, the validator log had the line:
[2019-03-08 14:39:02.478 INFO interconnect] Listening on tcp://127.0.0.1:5050
Port 5050 is used for the consensus engine as stated in https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide/ubuntu.html. This makes clear that the consensus engine initiates the connection to the validator service.
So why didn't the validator service listen to port 5050 on my Ubuntu machine? Because the settings transaction processor did not ever run on the Ubuntu machine. I started this processor according to the command in the Ubuntu tutorial:
sudo -u sawtooth settings-tp -v
Then the validator proceeded and started listening to port 5050. As a consequence, the consensus engine could be started.

clojure/lein repl does not work with all database loaded

I have used lein repl commond to test some operation on the database of my project. But I was not able to connect to it.
Then I found that the issue is database was not getting loaded. The only solution i was able to try out was:
lein run
this resulted in following messages:
2018-04-24 12:23:07,397 [main] INFO guestbook.core - #'guestbook.db.core/*db* started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.handler/init-app started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.handler/app started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.core/http-server started
2018-04-24 12:23:07,398 [main] INFO guestbook.core - #'guestbook.core/repl-server started
then I ran the following command :
lein repl :connect 7000
this connected to database and started repl. next commands worked fine:
user=> (use 'guestbook.db.core)
nil
user=> (get-messages)
nil
Please let me know if there is any other way too?

AWS: cloud-init failed

I have a custom centos AMI on which i installed aws-cfn-bootstrap from https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.amzn1.noarch.rpm.
After running cloudformation script,cloud-init is stuck for that instance.
I tried to look into cloud-init.log. I couldn't able to debug the issue.
and I have no idea how to debug this. Help needed, I am currently stuck here. Below are some errors from log. Help me in solving this.
Do I need to change cloud-init.cfg ??
2017-10-10 13:12:26,172 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvda', '/tmp/tmptIJHi2'] with allowed return codes [0] (shell=False, capture=True)
2017-10-10 13:12:26,188 - util.py[DEBUG]: Failed mount of '/dev/xvda' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvda', '/tmp/tmptIJHi2']
Exit code: 32
Reason: -
Stdout: -
2017-10-10 13:12:27,908 - util.py[DEBUG]: Failed mount of '/dev/xvda' as 'iso9660': Unexpected error while running command.
Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvda', '/tmp/tmpJcAGOG']
Exit code: 32
Reason: -
Stdout: -
2017-10-11 04:05:42,847 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
2017-10-11 04:05:42,847 - util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 802, in runparts
subp(prefix + [exe_path], capture=False)
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 1858, in subp
cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/part-001']
Exit code: 1
Reason: -
Stdout: -
Stderr: -
BRs,
Kiran

The problem of yours is the mount command.
Are you launch the instance using from existing server (using it's AMI)?
You can check by running the command on that server, and see if it is working or not.
I believe that your command is something like this.
mount -o ro,sync -t iso9660 /dev/xvda /tmp/tmptIJHi2

Error with Phusion passenger + Nginx for spawning the new application process

My application is based on Ruby 2.2.0, Rails 4.1.9, nginx 1.8.0, Phusion Passenger - 5.0.11 and the application is deployed on EC2 instance, 2 cores and 8GB RAM. But sometimes it shows me following error in the log file:
stderr: Errno::ENOMEM
App 13088 stderr: )
App 13088 stderr: from /usr/lib/ruby/vendor_ruby/phusion_passenger/preloader_shared_helpers.rb:69:in `accept_and_process_next_client'
App 13088 stderr: from /usr/lib/ruby/vendor_ruby/phusion_passenger/preloader_shared_helpers.rb:139:in `run_main_loop'
App 13088 stderr: from /usr/share/passenger/helper-scripts/rack-preloader.rb:154:in `<module:App>'
App 13088 stderr: from /usr/share/passenger/helper-scripts/rack-preloader.rb:29:in `<module:PhusionPassenger>'
App 13088 stderr: from /usr/share/passenger/helper-scripts/rack-preloader.rb:28:in `<main>'
[ 2015-12-28 15:16:56.6347 24663/7f161b3e6700 App/Implementation.cpp:303 ]: Could not spawn process for application /var/www/project123: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger.
Error ID: 392ad659
Error details saved to: /tmp/passenger-error-fHmsZh.html
Message from application: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger. Please read this article for more information about this problem.<br>
<h2>Raw process output:</h2>
(empty)
[ 2015-12-28 15:16:56.6348 24663/7f161b3e6700 Spa/SmartSpawner.h:726 ]: An error occurred while spawning a process: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger.
[ 2015-12-28 15:16:56.6390 24663/7f161b3e6700 Spa/SmartSpawner.h:727 ]: The application preloader seems to have crashed, restarting it and trying again...
App 13205 stdout: Using: /home/ubuntu/.rvm/gems/ruby-2.2.0#found.fy
App 13205 stdout:
[ 2015-12-28 15:17:02.5000 24663/7f161b3e6700 App/Implementation.cpp:303 ]: Could not spawn process for application /var//www/project123: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger.
Error ID: 63c3eb0c
Error details saved to: /tmp/passenger-error-62IksC.html
Message from application: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger. Please read this article for more information about this problem.<br>
<h2>Raw process output:</h2>
(empty)
App 13205 stderr: /usr/lib/ruby/vendor_ruby/phusion_passenger/preloader_shared_helpers.rb:69:in `fork'
-- INSERT -- 840,1 4%
And my sites stop working, do you know why this is error is coming? I also check the memory usage but nothing cause to error related with the memory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js