I am building an .ova file using as source a base image (which is in .vmx format).
The base image (created as said above in .vmx format) is built from an Ubuntu 16.04 server using vmware-iso builder.
Here is my builder configuration
"builders": [
{
"type": "vmware-vmx",
"vmx_data": {
"memsize": "8192",
"numvcpus": "4"
},
"source_path": "path/to/base.vmx",
The first provisioner that will run is the following:
"provisioners": [
{
"type": "shell",
"inline": [
"sudo apt-get update -y",
"sudo apt-get upgrade -y",
...
However, although I have repeated the process many times, it suddenly breaks with the following error:
==> vmware-vmx: Cloning source VM...
==> vmware-vmx: Starting HTTP server on port 8031
==> vmware-vmx: Starting virtual machine...
==> vmware-vmx: Waiting 10s for boot...
==> vmware-vmx: Connecting to VM via VNC (127.0.0.1:5924)
==> vmware-vmx: Typing the boot command over VNC...
==> vmware-vmx: Waiting for SSH to become available...
==> vmware-vmx: Connected to SSH!
==> vmware-vmx: Provisioning with shell script: /tmp/packer-shell747369685
vmware-vmx: Reading package lists...
vmware-vmx: E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
vmware-vmx: E: Unable to lock directory /var/lib/apt/lists/
==> vmware-vmx: Stopping virtual machine...
==> vmware-vmx: Deleting output directory...
Build 'vmware-vmx' errored: Script exited with non-zero exit status: 100
See Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?
The lock is placed when an apt process is running, and is removed when the process completes. If there is a lock with no apparent process running, this may mean the process got stuck for some reason.
Related
I am trying to launch aws replication agent in a CENTOS 8.3 and always returns me an error during the process of replication agent installation ( python3 aws-replication-installer-init.py ......)
The output of the process shows me:
The installation of the AWS Replication Agent has started.
Identifying volumes for replication.
Identified volume for replication: /dev/sdb of size 7 GiB
Identified volume for replication: /dev/sda of size 11 GiB
All volumes for replication were successfully identified.
Downloading the AWS Replication Agent onto the source server... Finished.
Installing the AWS Replication Agent onto the source server...
Error: Failed Installing the AWS Replication Agent
Installation failed.
If i check the aws_replication_agent_installer.log i can see that appears messages like:
make -C /lib/modules/4.18.0-348.2.1.el8_5.x86_64/build M=/tmp/tmp8mdbz3st/AgentDriver modules
.....................
retcode: 0
Build essentials returned with code None
--- Building software
running: 'which zypper'
retcode: 256
running: 'make'
retcode: 0
running: 'chmod 0770 ./aws-replication-driver-commander'
retcode: 0
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
running: '/sbin/insmod ./aws-replication-driver.ko'
retcode: 256
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
Cannot insert module. Try 0.
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
running: '/sbin/insmod ./aws-replication-driver.ko'
retcode: 256
running: '/sbin/rmmod aws-replication-driver'
retcode: 256
Cannot insert module. Try 1.
............
Cannot insert module. Try 9.
Installation returned with code 2
Installation failed due to unspecified error:
stderr: sh: /var/lib/aws-replication-agent/stopAgent.sh: No such file or directory
which: no zypper in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
which: no apt-get in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
which: no zypper in (/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin)
rmmod: ERROR: Module aws_replication_driver is not currently loaded
insmod: ERROR: could not insert module ./aws-replication-driver.ko: Required key not available
rmmod: ERROR: Module aws_replication_driver is not currently loaded
Any issue of the error?
Launching with the command:
mokutil --disable-validation
will allow to change kernel modules (next boot will confirm it introducing password that must be entered afet command mokutil)
I regularly create an image of a running instance without stopping it first. That has worked for years without any issues. Tonight, I created another image of the instance (without any changes to the virtual server settings except for a "sudo yum update -y") and noticed my ssh session was closed. It looked like it was rebooted after the image was created. Then the web console showed 1/2 status checks passed. I rebooted it a few times and the status remained the same. The log showed:
Setting hostname localhost.localdomain: [ OK ]
Setting up Logical Volume Management: [ 3.756261] random: lvm: uninitialized urandom read (4 bytes read)
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
[ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/xvda1
/: clean, 437670/1048576 files, 3117833/4193787 blocks
[/sbin/fsck.xfs (1) -- /mnt/pgsql-data] fsck.xfs -a /dev/xvdf
[/sbin/fsck.ext2 (2) -- /mnt/couchbase] fsck.ext2 -a /dev/xvdg
/sbin/fsck.xfs: XFS file system.
fsck.ext2: Bad magic number in super-block while trying to open /dev/xvdg
/dev/xvdg:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
[ 3.811304] random: crng init done
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
[FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
/dev/fd/9: line 2: plymouth: command not found
Give root password for maintenance
(or type Control-D to continue):
It looked like /dev/xvdg failed the disk check. I detached the volume from the instance and reboot. I still couldn't ssh in. I re-attached it and rebooted. Now it says status check 2/2 passed but I still can't ssh back in and the log still showed issues with /dev/xvdg as above.
Any help would be appreciated. Thank you!
Thomas
I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable
You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop
I've been playing around with Ansible a little bit and I'm able to launch my playbook against a vagrant virtual machine, but the problems arise when I try to do the same process in an EC2 instance. I just can't sudo any task:
# ansible -i inventory/staging dbservers --sudo -a "apt-get update"
ubuntu#staging-ansible | FAILED | rc=100 >>
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
Any clue? I can sudo with ssh:
# ssh ubuntu#staging-ansible "sudo apt-get update"
sudo: unable to resolve host ip-10-0-0-61
Get:1 http://eu-west-1.ec2.archive.ubuntu.com precise Release.gpg [198 B]
Get:2 http://eu-west-1.ec2.archive.ubuntu.com precise-updates Release.gpg [198 B]
Get:3 http://security.ubuntu.com precise-security Release.gpg [198 B]
...
I eventually found the solution:
My inventory file was like:
[dbservers]
ubuntu#mydomain
But if I set the ssh user in this other way runs ok:
[dbservers]
mydomain ansible_ssh_user=ubuntu
I ran a test-kitchen instance and all was fine, but at the end when I did try to destroy it kitchen with:
roberto#pc:~$ kitchen destroy
Virtualbox gave me this error.
-----> Starting Kitchen (v1.1.1)
-----> Destroying <default-ubuntu-1204>...
[default] Destroying VM and associated drives...
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: Failed to complete #destroy action: [Expected process to exit with [0], but received '1'
---- Begin output of vagrant destroy -f ----
STDOUT: [default] Destroying VM and associated drives...
STDERR: There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.
Command: ["unregistervm", "2507bc77-3734-429b-a573-d92fadb80e95", "--delete"]
Stderr: VBoxManage: error: Cannot unregister the machine 'default-ubuntu-1204_default_1391521776' while it is locked
VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component Machine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "Unregister(CleanupMode_DetachAllReturnHardDisksOnly, ComSafeArrayAsOutParam(aMedia))" at line 158 of file VBoxManageMisc.cpp
---- End output of vagrant destroy -f ----
Ran vagrant destroy -f returned 1]
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
I ran Virtualbox and I can not removed the instance because it also was locked and the close option was disable.
Anyone else with this problem?
VBoxManage: error: Cannot unregister the machine 'X' while it is locked.
It's locked, because it's in use, so basically you need to shut it down, e.g.
VBoxManage controlvm VMNAME poweroff
Change VMNAME to your machine name, e.g. default-ubuntu-1204_default_1391521776.
Then you can unregister via:
VBoxManage unregistervm VMNAME --delete
Specifying --delete will delete your VM. If you don't want to delete it, you can make a backup from ~/"VirtualBox VMs/VMNAME.
There are two things that can cause this issue. Please ensure:
You restart your computer after installing/updating VirtualBox
Please make sure you manually open VirtualBox at least once before using the driver
If the machine appears in the list of Virtual Machines in the VBox GUI, try the command again. If they are not listed in the VirtualBox GUI, remove the .kitchen directory and try again.