/System/Volumes/Data is taking too much space on Mac OS Catalina

/System/Volumes/Data is taking too much space on Mac OS Catalina - macos-catalina

One of my hosts runs on Mac OS Catalina, and it constantly runs out of disk space...
I have scheduled tasks running there and every day it uploads files into /Users/labuser/myfolder and removes older files from that folder.
After digging through folders I found that /System/Volumes/Data/Users/labuser/myfolder takes 90% of occupied space on my host.
Is there a way to disable this feature on Catalina and stop it from growing /System/Volumes/Data/... ?

/Users/labuser/myfolder is equivalent to the folder with /System/Volumes/Data/ prepended. macOS 10.15 Catalina added firmlinks (some more description here), but actually from a practical perspective (to the user) these are one and the same.
Thus, your problem has nothing to do with a "feature" on Catalina; rather it has to do with the amount of data you're storing and backing up from /Users/labuser/myfolder.
Whether you use ncdu or another disk usage manager that will solve you problem of finding out why you're consuming all of your disk space.
One other relevant point is that because these are "symlinked" (called firmlinks by Apple), some disk inventory apps don't know how to handle this and end up in a recursion scenario when trying to understand total disk usage. I've seen this behavior with ncdu also. That being said, if you run the disk inventory on a subfolder of /System/Volumes/Data/, e.g.:
cd /Users
ncdu
It should avoid these issues.

Related

Reduce size of tlog files produced by compiler

Since our build on the build server is more and more slowing down I tried to find out what could be the cause. It seems it is mostly hanging in the disk IO operations of the .tlog files since there is no CPU load and still the build hangs. Even with a project containing only 10 cpp files, it generates ~5500 rows in the CL.read.1.tlog file.
The suspicious thing is that the file contains the same headers over and over, especially boost headers which take up like 90% of the file.
Is this really the expected behavior that those files are so big and have redundant content or is maybe a problem triggered from our source code? Are there maybe cyclic includes or too many header includes that can cause this problem?
Update 1
After all the comments I'll try to clarify some more details here.
We are only using boost by including the headers and linking the already compiled libs, we are not compiling boost itself
Yes, SSD is always a nice improvement, but the build server is hosted by our IT and we do not have SSDs available there. (see points below)
I checked some perfcounters especially via perfmon during the compilation. Whereas the CPU and Memory load are negligible most of the time, the disk IO counters and also queue sizes are quite high all the time. Disk Activity - Highest Active time is constantly on 100& and if I sort the Total (B/sec) it's all full with tlog files which read/write a lot of data to the disk.
Even if 5500 lines of tlog seem okay in some cases, I wonder why the exact same boost headers are contained over and over. Here a logfile where I censored our own headers.
There is no Antivirus influencing. I stopped it for my investigations since we know that it influences our compilation even more.
On my local developer machine with SSD it takes ~16min to build our whole solution, whereas on our build server with a "slower" disk it takes ~2hrs. CPU and memory are comparable. The 5500 line file just was an example from a single project within a solution of 20-30 projects. We have a project where we have ~30MB tlog files with ~60.000 lines in it, only this project takes half of the compilation duration.
Of course there is some basic CPU load on the machine during compilation. But it is not comparable to other developer machines with SSDs.
Our .net solution with 45 projects is finished in 12min (including setup project with WiX)
As on developer machines with SSDs we have at least a reduction from 2hrs to 16mins with a comparable CPU/memory configuration my assumption for the bottle neck was always the hard disk. Checking for disk related operations lead me to the tlog files since they caused the highest disk activity according to permon.

Virtualbox - Automatically return to snapshot

I've been trying for quite a while now to get the following thing to work - without success. I know that my approach is a bit of a dirty hack, so I'm always open for suggestions about how to do this in a better way.
We're running VirtualBox on Linux machines in a school environment. There are a couple of applications that absolutely need to be run under Windows, which is why there's no way around VirtualBox (forget wine!). So we put a Virtual Machine image on every local hard drive which the students can run if they need to. This image really ought to be "read only" in one way or another. Obviously, even running the machine will make changes to that image, so we need a mechanism that automatically reverts those changes. We're somewhat flexible about when this is supposed to happen. To me, doing this at each reboot seems to be the best approach but I wouldn't mind it being done at, say, logoff.
Right now we're using rsync on the image every time the computer boots which works as long as the virtual machine isn't started until that process is finished. Despite the fact that it works, it's a pain for the administrator as it may lead to various kinds of errors that are hard to reproduce, so there must be a better way.
My idea was to use a snapshot which I automatically revert to. Since simply deleting the Snapshot VDIs won't work, I wanted to create a "template snapshot vdi" (around 180MB) which is copied to a world writeble location (e.g. /tmp) at boot time. I know there's still a race condition but copying 180MB on a local hard drive should be significantly faster and predictable than rsyncing some 15GB. I then configure the virtual machine to use that snapshot.
Doing this VirtualBox produces error
Parent UUID {00000000-0000-0000-0000-000000000000} doesn't match UUID {12345678-1234-1234-1234-123456789012} in the xml file.
A possible cause is this issue with VirtualBox, though I'm not sure about this.
What came to my mind right now while writing those lines is using a script to copy/rsync the snapshot VDI template to the "right" path within the home directory just before the machine is being started by that script.
Are there any other suggestions?

Running out of file descriptors for mmaped files despite high limits in multithreaded web-app

I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?

So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.

your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)

Increasing shared memory on OSX to properly install PostgreSQL

This is my first stackoverflow post. I am trying to set up PostgreSQL to use with Django. Very new to all of this (took one course in Python in college, now trying to teach myself a little web development).
The installation guide for PostgreSQL says:
"Before running the installation, please ensure that your system is
configured to allow the use of larger amounts of shared memory. Note that
this does not 'reserve' any memory so it is safe to configure much higher
values than you might initially need. You can do this by editting the
file /etc/sysctl.conf - e.g.
% sudo vi /etc/sysctl.conf
On a MacBook Pro with 2GB of RAM, the author's sysctl.conf contains:
kern.sysv.shmmax=1610612736
kern.sysv.shmall=393216
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.maxprocperuid=512
kern.maxproc=2048
Note that (kern.sysv.shmall * 4096) should be greater than or equal to
kern.sysv.shmmax. kern.sysv.shmmax must also be a multiple of 4096.
Once you have edited (or created) the file, reboot before continuing with
the installation. If you wish to check the settings currently being used by
the kernel, you can use the sysctl utility:
% sysctl -a
The database server can now be installed."
I am running a fresh-out-of-the-box MBA with 4GB of ram. How to I set this up properly? Thanks in advance.

Just download the installer and click "ok" to get started. When everything is running, you can always increase memory settings and edit postgresql.conf to get better performance.

VMWare Image Modification

If I already have an image that exists, can I create an image based on the existing one, except I want to make changes to the exisiting one (mainly configurations).

I do this all the time. I actually keep each of my VMs in a separate directory and duplicate the entire directory to make a copy. All references within the VMX file (configuration) are relative to the current directory.
One thing you need to watch out for. The VMX file has a line with the MAC address of the virtual network card:
ethernet0.generatedAddress = "00:0c:29:ff:1f:c7"
You'll need to change that if you want to run both VMs at the same time - I usually just bump the final digit up by 1 (to c8).
I also change the displayName in that file so I can tell the difference between them when they're running.

Yes, you can just copy the image off to external storage. Just find the image file(s) on your drive and do the copy when the image is not running. You can then change the original all you want. Is this what you are after?

What I do is create a base "clean" VM which I then run Sysprep on before cloning. You can run into a few problems when you don't reset the 'unique' elements of a windows installation and you're trying to run them simultaneously.
I'm running ~20 VMs at the moment and if any one gets seriously messed up (they're used for testing) I've got clean base images of Windows 2000, Windows XP, Vista and Server 2003 at the ready so I can be back up and running in 20mins or less.

Depending on what your needs are, you might try the (free) VMWare Converter. It lets you change drive sizes and other image parameters.

As others have said, this is exactly how you implement full backups for your VMs.
When the VM is not running, merely copy the virtual disks into a different location, then restart the VM.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js