The filesystem 'P4ROOT' has only 1.9G free, but the server configuration requires at least 2G available - digital-ocean

I'm working on a Perforce server hosted on Digital Ocean, Linux Ubuntu 18.04.3.
I've run out of file space on the server (according to the perforce configuration)
The trick is that I can't change the configuration as long as it's within this 2G threshold, the server continually rejects P4 Configure commands as well.
Digital Ocean can't downsize after you add more server space, and I feel like I might run into the same issue again so I'd prefer to learn how to actually fix it.
I'm an artist by trade, and while I can mostly follow documentation, specificity is required.
The P4 Config commands don't work, and return the same error. I can't check anything out, I can't obliterate any files to free up space.
Not sure what to do, my whole project is frozen.

The temporary fix to lower the disk space thresholds is to use p4d -c to modify the appropriate filesys.*.min configurables:
C:\Perforce\test>p4d -h
Usage:
p4d [ options ]
p4d [ options ] -j? [ -z | -Z ] [ args ... ]
...
Configuration options:
-cshow
'-cset [server#]variable=value'
'-cunset [server#]variable'
This syntax provides a limited subset of the functionality of the
'p4 configure' command, and is useful for accessing the configuration
when the server is down. The '-cshow' flag displays the contents of
the db.config table, similar to 'p4 configure show allservers'.
The '-cset' and '-cunset' flags set or unset a configurable. When
using set or unset, enclose the entire expression in quotation marks,
and on Windows, use double quotation marks, not single ones.
C:\Perforce\test>p4 help configurables
Perforce server configurables
...
filesys.depot.min 250M Minimum space for depot filesystem
filesys.extendlowmark 32K Minimum filesize before preallocation(NT)
filesys.P4ROOT.min 250M Minimum space for P4ROOT filesystem
filesys.P4JOURNAL.min 250M Minimum space for P4JOURNAL filesystem
filesys.P4LOG.min 250M Minimum space for P4LOG filesystem
filesys.TEMP.min 250M Minimum space for TEMP filesystem
This will allow you to run commands like p4 obliterate etc that will free up a bit of space. Once you're done, you should return the thresholds to their previous values, because it's pretty easy to blow through that buffer and actually run out of disk space (which is much more difficult to recover from than hitting a soft limit).
The long-term solution is to increase the amount of available disk space. I've talked to many many Perforce admins who run out of disk space, say "no problem, I'll obliterate some stuff" (or compress the db via checkpoint restoration, or whatever else turns up on a "how to free up space in a Perforce server" google search), and they're inevitably back in the same position within a few months, except that they've already deleted/obliterated all the "easy" stuff and now have to start cutting closer to the bone, so now it takes a lot longer to figure out what they can afford to free up. And then the next time it's even harder. It's much easier (and in almost all cases much cheaper if you consider your own time to have any value at all) to just increase the amount of disk space and forget about it.

Related

out of space SAS studio engine V9- MAC

I was trying to practice with a dataset having more than 100K rows and my SAS UE shows error as out of space while trying statistical analysis,after some google search I found some solutions like extending disk space in VM and cleaning work libraries(I did clean the work library using "proc datasets library=WORK kill; run; quit;" but the issue remains same) but I am not sure how to increase the disk space, or redirecting work library to local storage in my Mac. There are no proper guidelines I have seen/understood from google search. Please help.
You can modify the cores on the VM to be 2 and increase the RAM space in the Oracle VB settings. You cannot increase the size of the VM and 100K rows should not be problematic unless you're not cleaning up processes.
Yes, SAS UE does have a tendency to not clean up after crashes so eventually if you've crashed it multiple times you'll have to reinstall to clean up. You can get around this by reassigning the work library. A quick way to do this is in projects that will be affecting it set the USER library to your myfolders or another space on your computer.
libname user '/folders/myfolders/tempWSpace';
Make sure you first create the folder under myfolders. Then any single level data set (no libname) will automatically be stored in user library and you should be ok to run your code.

Coldfusion cfprint and UPS labels

I am trying to use Coldfusion CFPRINT to print UPS labels to a network printer. The starting labels (png files) are great and I can print them locally to the zebra printer and they print and work wonderfully. The barcodes produced by CFPRINT however are of such poor quality that a barcode scanner cannot read them. My research shows that Coldfusion uses the jpedal java library which resizes the images to 72 dpi - which is just not crisp enough for a scanner.
I read about using a jpedal setting: org.jpedal.upscale=2 but I have no clue as to where you would utilize this.
Any suggestions on how to fix this CFPRINT resolution issue using Coldfusion?
(Just to add a bit more detail to the comments)
That is a JVM argument. There are several ways to apply it:
Add the setting to your jvm.config file manually. Backup the file first. Then add -Dorg.jpedal.upscale=2 to the end of the java.args section. Save the changes and restart the CF Server. Do not skip the backup step! Errors in the jvm.config file can prevent the server from starting. So it is important to have a good copy you can restore if needed.
Open the CF Administrator and select Server Settings > Java and JVM > JVM Arguments. Add -Dorg.jpedal.upscale=2 to the end of the arguments. Save the settings and restart the CF server.
Again, I would strongly recommend making a backup of the jvm.config file first. As #Mark noted in the comments, some versions of CF have been known to mangle the jvm.config file, which could prevent the server from starting. But as long as you have a good backup, simply restore it and you are good to go.
IIRC, you could also set the property at runtime, via code. However, timing will be more of a factor. Their API states system properties must be set before accessing JPedal. The docs are not clear on exactly what that means. However, the implication is the system property is only read once, so if you set it too late, it will have no affect.
// untested
sys = createObject("java", "java.lang.System");
prop = sys.getProperties();
prop.setProperty("org.jpedal.upscale", "2");
sys.setProperties(prop);
Side note, I was not familiar with that setting, but a quick search turned up the CF8 Update 1 Release Notes which mention this setting "improves sharpness, but it also doubles the image size" and also increases memory. Just something to keep in mind.

Inotify-like feature in a distributed file system

As the title goes, I want to trigger a notification when some events happen.
A event above can be user-defined, such as updating specified files in 1-miniute.
If files are stored locally, I can easily make it with the system call inotify, but the case is that files locate on a distributed file system such as mfs..
How to make it? I wonder to know if there are some solutions or open-source project to solve this problem. Thanks.
If you have only black-box access (e.g. NFS protocol) to the remote system(s), you don't have much options unless the protocol supports what you need. So I'll assume you have control over the remote systems.
The "trivial" approach is running a local inotify/fanotify listener on each computer that would forward the notification over the network. FAM can do this over NFS.
A problem with all notification-based system is the risk of lost notifications in various edge cases. This becomes much more acute over a network - e.g. client confirms reciept of notification, then immediately crashes. There are reliable message queues you can build on but IMHO this way lies madness...
A saner approach is stateless hash-based scan.
I like to call the following design "hnotify" but that's not an established term. The ideas are widely used by many version control and backup systems, dating back to Plan 9.
The core idea is if you know cryptographic hashes for files, you can compose a single hash that represents a directory of files - it changes if any of the files changed - and you can build these bottom-up to represent the whole filesystem's state.
(Git stores things this way and is very efficient at it.)
Why are hash trees cool? If you have 2 hash trees — one representing the filesystem state you saw at point in the past, one representing the current state — you can easily find out what changed between them:
You start at the roots. If they are different you read the 2 root directories and compare hashes for subdirectories.
If a subdirectory has same hash in both trees, then nothing under it changed. No point going there.
If a subdirectory's hash changed, compare its contents recursively — call step (1).
If one has a subdirectory the other doesn't, well that's a change. With some global table you can also detect moves/renames.
Note that if few files changed, you only read a small portion of the current state. So the remote system doesn't have to send you the whole tree of hashes, it can be an interactive ping-pong of "give me hashes for this directory; ok now for this...".
(This is akin to how Git's dumb http protocol worked; there is a newer protocol with less round trips.)
This is as robust and bug-proof as polling the whole filesystem for changes — you can't miss anything — but reasonably efficient!
But how does the server track current hashes?
Unfortunately, fully hashing all disk writes is too expensive for most people. You may get if for free if you're lucky to be running a deduplicating filesystem, e.g. ZFS or Btrfs.
Otherwise you're stuck with re-reading all changed files (which is even more expensive than doing it in the filesystem layer) or using fake file hashes: upon any change to a file, invent a new random "hash" to invalidate it (and try to keep the fake hashes on moves). Still compute real hashes up the tree. Now you may have false positives — you "detect a change" when the content is the same — but never false negatives.
Anyway, the point is that whatever stateful hacks you do (e.g. inotify with periodic scans to be sure), you only do them locally on the server. Across the network, you only ever send hashes that represent snapshots of current state (or its subtrees)! This way you can have a distributed system with many servers and clients, intermittent connectivity, and still keep your sanity.
P.S. Btrfs can efficiently find differences from an older snapshot. But this is a snapshot taken on the server (and causing all data to be preserved!), less flexible than a client-side lightweight tree-of-hashes.
P.S. One of your tags is HadoopFS. I'm not really familiar with it, but I suspect a lot of its files are write-once-then-immutable, and it might be able to natively give you some kind of file/chunk ids that can serve as fake hashes?
Existing tools
The first tool that springs to my mind is bup index. bup is a very clever deduplicating backup tool built on git (only scalable to huge data), so it sits on the foundation described above. In theory, indexing data in bup on the server and doing git fetch over the network would even implement the hash-walking comparison of what's new that I described above — unfortunately the git repositories that bup produces are too big for git itself to cope with. Also you probably don't want bup to read and store all your data. But bup index is a separate subsystem that quickly scans a filesystem for potential changes, without yet reading the changed files.
Currently bup doesn't use inotify but it's been discussed in depth.
Oh, and bup uses Bloom Filters which are a nearly optimal way to represent sets with false positives. I'm almost certain Bloom filters have a role to play in optimizion stateless notification protocols ("here is a compressed bitmap of all I have; you should be able to narrow your queries with it" or "here is a compressed bitmap of what I want to be notified about"). Not sure if the way bup uses them is directly useful to you, but this data structure should definitely be in your toolbelt.
Another tool is git annex. It's also based on Git (are you noticing a trend?) but is designed to keep the data itself out of Git repos (so git fetch should just work!) and has a "WORM" option that uses fake hashes for faster performance.
Alternative design: compressed replayable journal
I used to think the above is the only sane stateless approach for clients to check what's changed. But I just read http://arstechnica.com/apple/2007/10/mac-os-x-10-5/7/ about OS X's FSEvents framework, which has a perhaps simpler design:
ALL changes are logged to a file. It's kept forever.
Clients can ask "replay for me everything since event 51348".
The magic trick is the log has coarse granularity ("something in this directory changed, go re-scan it to find out what", repeated changes within 30 seconds are combined) so this journal file is very compact.
At the low level you might resort to similar techniques — e.g. hashes — but the top-level interface is different: instead of snapshots you deal with a timeline of events. It may be an easier fit for some applications.

Detecting mounted drives on Linux and Mac OS X

I’m using QDir::drives() to get the list of drives. It works great on Windows, but on Linux and Mac it only returns a single item “/”, i. e. root. It is expected behavior, but how can I get a list of drives on Mac and Linux?
Non-Qt, native API solutions are also welcome.
Clarification on "drive" definition: I'd like to get a list of mount points that are visble as "drives" in Finder or Linux built-in file manager.
As far as the filesystem is concerned, there is no concept of drives in Unix/Linux (I can't vouch for MacOSX but I'd say it's the same). The closest thing would probably be mount points, but a normal application shouldn't bother about them since all is already available under the filesystem root / (hence the behaviour of QDir::drives() that you observe).
If you really want to see which mount points are in use, you could parse the output of the mount command (without any arguments) or, at least on Linux, the contents of the /etc/mtab file. Beware though, mount points can get pretty hairy real quick (loop devices, FUSE filesystems, network shares, ...) so, again, I wouldn't recommend making use of them unless your application is designed to administer them.
Keep in mind that on Unix-y OSes, mount points are normally a matter for system administrators, not end-users, unless we're speaking of removable media or transient network shares.
Edit: Following your clarifications in the comments, on Linux you should use getmntent or getmntent_r to parse the contents of the /etc/mtab file and thus get a list of all mount points and the corresponding devices.
The trick after that is to determine which ones you want to display (removable? network share?). I know that /sys/block/... can help with that, but I don't know all the details so you'll have to dig a bit more.
For example, to check whether /dev/sdd1 (a USB key) mounted on /media/usb0/ is a removable device, you could do (note how I use the device name sdd, not the partition name sdd1):
$ cat /sys/block/sdd/removable
1
As opposed to my main hard drive:
$ cat /sys/block/sda/removable
0
Hope this puts you on the right track.
For OS X, the Disk Arbitration framework can be used to list and monitor drives and mount points
Scraping the output of mount shell command is certainly one option on either platform - although, what is your definition of a drive here? Physical media, removable drivers, network volumes? You'll need to do a lot of filtering.
On MacOSX, the mount point for removable media, network volumes, and secondary hard-drives is always under /Volumes/, so simply enumerating items in this directory will do the trick if your definition of a drive is broad. This ought to be fairly safe as they're all automounted .
On Linux, there are a variety of locations depending on the particular distro in use. /mnt/ is the traditional, but there are others.
In linux, the way to get information about drives currently mounted is to parse the mtab file. glibc provides a macro _PATH_MNTTAB to locate this file. See http://www.gnu.org/software/libc/manual/html_node/Mount-Information.html#Mount-Information
If you know the format of the drive/drives in question, you can use the df command to output the list of drives from the console or programatically as a system command. For example, to find all the ext4 drives:
df -t ext4
You can simply add additional formats onto the same command if you are interested in more than one type:
df -t ext4 -t tmpfs
This is going to return to you the physical location of the drive, the amount of memory it has, the amount of memory used, the amount of memory free, the use% and where it is mounted on the filesystem.
df will show you all of the drives mounted on the system, but some are going to be things that aren't really what you are looking for like temporary file systems, etc.
Not sure if this will work on OSX or not, but it does work on my Ubuntu 12.04 distribution.
Another way is to check for "Volumes"
df -H | grep "/Volumes"
I know that this is old, but it failed to mention getfsstat which I ended up using in macos. You can get a list of mounts (which will include most disks) using getfsstat. See man 2 getfsstat for further information.

How to measure the amount of data transmitted by my MPI program?

I'm experimenting my distributed clustering algorithm (implemented with MPI) on 24 computers that I set up as a cluster using BCCD (Bootable Cluster CD) that can be downloaded at http://bccd.net/.
I've written a batch program to run my experiment that consists in running my algorithm several times varying the number of nodes and the size of the input data.
I want to know the amount of data used in the MPI communications for each run of my algorithm so I can see how the amount of data changes when varying the previous mentioned parameters. And I want to do all this automatically using a batch program.
Someone told me to use tcpdump, but I found some difficulties in this approach.
First, I don't know how to call tcpdump in my batch program (which is written in C++ using the command system for making calls) before each run of my algorithm, since tcpdump requires another terminal to run in parallel with my application. And I can't run tcpdump in another computer since the network uses a switch. So I need to run it on the master node.
Second, I saw the traffic with tcpdump while my experiment was going on and I couldn't figure out what was the port used by MPI. It seems to use many ports. I wanted to know that for filtering the packages.
Third, I tried capturing whole packages and saving it to a file using tcpdump and in a few seconds the file was 3,5MB. But my whole experiment takes 2 days. So the final log file will be huge if I follow this approach.
The ideal approach would be to capture just the size field in the header of the packages and sum this up to obtain the total amount of data transmitted. In that way the logfile would be much smaller than if I were capturing the whole package. But I don't know how to do it.
Another restriction is that I don't have access to the computer disc. So I only have the RAM and my 4GB USB Flash drive. So I can't have huge logfiles.
I have already thought about using some MPI tracing or profiling tool such as those mentioned at http://www.open-mpi.org/faq/?category=perftools. I have only tested Sun Performance Analyzer until now. The problem is that I guess it will be difficult to install those tools on BCCD and maybe even impossible. In addtion to that, this tool will make my experiment take longer to end, sice it adds overhead. But if someone is familiar with BCCD and think it is a good choice to use one of those tools, so please let me know.
Hope someone have a solution.
Implementations like tcpdump won't work if there are multi-core nodes which use shard memory to communicate, anyway.
Using something like MPE is almost certainly the way to go. Those tools add very little overhead, and some overhead is always going to be necessary if you want to count messages. You can use mpitrace to write out every MPI call, and parse the resulting text file yourself. By the way, note that MPE is explicitly discussed on the bccd website. MPICH2 comes with MPE built in, but it can be compiled for any implementation. I've only found a very modest overhead for MPE.
IPM is another nice tool that does counting of messages and sizes; you should be able either parse the XML output, or use the postprocessing tools and just manually integrate the graphs (say either bytes_rx/bytes_tx by rank, or the message buffer size/count graph). The overhead for IPM is even less than for MPE, and mostly comes after the program's finished running to do the file I/O.
If you were really super worried about the overhead with either of these approaches, you could always write your own MPI wrappers using the profiling interface that wrapped MPI_Send, MPI_Recv, etc, and just counted # of bytes sent and recieved for each process, and output only that total at the end.