Retrieve RAM info on a Mac? - c++

I need to retrieve the total amount of RAM present in a system and the total RAM currently being used, so I can calculate a percentage. This is similar to: Retrieve system information on MacOS X?
However, in that question the best answer suggests how to get RAM by reading from:
/usr/bin/vm_stat
Due to the nature of my program, I found out that I am not cannot read from that file - I require a method that will provide me RAM info without simply opening a file and reading from it. I am looking for something to do with function calls. Something like this preferably : getTotalRam() and getRamInUse().
I obviously do not expect it to be that simple but I was looking for a solution other than reading from a file.
I am running Mac OS X Snow Leopard, but would preferably get a solution that would work across all current Mac OS X Platforms (i.e. Lion).
Solutions can be in C++, C or Obj-C, however C++ would the best possible solution in my case so if possible please try to provide it in C++.

Getting the machine's physical memory is simple with sysctl:
int mib [] = { CTL_HW, HW_MEMSIZE };
int64_t value = 0;
size_t length = sizeof(value);
if(-1 == sysctl(mib, 2, &value, &length, NULL, 0))
// An error occurred
// Physical memory is now in value
VM stats are only slightly trickier:
mach_msg_type_number_t count = HOST_VM_INFO_COUNT;
vm_statistics_data_t vmstat;
if(KERN_SUCCESS != host_statistics(mach_host_self(), HOST_VM_INFO, (host_info_t)&vmstat, &count))
// An error occurred
You can then use the data in vmstat to get the information you'd like:
double total = vmstat.wire_count + vmstat.active_count + vmstat.inactive_count + vmstat.free_count;
double wired = vmstat.wire_count / total;
double active = vmstat.active_count / total;
double inactive = vmstat.inactive_count / total;
double free = vmstat.free_count / total;
There is also a 64-bit version of the interface.

You're not supposed to read from /usr/bin/vm_stat, rather you're supposed to run it; it is a program. Look at the first four lines of output
Pages free: 1880145.
Pages active: 49962.
Pages inactive: 43609.
Pages wired down: 123353.
Add the numbers in the right column and multiple by the system page size (as returned by getpagesize()) and you get the total amount of physical memory in the system in bytes.
vm_stat isn't setuid on Mac OS, so I assume there is a non-privileged API somewhere to access this information and that vm_stat is using it. But I don't know what that interface is.

You can figure out the answer to this question by looking at the source of the top command. You can download the source from http://opensource.apple.com/. The 10.7.2 source is available as an archive here or in browsable form here. I recommend downloading the archive and opening top.xcodeproj so you can use Xcode to find definitions (command-clicking in Xcode is very useful).
The top command displays physical memory (RAM) numbers after the label "PhysMem". Searching the project for that string, we find it in the function update_physmem in globalstats.c. It computes the used and free memory numbers from the vm_stat member of struct libtop_tsamp_t.
You can command-click on "vm_stat" to find its declaration as a membor of libtop_tsamp_t in libtop.h. It is declared as type vm_statistics_data_t. Command-clicking that jumps to its definition in /usr/include/mach/vm_statistics.h.
Searching the project for "vm_stat", we find that it is filled in by function libtop_tsamp_update_vm_stats in libtop.c:
mach_msg_type_number_t count = sizeof(tsamp->vm_stat) / sizeof(natural_t);
kr = host_statistics(libtop_port, HOST_VM_INFO, (host_info_t)&tsamp->vm_stat, &count);
if (kr != KERN_SUCCESS) {
return kr;
}
You will need to figure out how libtop_port is set if you want to call host_statistics. I'm sure you can figure that out for yourself.

It's been 4 years but I just wanted to add some extra info on calculating total RAM.
To get the total RAM, we should also consider Pages occupied by compressor and Pages speculative in addition to Kyle Jones answer.
You can check out this post for where the problem occurs.

Related

Parallel Writes to NFS-backed File

UPDATE: I had each node write to a separate file, and when the separate files were concatenated together the result was correct. I also updated the code to attempt a channel flush and file sync after each write of a single record, but there are still issues between nodes 0 and 1, now. If I make Node 0 sleep for a few seconds before it starts its iteration of the coforall loop, the records come out correct. If not, the last few hundred bytes of Node 0's records seem to be reliably overwritten with NULL bytes, up to the start of Node 1's records. The issues between Node 1 and Node 2, and Node 2 and Node 3, seem to not show up anymore.
Additionally, if I suppress either Node 0 or Node 1 from writing, I see the fully-formed records from the un-suppressed node written correctly to the file. In the case that Node 1 is suppressed, I see 9,997 100B records (or 999,700) correct bytes followed by NULL bytes in the file where Node 1's suppressed records would go. In the case that Node 0 is suppressed, I see exactly 999,700 NULL bytes in the file, after which Node 1's records begin.
Original Post:
I'm trying to troubleshoot an issue with parallel writes from different nodes to a shared NFS-backed file on disk. At the moment, I suspect that something is wrong with the way writes to the disk happen on the NFS server.
I'm working on adapting MPI+C code that uses pwrite to write to coordinated chunks of a file. If I try to have the equivalent locales in Chapel write to the file inside of a coforall loop, I end up with the bits of the file around the node boundaries messed up - usually the final few hundred bytes of each node's data are garbled. However, if I have just one locale iterate through the data on all locales and write it, the data comes out correctly. That is, I use the same data structures to calculate the offsets, but only Locale 0 seeks to that offset and performs the writes.
I've verified that the offsets into the file that each locale runs do not overlap, and I'm using a single channel per task, defined from within the on loc do block, so that tasks don't share a single channel.
Are there known issues with writing to a file from different locales? A lot of the documentation makes it seem like this is known to be safe, but an unsubstantiated guess seems to indicate that there are issues with caching of file contents; when examining the incorrect data, the bits that are incorrect seem to be the original data from the file in that location at the beginning of the program.
I've included the relevant routine below, in case you easily spot something I missed. To make this serial, I convert the coforall loc in Locales and on loc do block into a for j in 0..numLocales-1 loop, and replace here.id with j. Please let me know what else would help get to the bottom of this. Thanks!
proc write_share_of_data(data_filename: string, ref record_blocks) throws {
coforall loc in Locales {
on loc do {
var data_file: file = open(data_filename, iomode.cwr);
var data_writer = data_file.writer(kind=ionative, locking=false);
var line: [1..100] uint(8);
const indices = record_blocks[here.id].D;
var local_record_offset = + reduce record_blocks[0..here.id-1].D.size;
writeln("Loc ", here.id, ": record offset is ", local_record_offset);
var local_bytes_offset = terarec.terarec_width_disk * local_record_offset;
data_writer.seek(start=local_bytes_offset);
for i in indices {
var write_rec: terarec_t = record_blocks[here.id].records[i];
line[1..10] = write_rec.key;
line[11..98] = write_rec.value;
line[99] = 13; // \r
line[100] = 10; // \n
data_writer.write(line);
lines_written += 1;
}
data_file.fsync();
data_writer.close();
data_file.close();
}
}
return;
}
Adding an answer here that solved my particular problem, though it doesn't explain the behavior seen. I ended up changing the outer loop from coforall loc in Locales to for loc in Locales. This isn't too big of an issue since it's all writing to one file anyway - I doubt that multiple locales can actually make much headway in all attempting to write concurrently to a single file on an NFS server. As a result, the change still allows nodes to write the data they have locally to NFS, rather than forcing Node 0 to collect and then write the data on behalf of all locales. This amounts to only adding idle time to the write operation commensurate with the time it takes Locale 0 to start the remote task on other nodes when the previous node has finished writing, which for the application at hand is not a concern.
Have you tried specifying start/end in file.writer instead of using seek? Does that change anything? What about specifying the end offset for the channel.seek call? Does it matter if the file is created and has the appropriate size before you start?
Other than that, I wonder if this issue would appear for both NFS and Lustre. If it appears for both it might well be a Chapel bug. It sounds from your description that the C program was using this pattern, which points to it being a bug. But, have you run C code doing this on your setup? If it being a Chapel bug seems most likely after further investigation, we would appreciate a bug report issue with a reproducer.
I know that NFS does not always do what one would like, in terms of data consistency. It's my understanding that it has "close to open" semantics but it's unclear to me what that means in the context of opening a file and writing to a particular region within it, in parallel from different locales.
From Why NFS Sucks by Olaf Kirch:
An NFS client is permitted to cache changes locally and send them to
the server whenever it sees fit. This sort of lazy write-back greatly
helps write performance, but the flip side is that everyone else will
be blissfully unaware of these change before they hit the server. To
make things just a little harder, there is also no requirement for a
client to transmit its cached write in any particular fashion, so
dirty pages can (and often will be) written out in random order.
I read two implications from this paragraph that are relevant to your situation here:
The writes you do on different locales can be observed by the NFS server in an arbitrary order. (However as I understand it, the data should be sent to the server by the time your fsync call returns).
These writes are done at an OS page granularity (usually 4k). (Note that this is more a hypothesis I am making than it is a fact. It should be tested or further investigated).
It would be interesting to check if 2. is a plausible explanation for the behavior you are seeing. For example, you could explore having each locale operate on a multiple of 4096 records (or potentially try writing records of 4096 bytes each) and see if that changes the behavior. If 2 is indeed the explanation, it should be possible to create a C program that demonstrates the behavior as well.

Can a process be limited on how much physical memory it uses?

I'm working on an application, which has the tendency to use excessive amounts of memory, so I'd like to reduce this.
I know this is possible for a Java program, by adding a Maximum heap size parameter during startup of the Java program (e.g. java.exe ... -Xmx4g), but here I'm dealing with an executable on a Windows-10 system, so this is not applicable.
The title of this post refers to this URL, which mentions a way to do this, but which also states:
Maximum Working Set. Indicates the maximum amount of working set assigned to the process. However, this number is ignored by Windows unless a hard limit has been configured for the process by a resource management application.
Meanwhile I can confirm that the following lines of code indeed don't have any impact on the memory usage of my program:
HANDLE h_jobObject = CreateJobObject(NULL, L"Jobobject");
if (!AssignProcessToJobObject(h_jobObject, OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCurrentProcessId())))
{
throw "COULD NOT ASSIGN SELF TO JOB OBJECT!:";
}
JOBOBJECT_EXTENDED_LIMIT_INFORMATION tagJobBase = { 0 };
tagJobBase.BasicLimitInformation.MaximumWorkingSetSize = 1; // far too small, just to see what happens
BOOL bSuc = SetInformationJobObject(h_jobObject, JobObjectExtendedLimitInformation, (LPVOID)&tagJobBase, sizeof(tagJobBase));
=> bSuc is true, or is there anything else I should expect?
In top of this, the mentioned tools (resource managed applications, like Hyper-V) seem not to work on my Windows-10 system.
Next to this, there seems to be another post about this subject "Is there any way to force the WorkingSet of a process to be 1GB in C++?", but here the results seem to be negative too.
For a good understanding: I'm working in C++, so the solution, proposed in this URL are not applicable.
So now I'm stuck with the simple question: is there a way, implementable in C++, to limit the memory usage of the current process, running on Windows-10?
Does anybody have an idea?
Thanks in advance

How to correctly determine fastest CDN, mirror, download server in C++

The question that I'm struggling with is how to determine in c++ that which is the server with fastest connection for the client do make git clone from or download tarball. So basically I want to choose from collection of known mirrors which one will be used for downloading content from.
Following code I wrote demonstrates that what I am trying to achieve more clearly perhaps, but I believe that's not something one should use in production :).
So lets say I have two known source mirrors git-1.exmple.com and git-2.example.com and I want to download tag-x.tar.gz from one which client has best connectivity to.
CDN.h
#include <iostream>
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/time.h>
using namespace std;
class CDN {
public:
long int dl_time;
string host;
string proto;
string path;
string dl_speed;
double kbs;
double mbs;
double sec;
long int ms;
CDN(string, string, string);
void get_download_speed();
bool operator < (const CDN&);
};
#endif
CDN.cpp
#include "CND.h"
CDN::CDN(string protocol, string hostname, string downloadpath)
{
proto = protocol;
host = hostname;
path = downloadpath;
dl_time = ms = sec = mbs = kbs = 0;
get_download_speed();
}
void CDN::get_download_speed()
{
struct timeval dl_started;
gettimeofday(&dl_started, NULL);
long int download_start = ((unsigned long long) dl_started.tv_sec * 1000000) + dl_started.tv_usec;
char buffer[256];
char cmd_output[32];
sprintf(buffer,"wget -O /dev/null --tries=1 --timeout=2 --no-dns-cache --no-cache %s://%s/%s 2>&1 | grep -o --color=never \"[0-9.]\\+ [KM]*B/s\"",proto.c_str(),host.c_str(),path.c_str());
fflush(stdout);
FILE *p = popen(buffer,"r");
fgets(cmd_output, sizeof(buffer), p);
cmd_output[strcspn(cmd_output, "\n")] = 0;
pclose(p);
dl_speed = string(cmd_output);
struct timeval download_ended;
gettimeofday(&download_ended, NULL);
long int download_end = ((unsigned long long)download_ended.tv_sec * 1000000) + download_ended.tv_usec;
size_t output_type_k = dl_speed.find("KB/s");
size_t output_type_m = dl_speed.find("MB/s");
if(output_type_k!=string::npos) {
string dl_bytes = dl_speed.substr(0,output_type_k-1);
double dl_mb = atof(dl_bytes.c_str()) / 1000;
kbs = atof(dl_bytes.c_str());
mbs = dl_mb;
} else if(output_type_m!=string::npos) {
string dl_bytes = dl_speed.substr(0,output_type_m-1);
double dl_kb = atof(dl_bytes.c_str()) * 1000;
kbs = dl_kb;
mbs = atof(dl_bytes.c_str());
} else {
cout << "Should catch the errors..." << endl;
}
ms = download_end-download_start;
sec = ((float)ms)/CLOCKS_PER_SEC;
}
bool CDN::operator < (const CDN& other)
{
if (dl_time < other.dl_time)
return true;
else
return false;
}
main.cpp
#include "CDN.h"
int main()
{
cout << "Checking CDN's" << endl;
char msg[128];
CDN cdn_1 = CDN("http","git-1.example.com","test.txt");
CDN cdn_2 = CDN("http","git-2.example.com","test.txt");
if(cdn_2 > cdn_1)
{
sprintf(msg,"Downloading tag-x.tar.gz from %s %s since it's faster than %s %s",
cdn_1.host.c_str(),cdn_1.dl_speed.c_str(),cdn_2.host.c_str(),cdn_2.dl_speed.c_str());
cout << msg << endl;
}
else
{
sprintf(msg,"Downloading tag-x.tar.gz from %s %s since it's faster than %s %s",
cdn_2.host.c_str(),cdn_2.dl_speed.c_str(),cdn_1.host.c_str(),cdn_1.dl_speed.c_str());
cout << msg << endl;
}
return 0;
}
So what are your thoughts and how would you approach this. What are the alternatives to replace this wget and achieve same clean way in c++
EDIT:
As #molbdnilo pointed correctly
ping measures latency, but you're interested in throughput.
So therefore I edited the demonstrating code to reflect that, however question remains same
For starters, trying to determine "fastest CDN mirror" is an inexact science. There is no universally accepted definition of what "fastest" means. The most one can hope for, here, is to choose a reasonable heuristic for what "fastest" means, and then measure this heuristic as precisely as can be under the circumstances.
In the code example here, the chosen heuristic seems to be how long it takes to download a sample file from each mirror via HTTP.
That's not such a bad choice to make, actually. You could reasonably make an argument that some other heuristic might be slightly better, but the basic test of how long it takes to transfer a sample file, from each candidate mirror, I would think is a very reasonable heuristic.
The big, big problem here I see here is the actual implementation of this heuristic. The way that this attempt -- to time the sample download -- is made, here, does not appear to be very reliable, and it will end up measuring a whole bunch of unrelated factors that have nothing do with network bandwidth.
I see at least several opportunities here where external factors completely unrelated to network throughput will muck up the measured timings, and make them less reliable than they should be.
So, let's take a look at the code, and see how it attempts to measure network latency. Here's the meat of it:
sprintf(buffer,"wget -O /dev/null --tries=1 --timeout=2 --no-dns-cache --no-cache %s://%s/%s 2>&1 | grep -o --color=never \"[0-9.]\\+ [KM]*B/s\"",proto.c_str(),host.c_str(),path.c_str());
fflush(stdout);
FILE *p = popen(buffer,"r");
fgets(cmd_output, sizeof(buffer), p);
cmd_output[strcspn(cmd_output, "\n")] = 0;
pclose(p);
... and gettimeofday() gets used to sample the system clock before and after, to figure out how long this took. Ok, that's great. But what would this actually measure?
It helps a lot here, to take a blank piece of paper, and just write down everything that happens here as part of the popen() call, step by step:
1) A new child process is fork()ed. The operating system kernel creates a new child process.
2) The new child process exec()s /bin/bash, or your default system shell, passing in a long string that starts with "wget", followed by a bunch of other parameters that you see above.
3) The operating system kernel loads "/bin/bash" as the new child process. The kernel loads and opens any and all shared libraries that the system shell normally needs to run.
4) The system shell process initializes. It reads the $HOME/.bashrc file and executes it, most likely, together with any standard shell initialization files and scripts that your system shell normally does. That itself can create a bunch of new processes, that have to be initialized and executed, before the new system shell process actually gets around to...
5) ...parsing the "wget" command it originally received as an argument, and exec()uting it.
6) The operating system kernel now loads "wget" as the new child process. The kernel loads and open any and all shared libraries that the wget process needs. Looking at my Linux box, "wget" loads no less than 25 separate shared libraries, including kerberos, and ssl libraries. Each one of those shared libraries get initialized.
7) The wget command performs a DNS lookup on the host, to obtain the IP address of the web server to connect to. If the local DNS server doesn't have the CDN mirror's hostname's IP address cached, it often takes several seconds to look up the CDN mirrors's DNS zone's authoritative DNS servers, then query them for the IP address, hopping this way and that way, across the intertubes.
Now, one moment... I seem have forgotten what we were trying to do here... Oh, I remember: which CDN mirror is "fastest", by downloading a sample file from each mirror, right? Yeah, that must be it!
Now, what does all of work done above, all of that work, have to do with determining which content mirror is the fastest???
Err... Not much, from the way it looks to me. Now, none of the above should really be such shocking news. After all, all of that is described in popen()'s manual page. If you read popen's manual page, it tells you that's ...what it does. Starts a new child process. Then executes the system shell, in order to execute the requested command. Etc, etc, etc...
Now, we're not talking about measuring time intervals that last many seconds, or minutes. If we're trying to measure something that takes a long time to execute, the relative overhead of popen()'s approach would be negligible, and not much to worry about. But the expected time to download the sample file, for the purpose of figuring out how fast each content mirror is -- I would expect that the actual download time would be relatively short. But it seems to me that the overhead to doing it this way, of forking an entirely new process, and executing first the system shell, then the wget command, with its massive list of dependencies, is going to be statistically significant.
And as I mentioned in the beginning, given that this is trying to determine the vaguely nebulous concept of "fastest mirror", which is already an inexact science -- it seems to me that you'd really want to get rid of as much utterly irrelevant overhead here -- as much as possible, in order to get as accurate of a result.
So, it seems to me that you don't really want to measure here anything other than what you're trying to measure: network bandwidth. And you certainly don't want to measure any of what transpires before any network activity takes place.
I still think that trying to time a sample download is a reasonable proposition. What's not reasonable here is all the popen and wget bloat. So, forget all of that. Throw it out the window. You want to measure how long it takes to download a sample file over HTTP, from each candidate mirror? Well, why don't you do just that?
1) Create a new socket().
2) Use getaddrinfo() to perform a DNS lookup, and obtain the candidate mirror's IP address.
3) connect() to the mirror's HTTP port.
4) Format the appropriate HTTP GET request, and send it to the server.
The above does pretty much what the popen/wget does, up to this point.
And only now I would start the clock running by grabbing the current gettimeofday(), then wait until I read the entire sample file from the socket, then grab the current gettimeofday() once more, to get the ending time of the transmission, and then calculate the actual time it took to receive the file from the mirror.
Only then, will I have some reasonable confidence that I'll be actually measuring the time it takes to receive a sample file from a CDN mirror, and completely ignoring the time it takes to execute a bunch of completely unrelated processes; and then by taking the same sample from multiple CDN mirrors, have any hope of picking one, using as much of a sensible heuristic, as possible.

Visual C++6 MFC MapViewOfFile returns error code 8

I have a program that is creating a map file, its able to do that call just fine, m_hMap = CreateFileMapping(m_hFile,0,dwProtect,0,m_dwMapSize,NULL);, but when the subsequent function call to MapViewOfFile(m_hMap,dwViewAccess,0,0,0), I get an error code of 8, which is ERROR_NOT_ENOUGH_MEMORY, or error string "error Not enough storage is available to process this command".
So I'm not totally understanding what the MapViewOfFile does for me, and how to fix the situation.
some numbers...
m_dwMapSize = 453427200
dwProtect = PAGE_READWRITE;
dwViewAccess = FILE_MAP_ALL_ACCESS;
I think my page size is 65536
In case of very large file and to read it, it is recommended to read it in small pieces and then process each piece. And MapViewOfFile function is used to map a piece in memory.
Look at http://msdn.microsoft.com/en-us/library/windows/desktop/aa366761(v=vs.85).aspx need offset to do its job properly i.e. in case you want to read a very large file in pieces. Mostly due to fragmentation and related reason very large memory request fails.
If you are working on a 64 bit processor then the system will allocate a total of 4GB memory with bit set LargeaddressAware.
go to Configuration properties->linker->system. in Enable largeaddressware: check
Yes /LARGEADDRESSAWARE and check.

WxTextCtrl unable to load large texts

I've read about the solutuon written here on a post a year ago
wx.TextCtrl.LoadFile()
Now I have a windows application that will generate color frequency statistics that are saved in 3D arrays. Here is a part of my code as you will see on the code below the printing of the statistics is dependent on a slider which specifies the threshold.
void Project1Frm::WxButton2Click(wxCommandEvent& event)
{
char stat[32] ="";
int ***report = pGLCanvas->GetPixel();
float max = pGLCanvas->GetMaxval();
float dist = WxSlider5->GetValue();
WxRichTextCtrl1->Clear();
WxRichTextCtrl1->SetMaxLength(100);
if(dist>0)
{
WxRichTextCtrl1->AppendText(wxT("Statistics\nR\tG\tB\t\n"));
for(int m=0; m<256; m++){
for(int n=0; n<256; n++){
for(int o=0; o<256; o++){
if((report[m][n][o]/max)>=(dist/100.0))
{
sprintf(stat,"%d\t%d\t%d\t%3.6f%%\n",m,n,o,report[m][n][o]/max*100.0);
WxRichTextCtrl1->AppendText(wxT(stat));
}
}
}
}
}
else if(dist==0) WxRichTextCtrl1->LoadFile("histodata.txt");
}
The solution I've tried so far is that when I am to print all the statistics I'll get it from a text file rather than going through the 3D array... I would like to ask if the Python implementation of the segmenting can be ported to C++ or are there better ways to deal with this problem. Thank you.
EDIT:
Another reason why I used a text file instead is that I observed that whenever I do sprintf only [with the line WxRichTextCtrl1->AppendText(wxT(stat)); was commented out] the computer starts to slow down.
-Ric
Disclaimer: My answer is more of an alternative than a solution.
I don't believe that there's any situation in which a user of this application is going to find it useful to have a scrolled text window containing ~16 million lines of numbers. It would be impossible to scroll to one specific location in the list that the user might need to see easily. This is all assuming that every single number you output here has some significance to the user of course (you are showing them on the screen for a reason). Providing the user with controls to look up specific, fixed (reasonable) ranges of those numbers would be a better solution, not only in regards to a better user experience, but also in helping to resolve your issue here.
On the other hand, if you still insist on one single window containing all 64 million numbers, you seem to have a very rigid data structure here, which means you can (and should) take advantage of using a virtual grid control (wxGrid), which is intended to work smoothly even with incredibly large data sets like this. The user will likely find this control easier to read and find the section of data they are looking for.