Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
In some video games, I find that everytime a new character is created a factory method is used create the new one like this
class CharacterEngine
{
public:
static Character* CreateCharacter(string Name, Weapons InitialWeapons)
{
return new Character(Name, InitialWeapons);
}
};
//...
Now that if I have 100000000 characters (very many, e.g like simulated particles), heap allocation like this may fail to work on computers with small RAM. What is your solution to this problem?
Edit
What other methods or designs do you know can change or replace the factory method/class?
Do you actually have 100K characters? And are you actually in an environment in which you are memory constrained and allocation fails? Even if Character is a whopping 1KB in side, you'd be looking at 100MBs consumed, which isn't that much, even for feature phones.
But perhaps you're worried that you might actually have memory to spare, but fragmentation is so high you can't use it. That's a fairer concern, and one usually relevant to games. Perhaps take a look at the object pool pattern. Also, taking into consideration the large number of characters you're speaking of, perhaps flighweight might also help!
Finally, running out of memory isn't like other program errors like losing a TCP connection or facing a disk error. If you need to allocate the 100001th character and there's no more memory for it you can't not allocate it, show an error to the user or try again later. You can't go on without them as it were. So don't - just bail the program and perhaps do whatever cleaning up is required to not lose too much game state etc. Have a read for malloc never fails as well.
The heap memory is obviously limited, but the limit is in practice not that small (at least gigabytes on current PCs).
And memory consumption is not the biggest problem in a game. If you have many characters, you might need to deal with interactions between them, and that could be more difficult (e.g. determining the set of characters close to a given one could be more challenging).
You should read more about memory management, virtual address space, smart pointers, reference counting, RAII, circular references, weak references, hash consing.
Notice that the heap is global to your program & process (it is not the property of some particular class or code chunk, but of your entire program).
The heap allocation routines (related to new & delete) are generally implemented about some operating system primitives (often system calls) to grow the virtual address space. On Linux, see mmap(2). The operating system could provide some mean to query your virtual address space (on Linux, see proc(5) and for a process of pid 1234, the /proc/1234/maps pseudo-file).
I recommend reading a good book on garbage collection, such as the GC handbook. It teaches you concepts and terminology which are relevant for C++ programming (notably in games). In some sense, you may want to implement your own GC for your game.
C++ has some allocator concept and standard containers know about that.
Read also some Introduction to Algorithms.
heap allocation like this may fail to work on computers with small RAM.
Then either improve your program to use less memory, or get a bigger computer. Perhaps consider some distributed computing approach (e.g. cloud computing), like in MMORG.
What other methods or designs do you know can change or replace the factory method/class?
They won't change much the consumed memory, because in your design every character is represented by its unique C++ object. So that does not matter much.
Assuming you have enough local disk space to store the information of all characters, you can mmap one or more files that store all the character data, and create a character object from data in the file(s) only when needed.
If you have neither enough memory nor disk space to store data of all characters locally, then it becomes a much more difficult problem -- you might need to assign to every character an URI and load it from the network...
EDIT: Of course, after updating the character data, you'd need to write it back to the corresponding file. And for performance sake, you might want to implement some caching mechanism so frequently used characters don't need to be read & written back every time they are used.
Related
I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.
This question already has answers here:
How to determine CPU and memory consumption from inside a process
(10 answers)
Closed 6 years ago.
See, I wanted to measure memory usage of my C++ program. From inside the program, without profilers or process viewers, etc.
Why from inside the program?
Measurements will be done thousands of times—must be automated; therefore, having an eye on Task Manager, top, whatever, will not do
Measurements are to be done during production runs—performance degradation, which may be caused by profilers, is not acceptable since the run times are non-negligible already (several hours for large problem instances)
note. Why measure at all? The only reason to measure used memory (as reported by the OS) as opposed to calculating “expected” usage in advance is the fact that I can not directly, analytically “sizeof” how much does my principal data structure use. The structure itself is
unordered_map<bitset, map<uint16_t, int64_t> >
these are packed into a vector for all I care (a list would actually suffice as well, I only ever need to access the “neighbouring” structures; without details on memory usage, I can hardly decide which to choose)
vector< unordered_map<bitset, map<uint16_t, int64_t> > >
so if anybody knows how to “sizeof” the memory occupied by such a structure, that would also solve the issue (though I'd probably have to fork the question or something).
Environment: It may be assumed that the program runs all alone on the given machine (along with the OS, etc. of course; either a PC or a supercomputer's node); it is certain to be the only one program requiring large (say > 512 MiB) amounts of memory—computational experiment environment. The program is either run on my home PC (16GiB RAM; Windows 7 or Linux Mint 18.1) or the institution supercomputer's node (circa 100GiB RAM, CentOS 7), and the program may want to consume all that RAM. Note that the supercomputer effectively prohibits disk swapping of user processes, and my home PC has a smallish page file.
Memory usage pattern. The program can be said to sequentially fill a sort of table, each row wherein is the vector<...> as specified above. Say the prime data structure is called supp. Then, for each integer k, to fill supp[k], the data from supp[k-1] is required. As supp[k] is filled it is used to initialize supp[k+1]. Thus, at each time, this, prev, and next “table rows” must be readily accessible. After the table is filled, the program does a relatively quick (compared with “initializing” and filling the table), non-exhaustive search in the table, through which a solution is obtained. Note that the memory is only allocated through the STL containers, I never explicitly new() or malloc() myself.
Questions. Wishful thinking.
What is the appropriate way to measure total memory usage (including swapped to disk) of a process from inside its source code (one for Windows, one for Linux)?
Should probably be another question, or rather a good googling session, but still---what is the proper (or just easy) way to explicitly control (say encourage or discourage) swapping to disk? A pointer to an authoritative book on the subject would be very welcome. Again, forgive my ignorance, I'd like a means to say something on the lines of “NEVER swap supp” or
“swap supp[10]”; then, when I need it, “unswap supp[10]”—all from the program's code. I thought I'd have to resolve to serialize the data structures and explicitly store them as a binary file, then reverse the transformation.
On Linux, it appeared the easiest to just catch the heap pointers through sbrk(0), cast them as 64-bit unsigned integers, and compute the difference after the memory gets allocated, and this approach produced plausible results (did not do more rigorous tests yet).
edit 5. Removed reference to HeapAlloc wrangling—irrelevant.
edit 4. Windows solution
This bit of code reports the working set that matches the one in Task Manager; that's about all I wanted—tested on Windows 10 x64 (tested by allocations like new uint8_t[1024*1024], or rather, new uint8_t[1ULL << howMuch], not in my “production” yet ).
On Linux, I'd try getrusage or something to get the equivalent.
The principal element is GetProcessMemoryInfo, as suggested by #IInspectable and #conio
#include<Windows.h>
#include<Psapi.h>
//get the handle to this process
auto myHandle = GetCurrentProcess();
//to fill in the process' memory usage details
PROCESS_MEMORY_COUNTERS pmc;
//return the usage (bytes), if I may
if (GetProcessMemoryInfo(myHandle, &pmc, sizeof(pmc)))
return(pmc.WorkingSetSize);
else
return 0;
edit 5. Removed reference to GetProcessWorkingSetSize as irrelevant. Thanks #conio.
On Windows, the GlobalMemoryStatusEx function gives you useful information both about your process and the whole system.
Based on this table you might want to look at MEMORYSTATUSEX.ullAvailPhys to answer "Am I getting close to hitting swapping overhead?" and changes in (MEMORYSTATUSEX.ullTotalVirtual – MEMORYSTATUSEX.ullAvailVirtual) to answer "How much RAM is my process allocating?"
To know how much physical memory your process takes you need to query the process working set or, more likely, the private working set. The working set is (more or less) the amount of physical pages in RAM your process uses. Private working set excludes shared memory.
See
What is private bytes, virtual bytes, working set?
How to interpret Windows Task Manager?
https://blogs.msdn.microsoft.com/tims/2010/10/29/pdc10-mysteries-of-windows-memory-management-revealed-part-two/
for terminology and a little bit more details.
There are performance counters for both metrics.
(You can also use QueryWorkingSet(Ex) and calculate that on your own, but that's just nasty in my opinion. You can get the (non-private) working set with GetProcessMemoryInfo.)
But the more interesting question is whether or not this helps your program to make useful decisions. If nobody's asking for memory or using it, the mere fact that you're using most of the physical memory is of no interest. Or are you worried about your program alone using too much memory?
You haven't said anything about the algorithms it employs or its memory usage patterns. If it uses lots of memory, but does this mostly sequentially, and comes back to old memory relatively rarely it might not be a problem. Windows writes "old" pages to disk eagerly, before paging out resident pages is completely necessary to supply demand for physical memory. If everything goes well, reusing these already written to disk pages for something else is really cheap.
If your real concern is memory thrashing ("virtual memory will be of no use due to swapping overhead"), then this is what you should be looking for, rather than trying to infer (or guess...) that from the amount of physical memory used. A more useful metric would be page faults per unit of time. It just so happens that there are performance counters for this too. See, for example Evaluating Memory and Cache Usage.
I suspect this to be a better metric to base your decision on.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Wants to create an application storing data in memory. But i dont want the data to be lost even if my app crashes.
What concept should i use?
Should I use a shared memory, or is there some other concept that suits my requirement better.
You are asking for persistence (or even orthogonal persistence) and/or for application checkpointing.
This is not possible (at least thru portable C++ code) in the general case for some arbitrary existing C++ code, e.g. because of ASLR, because of pointers on -or to- the local call stack, because of multi-threading, and because of external resources (sockets, opened files, ...), because the current continuation cannot be accessed, restored and handled in standard C++.
However, you might design your application with persistence in mind. This is a strong architectural requirement. You could for instance have every class contain some dumping method and its load factory function. Beware of shared pointers, and take into account that you could have cyclic references. Study garbage collection algorithms (e.g. in the Gc HandBook) which are similar to those needed for persistence (a copying GC is quite similar to a checkpointing algorithm).
Look also in serialization libraries (like libs11n). You might also consider persisting into textual format (e.g. JSON), perhaps inside some Sqlite database (or some real database like PostGreSQL or MongoDb....). I am doing this (in C) in my monimelt software.
You might also consider checkpointing libraries like BLCR
The important thing is to think about persistence & checkpointing very early at design time. Thinking of your application as some specialized bytecode interpreter or VM might help (notably if you want to persist continuations, or some form of "call stack").
You could fork your process (assuming you are on Linux or Posix) before persistence. Hence, persistence time does not matter that much (e.g. if you persist every hour or every ten minutes).
Some language implementations are able to persist their entire state (notably their heap), e.g. SBCL (a good Common Lisp implementation) with its save-lisp-and-die, or Poly/ML -an ML dialect- with its SaveState, or Squeak (a Smalltalk implementation).
See also this answer & that one. J.Pitrat's blog has a related entry: CAIA as a sleeping beauty.
Persistency of data with code (e.g. vtables of objects, function pointers) might be technically difficult. dladdr(3) -with dlsym- might help (and, if you are able to code machine-specific things, consider the old getcontext(3), but I don't recommend that). Avoid name mangling (for dlsym) by declaring extern "C" all code related to persistence. If you want to persist some data and be able to restart from it with a slightly modified program (e.g. a small bugfix) things are much more complex.
More pragmatically, you could have a class representing your entire persistable state, and implement methods to persist (and reload it). You would then persist only at certain steps of your algorithm (e.g. if you have a main loop or an event loop, at start of that loop). You probably don't want to persist too often (e.g. because of the time and disk space required to persist), e.g. perhaps every ten minutes. You might perhaps consider some transaction log if it fits in the overall picture of your application.
Use memory mapped files - mmap (https://en.wikipedia.org/wiki/Mmap) And allocate all your structures inside mapped memory region. System will properly save mapped file to disk when your app crashes.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I know this is a really dumb question but I feel I have been going about learning C++ wrong, ignoring memory too much. I always hear about memory management in C++ and C but what is it's importance to something like a video game, or some office program?
C and C++ are languages which are by most people considered low level (or with low level parts), this allows you to write hardware specific code. And as hardware usually have a lot of preconditions about their input, you'll have to manually manage memory, in terms of memory layout, allocation, padding alike.
This is to be expected when interacting with hardware, and is actually required in order to deal with hardware. However when implementing non hardware specific code, the same utilities and language features still apply. I.e. if you want a piece of dynamic memory, you'll have to explicitly request it, and explicitly release it (the hard part). The way around this, in C++ is to use classes, which helps you handle memory management, by either abstracting the memory management away all together, or by providing garbage collection (usually via reference counting).
The consequence of not cleaning up your garbage, I.e. returning resources to the system, also known as leaking, is that the system will EVENTUALLY run out of resources (as resources are generally limited, although sometimes immense). If your program is small, and has a limited span of executing, this may not be an issue, but nevertheless you should handle your resources, as the hosting environment is actually NOT required to do so, for you, after program termination (although most system will do so, atleast in terms of memory).
Also please notice, that you should have focus on managing resources. Rather than just memory. There are a lot of resources, which are all limited, and hence all needs to be managed. Other resources could be; files, IP sockets, handles, hardware devices, ...
For games in specific, you'll have to expect a high resource usage, in terms of memory and file access, also your game is likely to run, for quite a while (assuming its good), and hence handling resource management becomes critical!
My best piece if advice is keeping away from raw pointers and manual memory management (new/free), and instead using the standard containers (std::vector, alike), value semantics (I.e. pass arguments around by value instead of by pointers.), reference semantics, and if you really have to use pointers, make use of std::unique_ptr, and std::shared_ptr. (This is assuming your writing non hardware code, like a game or a text processor).
Sean Paul did a talk about avoiding pointers at the going native conference in 2013, and it's really worth watching. I can't remember the name of the talk, but it's live on the channel9 webpage for free. The other talks from going native are also recommendable!
You probably already know the answer, but considering that your computer only has a finite amount of memory, it's only natural that applications must manage this memory in a conservative manner.
If, for example, a game poorly manages memory and uses up a large chunk of your 8 gigs of installed ram, other applications that demand a modest amount of memory will essentially start fighting over it. This will usually result in your OS to start swapping memory with other storage mediums and ultimately degrade the performance of your computer until more memory is available.
I'm architecting a small software engine and I'd like to make expensive use of the stack for rapid iterations of large number sets. But then it occurred to me that this might be a bad idea since the stack isn't as large a memory store as the heap. But I am attracted to the stack's speed and lack of dynamic allocation coding practices.
Is there a way to find out how far I can push the stack on a given platform? I am looking mainly at mobile devices but the issue could come up on any platform.
On *nix, use getrlimit:
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon
reaching this limit, a SIGSEGV signal is generated. To handle
this signal, a process must employ an alternate signal stack
(sigaltstack(2)).
On Windows, use VirtualQuery:
For the first call, pass it the address of any value on the stack to
get the base address and size, in bytes, of the committed stack space.
On an x86 machine where the stack grows downwards, subtract the size
from the base address and VirtualQuery again: this will give you the
size of the space reserved for the stack (assuming you're not
precisely on the limit of stack size at the time). Summing the two
naturally gives you the total stack size.
There is no platform-independent method since stack size is left to the implementation and host system logically - on an embedded mini-SOC there are less resources to distribute than on a 128GB RAM server. You can however influence the stack size of a specific thread on all OS'es as well with API-specific calls.
A possible portable solution is to write an allocator yourself.
You do not have to make use of the process stack, just simulate it in the heap.
Allocate a large amount of memory in the beginning, and write a stack allocator on top of it to use it while allocating.
Google 'Allocator Requirements' for information on how to achieve it in C++.
I'm not sure if the term 'Stack Allocator' is canonical, but I mean that you have to put stack like restrictions on where the allocation or deallocation has to happen.
Since you said that your algorithm is suited to this pattern, I think it'd be easy.
In standard C++, definitely not. In a portable way, probably not. In a particular OS, sometimes. If nothing else, you could open your own executable size and inspect the headers of the executable file to see it's stacksize. [The next problem is of course "how much of the stack was used before this bit of code" - which can be difficult to determine].
If you run the code in a separate thread, many of the (low level) thread interfaces allow you to specify a stack (or stacksize), E.g Posix threads pthread_set_stacksize or MS _beginthread. Again, you don't know EXACTLY how much space has been used up before it gets to the actual thread code - but it's probably not a huge amount.
Of course, in an embedded system (e.g. mobile phone), the stacksize is typically quite small, 4K, 12K or 64KB is very much normal - sometimes even a lot smaller than that in some systems.
Another potential problem is that you can't really know how much space is ACTUALLY used on the stack - you can measure after the fact in a compiled system, and of course, if you have a stack local array of int array[25];, we can know it takes up at least 25 * sizeof(int) - but there may be padding, the compiler saves registers on the stack, etc, etc.
Edit, as an afterthought:
I also don't really see much benefit in having two code-paths:
if (enough_stack_space_for_something)
use_stack_based_algorithm();
else
use_heap_based_algorithm();
This would add a fair amount of extra overhead, and more code is generally not a good plan in an embedded/mobile system.
Edit2: Also, if allocating memory is a major part of the runtime, perhaps looking at why that is, for example block-creation of objects would help?
To expand on the answers already given about why there is no portable way to do this, the entire concept of an actual stack is not part of the standard. You could write a C or C++ runtime that doesn't use a stack at all other than the function call records (which might internally be a linked list or something else).
The stack is an implementation detail of a particular machine/OS/compiler. Hence any technique to access stack metrics will be specific to machine/OS/compiler.
While not an actual answer to your specific question (Niels covered that quite well) but as advice to your problem domain: just allocate a large chunk of memory in the heap. There's no reason aside from convenience that the "real" stack is any different. Highly recursive (non-tail-recursive) algorithms often need to do this to ensure that they have a virtually unbounded "stack." Scripting languages that want to ensure they give a runtime error/exception rather than crashing the host application also often do this. To be efficient about things, you can either implement a "split stack" (like a std::deque would give you) or you can just be sure to preallocate a stack big enough for your needs.
There's no standard way to do it from within the language. I'm not even aware of a documented extension that is able to query.
However some compilers have options to set the stack size. And platform may specify what it does when launching a process, and/or provide ways to set stack size of a new thread, maybe even manipulate existing one.
For small platforms it's usual to know the whole memory size, have all the data segments on one end, a set size arena for the heap (may be 0), and the rest is stack, approaching from the other side.