Why word2bits RAM usage like word2vec? - word2vec

I am using word2bits for large datasets. Actually, what I observed was word2bits occupying RAM like word2vec. So, I thought that what will happen for a small dataset.
Because of this purpose, I ran the word2bits example which mentioned in Readme (https://github.com/agnusmaximus/Word2Bits).I observed the RAM usage in htop.
When I ran this text8 on word2vec. It is also occupying the same RAM.

The word2bits use RAM like word2vec. This issue was solved in the GitHub issues (github.com/agnusmaximus/Word2Bits/issues/7)

Related

memory leak disappears when using valgrind

We have a complex algorithm, which processes OpenCV images, thereby allocating and deallocating several GB of memory, mostly cv::Mat with about 10MB size each. If we run this iteratively unter valgrind (either with --tool=massif or --tool=memcheck) the memory footprint returns to the same value (+-1MB) after each iteration, and no significant memory leak is found. Watching from outside via ps or pmap or from inside via /proc/self/status also shows a maximum footprint of 2.3GB not increasing.
If we run the same software without valgrind however, then the memory footprint (checked from outside via ps or pmap or from inside via /proc/self/status) increases with every iteration about several hundred MB, soon reaching 5BG after a few iterations.
Thus we have something looking like a memory leak, but valgrind is of no help for finding the cause.
What could this be?
(This is C++ under Ubuntu).
Thanks to the comment of #phd I found a solution to my problem: Using tcmalloc reduces the memory footprint dramatically (using 2.5GB instead of 6GB). See the attached grafic
RSS memory using different malloc libraries
There seems to be still a slight increase in memory usage with tcmalloc or jemalloc, but at least it's ok for the iteration numbers we usally have.
Still wonder though how malloc can waste so much resources. I tried to find out with malloc_info(), but with no success. I suspect memory fragmentation and/or multithreading plays a role here.

Glowcode memory usage number is different from Window Task Manager

I'm trying to use glowcode to track a C++ mem-leak issue in our app.
But first of the first, glowcode cannot track all the memory allocated from our app. It only tracks 300MB memory usage when TaskManager shows we're using 700MB;
I doubted that the lost memory is allocated by some 3rd-party dll(s), but it's hard to find it out.
Do you have any similar issues or any ideas about that? Thanks in advance.
EDIT:
WOW! VMMap is a great tool, its timeline feature fits me well.
Well, it seems that VMMap is quite suited for the job, the timeline functionality is really a killer feature. I have sorted out our memleak problem with it.
Thanks #sergmat!
As for Glowcode, it looks like that it only tracks heap-allocation, not VirtualAlloc-ed memory (which is shown in VMMap as `Private Data'), so that's why the figures did not match with Task-Manager.
For tracking stacktrace, VMMap takes quite a lot of memory for itself compared with Glowcode. But for a 8GB memory machine, it won't be a big issue.

How many bytes does a Xeon bring into the cache per memory access?

I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs) held in RAM, over 10 GB, and elements of it need to be accessed periodically. I want to revise the data structure to work with the system's caching mechanism as much as possible.
Currently, accesses are done mostly randomly across the structure, and each time 1-4 32-bit ints are read. It is a long time before another read occurs in the same place, so there is no benefit from the cache.
Now I know that when you read a byte from a random location in RAM, more than just that byte is brought into the cache. My question is how many bytes are brought in? Is it 16, 32, 64, 4096? Is this called a cache line?
I am looking to redesign the data structure to minimize random RAM accesses and work with the cache instead of against it. Knowing how many bytes are pulled into the cache on a random access will inform the design choices I make.
Update (October 2014):
Shortly after I posed the question above the project was put on hold. It has since resumed and based on suggestions in answers below, I performed some experiments around RAM access, because it seemed likely that TLB thrash was happening. I revised the program to run with huge pages (2MB instead of the standard 4KB), and observed a small speedup, about 2.5%. I found great information about setting up for huge pages here and here.
Today’s CPUs fetch memory in chunks of (typically) 64 bytes, called cache lines. When you read a particular memory location, the entire cache line is fetched from the main memory into the cache.
More here : http://igoro.com/archive/gallery-of-processor-cache-effects/
A cache line for any current Xeon processor is 64 bytes. One other thing that you might want to think about is the TLB. If you are really doing random accesses across 10GB of memory then you are likely to have a lot of TLB misses which can potentially be as costly as cache misses. You can get work around with with large pages, but it's something to keep in mind.
Old SO question that has some info that might be of use to you (in particular the first answer where to look for Linux CPU info - responder doesn't mention line size proper, but 'other info' on top of associativity etc). Question is for x86, but answers are more general. Worth a look.
Where is the L1 memory cache of Intel x86 processors documented?
You might want to head over to http://agner.org/optimize/ and grab the optimization PDFs available there - there's a lot of good (low-level) information in there. Pretty focused on assembly language level, but there's lessons to be learned for C/C++ programmers as well.
Volume 3, "The microarchitecture of Intel, AMD and VIA CPUs" should be of interest :-)
Good (long) article about organizing data structures to take cache and RAM hierarchy into account from GNU's libc maintainer: https://lwn.net/Articles/250967/ (full PDF here: http://www.akkadia.org/drepper/cpumemory.pdf)

memory footprint issue

I was just curious I have a binary executable file in unix around 9MB. is that considered a large memory footprint? the client will be calling this to generate some values and subsequently queue messages elsewhere. I am just curious who is one suppose to know how when is it too big a memory footprint for a program and then having to provide like a static library instead of an executable?
Everything is relative. It's large footprint if the app is running on a machine with 8MB of RAM. It's not large if the app is running on a machine with 64GB RAM. Then again, it might be large even on a 64GB RAM machine if most of the RAM has been gobbled up by some huge Oracle instance (for example).
You should also take into account that only a part of those 9MB is actually loaded into RAM -- readelf or objdump utilities can show you how much exactly.
It all really depends on how much value you get for that 9MB. For example if the file added a few numbers then definately too large. However 9MB isn't really all that much these days where a server can have over 8GB of memory easily.
For comparison just starting some VM's can use over 50MB of memory.
Back in 1996 I was routinely creating executables (staticly-linked) for a top-tier CAD/CAM vendor that routinely ran between 32 and 50MB in size, depending on the platform. Yes, back when memory was $40 per MB, we managed to fill up a system with our program. In 1996. In light of that, everything I see today wrt software bloat pisses me off, because that program did more on startup than most do in their whole day. :-)
So no, unless it's "helloworld.exe", 9MB isn't much these days.
That sounds pretty normal. You could hit 9MB quickly by statically linking to various libs, or by enabling debugging symbols.
I have 1.5TB of hard drive space. I can fit 174762 copies of that binary on my disk.
As for ram, perhaps if there was 9MB of code in that binary (which I highly doubt), then maybe it will all be kept in ram.
Otherwise, the executable loader will probably only load the parts it needs, if there are some resources or unmapped parts of the binary they aren't necessary to be kept in ram for the whole runtime. In any case, you probably have at least 1-4GB of ram, so it's clearly not a problem...

How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.
Each item in the stl-map is a complex object with references to other items in the other stl-maps.
The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.
When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).
After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.
I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.
I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.
I guess another poor mans solution could be to disable the pagefile completely on Windows.
I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.
Are there other more elegant solutions ?
This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.
Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.
If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.
As #Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.
But for the record, none of your "poor mans solutions" would work. At all.
Windows pages out some of your data because there's not room for it in RAM.
Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.
Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.
If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.
It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.
If you can't do that, then either
throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
switch to a format that's optimized for efficient disk access (A SQL database probably)
We have a similar problem and the solution we choose was to allocate everything in a shared memory block. AFAIK, Windows doesn't page this out. However, using stl-map here is not for faint of heart either and was beyond what we required.
We are using Boost Shared Memory to implement this for us and it works well. Follow examples closely and you will be up and running quickly. Boost also has Boost.MultiIndex that will do a lot of what you want.
For a no cost sql solution have you looked at Sqlite? They have an option to run as an in memory database.
Good luck, sounds like an interesting application.
I have an application where the entire
database is implemented in memory
using a stl-map for each table in the
database.
That's the start of the end: STL's std::map is extremely memory inefficient. Same applies to std::list. Every element would be allocated separately causing rather serious memory waste. I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue.
When the application have been running
for an hour or so, then Windows 2003
SP2 starts to page out parts of the
RAM for the application (Eventhough
there is 16 GByte RAM on the machine).
Hard to tell without knowing how your application is written. Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. But that normally affects memory mapped files and alike.
Otherwise, I would strongly suggest to read up the Windows memory management documentation . It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. I never had luck with it, but probably in your application using custom std::allocator would work.
I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space.
Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. On Windows the untameable file system cache seems to antagonise the relationships.
Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too.
Pagefiles - who needs em'
---- Edit
Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. This is the same as this:
Can I tell Windows not to swap out a particular processes’ memory?
and VirtualLock function should do its job:
http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx
---- Previous answer
First of all you need to distinguish between memory leak and memory need problems.
If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application.
SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well.
If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment.