Reduce size of tlog files produced by compiler - c++

Since our build on the build server is more and more slowing down I tried to find out what could be the cause. It seems it is mostly hanging in the disk IO operations of the .tlog files since there is no CPU load and still the build hangs. Even with a project containing only 10 cpp files, it generates ~5500 rows in the CL.read.1.tlog file.
The suspicious thing is that the file contains the same headers over and over, especially boost headers which take up like 90% of the file.
Is this really the expected behavior that those files are so big and have redundant content or is maybe a problem triggered from our source code? Are there maybe cyclic includes or too many header includes that can cause this problem?
Update 1
After all the comments I'll try to clarify some more details here.
We are only using boost by including the headers and linking the already compiled libs, we are not compiling boost itself
Yes, SSD is always a nice improvement, but the build server is hosted by our IT and we do not have SSDs available there. (see points below)
I checked some perfcounters especially via perfmon during the compilation. Whereas the CPU and Memory load are negligible most of the time, the disk IO counters and also queue sizes are quite high all the time. Disk Activity - Highest Active time is constantly on 100& and if I sort the Total (B/sec) it's all full with tlog files which read/write a lot of data to the disk.
Even if 5500 lines of tlog seem okay in some cases, I wonder why the exact same boost headers are contained over and over. Here a logfile where I censored our own headers.
There is no Antivirus influencing. I stopped it for my investigations since we know that it influences our compilation even more.
On my local developer machine with SSD it takes ~16min to build our whole solution, whereas on our build server with a "slower" disk it takes ~2hrs. CPU and memory are comparable. The 5500 line file just was an example from a single project within a solution of 20-30 projects. We have a project where we have ~30MB tlog files with ~60.000 lines in it, only this project takes half of the compilation duration.
Of course there is some basic CPU load on the machine during compilation. But it is not comparable to other developer machines with SSDs.
Our .net solution with 45 projects is finished in 12min (including setup project with WiX)
As on developer machines with SSDs we have at least a reduction from 2hrs to 16mins with a comparable CPU/memory configuration my assumption for the bottle neck was always the hard disk. Checking for disk related operations lead me to the tlog files since they caused the highest disk activity according to permon.

Related

Same Qt-based app uses much more resources on file load on Windows 10 now vs. early 2022

I am working with an open-source application with a Qt UI that processes large (often 500MB+) XML files. It is in general poorly written from a memory perspective, as it stores the entirety of the data parsed from all files in memory rather than processing and then closing them. I suspect it was written this way to be more responsive (we didn't write it), but it's always been a "RAM hog". However, this past April 2022 it worked quite passably on a Windows 10 workstation.
Now, in Oct 2022, the very same .exe file uses so much RAM on the same machine with the same size files that it slows to a crawl and is virtually unusable. So I suspect a change with Windows and/or the machine that somehow changes how Qt handles file open. In particular, looking at the memory usage, it looks suspiciously like when the user selects multiple files, it's trying to invoke the file handler function on them all concurrently rather than one at a time. This would be helpful if the parsing were CPU limited, but a disaster in our case where RAM is by far limiting.
Each file parse requires building a DOM tree that's somewhat larger than the size of the file, but then the code extracts the necessary data and populates a data structure that is smaller than the file (maybe 0.75x the size). The scope of the DOM tree is limited to the function called on file open, so back when we first compiled this app, if you selected 10 files, it would build the first DOM tree, and then populate the corresponding data structure, after which the memory for the DOM tree would be released and only the data structure would "live on". Then, then the next DOM tree would be built, leading to a "sawtooth" pattern of RAM use with a drop when each file finished parsing, and with the peak usage never more than one DOM tree plus the data structures already populated. Now, the same .exe uses about 2x more RAM than the sum of ALL the files put together before even the first parse finishes.
As I said, it's the same .exe, which was compiled on a Windows 7 machine in early 2022 but worked on this Windows 10 desktop as late as April 2022 without such exorbitant RAM usage. In fact, other tasks invoked from the GUI are also slower now, I expect for the same fundamental reason. On the Windows 7 machine where it was originally compiled, it seems to be running the same it always did. Is there any good explanation for this? How would it be fixed within the application code?

Visual Studio profiler uses huge amount of RAM

I'm trying to do an Instrumentation Profiling of a quite big project (around 40'000 source files in the whole solution, but the project under profiling has around 200 source files), written in C++.
Each time I run the profiling, it creates a huge report of around 34GB, and then, when it's going to analyze it, it's trying (I think) to load the whole file into RAM.
Obviously, it renders the computer unusable, and I have to stop the analyzer before it completes.
Any suggestions?
Hi there hope this response isn't too late. This is Andre Hamilton from Visual Studio profiler team. Analyzing such a large report file does take some time. Instrumentation produces that much data because all of your functions are instrumented. By instrumenting a few functions or specific binary you might be able to speed things up if you don't mind profiling via the command line. This will product a vsp file which you can then open in VS and use as normal. Lets say that your project requires n binaries to run. Let us assume that of these binaries you are interested in the performance of binary ni
Open up a VisualStudio command prompt
1) Do vsinstr ni.dllto instrument the entire binary or use the /include or /exclude options of vsinstr to further restrict which functions are instrumented. N.B if your binary was signed, you will need to resign after instrumenting
2)Start the profiler in instrumentation mode via the given command
vsperf /start:trace /output:myinstrumentedtrace.vsp
3)Launch Your Application
4)When you are ready to stop profiling
vsperf /shutdown
Hope this helps
(Notice, I assume you have a licensed copy of VS to both collect and analyze the data).
This is a general problem when profiling large or "dense" programs. You need to restrict the profiler to collect data only from certain units of your code base. In Microsoft's profilers this is done by using Include/Exclude switches either at command line or in the IDE.
There is a bug in VS, the reason is most of the profiling work is done in UI thread it make the VS unusable, as mentioned in http://channel9.msdn.com/Forums/TechOff/260091-Visual-Studio-Performance-Analysis-in-10-minutes
You may give try to VS 2012 to see if the problem is resolved, but there is no doubt loading a 34 GB file is not a simple task and its also the reason behind resulting in system unusable, so as John suggested above in comment section, break your code in smaller component and then do profiling, hope it help!

How do I debug lower level File access exceptions/crashes in C++ unmanaged code?

I'm currently working on trying to resolve a crash/exception on an unmanaged C++ application.
The application crashes with some predicatibility. The program basically
process a high volume of files combined with running a bunch of queries through
the access DB.
It's definitely occuring during a file access. The error message is:
"failed reading. Network name is no longer available."
It always seems to be crashing in the same lower level file access code.
It's doing a lower level library Seek(), then a Read(). The exception occurs
during the read.
To further complicate things, we can only get the errors to occur when
we're running an disk balancing utility. The utility essentially examines file
access history and moves more frequently/recently used files to faster storage retrieval
while files that are used less frequently are moved to a slower retrieval area. I don't fully
understand the architecture of the this particular storage device,
but essentially it's got an area for "fast" retrieval and one for "archived/slower."
The issues are more easily/predicably reproducible when the utility app is started and
stopped several times. According to the disk manufacturer, we should be able to run
the utility in the background without effecting the behaviour of the client's main application.
Any suggestions how to proceed here? There are theories floating around here that it's somehow related to latency on the storage device. Is there a way to prove/disprove that? We've written a small sample app that basically goes out accesses/reads a whole mess of files on the drive. We've (so far) been unable to reproduce the issue even running with SmartPools. My thought is to try push the latency theory is to have multiple apps basically reading volumes of files from disk while running the utility application.
The memory usage and CPU usage do not look out of line in the Task Manager.
Thoughts? This is turning into a bit of a hairball.
Thanks,
JohnB
Grab your debug binaries.
Setup Application Verifier and add your application to its list.
Hopefully wait for a crash dumb.
Put that through WinDBG.
Try command: !avrf
See what you get....

What is your strategy to write logs in your software to deal with possible HUGE amount of log messages?

Thanks for your time and sorry for this long message!
My work environment
Linux C/C++(but I'm new to Linux platform)
My question in brief
In the software I'm working on we write a LOT of log messages to local files which make the file size grow fast and finally use up all the disk space(ouch!). We want these log messages for trouble-shooting purpose, especially after the software is released to the customer site. I believe it's of course unacceptable to take up all the disk space of the customer's computer, but I have no good idea how to handle this. So I'm wondering if somebody has any good idea here. More info goes below.
What I am NOT asking
1). I'm NOT asking for a recommended C++ log library. We wrote a logger ourselves.
2). I'm NOT asking about what details(such as time stamp, thread ID, function name, etc) should be written in a log message. Some suggestions can be found here.
What I have done in my software
I separate the log messages into 3 categories:
SYSTEM: Only log the important steps in my software. Example: an outer invocation to the interface method of my software. The idea behind is from these messages we could see what is generally happening in the software. There aren't many such messages.
ERROR: Only log the error situations, such as an ID is not found. There usually aren't many such messages.
INFO: Log the detailed steps running inside my software. For example, when an interface method is called, a SYSTEM log message is written as mentioned above, and the entire calling routine into the internal modules within the interface method will be recorded with INFO messages. The idea behind is these messages could help us identify the detailed call stack for trouble-shooting or debugging. This is the source of the use-up-disk-space issue: There are always SO MANY INFO messages when the software is running normally.
My tries and thoughts
1). I tried to not record any INFO log messages. This resolves the disk space issue but I also lose a lot of information for debugging. Think about this: My customer is in a different city and it's expensive to go there often. Besides, they use an intranet that is 100% inaccessible from outside. Therefore: we can't always send engineers on-site as soon as they meet problems; we can't start a remote debug session. Thus log files, I think, are the only way we could make use to figure out the root of the trouble.
2). Maybe I could make the logging strategy configurable at run-time(currently it's before the software runs), that is: At normal run-time, the software only records SYSTEM and ERROR logs; when a problem arises, somebody could change the logging configuration so the INFO messages could be logged. But still: Who could change the configuration at run-time? Maybe we should educate the software admin?
3). Maybe I could always turn the INFO message logging on but pack the log files into a compressed package periodically? Hmm...
Finally...
What is your experience in your projects/work? Any thoughts/ideas/comments are welcome!
EDIT
THANKS for all your effort!!! Here is a summary of the key points from all the replies below(and I'll give them a try):
1). Do not use large log files. Use relatively small ones.
2). Deal with the oldest ones periodically(Either delete them or zip and put them to a larger storage).
3). Implement run-time configurable logging strategy.
There are two important things to take note of:
Extremely large files are unwieldy. They are hard to transmit, hard to investigate, ...
Log files are mostly text, and text is compressible
In my experience, a simple way to deal with this is:
Only write small files: start a new file for a new session or when the current file grows past a preset limit (I have found 50 MB to be quite effective). To help locate the file in which the logs have been written, make the date and time of creation part of the file name.
Compress the logs, either offline (once the file is finished) or online (on the fly).
Put up a cleaning routine in place, delete all files older than X days or whenever you reach more than 10, 20 or 50 files, delete the oldest.
If you wish to keep the System and Error logs longer, you might duplicate them in a specific rotating file that only track them.
Put altogether, this gives the following log folder:
Log/
info.120229.081643.log.gz // <-- older file (to be purged soon)
info.120306.080423.log // <-- complete (50 MB) file started at log in
(to be compressed soon)
info.120306.131743.log // <-- current file
mon.120102.080417.log.gz // <-- older mon file
mon.120229.081643.log.gz // <-- older mon file
mon.120306.080423.log // <-- current mon file (System + Error only)
Depending on whether you can schedule (cron) the cleanup task, you may simply spin up a thread for cleanup within your application. Whether you go with a purge date or a number of files limit is a choice you have to make, either is effective.
Note: from experience, a 50MB ends up weighing around 10MB when compressed on the fly and less than 5MB when compressed offline (on the fly is less efficient).
Your (3) is standard practice in the world of UNIX system logging.
When log file reaches a certain age or maximum size, start a new one
Zip or otherwise compress the old one
throw away the nth oldest compressed log
One way to deal with it is to rotate log files.
Start logging into a new file once you reach certain size and keep last couple of log files before you start overwriting the first one.
You will not have all possible info but you will have at least some stuff leading up to the issue.
The logging strategy sounds unusual but you have your reasons.
I would
a) Make the level of detail in the log messages configurable at run time.
b) Create a new log file for each day. You can then get cron to either compress them and/or delete them or perhaps transfer to off-ling storage.
My answer is to write long logs and then tweat out the info you want.
Compress them on a daily basis - but keep them for a week
I like to log a lot. In some programs I've kept the last n lines in memory and written to disk in case of an error or the user requesting support.
In one program it would keep the last 400 lines in memory and save this to a logging database upon an error. A separate service monitored this database and sent a HTTP request containing summary information to a service at our office which added this to a database there.
We had a program on each of our desktop machines that showed a list (updated by F5) of issues, which we could assign to ourselves and mark as processed. But now I'm getting carried away :)
This worked very well to help us support many users at several customers. If an error occurred on a PDA somewhere running our software then within a minute or so we'd get a new item on our screens. We'd often phone a user before they realised they had a problem.
We had a filtering mechanism to automatically process or assign issues that we knew we'd fixed or didn't care much about.
In other programs I've had hourly or daily files which are deleted after n days either by the program itself or by a dedicated log cleaning service.

Running out of file descriptors for mmaped files despite high limits in multithreaded web-app

I have an application that mmaps a large number of files. 3000+ or so. It also uses about 75 worker threads. The application is written in a mix of Java and C++, with the Java server code calling out to C++ via JNI.
It frequently, though not predictably, runs out of file descriptors. I have upped the limits in /etc/security/limits.conf to:
* hard nofile 131072
/proc/sys/fs/file-max is 101752. The system is a Linode VPS running Ubuntu 8.04 LTS with kernel 2.6.35.4.
Opens fail from both the Java and C++ bits of the code after a certain point. Netstat doesn't show a large number of open sockets ("netstat -n | wc -l" is under 500). The number of open files in either lsof or /proc/{pid}/fd are the about expected 2000-5000.
This has had me grasping at straws for a few weeks (not constantly, but in flashes of fear and loathing every time I start getting notifications of things going boom).
There are a couple other loose threads that have me wondering if they offer any insight:
Since the process has about 75 threads, if the mmaped files were somehow taking up one file descriptor per thread, then the numbers add up. That said, doing a recursive count on the things in /proc/{pid}/tasks/*/fd currently lists 215575 fds, so it would seem that it should be already hitting the limits and it's not, so that seems unlikely.
Apache + Passenger are also running on the same box, and come in second for the largest number of file descriptors, but even with children none of those processes weigh in at over 10k descriptors.
I'm unsure where to go from there. Obviously something's making the app hit its limits, but I'm completely blank for what to check next. Any thoughts?
So, from all I can tell, this appears to have been an issue specific to Ubuntu 8.04. After upgrading to 10.04, after one month, there hasn't been a single instance of this problem. The configuration didn't change, so I'm lead to believe that this must have been a kernel bug.
your setup uses a huge chunk of code that may be guilty of leaking too; the JVM. Maybe you can switch between the sun and the opensource jvms as a way to check if that code is not by chance guilty. Also there are different garbage collector strategies available for the jvm. Using a different one or different sizes will cause more or less garbage collects (which in java includes the closing of a descriptor).
I know its kinda far fetched, but it seems like all the other options you already followed ;)