We have some code that wants to call localtime very often from multiple threads. (Relevant background: it's a server where one of the things you can ask it for is the local time as a string, and it wants to be able to serve 100Ks of requests per second.)
We have discovered that on Ubuntu Linux 12.04, the glibc function localtime_r ("reentrant localtime") calls __tz_convert, which still takes a global lock!
(Also, it looks like FreeBSD makes localtime_r call tzset on every single invocation, because they're paranoid that the program might have done a setenv("TZ") and/or the user downloaded a new version of /etc/localtime between now and the last time localtime_r was called. (This is the opposite of the situation described here; it seems that glibc calls tzset on every invocation of localtime but not localtime_r, just to be confusing.)
Obviously, this is terrible for performance. For our purposes, we would like to basically "snapshot" the rules for our current timezone when the server starts running, and then use that snapshot forever afterward. So we'd continue to respect Daylight Savings Time rules (because the rules for when to switch to DST would be part of the snapshot), but we would never go back to the disk, take mutexes, or do anything else that would cause threads to block. (We are fine with not respecting downloaded updates to tzinfo and not respecting changes to /etc/localtime; we don't expect the server to physically change timezones while it's running.)
However, I can't find any information online about how to deal with timezone rules — whether there's a userspace API for working with them, or whether we'll be forced to reimplement a few hundred lines of glibc code to read the timezone data ourselves.
Do we have to reimplement everything downstream of __tz_convert — including tzfile_read, since it doesn't seem to be exposed to users? Or is there some POSIX interface and/or third-party library that we could use for working with timezone rules?
(I've seen http://www.iana.org/time-zones/repository/tz-link.html but I'm not sure it's helpful.)
Use https://github.com/google/cctz
It's fast and should do everything you want with a very simple API.
In particular, for a cctz equivalent to localtime, use the cctz::BreakTime() function. For example, https://github.com/google/cctz/blob/master/examples/example3.cc
Perhaps this free, open-source timezone library would fit the need.
It has a configuration flag called LAZY_INIT which is fully documented here. By default it is on, and will cause calls to std::call_once the first time each individual timezone is accessed. However you can compile with:
-DLAZY_INIT=0
and then the calls to std::call_once go away. Every single timezone is read from disk and fully initialized on first access (via a function local static). From then on, things are stable and lock-free, with no disk access. Naturally this increases initialization time up front, but decreases the "first-access" time for each timezone.
This library requires C++11/14, and so may not be suitable for that reason. It is based on (and heavily uses) the C++11 <chrono> library. Here is sample code that prints out the current local time:
#include "tz.h"
#include <iostream>
int
main()
{
using namespace date;
auto local = make_zoned(current_zone(), std::chrono::system_clock::now());
std::cout << local << '\n';
}
which just output for me:
2016-04-12 10:13:14.585945 EDT
The library is a modern, high-performance thread safe design. It is also very flexible and fully documented. Its capabilities go far beyond a simple replacement for C's localtime. Unlike the C API, you can specify any IANA timezone you need, for example:
auto local = make_zoned("Europe/London", std::chrono::system_clock::now());
This gives the current time in London:
2016-04-12 15:19:59.035533 BST
Note that by default the precision of the timestamp is that of std::chrono::system_clock. If you prefer another precision, that is easily accomplished:
using namespace date;
using namespace std::chrono;
auto local = make_zoned("Europe/London", std::chrono::system_clock::now());
std::cout << format("%F %H:%M %Z", local) << '\n';
2016-04-12 15:22 BST
See the docs for more details.
Related
I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.
Here is some C++ code illustrating my problem with a minimal expample:
// uncomment the next line, to make it hang up:
//#define BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG //needed for nanosecond support of boost
#include <boost/thread.hpp>
void foo()
{
while(true);
}
int main(int noParameters, char **parameterArray)
{
boost::thread MyThread(&foo);
if ( MyThread.timed_join( boost::posix_time::seconds(1) ) )
{
std::cout<<"\nDone!\n";
}
else
{
std::cerr<<"\nTimed out!\n";
}
}
As long as I don't turn on the nanosecond support everthing works as expected, but as soon as I uncomment the #define needed for the nanosecond support in boost::posix_time the program doesn't get past the if-statement any more, just as if I had called join() instead of timed_join().
Now I've already figured out, that this happens because BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG changes the actual data representation of the timestamps from a single 64bit integer to 64+32 bit. A lot boost stuff is completely implemented inside the headers but the thread methods are not and because of that they cannot adapt to the new data format without compiling them again with the apropriate options. Since the code is meant to run on an external server, compiling my own version of boost is not an option and neither is turning off the nanosecond support.
Therefore my question is as follows: Is there a way to pass on a value (on the order of seconds) to timed_join() without using the incompatible 96bit posix_time methods and without modifying the standard boost packages?
I'm running on Ubuntu 12.04 with boost 1.46.1.
Unfortunately I don't think your problem can be cleanly solved as written. Since the library you're linking against was compiled without nanosecond support, by definition you violate the one-definition rule if you happen to enable nanosecond support for any piece that's already compiled into the library binary. In this case, you're enabling it across the function calls to timed_join.
The obvious solution is to decide which is less painful to give up: Building your own boost, or removing nanosecond times.
The less obvious "hack" that may or may not totally work is to write your own timed_join wrapper that takes a thread object and an int representing seconds or ms or whatever. Then this function is implemented in a source file with nothing else and that does not enable nanosecond times for the specific purpose of calling into the compiled boost binary. Again I want to stress that if at any point you fail to completely segregate such usages you'll violate the one definition rule and run into undefined behavior.
Having used gprof and callgrind many times, I have reached the (obvious) conclusion that I cannot use them efficiently when dealing with large (as in a CAD program that loads a whole car) programs. I was thinking that maybe, I could use some C/C++ MACRO magic and somehow build a simple (but nice) logging mechanism. For example, one can call a function using the following macro:
#define CALL_FUN(fun_name, ...) \
fun_name (__VA_ARGS__);
We could add some clocking/timing stuff before and after the function call, so that every function called with CALL_FUN gets timed, e.g
#define CALL_FUN(fun_name, ...) \
time_t(&t0); \
fun_name (__VA_ARGS__); \
time_t(&t1);
The variables t0, t1 could be found in a global logging object. That logging object can also hold the calling graph for each function called through CALL_FUN. Afterwards, that object can be written in a (specifically formatted) file, and be parsed from some other program.
So here comes my (first) question: Do you find this approach tractable ? If yes, how can it be enhanced, and if not, can you propose a better way to measure time and log callgraphs ?
A collegue proposed another approach to deal with this problem, which is annotating with a specific comment each function (that we care to log). Then, during the make process, a special preprocessor must be run, parse each source file, add logging logic for each function we care to log, create a new source file with the newly added (parsing) code, and build that code instead. I guess that reading CALL_FUN... macros (my proposal) all over the place is not the best approach, and his approach would solve this problem. So what is your opinion about this approach?
PS: I am not well versed in the pitfalls of C/C++ MACROs, so if this can be developed using another approach, please say it so.
Thank you.
Well you could do some C++ magic to embed a logging object. something like
class CDebug
{
CDebug() { ... log somehow ... }
~CDebug() { ... log somehow ... }
};
in your functions then you simply write
void foo()
{
CDebug dbg;
...
you could add some debug info
dbg.heythishappened()
...
} // not dtor is called or if function is interrupted called from elsewhere.
I am bit late, but here is what I am doing for this:
On Windows there is a /Gh compiler switch which makes the compiler to insert a hidden _penter function at the start of each function. There is also a switch for getting a _pexit call at the end of each function.
You can utilizes this to get callbacks on each function call. Here is an article with more details and sample source code:
http://www.johnpanzer.com/aci_cuj/index.html
I am using this approach in my custom logging system for storing the last few thousand function calls in a ring buffer. This turned out to be useful for crash debugging (in combination with MiniDumps).
Some notes on this:
The performance impact very much depends on your callback code. You need to keep it as simple as possible.
You just need to store the function address and module base address in the log file. You can then later use the Debug Interface Access SDK to get the function name from the address (via the PDB file).
All this works suprisingly well for me.
Many nice industrial libraries have functions' declarations and definitions wrapped into void macros, just in case. If your code is already like that -- go ahead and debug your performance problems with some simple asynchronous trace logger. If no -- post-insertion of such macros can be an unacceptably time-consuming.
I can understand the pain of running an 1Mx1M matrix solver under valgrind, so I would suggest starting with so called "Monte Carlo profiling method" -- start the process and in parallel run pstack repeatedly, say each second. As a result you will have N stack dumps (N can be quite significant). Then, the mathematical approach would be to count relative frequencies of each stack and make a conclusion about the ones most frequent. In practice you either immediately see the bottleneck or, if no, you switch to bisection, gprof, and finally to valgrind's toolset.
Let me assume the reason you are doing this is you want to locate any performance problems (bottlenecks) so you can fix them to get higher performance.
As opposed to measuring speed or getting coverage info.
It seems you're thinking the way to do this is to log the history of function calls and measure how long each call takes.
There's a different approach.
It's based on the idea that mainly the program walks a big call tree.
If time is being wasted it is because the call tree is more bushy than necessary,
and during the time that's being wasted, the code that's doing the wasting is visible on the stack.
It can be terminal instructions, but more likely function calls, at almost any level of the stack.
Simply pausing the program under a debugger a few times will eventually display it.
Anything you see it doing, on more than one stack sample, if you can improve it, will speed up the program.
It works whether or not the time is being spent in CPU, I/O or anything else that consumes wall clock time.
What it doesn't show you is tons of stuff you don't need to know.
The only way it can not show you bottlenecks is if they are very small,
in which case the code is pretty near optimal.
Here's more of an explanation.
Although I think it will be hard to do anything better than gprof, you can create a special class LOG for instance and instantiate it in the beginning of each function you want to log.
class LOG {
LOG(const char* ...) {
// log time_t of the beginning of the call
}
~LOG(const char* ...) {
// calculate the total time spent,
//by difference between current time and that saved in the constructor
}
};
void somefunction() {
LOG log(__FUNCTION__, __FILE__, ...);
.. do other things
}
Now you can integrate this approach with the preprocessing one you mentioned. Just add something like this in the beginning of each function you want to log:
// ### LOG
and then you replace the string automatically in debug builds (shoudn't be hard).
May be you should use a profiler. AQTime is a relatively good one for Visual Studio. (If you have VS2010 Ultimate, you already have a profiler.)
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 13 years ago.
I'm a novice at programming although I've been teaching myself Python for about a year and I studied C# some time ago.
This month I started C++ programming courses at my university and I just have to ask; "why is the C++ code so complicated?"
Writing "Hello world." in Python is as simple as "print 'Hello world.'" but in C++ it's:
# include <iostream>
using namespace std;
int main ()
{
cout << "Hello world.";
return 0;
}
I know there is probably a good reason for all of this but, why...
... do you have to include the <iostream> everytime? Do you ever not need it?
... same question for the standard library, when do you not need std::*?
... is the "main" part a function? Do you ever call the main function? Why is it an integer? Why does C++ need to have a main function but Python doesn't?
... do you need "std::cout << "? Isn't that needlessly long and complicated compared to Python?
... do you need to return 0 even when you are never going to use it?
This is probably because I'm learning such basic C++ but every program I've made so far looks like this, so I have to retype the same code over and over again. Isn't that redundant? Couldn't the compiler just input this code itself, since it's always the same (i.e. afaik you always include <iostream>, std, int main, return 0)
C++ is a more low-level language that executes without the context of an interpreter. As such, it has many different design choices than does Python, because C++ has no environment which it can rely on to manage information like types and memory. C++ can be used to write an operating system kernel where there is no code running on the machine except for the program itself, which means that the language (some library facilities are not available for so-called freestanding implementations) must be self-contained. This is why C++ has no equivalent to Python's eval, nor a means of determining members, etc. of a class, nor other features that require an execution environment (or a massive overhead in the program itself instead of such an environment)
For your individual questions:
do you have to include the <iostream> everytime? Do you ever not need it?
#include <iostream> is the directive that imports the <iostream> header into your program. <iostream> contains the standard input/output objects - in particular, cout. If you aren't using standard I/O objects (for instance, you use only file I/O, or your program uses a GUI library, or are writing an operating system kernel), you do not need <iostream>
same question for the standard library, when do you not need std::*?
std is the namespace containing all of the standard library. using namespace std; is sort of like from std import *, whereas a #include directive is (in this regard) more like a barebones import std statement. (in actual fact, the mechanism is rather different, because C++ does not use using namespace std; to automatically lookup objects in std; the using-directive only imports the names into the global namespace.)
I'll note here that using-directives (using namespace) are frequently frowned upon in C++ code, as they import a lot of names and can cause name clashes. using-declarations (using std::cout;) are preferred when possible, as is limiting the scope of a using-directive (for instance, to one function or to one source file). Don't ever put using namespace in a header without good reason.
is the "main" part a function? Do you ever call the main function? Why is it an integer? Why does C++ need to have a main function but Python doesn't?
main is the entry point to the program - where execution starts. In Python, the __main__ module serves the same purpose. C++ does not execute code outside a defined function like Python does, so its entry point is a function rather than a module.
do you need "std::cout << "? Isn't that needlessly long and complicated compared to Python?
std::cout is only needed if you don't import the cout name into the global namespace, either by a using-directive (using namespace std;) or by a using-declaration (using std::cout). In this regard, it is once again much like the distinction between Python's import std and from std import * or from std import cout.
The << is an overloaded operator for standard stream objects. cout << value calls cout's function to output value. Python needs no such extra code because print is built into the language; this does not make sense for C++, where there may not even be an operating system, much less an I/O library.
do you need to return 0 even when you are never going to use it?
No. main (and no other function) has an implicit return 0; at the end. The return value of main (or, if the exit function is called, the value passed to it) is passed back to the operating system as the exit code. 0 indicates the program successfully executed - that it encountered no errors, etc. If an error is encountered, a non-zero value should be returned (or passed to exit).
In response to your questions at the end of the post, it can be summed up with the philosophy of C++:
You don't pay for what you don't use.
You don't always need to use stdin or stdout (Windows/GUI apps?), nor will you always be using the STL, nor will everything you write necessarily use the standard main (winAPI) etc. As a previous poster said, C++ is lower level than Python. You will be exposed to more of the details, which offers you more control over what you're doing.
... do you have to include the
everytime? Do you ever not
need it?
You don't need it if you're not going to use iostreams in that module. In larger programs, few modules do any actual IO directly, and so few actually need to use iostreams.
Turning the question around: in python you need to import sys and/or os in most non-trivial programs. Why?
... same question for the standard
library, when do you not need std::*?
You can have the using line or you can use the std:: prefix. This is very similar to the choice python gives you of either saying "from sys import *" or "import sys" and then having to prefix things with "sys.". In python you have to say "sys.stdout". Is "std::cout" really any worse?
... is the "main" part a function? Do
you ever call the main function? Why
is it an integer? Why does C++ need to
have a main function but Python
doesn't?
Yes, main is a function. Typically you wouldn't call main yourself. The name "main" is reserved for the entry-point of your program. It returns an integer because the value returned is used as the status code of your program. In Python you can use sys.exit if you want to return a non-zero status code.
Python doesn't have the same convention because with Python you can have code in a module not in a function. This code is executed when you load the module. Interestingly, many people feel it is bad style to have code at the top-level of a module and will instead create a main function by doing something like this:
def main(argv):
# program goes here
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv))
Also, in Python you tell the interpreter with module is the "main" module when you run it. eg: "python foo.py". In C, the "main" module is (effectively) the one with a function called main. (If there are multiple modules with a main function, it's a linker error.)
... do you need "std::cout << "? Isn't
that needlessly long and complicated
compared to Python?
The equivalent in Python is actually "sys.stdout.write(...)". Python's print statement is a special-case short-hand.
That said, many people do feel the iostreams convention of using bit-shifting operators for IO was a bad idea. Ironically, Python seems to have been "inspired" by this syntax. If you want to use print to write to somewhere other than stdout you can say:
print >>file, "Hello"
... do you need to return 0 even when
you are never going to use it?
You aren't going to use it, but your program will. As mentioned earlier, the value you return is the status code of your program.
Aside: I actually do feel that C++ is overcomplicated, but not because of any of the points you mention. All of the differences you mention go away (in the sense that you need just as much complexity in Python) once you start writing non-trivial programs that have multiple modules and do more than just writing to stdout.
You include <iostream> when you want to output things to the console. Since printing "Hello world" involves console output, you need iostream.
The main function is called by the operating system, basically. It gets called with the command-line arguments passed to the program. It returns an integer because the program must return an error code to the operating system (this is the standard way for determining if the last command was successful).
You can always use printf("hello world"); instead of std::cout << "hello world"; if you want to go C style. It's a bit quicker to write and lets you do formatted output.
You return 0 from main to indicate that the program executed successfully.
The compiler does not automatically include all the standard libraries and use namespace std because sometimes name collisions can result between your code and library code that you may not actually need at all. You don't always need all the libraries. Likewise, sometimes you are using a different main routine (Windows development comes to mind with its own, different WinMain starting function). The compiler also does not automatically return 0 because sometimes the program needs to indicate that it completed unsuccessfully.
There are good reasons for all these things. C++ is a very broad language it is used for everything from small embedded systems to giant applications built by 100s of programmers. The use case of a guy building a small program to run on a desktop is by no means the only one. So sometimes you are building library components. In that case no main(). Sometimes you are working on a tiny system with no standard library. In that case no std. Sometimes you want to build a Unix tool that works with other Unix text tools and signals its completion status with an int returned from main().
In other words the things you complain about are boilerplate to you. But they are vital details that vary to other users of the language.
This reminds me of The Evolution of a Programmer. Some of the languages and technologies demonstrated are a bit dated now, but you should get the general idea. :)
One of the reasons C++ is rather complicated is because it was designed to address problems that crop up in large programs. At the time C++ was created as AT&T, their biggest C program was about 10 million lines of code. At that scale, C doesn't function very well. C++ addresses many of the problems you get with that kind of program.
With that said, it's also possible to answer the original questions:
You would include <iostream> where it's needed. If you've got 10.000 C++ files, it's quite common that less than 1000, sometimes less than 100 will produce user-visible output.
A statement like print "Hello, world" assumes that there is a default output, but makes it hard to generalize. The cout << "Hello, world" form makes it explicit where the output goes, but the same form also allows cerr << "Goodbye, world" and MyTmpFile << "Starting phase #" << i
The standard library is in the std:: namespace. My 10.000 files will be in an additional 25 namespaces.
main is an oddity in many ways, being the startup function.
Baldur:
You don't always need <iostream>. The only things that you will always need are:
A main function (or a WinMain, if you're writing Win32 apps).
Variables, functions, operators, language constructs (if, while, etc.).
The ability to include functionality from libraries into your program.
Everything else is application-specific.
As other posters say, the return value of the main function is an error code1. If main returns 0, be happy: everything worked OK!
1This is useful when you write programs that "communicate" with other programs. The most simple way that a program can "tell" another whether it executed properly is using an error code.
As people have said, the simple answer is that they're different languages, with different goals. To answer your specific questions...
... do you have to include the <iostream> everytime? Do you ever not need it?
<iostream> is one of the header files for iostreams, the part of the C++ standard library responsible for input/output; in this instance, you need it to gain access to std::cout. If you're not doing I/O operations in a source file, you don't need to include it -- for example, most files containing class definitions probably won't need <iostream>.
... same question for the standard library, when do you not need std::*?
std is the name of namespace containing classes in the standard library; it's there to avoid name collisions. Python has packages and modules to do this.
You can use the using statement to bring items from another namespace into your current scope, see this FAQ entry for an example (and an explanation of why it's bad to blindly bring all of std into scope!).
... why is the "main" part a function? Do you ever call the main function? Why is it an integer? Why does C++ need to have a main function but Python doesn't?
Executable statements in C++ have to be contained within a function, and the main function is defined as where execution begins. In Python, executable statements can be placed at the top-level of a file, and execution is defined to .
You can call main() if you wish -- it's just a function, after all -- but there's not often a reason to do this. Behind the scenes, most implementations of C++ call main() for you once some startup housekeeping has been done by the runtime library.
The return value of main() is returned back to the operating system. This stems from C and UNIX, in which application programs are required to provide a 1-byte exit status code, and returning that value from main() is a clear way of expressing this.
... why do you need "std::cout << "? Isn't that needlessly long and complicated compared to Python?
This is just a design difference. iostreams is a fairly complex beast with lots of features, and one of the side-effects of this is that the syntax is a bit ugly for simple tasks at times.
... why do you need to return 0 even when you are never going to use it?
You do use it; this is the value returned to the operating system as the exit status of the program.
Python is high-level language. C++ is middle-level language.
While troubleshooting some performance problems in our apps, I found out that C's stdio.h functions (and, at least for our vendor, C++'s fstream classes) are threadsafe. As a result, every time I do something as simple as fgetc, the RTL has to acquire a lock, read a byte, and release the lock.
This is not good for performance.
What's the best way to get non-threadsafe file I/O in C and C++, so that I can manage locking myself and get better performance?
MSVC provides _fputc_nolock, and GCC provides unlocked_stdio and flockfile, but I can't find any similar functions in my compiler (CodeGear C++Builder).
I could use the raw Windows API, but that's not portable and I assume would be slower than an unlocked fgetc for character-at-a-time I/O.
I could switch to something like the Apache Portable Runtime, but that could potentially be a lot of work.
How do others approach this?
Edit: Since a few people wondered, I had tested this before posting. fgetc doesn't do system calls if it can satisfy reads from its buffer, but it does still do locking, so locking ends up taking an enormous percentage of time (hundreds of locks to acquire and release for a single block of data read from disk). Not doing character-at-a-time I/O would be a solution, but C++Builder's fstream classes unfortunately use fgetc (so if I want to use iostream classes, I'm stuck with it), and I have a lot of legacy code that uses fgetc and friends to read fields out of record-style files (which would be reasonable if it weren't for locking issues).
I'd simply not do IO a char at a time if it is sensible performance wise.
fgetc is almost certainly not reading a byte each time you call it (where by 'reading' I mean invoking a system call to perform I/O). Look somewhere else for your performance bottleneck, as this is probably not the problem, and using unsafe functions is certainly not the solution. Any lock handling you do will probably be less efficient than the handling done by the standard routines.
The easiest way would be to read the entire file in memory, and then provide your own fgetc-like interface to that buffer.
Why not just memory map the file? Memory mapping is portable (except in Windows Vista which requires you to jump through hopes to use it now, the dumbasses). Anyhow, map your file into memory, and do you're own locking/not-locking on the resulting memory location.
The OS handles all the locking required to actually read from the disk - you'll NEVER be able to eliminate this overhead. But your processing overhead, on the otherhand, won't be affected by extraneous locking other than that which you do yourself.
the multi-platform approach is pretty simple. Avoid functions or operators where standard specifies that they should use sentry. sentry is an inner class in iostream classes which ensures stream consistency for every output character and in multi-threaded environment it locks the stream related mutex for each character being output. This avoids race conditions at low level but still makes the output unreadable, since strings from two threads might be output concurrently as the following example states:
thread 1 should write: abc
thread 2 should write: def
The output might look like: adebcf instead of abcdef or defabc. This is because sentry is implemented to lock and unlock per character.
The standard defines it for all functions and operators dealing with istream or ostream. The only way to avoid this is to use stream buffers and your own locking (per string for example).
I have written an app, which outputs some data to a file and mesures the speed. If you add here a function which ouptuts using the fstream directly without using the buffer and flush, you will see the speed difference. It uses boost, but I hope it is not a problem for you. Try to remove all the streambuffers and see the difference with and without them. I my case the performance drawback was factor 2-3 or so.
The following article by N. Myers will explain how locales and sentry in c++ IOStreams work. And for sure you should look up in ISO C++ Standard document, which functions use sentry.
Good Luck,
Ovanes
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <cstdlib>
#include <boost/progress.hpp>
#include <boost/shared_ptr.hpp>
double do_copy_via_streambuf()
{
const size_t len = 1024*2048;
const size_t factor = 5;
::std::vector<char> data(len, 1);
std::vector<char> buffer(len*factor, 0);
::std::ofstream
ofs("test.dat", ::std::ios_base::binary|::std::ios_base::out);
noskipws(ofs);
std::streambuf* rdbuf = ofs.rdbuf()->pubsetbuf(&buffer[0], buffer.size());
::std::ostreambuf_iterator<char> oi(rdbuf);
boost::progress_timer pt;
for(size_t i=1; i<=250; ++i)
{
::std::copy(data.begin(), data.end(), oi);
if(0==i%factor)
rdbuf->pubsync();
}
ofs.flush();
double rate = 500 / pt.elapsed();
std::cout << rate << std::endl;
return rate;
}
void count_avarage(const char* op_name, double (*fct)())
{
double av_rate=0;
const size_t repeat = 1;
std::cout << "doing " << op_name << std::endl;
for(size_t i=0; i<repeat; ++i)
av_rate+=fct();
std::cout << "average rate for " << op_name << ": " << av_rate/repeat
<< "\n\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n"
<< std::endl;
}
int main()
{
count_avarage("copy via streambuf iterator", do_copy_via_streambuf);
return 0;
}
One thing to consider is to build a custom runtime. Most compilers provide the source to the runtime library (I'd be surprised if it weren't in the C++ Builder package).
This could end up being a lot of work, but maybe they've localized the thread support to make something like this easy. For example, with the embedded system compiler I'm using, it's designed for this - they have documented hooks to add the lock routines. However, it's possible that this could be a maintenance headache, even if it turns out to be relatively easy initially.
Another similar route would be to talk to someone like Dinkumware about using a 3rd party runtime that provides the capabilities you need.