the best approaches for logging localization using c++ - c++

I am working on a multinational project where target audience for logs might be from two nationalities. Therefore it is becoming important to log in more than one language , I am thinking about writing to 2 different log folders based on language every time I am logging something, but I am also wondering if there's some out of the box functionality that is coming along with logging frameworks like log4cpp?

As other commenters have mentioned, it sounds like you are going down the wrong track by looking to do multilingual logging.
My recommendation would be to use English (which is the standard for technical information, and which I guess is the language you know best) and to make sure that the language you use is clear, grammatically correct and unambiguous. Then if one of the technicians cannot understand it, they can very easily and efficiently run it through a machine translation engine such as Google Translate. Or indeed they could process the logs and run everything through Google Translate to append translated text, particularly if you annotate the logs to mark the language content.
Assuming that the input language is well-written, machine transation usually gives a good result which the end user can understand. If the message isn't clear, has typos or abbreviations, then that's where machine translation fails spectacularly.

Writing log naturally brings down the speed of execution due to file open, seek and write operations involved as part of it.
This is one primary reason why many developer and architects suggest to write log at different levels.Increasing the depth of log entries as level increases to trace down the problems better. At higher level, you will notice that your process speed drops due to more log entries getting generated.
Rather suggest you to use services that can translate from one language to other.
I'm sure there are libraries free or paid which does this translation. You can create a small utility program that runs in the background and does this conversion during process idle time.

Well one suggestion is you can use a different process/thread which listens for your log messages, which you can log it from there ..
This reduces I/O logging time in your main process/thread and you can make all changes related to Logging language over there..
For multi - Lingual support I think you can try writing with widechar string .. though I am not sure..

the best approaches for logging localization using c++
Install Qt 4 and use QObject::tr/ tr() macro for strings. Write strings in whatever language you want. Hire/Get a translator to localize strings using QT Linguist.
Please note that perfect translation is impossible, so there will be many "amusing" misunderstandings, even if your translator is a genius. So it might be a better idea to select main language for programming team.
--EDIT--
Didn't notice this part before:
in more than one language
One way to approach it is to implement log reader. Instead of writing plaintext messages, you could dump message ids (generated by some kind of macros) and string arguments if strings are formatted. "Log reader" will allow user to select desired language while viewing log file, and translate messages based on their ids/arguments using mechanism similar to QTranslator. The good thing about this approach is that you'll be able to add more languages later - so it'll be possible to retranslate old logs. The bad thing is that this format will be harder to read for "normal human", although you can add plaintext messages in addition to message ids and arguments and you'll need to write log viewer.
Qt 4 has most of this framework implemented (there are routines for dumping variants into text/data streams, and so on) along with translation tool. See QTranslator documentation and Linguist manual for more info.

Related

Logging Etiquette

I have a server program that I am writing. In this program, I log allot. Is it customary in logging (for a server) to overwrite the log of previous runs, append to the file with some sort of new run header, or to create a new log file (it won't be restarted too often).
Which of these solutions is the way of doing things under Linux/Unix/MacOS?
Also, can anyone suggest a logging library for C++/C? I need one, regardless of the answer to the above question.
Take a look in /var/log/...you'll see that files are structured like
serverlog
serverlog.1
serverlog.2
This is done by logrotate which is called in a cronjob. But everything is simply in chronological order within the files. So you should just append to the same log file each time, and let logrotate split it up if needed.
You can also add a configuration file to /etc/logrotate.d/ to control how a particular log is rotated. Depending on how big your logfiles are, it might be a good idea to add here information about your logging. You can take a look at other files in this directory to see the syntax.
This is a rather complex issue. I don't think that there is a silver bullet that will kill all your concerns in one go.
The first step in deciding what policy to follow would be to set your requirements. Why is each entry logged? What is its purpose? In most cases this will result in some rather concrete facts, such as:
You need to be able to compare the current log with past logs. Even when an error message is self-evident, the process that led to it can be determined much faster by playing spot-the-difference, rather than puzzling through the server execution flow diagram - or, worse, its source code. This means that you need at least one log from a past run - overwriting blindly is a definite No.
You need to be able to find and parse the logs without going out of your way. That means using whatever facilities and policies are already established. On Linux it would mean using the syslog facility for important messages, to allow them to appear in the usual places.
There is also some good advice to heed:
Time is important. No only because there's never enough of it, but also because log files without proper timestamps for each entry are practically useless. Make sure that each entry has a timestamp - most system-wide logging facilities will do that for you. Make also sure that the clocks on all your computers are as accurate as possible - using NTP is a good way to do that.
Log entries should be as self-contained as possible, with minimal cruft. You don't need to have a special header with colors, bells and whistles to announce that your server is starting - a simple MyServer (PID=XXX) starting at port YYYYY would be enough for grep (or the search function of any decent log viewer) to find.
You need to determine the granularity of each logging channel. Sending several GB of debugging log data to the system logging daemon is not a good idea. A good approach might be to use separate log files for each logging level and facility, so that e.g. user activity is not mixed up with low-level data that in only useful when debugging the code.
Make sure your log files are in one place, preferably separated from other applications. A directory with the name of your application is a good start.
Stay within the norm. Sure you may have devised a new nifty logfile naming scheme, but if it breaks the conventions in your system it could easily confuse even the most experienced operators. Most people will have to look through your more detailed logs in a critical situation - don't make it harder for them.
Use the system log handling facilities. E.g. on Linux that would mean appending to the same file and letting an external daemon like logrotate to handle the log files. Not only would it be less work for you, it would also automatically maintain any general logging policies as a whole.
Finally: Always copy log important data to the system log as well. Operators watch the system logs. Please, please, please don't make them have to look at other places, just to find out that your application is about to launch the ICBMs...
https://stackoverflow.com/questions/696321/best-logging-framework-for-native-c
For the logging, I would suggest creating a new log file and clean it using a certain frequency to avoid it growing too fat. Overwrite logs of previous login is usually a bad idea.

Scan for changed files

I'm looking for a good efficient method for scanning a directory structure for changed files in Windows XP+. Something like how git does it is exactly what I'm looking for, when running a git status it displays all modified files, all new (untracked) files and deleted files very quickly which is exactly what I would like to do.
I have a basic model up and running which performs an initial scan and stores all filenames, size, dates and attributes.
On a subsequent scan it checks if the size, attributes or date have changed and marks as a changed file.
My issue now comes in detecting moved and deleted files. Is there a tried and tested method for this sort of thing? I'm struggling to come up with a good method.
I should mention that it will eventually use ReadDirectoryChangesW to monitor files and alert the user when something changes so a full scan is really a last resort after the initial scan.
Thanks,
J
EDIT: I think I may have described the problem badly. The issue I'm facing is not so much detecting the changes - I have ReadDirectoryChangesW() using IOCP on multiple threads to detected when a change happens, the issue is more what to do with the information. For example, a moved file is reported as a delete followed by a create and a rename comes in 2 parts, old name, followed by new name. So what I'm asking is how to differentiate between the delete as part of a move and an actual delete. I'm guessing buffering the changes and processing batches would be an option but feels messy.
In native code FileSystemWatcher is replaced by ReadDirectoryChangesW. Using this properly is not simple, there is a good baseline to build off here.
I have used this code in a previous job and it worked pretty well. The Win32 API itself (and FileSystemWatcher) are prone to problems that are described in the docs and also discussed in various places online, but impact of those will depending on your use cases.
EDIT: the exact change is indicated in the FILE_NOTIFY_INFORMATION structure that you get back - adds, removals, rename data including old and new name.
I voted Liviu M. up. However, another option if you don't want to use the .NET framework for some reason, would be to use the basic Win32 API call FindFirstChangeNotification.
You can use USN journaling if you are up to it, that is pretty low level (NTFS level) stuff.
Here you can find detailed information and source code included. It is written in C# but most of it is PInvoking C/C++ functions.

I want to show off my C++ projects through a website

The problem is that, well, it's C++. The way I've created them makes it such that they've always been run via a terminal/console window and wait for user input or else simply take a sample input and run with that. The output has also always been to the terminal screen or sometimes to a file. I'm not quite sure how I could take all of that and integrate it with a website while leaving the source code as it is, if that's at all possible. I guess what I'm trying to aim for is to have whatever website I use behave like a terminal window that will accept user input and then send it off to run the C++ program in question and return with the output (whatever it may be), all with minimal modification to the source code. Either that or else set up a more automated kind of page where a user can just click 'Go' and the program will run using a sample input.
When it comes to web I consider myself intermediate with HTML, CSS, PHP & MySQL, and a beginner with Javascript, so if this can be accomplished using those languages, that would be fantastic. If not, don't be afraid to show me something new though.
The easiest interaction model to bring to the web is an application that takes its input up front and produces its output on stdout. In this situation, as the unknown poster mentioned, you could use CGI. But due to the nature of CGI, this will only work (in the simplest sense) if all the information is collected from the user in one page, sent to the application and the results returned in one page. This is because each invocation of a page using CGI spawns a new indepdent process to serve the request. (There are other more efficient solutions now, such as FastCGI which keeps a pool of processes around.) If your application is interactive, in that it collects some information, presents some results, prints some options, collects some more user input, then produces more results, it will need to be adapted.
Here is about the simplest possible CGI program in C++:
#include <iostream>
int main(int argc, char* argv[])
{
std::cout << "Content-type: text/plain\n" << std::endl;
std::cout << "Hello, CGI World!" << std::endl;
}
All it does is return the content type followed by a blank line, then the actual content with the usual boring greeting.
To accept user input, you would write a form in HTML, and the POST target would be your application. It will be passed a string containing the parameters of the request, in the usual HTTP style:
foo.cgi?QTY=123&N=41&DESC=Simple+Junk
You would then need to parse the query string (which is passed to the program via the QUERY_STRING environment variable) to gather the input fields from the form to pass to your application. Beware, as parsing parameter strings is the source of a great number of security exploits. It would definitely be worthwhile finding a CGI library for C++ (a Google search reveals many) that does the parsing for you. The query data can be obtained with:
const char* data = getenv("QUERY_STRING");
So at a minimum, you would need to change your application to accept its input from a query string of name=value pairs. You don't even need to generate HTML if you don't want to; simply return the content type as text/plain to begin with. Then you can improve it later with HTML (and change the content type accordingly).
There are other more sophisticated solutions, including entire web frameworks such as Wt. But that would involve considerable changes to your apps, which you said you wished to avoid.
Almost off-topic, but you might want to take a look at Wt.
have you considered using cgi ... its 19th century technology which lets webserver execute programs written in C/C++ to run and generate output
I do not know much about it ... but I used it for some school projects
Show it all off with Screencasts. I use Camtasia Studio, but there are a ton of them out there: http://en.wikipedia.org/wiki/Screencast
Camtasia will even generate all of the HTML and Flash you need to upload to your web server. Buy a nice USB microphone, and write a script of what you're going to say and show.
What is the purpose of showing off your projects? Do you wish to impress your friends or employers?
It doesn't seem feasible to emulate or port your C++ console apps through a web interface.
I suppose you could write a bridge between a server side script and your C++ binary which passes the user input through to your app, then returns the result through the web interface. Bear in mind this would be a huge task for you to undertake.
Ruby have a compiler on their website which demonstrates this can be done.
However no one on the web would expect to run your C++ apps in a web browser. Also I think that anyone who is interested in running a C++ app would be totally comfortable with downloading a C++ binary that you made and running it (apart from the security risk) but when you think about it we download apps and run them all the time, whilst trusting the source.
I have a portfolio website which I created for the purpose of letting employers see my work. Take a look, it will give you an idea of another way you can do things.
Basically I provide the binaries for download, videos, screenshots and links. Things that the user can use to see my work quickly if they don't have time (or an appropriate computer) to run my projects on.
Good luck
I have no experience with this (other than hearing a guy on BART talk about implementing his server-side code all in C), but you might consider taking a look at SWIG (http://www.swig.org/). It allows you to wrap C++ so that you can access C++ code when using languages such as PHP.

How do I extract the network protocol from the source code of the server?

I'm trying to write a chat client for a popular network. The original client is proprietary, and is about 15 GB larger than I would like. (To be fair, others call it a game.)
There is absolutely no documentation available for the protocol on the internet, and most search results only come back with the client's scripting interface. I can understand that, since used in the wrong way, it could lead to ruining other people's experience.
I've downloaded the source code of a couple of alternative servers, including the one I want to connect to, but those
contain no documentation other than install instructions
are poorly commented (I did a superficial browsing)
are HUGE (the src folder of the target server contains 12 MB worth of .cpp and .h files), and grep didn't find anything related
I've also tried searching their forums and contacting the maintainers of the server, but so far, no luck.
Packet sniffing isn't likely to help, as the protocol relies heavily on encryption.
At this point, all my hope is my ability to chew through an ungodly amount of code. How do I start?
Edit: A related question.
If your original code is encrypted with some well known library like OpenSSL or Ctypto++ it might be useful to write your wrapper for the main entry points of these libraries, then delagating the call to the actual library. If you make such substitution and build the project successfully, you will be able to trace everything which goes out in the plain text way.
If your project is not using third party encryption libs, hopefully it is still possible to substitute the encryption routines with some wrappers which trace their input and then delegate encryption to the actual code.
Your bet is that usually enctyption is implemented in separate, relatively small number of source files so that should be easier for you to track input/output in these files.
Good luck!
I'd say
find the command that is used to send data through the socket (the call depends on the network library)
find references of this command and unroll from there. If you can modify-recompile the server code, it might help.
On the way, you will be able to log decrypted (or, more likely, not yet encrypted) network activity.
IMO, the best answer is to read the source code of the alternative server. Try using a good C++ IDE to help you. It will make a lot of difference.
It is likely that the protocol related material you need to understand will be limited to a subset of the files. These will contain references to network sockets and things. Start from there and work outwards as far as you need to.
A viable approach is to tackle this as a crypto challenge. That makes it easy, because you control so much.
For instance, you can use a current client to send a known message to the server, and then check server memory for that string. Once you've found out in which object the string ends, it also becomes possible to trace its ancestry through the code. Set a breakpoint on any non-const method of the object, and find the stacktraces. This gives you a live view of how messages arrive at the server, and a list of core functions essential to message processing. You can next find related functions (caller/callee of the functions on your list).

Do you know of a good program for editing/translating resource (.rc) files?

I'm building a C++/MFC program in a multilingual environment. I have one main (national) language and three international languages. Every time I add a feature to the program I have to keep the international languages up-to-date with the national one. The resource editor in Visual Studio is not very helpful because I frequently end up leaving a string, dialog box, etc., untranslated.
I wonder if you guys know of a program that can edit resource (.rc) files and
Build a file that includes only the strings to be translated and their respective IDs and accepts the same (or similar) file in another language (this would be helpful since usually the translation is done by someone else), or
Handle the translations itself, allowing to view the same string in different languages at the same time.
In my experience, internationalization requires a little more than translating strings. Many strings when translated, require more space on a dialog. Because of this it's useful to be able to customize the dialogs for each language. Otherwise you have to create dialogs with extra space for the translated strings which then looks less than optimal when displayed in English.
Quite a while back I was using a translation tool for an MFC application but the company that produced the software stopped selling it. When I tried to find a reasonably priced replacement I did not find one.
Check out Lingobit Localizer. Expensive, but well worth it.
Here's a script I use to generate resource files for testing in different languages. It just parses a response from babelfish so clearly the translation will be about as high quality as that done by a drunken monkey, but it's useful for testing and such
for i in $trfile
do
key=`echo $i | sed 's/^\(.*\)=\(.*\)$/\1/g'`
value=`echo $i | sed 's/^\(.*\)=\(.*\)$/\2/g'`
url="http://babelfish.altavista.com/tr?doit=done&intl=1&tt=urltext&lp=$langs&btnTrTxt=Translate&trtext=$value"
wget -O foo.html -A "$agent" "$url" *&> /dev/null
tx=`grep "<td bgcolor=white class=s><div style=padding:10px;>" foo.html`
tx=`echo $tx | iconv -f latin1 -t utf-8 | sed 's/<td bgcolor=white class=s><div style=padding:10px;>\(.*\)<\/div><\/td>/\1/g'`
echo $key=$tx
done
rm foo.html
Check out appTranslator, its relatively cheap and works rather well. The guy developing it is really responsive to enhancement requests and bug report, so you get really good support.
You might take a look at Sisulizer http://www.sisulizer.com. Expensive though. We're evaluating it for use at my company to manage the headache of ongoing translation. I read on their About page that the company was founded by people who left Multilizer and other similar companies.
If there isn't one, it would be pretty easy to loop through all the strings in a resource a compare them to the international resources. You could probably do this with a simple grid.
In the end we have ended up building our own external tools to manage this. Our devs work in the english string table and every automated build sends our strings that have been added/changed and deleted to translation manager. He can also run a report at anytime from an old build to determine what is required for translation.
Check out RC-WinTrans. Its a commercial tool that my company uses. It basically imports our .RC files (or .resx files) into a database which we send to a different office for translation. The tool can then export a translated .RC file (or .resx file) for each language from the database. It even has a basic dialog box editor so the translator can adjust the size of various controls in the dialog box to be sure the translated text fits.
It also accepts a number of command line arguments and has a COM automation interface so you can integrate it into a build process more easily. It works quite well for us and we literally have thousands and thousands of strings and dialog boxes, etc.
(We currently have version 7 so what I've said might be a little bit different than their latest version 8.)
Also try AppTranslator: http://www.apptranslator.com/. It has a build-in resource editor so that translators can, for example, enlargen a text box when need bo. It has separate versions for developers and translators and much more.
We are using Multilizer (http://www.multilizer.com/) and although sometimes it's a bit tricky to use, at the end with a bit of patient it works pretty well.
We even have a translation web site where translators can download our projects and then upload the translations using Multilizer command-line features.
Managing localization and translations using .rc files and Visual Studio is not a good idea. It's much smarter (though counter-intuitive) to start localization through the exe. Read here why: http://www.apptranslator.com/misconceptions.html
I've written this recently, which integrates into VS:
https://github.com/ekkis/Powershell/blob/master/MT.ps1
largely because I was unsatisfied with the solutions out there. you'll need to get a client id from M$ (but they give you 2M words/month translation free - not bad)
ResxCrunch will be out soon, it will edit multiple resource files in multiple languages in one single table.