Better way to parse a config File in C++ [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am currently working on a parser project to get some file xml format 1 to another xml format.
I finished my project, and there is quiet a lot of parameters to know, most of them are paths to some files, lists of parameters, ...
I would like it to be easy to use so I create a settings.txt : it contains lines like
someparameter = defaultvalue
This file is easy to modify.I consider case where people want to parse files with same parameters, so my application is like :
Do you want to change parameters? (Y/N)
if(yes){
Do you want to change paramater 1? //If no value, [] stays
[pathbydefault]
}
...
else{ load from settings.txt}
To implement this, I use getline(), splitting on first '=' and put them in a std::map.
I think this is a bad choice, I searched and I found a lot of architecture for c++: list (really heavy for 10 parameters), table (not easy to read, the code is dark for other people).
What solution would you advise me to use?
Of course, i dont ask for implementation, just for some solution to consider.
Some info: I dont think if it matters a lot, but I work on Unix based system (Mac OS or Linux) so I cant use windows libraries. I saw a windows solution but I did not go further into this.

The rule of thumb with this stuff (IMHO) is to always use a good library to parse a standard data format, at least in any situation where your needs might continue to scale. For example, I tend to really be fond of JSON because it's a simple, easy to humanly read/write format, with quite a few good quality choices in C++.
This avoids having to deal with any bug prone parsing logic yourself. It also makes it very easy to e.g. write python scripts that generate or verify your configuration files (since python has very easy access to json as well, as about every language does).
In your code it may also be a good idea to cleanly separate the data format, and the collection into a map or whatever data structure you're using. That way, the portion of the code that will change if you decide to change which data format you use is contained.

One future consideration would be using XML transformation language to transition between an old XML file format and a new XML file format. I'm not sure if you're already doing this or not, so if you are, please disregard. If you're unfamiliar, the most common of these languages is XSLT (eXtensible Stylesheet Language Transformations).
Rather than having the end-user enter a myriad of options/paths/values to an INI-styled configuration file, the end-user would ideally only supply XML (which can easily be edited by hand) / XSLT template files to the XSLT processor. You could update your C++ application to focus less on obtaining every input from the end-user and more on creating/choosing the right XML/XSLT template files.
As you mention working on Unix-based systems, you might want to consider this processor: http://xml.apache.org/xalan-c/

Related

C++/Objective-C - how to analyse a big project (Unix way)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Normally, to analyse big C projects, I prefer grep/GNU command line tools, lint, simple Python scripts. Saying "to analyse" C project I mean to collect code statistics, to understand project's structure, its data structures and flow of execution - what function calls what, entry points in different modules, static members, threads, etc. But it works not so good with an object-oriented code.
Whenever I have a big C++ (or Objective-C) project, containing large number of source files and several directories, I would like to see it's class diagram, data fields, methods, messages, instances, etc.
I am looking for a most Unix way solution. Can you help me?
Doxygen is the closest i could find, when i was searching last time. It is not unix way, but it is available free for linux/windows/mac. It generated descent graphs for me. Hope it helps.
http://www.doxygen.nl/
http://en.wikipedia.org/wiki/Doxygen
With message passing and dynamic dispatch going around you are pretty much screwed. It doesn't even depend on language, message is as well used in C++ world. There is no tool that can analyze the code and tell what the application flow will look like. In those cases, the whole data/execution flow may depend on configuration files, how you hook up producers/consumers together etc.. and change significantly. If you are lucky, there would be some high-level documentation, maybe with pictures and description of overall ideas etc. Otherwise, the only option here is to run it under debugger for any given configuration and see what is going on in there, step by step. Isn't that a true UNIX way?
Your request is for a variety of views, some text-based, some structure based.
You might consider Understand for C++ which does a mixture of these. Don't know if it does ObjectiveC.
Our Source Code Search Engine (SCSE) is rather more limited, but provides a much faster way to "grep" than grep does. For large code bases this matters. It will handle multiple languages and dialects. We don't have an Objective C dialect, but I think our C or C++ front ends would actually work pretty well for this, since Objective C uses pretty much the same lexical syntax.

Code for identifying programming language in a text file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
i'm supposed to write code which when given a text file (source code) as input will output which programming language is it. This is the most basic definition of the problem. More constraints follow:
I must write this in C++.
A wide variety of languages should be recognized - html, php, perl, ruby, C, C++, Java, C#...
Amount of false positives (wrong recognition) should be low - better to output "unknown" than a wrong result. (it will be in the list of probabilities for example as unknown: 100%, see below)
The output should be a list of probabilities for each language the code knows, so if it knows C, Java and Perl, the output should be for example: C: 70%, Java: 50%, Perl: 30% (note there is no need to have the probabilities sum up to 100%)
It should have a good ratio of accuracy/speed (speed is a bit more favored)
It would be very nice if the code could be written in a way that adding new languages for recognition will be fairly easy and involve just adding "settings/data" for that particular language. I can use anything available - a heuristic, a neural network, black magic. Anything. I'am even allowed to use existing solutions, but: the solution must be free, opensource and allow commercial usage. It must come in form of easily integrable source code or as a static library - no DLL. However i prefer writing my own code or just using fragments of another solution, i'm fed up with integrating code of others. Last note: maybe some of you will suggest FANN (fast artificial neural network library) - this is the only thing i cannot use, since this is the thing we use ALREADY and we want to replace that.
Now the question is: how would you handle such a task, what would you do? Any suggestions how to implement this or what to use?
EDIT: based on the comments and answers i must emphasize some things i forgot: speed is very crucial, since this will get thousands of files and is supposed to answer fast, so looking at a thousand files should produce answers for all of them in a few seconds at most (the size of files will be small of course, a few kB each one). So trying to compile each one is out of question. The thing is, that i really want probabilities for each language - so i rather want to know that the file is likely to be C or C++ but that the chance it is a bash script is very low. Due to code obfuscation, comments etc. i think that looking for a 100% accurate code is a bad idea and in fact is not the goal of this.
You have a problem of document classification. I suggest you read about naive bayes classifiers and support vector machines. In the articles there are links to libraries which implement these algorithms and many of them have C++ interfaces.
One simple solution I could think of is that you could just identify the keywords used in different languages. Each identified word would have score +1. Then calculate ratio = identified_words / total_words. The language that gets most score is the winner. Off course there are problems like usage of comments e.t.c. But I think that is a very simple solution that should work in most cases.
If you know that the source files will conform to standards, file extensions are unique to just about every language. I assume that you've already considered this and ruled it out based on some other information.
If you can't use file extensions, the best way would be to find the things between languages that are most different and use those to determine filetype. For example, for loop statement syntax won't vary much between languages, but package include statements should. If you have a file including java.util.*, then you know it's a java file.
I'm sorry but if you have to parse thousands of files, then your best bet is to look at the file extension. Don't over engineer a simple problem, or put burdensome requirements on a simply task.
It sounds like you have thousands of files of source code and you have no idea what programming language they were written in. What kind of programming environment do you work in? (Ruling out the possibility of an artificial homework requirement) I mean one of the basics of software engineering that I can always rely on are that c++ code files have .cpp extension, that java code files have the .java extension, that c code files have the .c extension etc... Is your company playing fast and loose with these standards? If so I would be really worried.
As dmckee suggested, you might want to have a look at the Unix file program, whose source is available. The heuristics used by this utility might be a great source of inspiration. Since it is written in C, I guess that it qualifies for C++. :) You do not get confidence percentages directly, though; maybe are they used internally?
Take a look at nedit. It has a syntax highlighting recognition system, under Syntax Highlighting->Recognition Patterns. You can browse sample recognition patterns here, or download the program and check out the standard ones.
Here's a description of the highlighting system.
Since the list of languages is known upfront you know the syntax/grammar for each of them.
Hence you can, as an example, to write a function to extract reserved words from the provided source code.
Build a binary tree that will have all reserved words for all languages that you support. And then just walk that tree with the extracted reserved words from the previous step.
If in the end you only have 1 possibility left - this is your language.
If you reach the end of the program too soon - then (from where you stopped) - you can analyse your position on a tree to work out which languages are still the possibitilies.
You can maybe try to think about languages differences and model these with a binary tree, like "is feature X found ? " if yes, proceed in one direction, if not, proceed in another direction.
By constructing this search tree efficiently you could end with a rather fast code.
This one is not fast and may not satisfy your requirements, but just an idea. It should be easy to implement and should give 100% result.
You could try to compile/execute the input text with different compilers/interpreters (opensource or free) and check for errors behind the scene.
The Sequitur algorithm infers context-free grammars from sequences of terminal symbols. Perhaps you could use that to compare against a set of known production rules for each language.

C++ serialization library that supports partial serialization? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Are there any good existing C++ serialization libraries that support partial serialization?
By "partial serialization" I mean that I might want to save the values of 3 specific members, and later be able to apply that saved copy to a different instance. I'd only update those 3 members and leave the others intact.
This would be useful for synchronizing data over a network. Say I have some object on a client and a server, and when a member changes on the server I want to send the client a message containing the updated value for that member and that member only. I don't want to send a copy of the whole object over the wire.
boost::serialization at a glance looks like it only supports all or nothing.
Edit: 3 years after originally writing this I look back at it and say to myself, 'wut?' boost::serialization lets you define what members you want saved or not, so it would support 'partial serialization' as I seem to have described it. Further, since C++ lacks reflection serialization libraries require you to explicitly specify each member you're saving anyway unless they come with some sort of external tooling to parse the source files or have a separate input file format that is used to generate C++ code (e.g. what Protocol Buffers does). I think I must have been conceptually confused when I wrote this.
You're clearly not looking for serialization here.
Serialization is about saving an object and then recreating it from the stream of bytes. Think video games saves or the session context for a webserver.
Here what you need is messaging. Google's FlatBuffers is nice for that. Specify a message that will contain every single field as optional, upon reception of the message, update your object with the fields that do exist and leave the others untouched.
The great thing with FlatBuffers is that it handles forward and backward compatibility nicely, as well as text and binary encoding (text being great for debugging and binary being better for pure performance), on top of a zero-cost parsing step.
And you can even decode the messages with another language (say python or ruby) if you save them somewhere and want to throw together a html gui to inspect it!
Although I'm not familiar with them, you could also check out Google's Protocol Buffers
.

Grails: Templates vs TagLibs. [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
In Grails, there are two mechanisms for modularity in the view layers: Template and TagLib.
While I am writing my own Grails app, I am often facing the same question when I need to write an UI component: do I need to use a template or a TagLib?
After searching the web, I didn't find a lot of best practices or rules of thumb concerning this design decision, so can you help me and tell me:
What is the main difference between the two mechanisms?
In which scenarios, do you use a TagLib instead of a Template (and vice-versa) ?
There is definitely some overlap, but below are several things to think about. One way to think about it is that Template is like a method-level reuse, while TagLibs are more convenient for API-level reuse.
Templates are great for when you have to format something specific for display. For example, if you wan to display a domain object in a specific way, typically it's easier to do it in a template, since you are basically just writing HTML with some . It's reusable, but I think its reusability in a bit limited. I.e. if you have a template, you'd use it in several pages, not in hundreds of pages.
On the other hand, taglibs is a smaller unit of functionality, but one you are more likely to use in many places. In it you are likely to concatenate strings, so if you are looking to create a hundred lines of HTML, they are less convenient. A key feature taglibs allow is ability to inject / interact with services. For example, if you need a piece of code that calls up an authentication service and displays the current user, you can only do that in a TagLib. You don't have to worry about passing anything to the taglib in this case - taglib will go and figure it out from the service. You are also likely to use that in many pages, so it's more convenient to have a taglib that doesn't need parameters.
There are also several kinds of
taglibs, including ones that allow
you to iterate over something in the
body, have conditional, etc - that's
not really possible with templates.
As I said above, a well-crafted
taglib library can be used to create
a re-usable API that makes your GSP
code more readable. Inside the same *taglib.groovy you can have multiple tag definitions, so that's another difference - you can group them all in once place, and call from one taglib into another.
Also, keep in mind that you can call up a template from inside a taglib, or you can call taglibs withing templates, so you can mix and match as needed.
Hope this clears it up for you a bit, though really a lot of this is what construct is more convenient to code and how often it will be reused.
As for us...
A coder is supposed to see specific object presentation logic in template, not anywhere else.
We use taglibs only for isolated page elements, not related to business logic at all. Actually, we try to minimize their usage: it's too easy to write business logic in a taglib.
Templates are the conventional way to go; for instance, they support Layouts (btw, they can be named a third mechanism)

C/C++ Header file documentation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
What do you think is best practice when creating public header files in C++?
Should header files contain no, brief or massive documentation? I've seen everything from almost no documentation (relying on some external documentation) to large specifications of invariants, valid parameters, return values etc. I'm not sure exactly what I prefer, large documentation is nice since you've always access to it from your editor, on the other hand a header file with very brief documentation can often show a complete interface on one or two pages of text giving a much better overview of what's possible to do with a class.
Let's say I go with something like brief or massive documentation. I want something similar to javadoc where I document return values, parameters etc. What's the best convention for that in c++? As far as I can remember doxygen does good stuff with java doc-style documentation, but are there any other conventions and tools for this I should be aware of before going for javadoc style documentation?
Usually I put documentation for the interface (parameters, return value, what the function does) in the interface file (.h), and the documentation for the implementation (how the function does) in the implementation file (.c, .cpp, .m).
I write an overview of the class just before its declaration, so the reader has immediate basic information.
The tool I use is Doxygen.
I would definetely have some documentation in the header files themselves. It greatly improves debugging to have the information next to the code, and not in separate documents. As a rule of thumb, I would document the API (return values, argument, state changes, etc) next to the code, and high-level architectural overviews in separate documents (to give a broader view of how everything is put together; it's hard to place this together with the code, since it usually references several classes at once).
Doxygen is fine from my experience.
I believe Doxygen is the most common platform for generating documentation, and as far as I know, it's more or less able to cover JavaDoc-notation (not limited to of course). I've used doxygen for C, with OK results, I do think it's more suitable for C++ though. You might want to look into robodoc as well, although I think Occam's Razor will tell you to rather go for Doxygen.
Regarding how much documentation, I've never been a documentation-fan myself, but whether I like it or not, having more documentation always beats having no documentation. I'd put the API-documentation in the header file, and the implementation documentation in the implementation (stands to reason, doesn't it?). :) That way, IDEs have the chance to pick it up and show it during autocompletion (Eclipse does this for JavaDocs, for example, perhaps also for doxygen?), which shouldn't be underestimated. JavaDoc has this additional quirk that it uses the first sentence (i.e. up to the first full stop) as a brief description, don't know if Doxygen does this though, but if so, it should be taken into consideration when writing the documentation.
Having a lot of documentation runs the risk of being out of date, however, keeping the documentation close to the code will give people a chance to keep it up to date, so you should definately keep it in the source/header files. What shouldn't be forgotten though is the production of documentation. Yes, some people will use the documentation directly (through the IDE or whatever, or just reading the header file), but some people prefer other ways, so you should probably consider putting your (regularly updated) API documentation online, all nice and browsable, as well as perhaps producing man-files if you're targeting *nix-based developers.
That's my two cents.
Put enough into the code that it can stand alone. Nearly every project I've been in where the documentation was separate, it got out of date or wasn't done, partly that if it's a separate document it becomes a separate task, partly as management didn't allow for it as a task in budgetting. Documenting inline as part of the flow works much better in my experience.
Write the documentation in a form which most editors recognise is a block; for C++ this seems to be /* rather than //. That way you can fold it and just see the declarations.
Maybe you would be interested in gtk-doc. It can be "a bit awkward to setup and use" but you can get a nice API documentation from source code, looking like this:
String Utility Functions