How to use *.mbconfig files with mlnet CLI - ml.net

I am looking to automate more of the auto-training that can be done via the Visual Studio GUI. The mlnet command line tool is useful, but doesn't allow specification of column types, and seems to default many of my numerical fields to "strings" rather than "single" when loading data from a CSV file (especially values such as '0.05663258').
Is there a way to pass a .mbconfig file to the mlnet command line tool (since these are just JSON files with a great deal more flexibility)? It looks like this might be a pending feature request, but the tool's documentation is a little inconsistent from source to source...
Alternatively, is there a way to specify column types (or default column types) in the CLI? I do see the command options to ignore columns, but nothing to control either default column datatypes, or datatypes for individual columns.

If you install the new ML.NET CLI (version 16.13 or up), then it will include a train command and you will use it like this...
mlnet train --training-config <mbconfig-name>
Note that the training data that was used to generate the "mbconfig" file will also need to be in the same directory.

Related

Convert xlf to html using okapi

I have implemented a local service that allows converting multiple formats like html, docx, xlsx, tmx... to XLIFF. After performing a specific process with xlf generated file I want to get it back to its original format. I use okapi libraries for this purpose and all works properly.
I would like to know if okapi implements a mechanism to convert xlf to its original file format, speciall xlf to html (this format is mandatory for me).
Is there any suitable approach?
Thanks in advance
Yes, this is generally possible. Okapi calls it merging, and it requires that the source HTML (or other format) file is available in addition to the translated XLIFF.
A common method for doing this is to use a pair of rainbow pipelines. The first ("extraction") pipeline looks like this:
Raw Document to Filter Events
[Other steps, such as segmentation, are
optional here]
Rainbow Translation Kit Creation (select "Generic
XLIFF" as the type)
This will generate a "translation kit" containing the source file, an extracted XLIFF, and some metadata in a file called manifest.rkm. You can then modify the XLIFF to perform the translation, etc. Then, use another pipeline to perform the merge:
Raw Document to Filter Events
Rainbow Translation Kit Merging
Sort of confusingly, the source file for this merge pipeline should be the manifest.rkm file for the translation kit, not the XLIFF or the source file. Okapi will parse the manifest and figure out where everything else is, then merge the translations from the XLIFF back into a new output copy of the HTML.
This process can fail if you do sufficiently gruesome things to the XLIFF that Okapi can't figure out how to map the translated segments back to the original document any more.
A quick-and-dirty way to do this same thing, without the kit, is to use the tikal command-line tool that is bundled with Okapi. First, use this to extract test.html to test.html.xlf:
tikal.sh -fc okf_html -x test.html
Then, merge the translated test.html.xlf to an output test.out.html:
tikal.sh -fc okf_html -m test.html.xlf
I do not understand your question: can you convert files back or not? I assume not, and that's what this answer is about.
The Okapi doc at http://www.opentag.com/okapi/wiki/index.php?title=Rainbow says:
There are filters for many formats, for example: OpenOffice, XML, HTML, Properties, DTD, MS Office, tables, etc.
To convert XLIFF files back to their original format you have to add the Filter Events to Raw Document Step to your command pipeline. There are two filter configurations available for HTML, and one for HTML 5.

How to set the sqlite parameters using C++

I need to set some parameters in sqlite like turning the headers on: (.headers ON), setting the output mode to csv : (.mode csv) and I require this to be done with C++ instead of the sqlite command line tool.
Can I know whether it is possible or not, and if possible, how to achieve this (using example)?
Thanks
The dot commands are conveniences of the sqlite command line tool. They are not available using the API. CSV is quite easy to build yourself, though.

How to create extended (custom) file property in Windows?

We have a proprietary file format which has embedded in it a product-code.
I am just starting down the path of "enabling the end-user to sort / filter by product-code when opening a file".
The simplest approach for us might be to simply have another drop-down in our customized Open File Dialog in which to choose a product-code to filter by.
However, I think it might be more useful to the end-user if we could present this information as a column in the details view for this file type - just as name, date-modified, type, size, etc., are also detail properties of a file-type (or perhaps generic to all files).
My vague understanding is that XP and prior Windows OSes embedded some sort of meta data like this in an alternate data stream in NTFS. However, Starting in Vista Microsoft stopped using alternate data streams due to their dependence upon NTFS, and hence fragility (i.e. can't send via file attachment, can't move to a FAT formatted thumb drive, etc.)
Things I need to know but haven't figured out yet:
Is it possible / Is it practicable / how to create a custom extended file property for our file type that expresses the product-code to the Windows shell so that it can be seen in Windows Explorer (and hence File dialogs)?
If that is doable, then how to configure things so that the product-code column is displayed by default for folders containing our file type.
Can anyone point me to a good starting point on the above? We certainly don't have to accomplish this by publishing a custom extended file property - but that seems like a sensible approach, in absence of any way to measure the costs of going this route.
If you have sensible alternative approaches to the problem, I'd be interested in those as well!
Just found: http://www.codeproject.com/Articles/830/The-Complete-Idiot-s-Guide-to-Writing-Shell-Extens
CRAP! It seems I'm very late to the banquet, and MS has already removed this functionality from their shell: http://xpwasmyidea.blogspot.com/2009/10/evil-conspiracy-behind-customizable.html
By far the easiest approach to developing a shell extension is to use a library made for the purpose.
I can recommend EZShellExtension because I have used it in the past to add columns and thumbnails/preview for a custom file format for our company.

generate C/C++ command line argument parsing code from XML (or similar)

Is there a tool that generates C/C++ source code from XML (or something similar) to create command line argument parsing functionality?
Now a longer explanation of the question:
I have up til now used gengetopt for command line argument parsing. It is a nice tool that generates C source code from its own configuration format (a text file). For instance the gengetopt configuration line
option "max-threads" m "max number of threads" int default="1" optional
among other things generates a variable
int max_threads_arg;
that I later can use.
But gengetopt doesn't provide me with this functionality:
A way to generate Unix man pages from the gengetopt configuration format
A way to generate DocBook or HTML documentation from the gengetopt configuration format
A way to reuse C/C++ source code and to reuse gengetopt configuration lines when I have multiple programs that share some common command line options
Of course gengetopt can provide me with a documentation text by running
command --help
but I am searching for marked up documentation (e.g. HTML, DocBook, Unix man pages).
Do you know if there is any C/C++ command line argument tool/library with a liberal open source license that would suite my needs?
I guess that such a tool would use XML to specify the command line arguments. That would make it easy to generate documentation in different formats (e.g. man pages). The XML file should only be needed at build time to generate the C/C++ source code.
I know it is possible to use some other command line argument parsing library to read a configuration file in XML at runtime but I am looking for a tool that generate C/C++ source code from XML (or something similar) at build time.
Update 1
I would like to do as much as possible of the computations at compile time and as less as possible at run time. So I would like to avoid libraries that give you a map of the command line options, like for instance boost::program_options::variables_map ( tutorial ).
I other words, I prefer args_info.iterations_arg to vm["iterations"].as<int>()
User tsug303 suggested the library TCLAP. It looks quite nice. It would fit my needs to divide the options into groups so that I could reuse code when multiple programs share some common options. Although it doesn't generate out the source code from a configuration file format in XML, I almost marked that answer as the accepted answer.
But none of the suggested libraries fullfilled all of my requirements so I started thinking about writing my own library. A sketch: A new tool that would take as input a custom XML format and that would generate both C++ code and an XML schema. Some other C++ code is generated from the XML schema with the tool CodeSynthesis XSD. The two chunks of C++ code are combined into a library. One extra benefit is that we get an XML Schema for the command line options and that we get a way to serialize all of them into a binary format (in CDR format generated from CodeSynthesis XSD). I will see if I get the time to write such a library. Better of course is to find a libraray that has already been implemented.
Today I read about user Nore's suggested alternative. It looks promising and I will be eager to try it out when the planned C++ code generation has been implemented. The suggestion from Nore looks to be the closest thing to what I have been looking for.
Maybe this TCLAP library would fit your needs ?
May I suggest you look at this project. It is something I am currently working on: A XSD Schema to describe command line arguments in XML. I made XSLT transformations to create bash and Python code, XUL frontend interface and HTML documentation.
Unfortunately, I do not generate C/C++ code yet (it is planed).
Edit: a first working version of the C parser is now available. Hope it helps
I will add yet another project called protoargs. It generates C++ argument parser code out of protobuf proto file, using cxxopts.
Unfortunately it does not satisfy all author needs. No documentation generated. no compile time computation. However someone may find it useful.
UPD: As mentioned in comments, I must specify that this is my own project

How to store the Visual C++ debug settings?

The debug settings are stored in a .user file which should not be added to source control. However this file does contain useful information. Now I need to set each time I trying to build a fresh checkout.
Is there some workaround to make this less cumbersome?
Edit: It contains the debug launch parameters. This is often not really a per-user setting. The default is $(TargetPath), but I often set it to something like $(SolutionDir)TestApp\test.exe with a few command line arguments. So it isn't a local machine setting per se.
Well, I believe this file is human readable (xml format I think?), so you could create a template that is put into source control that everyone would check out, for instance settings.user.template. Each developer would than copy this to settings.user or whatever the name is and modify the contents to be what they need it to be.
Its been a while since I've looked at that file, but I've done similar things to this numerous times.
Set the debug launch parameters in a batch file, add the batch file to source control. Set the startup path in VS to startup.bat $(TargetPath).