Let's say I have a list that has about 210000 english words.
I need to use all these 210000 words as test case.
I need to make sure every words in that list is covered every time I run my test.
The question is: What is the best practices to store these words in my test?
should I save all these words in a slice (will it be too large a slice? ), or should I save these words in a external file (like words.txt) and load the file line by line when needed?
Test data is usually stored in a directory named testdata to keep it separate from the other source code or data files (see the docs from the command go help test). The go tool ignores stuff inside that directory.
210,000 words should take up only single digit megabytes of RAM anyway, which isn't much. Just have a helper function that reads the words from the file before each test (perhaps caching them), or define a TestMain() function which reads them once and stores them in a global variable for access by tests that are subsequently run.
Edit: Regarding best practices, it's sometimes nicer to store test data in testdata even if the data isn't large. For example, I sometimes need to use multiple short JSON snippets in test cases, and perhaps use them more than once. Storing them in appropriately named files under a subdirectory of testdata can be more readable than littering Go code with a bunch of JSON snippets.
The slight loss of performance is generally not an issue in tests. Whichever method makes the code easier to understand could be the 'best practice'.
I'm finding myself writing a lot of boilerplate for my unit tests. I could down on that boilerplate significantly if I stored my unit test inputs along with the expected outputs in a csv file and directed my test suite to read the inputs form that file, pass them to the function being tested, and then compare its output with the values in the file's expected output column.
Is this considered good practice?
Instead of storing this in a separate file, I would recommend to store it in some kind of table (probably an array) inside your test code and iterate over that table. Most testing frameworks have specific support for this: in JUnit the feature is called parameterized tests. Then you even don't have to implement the iteration over that set of inputs and expected outputs yourself.
Well a lot of questions have been made about parsing XML in C++ and so on...
But, instead of a generic problem, mine is very specific.
I am asking for a very efficient XML parser for C++. In particular I have a VERY VERY BIG XML file to parse.
My application must open this file and retrieve data. It must also insert new nodes and save the final result in the file again.
To do this I used, at the beginning, rapidxml, but it requires me to open the file, parse it all (all the content because this lib has no functions to access the file directly without loading the entire tree first), then edit the tree, modify it and store the final tree on the file by overwriting it... It consumes too much resources.
Is there an XML parser that does not require me to load the entire file, but that I can use to insert, quickly, new nodes and retrieve data? Can you please indicate solutions for this problem of mine?
You want a streaming XML parser rather than what is called a DOM parser.
There are two types of streaming parsers: pull and push. A pull parser is good for quickly writing XML parsers that load data into program memory. A push parser is good for writing a program to translate one document to another (which is what you are trying to accomplish). I think, therefore, that a push parser would be best for your problem.
In order to use a push parser, you need to write what is essentially an event handler for parsing events. By "parsing event", I mean events like "start tag reached", "end tag reached", "text found", "attribute parsed", etc.
I suggest that as you read in the document, you write out the transformed document to a separate, temporary file. Thus, your XML parsing event handlers will need to be written so that they are stateful and write out the XML of the translated document incrementally.
Three excellent push parser libraries for C++ include Expat, Xerces-C++, and libxml2.
Search for "SAX parser". They are mostly tokenizers, i.e. they emit tag by tag without building a tree.
SAX parsers are faster than DOM parsers because DOM parsers read the entire file into memory before building an in-memory representation of the XML document, whereas a SAX parser behaves like an event listener and builds the document as it reads in the file. Go here for an explanation.
As you mentioned Xerces is a good C++ SAX parser.
I would recommend looking into ways of breaking the XML document into smaller XML documents as that seems to be part of your problem.
Okay, here is one off the beaten track, I looked at this, but haven't really used it myself, it's called asmxml. These boys claim performance bar none, downside, you need x86 assembler.
If you really seek high performance XML stream parser then libhpxml is likely the right thing for you.
I’m convinced that no XML library exists that allows you to modify a file without loading it first. This simply isn’t possible because files don’t work that way: you cannot insert (or remove) in the middle of a file. You can only overwrite a block of identical size, or append at the end. But your request would require to append or remove in the middle of the file.
Reading only parts of an XML file may be possible. But writing … no way.
Go for template libraries as much as possible, like Boost::property_tree or Boost::XMLParser or POCO::XML and Folly has XML Parser in it.
Avoid old C libraries, it all old code designs.
someone say QtXML module is high performance for huge XML files.
What is the best way to unit test code that generates images? Say, for example, a class that generates a plot or a chart?
If this class uses a third party library to generate plots/charts (say matplotlib) then you can write tests for the methods that generate input for the the library. This will be fairly easy.
If the output is an image and you are interested in verifying its properties then you will have to dig deeper. External image attributes (size, height, format etc.) can be easily verified but others such as the actual contents of the image would be quite hard. IMHO that wouldn't be worth the trouble.
If the output is non-binary (say SVG) then you can easily write tests to ensure that the output XML contains what you are looking for.
A method that I've used is to generate a "known-good" file, store it in your source tree, and then do a binary compare against it as part of the test. If the file contents match the output hasn't changed.
This doesn't allow you to test all possible combinations of input that would generate the image, but is useful for basic regression tests.
I have a schema (xsd), and I want to create xml files that conform to it.
I've found code generators that generate classes which can be loaded from an xml file (CodeSynthesis). But I'm looking to go the other direction.
I want to generate code that will let me build an object which can easily be written out as an xml file. In C++. I might be able to use Java for this, but C++ would be preferable. I'm on solaris, so a VisualStudio plugin won't help me (such as xsd2code).
Is there a code generator that lets me do this?
To close this out: I did wind up using CodeSynthesis. It worked very well, as long as I used a single xsd as its source. Since I actually had two xsds (one imported the other), I had to manually merge them (they did some weird inheritance that needed manual massaging).
But yes, Code Synthesis was the way to go.