Which .xml dump to use for wikidata? - wiki

I'm looking at the Wikidata .xml dumps and am getting a little lost.
I'm looking for (ideally) a complete dump in the format of the add/delete daily .xml dumps found at https://dumps.wikimedia.org/other/incr/wikidatawiki/ as it is the edit history that I'm interested in.
Thanks!

Related

Best way to parse a complex log file?

I need to parse a log file that consist in many screenshot of real-time OS stdout.
In particular, every section of my log_file.txt is a text version of what appear on screen. In this machine there's not monitor, so the stdout is written on a downloadable log_file.txt.
The aim would be to create a .csv of this file for data mining purpose but I'm still wondering what could be the best method to compute this file.
I would the first csv file line with the description (string) of the value and from the second line I would the respective values (int).
I was thinking about a parser generator (JavaCC, ANTLR, etc..) but before starting with them I would get some opinions.
Thank you.
P.S.
I put a short version of my log at the following link: pastebin.com/r9t3PEgb

Beginner - data storage through XML or text files

I am a beginner in visual studio and has only code C and C++ in command line settings.
Currently, I am taking a module(software development) which requires me to come up with an expense tracker - a program which helps user tracks his/her daily expenses. Therefore, at the end of each individual day, or after a user uses finishes the program, we would have to perform data storage to store all the info in one place which we would export it during the next usage.
My constraint include not using any relational database(although i have no idea what it is :( ). Data storage must be done using XML or text files. Following this, I have several questions regarding data storage:
1) If data is stored successfully, do we export it everytime we start the program? And everytime after the user closes the program, we overwrite the existing data file and then store it accordingly?
2) I have heard from some people that using text file may be easier. Searching on the internet and library only provides me with information regarding XML and not text. Would anyone be able to help me with it? Like tutorials link and stuff?
Thank you very much!
File writing/handling works similar to every other buffer in c++.
you can enable file handling using the fstream header. You can create a file, write to it and over-write every time the program is run, or can even create a file the first time the program is run and then append to it every subsequent time the program runs.
Ive only ever done text files, never tried XML, but Im guessing they're similar.
http://www.cplusplus.com/doc/tutorial/files/ should give you everything you need to know.
Your choice of XML vs plain text depends on the kind of data that you'll be storing.
The reason why you'll only find XML libraries on the internet is because XML is a lot more complicated than plain text. If you don't know what XML is or if the data that you're storing isn't very complex, then I would suggest going with plain text.
For example, to track expenses, you might store a file like this:
sandwich 5.00
coffee 2.30
soft drink 1.50
...
It's very easy to read/write lines like this to/from a file in C++.

Parsing XML file while it is being written in Qt

I have a process that is writing an XML file. I am writing an application that wants to parse the XML that is being written. The constraint here is that I want to parse the XML as it is being written. The XML is not written entirely all at once, and will be written gradually. How can I accomplish this using Qt?
You can accomplish this with QXmlStreamReader. It will report a QXmlStreamReader::PrematureEndOfDocumentError if it runs out of data as you're parsing, but you should be able to wait for more data if this happens.
Proposed link also contains information about incremental parsing.

how to get xsl from existing pdf?

Is it possible to get the .xsl file from an existing .pdf file?
I know that with Apache FOP you can get a .pdf file from a .xml and .xsl but I would like to go in the other direction. Any idea?
XML+XSL->PDF with Apache FOP, but is it somehow possible PDF->XSL?????
The reason why I would like to do that is because I want to open a PDF that has a form inside, edit it adding some information to the form and then save it again as PDF.
I already have the edited form as .xml and I'm trying to generate the PDF, but the I need a .xsl file for the layout... so I thought that maybe I could reuse the layout from the original PDF as they will be the same. Any other better approach?? I would like to avoid creating a specific XSL file for every form.
Thanks
Definitely not the XSLT file, since that's not even part of what FOP does. FOP only works with FO documents, the fact that it allows you to use XML+XSLT to get the FO source is just a nice usability feature. However, once it gets the FO file, it doesn't know how that was obtained, so it can't embed in any way the XSLT file.
You could post-process the PDF file using another tool, like PDFBox, to embed any metadata you want.

Difference in file size of an Excel file when downloading directly as opposed to open and saving it

May be the title of my question is really awful but I couldn't figure a better way to frame it. So the problem is I have a Silverlight web app that does some processing and generates an Excel file as output. THe Excel generation code uses OpenXML format to create various XML parts and packages and using System.Packaging.CompressionOptions I compress the file generated. Now, when the browser (IE 9) shows a download options box, if I click Open to open the file in Excel and then do a SaveAs, it saves the file with a further reduced size as opposed to if I hit Save directly on the download box in which case it saves it with whatever size the file was created with.
Any ideas why these 2 ways of saving the same file result in different sizes?
Cheers
Depending on how you used the OpenXML library, there might be some inefficiencies or errors. Resaving the file in Excel will fix any duplicate formatting, update the metadata (possibly reducing it) and fix any validation errors. I encourage getting the Open XML SDK 2.0 Productivity Tool provided with the OpenXML SDK to check for any validation errors and to better understand where more inefficiencies might lie. It is possible to automatically resave the file using Excel by using Interop (using C# anyways).