I want to convert all the .odt .doc .xls .pdf files to .txt files.
I want to convert these files to text files using a shell script or a perl script
There's a program for odt files and alikes:
odt2txt - avaliable in repos.
$ unoconv --format=txt document1.odt
Should produce document1.txt.
OpenOffice has a built-in document converter capable of handling a bunch of formats- take a look at unoconv: http://dag.wieers.com/home-made/unoconv/
That being said, I have had some troubles getting that to work in the past- If you're having trouble, take a look at similar programs for AbiWord (another open source word processor).
For word documents, you can try antiword, at least on linux. It's a command line utility that takes a word document as an argument, and spits out the text from that document (as best as it can figure) to Standard Output. Maybe you can specify an ouput file too. I can't remember the details of how it works. I haven't used it in a while. Not sure if it can handle OO documents.
It's certainly possible to do this, though there is something strange and impenetrable about the OO project and its documentation that makes things like this hard to research and follow. However, OO has the capability to convert all of those types, not just the OO native ones, and it can do it via two different forms of automatic control.
These are the two general approaches.
You can start OO and tell it to execute a macro which does this job for you for a given file. You then just have to write the macro and a script to loop over your files. The syntax is something like
$ oowriter -headless filename macro://dir/Standard.Module1.sMySub
The other thing OO has is a network API. This is based on something called UNO.
$ oowriter -accept=accept-string
Notifies the OpenOffice.org software that upon the creation of
"UNO Acceptor Threads", a "UNO Accept String" will be used.
You will need some sort of client library. I think they have one for Python at least. Using this technology a Python program or some other scripting language with an OO client library could drive the program and convert all the files. Since OO reads MSO, it should be able to do all of them.
Open the file in LibreOffice. Click on "File", "Save-as" scroll down to find the text option. Click that and it will be saved as a text file.
FYI, I had an *.ODT file that was 339.2 KB in size. When I save-as text the size of the file shrunk to ONLY 5.0 KB. Another reason for saving your files as text files.
For the Microsoft formats, look into the wvWare tools.
Open .ods file normally in libre office
Highlight text to be converted
Open a terminal
Run vi
Press "i" to get insert mode
Press ctrl-shift-v
Done!
Need some formatting?
Save the file as
Get out of vi
Run:
$cat | column >filename2
This worked in opensuse running KDE
Substitute "kwrite" for "vi", if you want
Related
I can't seem to find the answer, how do you create a new file in Ocaml? Do you edit your file in the terminal? Where does the source code appear?
I think you're asking how to write code in OCaml, i.e., how to create an OCaml source file. (This isn't completely clear. You could be asking how to write OCaml code that creates a file.)
The details of creating OCaml source depend on your development environment, not on the language itself. So there is no one answer.
The general answer is that you can use any tool you like that knows how to create a text file. If you like working from the command line (as I do) you can work in a terminal environment and run some kind of vintage text editor from the last millennium (as I do). If you like a GUI environment, you can run some kind of "programmer's editor" from the current millennium, or really any kind of editor that creates basic utf-8 files (or even ASCII files).
Generally the editor will have to be told where to store the files that you edit. You would probably want to make some kind of folder for the project and make sure you store the text files in there.
I hope this helps! If you have any programmers nearby, they can probably get you started a lot faster than asking on StackOverflow.
So I would like to make a program like this: we choose .txt file to open and it makes a plot based on it. The format of data is:
12:52:11 30.2
12:53:52 31.2
etc.
I think about doing it in C++, but first I want to check whether there's an easier option than Gnuplot (because I'm using Windows XP and can't use Linux terminal commands). I've seen DatPlot, but it's too inconvient for multiple files use. What do you recommend? I would be very grateful for any help. Cheers. :)
I m writing a c++ program using files and i need to take the input from existing files such as doc files and pdf files. how to program it in c++? And after getting the inputs, how can i write those details into a new doc or pdf files? Can anyone explain me with an example?
C++ as a language doesn't equip you with such features as "write to DOC file" or "read from PDF file". The only staff available to you a a programmer is raw byte-by-byte reading or writing. To make your new brand file as PDF/DOC/etc compatible you have to conform the chosen file format. The same about reading - you should understand which portions of raw byte array are responsible for what.
In common, this task named as "parsing" or "serialization". And it's a good idea to use one of existing parsers for particular file format instead of reinventing the wheel. Moreover, some file formats can be patent-pending so you may be not allowed to deal with it without license purchase.
Some clues so far:
PDF parsing in C++ (PoDoFo)
Microsoft word Text Parser in "C"
There are some libraries available on the web now(the question is from 2013, maybe that time there weren't many).
Apart from the links in selected answer, you can try PDFTron. It also supports new features, eg. Linearization.
Here is one of their samples is ->
https://www.pdftron.com/documentation/samples/cpp/TextExtractTest
(That program itself contains 4 if blocks, with slightly different features of the library/SDK, to try)
There should be more, search on the web for PDF parsing libraries.
What's a cross-platform way for getting a user-friendly description of a file?
Examples:
foo.pdf -> "Portable Document Format (PDF)
bar.doc -> "Microsoft Word Document"
Pointers to libraries or appropriate system APIs would be highly appreciated.
A Qt/C++ solution is preferred but anything is fine.
Target platforms are Windows and Mac OS X. I'd prefer the descriptions to match what would be found in Explorer or Finder if possible (rather than maintaining a map of extensions -> descriptions myself).
The GNU File command is builtin for Linux and OSX, and there is a version available for Windows (http://gnuwin32.sourceforge.net/packages/file.htm).
File tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic number
tests, and language tests. The first test that succeeds causes the
file type to be printed. The type printed will usually contain one of
the words text (the file contains only printing characters and a few
common control characters and is probably safe to read on an ASCII
terminal), executable (the file contains the result of compiling a
program in a form understandable to some UNIX kernel or another), or
data meaning anything else (data is usually `binary' or
non-printable). Exceptions are well-known file formats (core files,
tar archives) that are known to contain binary data.
You could invoke the file command using QProcess and display the returned info.
Output looks like :
$ file document.pdf
document.pdf: PDF document, version 1.5
$ file test.txt
test.txt: ASCII text, with CRLF, CR, LF line terminators
The closest that I think you can get out of Qt is QFileInfo.
Windows keeps track of the mapping through the registry that can be accessed through Qt's QSettings. But just from brief research it sounds like it might be kind tricky to mimic Explorer's mapping.
You can also launch the file with the default handler using QDesktopServices::openUrl().
I haven't researched how or where OSX keeps track of the file type description information.
Hope that helps.
I'm trying to write a simple GUI for Wget. I'm looking for advice on how to read information from the command line output that Wget generates when it is doing a run. I'd like to update that download information real time to a list box or some equivalent. The GUI will be in Visual Basic. I know programs like WinWget do this, and their source code is available, but I don't know the language that's written in well enough to find what I'm looking for.
tl;dr: I need to update a list box real time with command line output.
There are two ways to use the output of one console application for the input of an other:
The first way is to use the | operator; for example:
dir |more
The second way is to write the data into a file and process it later.
dir > data.txt