I've written a code in python that opens and reads a file that is saved as an .hdf5 format. If I don't have an hdf viewer installed on my computer can this code still run?
I'm not using it to open the file so I can look at it, I'm trying read the file & extract data before manipulating it in python. Probably a silly question but I'm very new to this coding thing. Would my code be unable to open a file type that can not be opened on my computer?
Cheers,
Claire
An hdf5 viewer is a program that knows how to interpret the contents of an hdf5 file — much like the program you are trying to write. So your program would at least need to include code (most likely in the form of a module) that knows how to do that.
I'm not very familiar with hdf5. But I think you couldn't manipulate it directly if the file is binary since it's not human readable and you could use something like viewer to "decode" those binary data.
According to this though, I think the answer is you don't have to have the hdf5 viewer to run your code.
So the python code has h5py but the code itself does not need the computer I'm working on to have the hdfviewer software.
It is still capable of running the code without opening the file with the hdfviewer.
I am able to read raw data of the corrupted file system of USB drive.
Is there any simple way for me to recover only text and docx files by using these raw data? (Programming Language: C++)
It might be possible to do it, but it won't be simple.
First of all you will need to parse the file system (i assume it's fat32 from the tags). In fact you will need to parse File Allocation Table (if it's corrupted and mirror copy of FAT was enabled on your drive, then you can try with it). Depending on corruption you it might be possible to extract some files. Read this article for more info about FAT32 structure and you can use this Microsoft specification as more strict guide. Good approach to understand the filesystem is to make some small usb or logical drive with sample file and parse it manually using some hex editor (free wxHexEditor or proprietary WinHex for example).
You can try to search sequences of ASCII characters in your Hex image, but then you will need to sort them manually.
As for docx, this format internally is a collection of XML files and resources, compressed in zip. So it will be way to complicated task to restore it from raw hex image
I want to split a big file into smaller ones without copying part of file, and without using filestream or functions which use it (if it is possible).
Imagine, we have big file which is consisted of 3 files:
[[File1bytes][File2bytes][File3bytes]]
In my opinion we can do this with these steps:
Use SetEndOfFile function to truncate the bytes of the last file ([File3bytes] in our example)
Somehow force our file system to recognize those truncated bytes ([File3bytes]) as a real file (maybe by adding some info to MFT table, or doing something with NTFS if it is possible, or using some function or method which can do all mentioned for us).
Any suggestions?
How about create a file system nesting over the existing file system where the very large file actually resides and define some IOCTL commands for splitting? Check this link:
How can I write my own 'filesystem' within Windows?
Currently I'm just saving the file as MS-DOS CSV with excel. I'm looking for the fastest way (in terms of writing the code) of doing it automatically.
I strongly prefer C++, but any simple executable program I can call from a C++ app would do.
Unzip the xslx file with eg WinZip and have a look at the resulting files. This may help.
I want to convert all the .odt .doc .xls .pdf files to .txt files.
I want to convert these files to text files using a shell script or a perl script
There's a program for odt files and alikes:
odt2txt - avaliable in repos.
$ unoconv --format=txt document1.odt
Should produce document1.txt.
OpenOffice has a built-in document converter capable of handling a bunch of formats- take a look at unoconv: http://dag.wieers.com/home-made/unoconv/
That being said, I have had some troubles getting that to work in the past- If you're having trouble, take a look at similar programs for AbiWord (another open source word processor).
For word documents, you can try antiword, at least on linux. It's a command line utility that takes a word document as an argument, and spits out the text from that document (as best as it can figure) to Standard Output. Maybe you can specify an ouput file too. I can't remember the details of how it works. I haven't used it in a while. Not sure if it can handle OO documents.
It's certainly possible to do this, though there is something strange and impenetrable about the OO project and its documentation that makes things like this hard to research and follow. However, OO has the capability to convert all of those types, not just the OO native ones, and it can do it via two different forms of automatic control.
These are the two general approaches.
You can start OO and tell it to execute a macro which does this job for you for a given file. You then just have to write the macro and a script to loop over your files. The syntax is something like
$ oowriter -headless filename macro://dir/Standard.Module1.sMySub
The other thing OO has is a network API. This is based on something called UNO.
$ oowriter -accept=accept-string
Notifies the OpenOffice.org software that upon the creation of
"UNO Acceptor Threads", a "UNO Accept String" will be used.
You will need some sort of client library. I think they have one for Python at least. Using this technology a Python program or some other scripting language with an OO client library could drive the program and convert all the files. Since OO reads MSO, it should be able to do all of them.
Open the file in LibreOffice. Click on "File", "Save-as" scroll down to find the text option. Click that and it will be saved as a text file.
FYI, I had an *.ODT file that was 339.2 KB in size. When I save-as text the size of the file shrunk to ONLY 5.0 KB. Another reason for saving your files as text files.
For the Microsoft formats, look into the wvWare tools.
Open .ods file normally in libre office
Highlight text to be converted
Open a terminal
Run vi
Press "i" to get insert mode
Press ctrl-shift-v
Done!
Need some formatting?
Save the file as
Get out of vi
Run:
$cat | column >filename2
This worked in opensuse running KDE
Substitute "kwrite" for "vi", if you want