I am trying to make a program that could automatically scan the images or texts on a user's desktop and then convert it to a .txt file for text analysis.
So far I have found source codes to convert PDF and HTML into .txt. However I would like to make my program automatically scan the desktop screen at certain time intervals rather than manually inputting the source such as:
$pdf2txt.py samples/simple1.pdf
I don't know where to start so any suggestion will be appreciated.
First of all, the desktop is just a location in the file directory like:
C:\Users\Kirsteen\Desktop
So the next step would be to search through this directory for the types of files you are interested in. You'd be aiming to generate a list of valid file names that need to be converted. This Q/A might help you.
Once the files have been found run those converting scripts you have. To repeat this automatically put all of this in a loop and add a delay so that it runs once an hour/week.
To tidy things up, think about running this process in the background and making sure the program doesn't convert the files more than once if they haven't changed.
Related
There is a program that accepts images containing license plates as arguments.
This program extracts only the license plate itself, letter and number image from the image received as an argument.
How to use this program is as follows.
C:\Users\gksql\Desktop\PlateRec>PlateRecog_saveDigits_Location someimage.jpg
This will create one image_Plate.jpg file that contains the license plate itself and several image_Digits_ (number) .jpg files that contain numbers or characters.
The problem is that the execution command itself is quite long, and there are at least a few thousand images.
That's why I want to run this PlateRecog_saveDigits_Location.exe file for all images containing license plates in any folder.
As a matter of fact, I can not change the source of the PlateRecog_saveDigits_Location file at this time.
I want to know how to iterate through all the images in a window.
Thank you for your wisdom.
Post is tagged c++, bash, shell...
looks like you want to loop through a list of files, calling the same executable if i understand correctly. Here are some good posts to get you going.
Loop through files in directory c++:
How do you iterate through every file/directory recursively in standard C++?
Launching an executable from within a c++ project:
How do I open an .exe from another C++ .exe?
Iterating through files in directory (bash):
How to iterate over files in a directory with Bash?
Iterating through files in directory (powershell):
Loop through files in a directory using PowerShell
I've been scouring the web for hours looking for an approach to solving this problem, and I just can't find one. Hopefully someone can fast-track me. I'd like to cause the following behaviour:
When running ember s and a file of a certain extension is changed, I'd like to analyze the contents of that file and write to several other files in the same directory.
To give a specific example, let's assume I have a file called app/dashboard/dashboard.ember. dashboard.ember consists of 3 concatenated files: app/dashboard/controller.js, .../route.js, and .../template.hbs with a reasonable delimiter between the files. When dashboard.ember is saved, I'd like to call a function (inside an addon, I assume) that reads the file, splits it at the delimiter and writes the corresponding splitted files. ember-cli should then pick up the changed source (.js, .hbs, etc.) files that it knows how to handle, ignoring the .ember file.
I could write this as a standalone application, of course, but I feel like it should be integrated with the ember-cli build environment, but I can't figure out what concoction of hooks and tools I should use to achieve this.
I am a beginner in visual studio and has only code C and C++ in command line settings.
Currently, I am taking a module(software development) which requires me to come up with an expense tracker - a program which helps user tracks his/her daily expenses. Therefore, at the end of each individual day, or after a user uses finishes the program, we would have to perform data storage to store all the info in one place which we would export it during the next usage.
My constraint include not using any relational database(although i have no idea what it is :( ). Data storage must be done using XML or text files. Following this, I have several questions regarding data storage:
1) If data is stored successfully, do we export it everytime we start the program? And everytime after the user closes the program, we overwrite the existing data file and then store it accordingly?
2) I have heard from some people that using text file may be easier. Searching on the internet and library only provides me with information regarding XML and not text. Would anyone be able to help me with it? Like tutorials link and stuff?
Thank you very much!
File writing/handling works similar to every other buffer in c++.
you can enable file handling using the fstream header. You can create a file, write to it and over-write every time the program is run, or can even create a file the first time the program is run and then append to it every subsequent time the program runs.
Ive only ever done text files, never tried XML, but Im guessing they're similar.
http://www.cplusplus.com/doc/tutorial/files/ should give you everything you need to know.
Your choice of XML vs plain text depends on the kind of data that you'll be storing.
The reason why you'll only find XML libraries on the internet is because XML is a lot more complicated than plain text. If you don't know what XML is or if the data that you're storing isn't very complex, then I would suggest going with plain text.
For example, to track expenses, you might store a file like this:
sandwich 5.00
coffee 2.30
soft drink 1.50
...
It's very easy to read/write lines like this to/from a file in C++.
An application of our company uses pdfimages (from xpdf) to check whether some pages in a PDF files, on which we know there is no text, consist of one image.
For this we run pdfimages on that page and count whether only one, two or more, or zero output files have been created (could be JPG, PPM, PGM or PPM).
The problem is that for some PDF files, we get millions of 14-byte PPM images, and the process has to be killed manually.
We know that by assigning the process to a job we can restrict how much time the process will run for. But it would probably be better if we could control that the process will create new files at most twice during its execution.
Do you have any clue for doing that?
Thank you.
One approach is to monitor the directory for file creations: http://msdn.microsoft.com/en-us/library/aa365261(v=vs.85).aspx - the monitoring app could then terminate the PDF image extraction process.
Another would be to use a simple ramdisk which limited the number of files that could be created: you might modify something like http://support.microsoft.com/kb/257405.
If you can set up a FAT16 filesystem, I think there's a limit of 128 files in the root directory, 512 in other dirs? - with such small files that would be reached quickly.
Also, aside from my 'joke' comment, you might want to check out _setmaxstdio and see if that helps ( http://msdn.microsoft.com/en-us/library/6e3b887c(VS.71).aspx ).
I want to convert all the .odt .doc .xls .pdf files to .txt files.
I want to convert these files to text files using a shell script or a perl script
There's a program for odt files and alikes:
odt2txt - avaliable in repos.
$ unoconv --format=txt document1.odt
Should produce document1.txt.
OpenOffice has a built-in document converter capable of handling a bunch of formats- take a look at unoconv: http://dag.wieers.com/home-made/unoconv/
That being said, I have had some troubles getting that to work in the past- If you're having trouble, take a look at similar programs for AbiWord (another open source word processor).
For word documents, you can try antiword, at least on linux. It's a command line utility that takes a word document as an argument, and spits out the text from that document (as best as it can figure) to Standard Output. Maybe you can specify an ouput file too. I can't remember the details of how it works. I haven't used it in a while. Not sure if it can handle OO documents.
It's certainly possible to do this, though there is something strange and impenetrable about the OO project and its documentation that makes things like this hard to research and follow. However, OO has the capability to convert all of those types, not just the OO native ones, and it can do it via two different forms of automatic control.
These are the two general approaches.
You can start OO and tell it to execute a macro which does this job for you for a given file. You then just have to write the macro and a script to loop over your files. The syntax is something like
$ oowriter -headless filename macro://dir/Standard.Module1.sMySub
The other thing OO has is a network API. This is based on something called UNO.
$ oowriter -accept=accept-string
Notifies the OpenOffice.org software that upon the creation of
"UNO Acceptor Threads", a "UNO Accept String" will be used.
You will need some sort of client library. I think they have one for Python at least. Using this technology a Python program or some other scripting language with an OO client library could drive the program and convert all the files. Since OO reads MSO, it should be able to do all of them.
Open the file in LibreOffice. Click on "File", "Save-as" scroll down to find the text option. Click that and it will be saved as a text file.
FYI, I had an *.ODT file that was 339.2 KB in size. When I save-as text the size of the file shrunk to ONLY 5.0 KB. Another reason for saving your files as text files.
For the Microsoft formats, look into the wvWare tools.
Open .ods file normally in libre office
Highlight text to be converted
Open a terminal
Run vi
Press "i" to get insert mode
Press ctrl-shift-v
Done!
Need some formatting?
Save the file as
Get out of vi
Run:
$cat | column >filename2
This worked in opensuse running KDE
Substitute "kwrite" for "vi", if you want