Folder with 1300 png files into html images list - c++

I've got folder with about 1300 png icons. What I need is html file with all of them inside like:
<img src="path-to-image.png" alt="file name without .png" id="file-name-without-.png" class="icon"/>
Its easy as hell but with that number of files its pure waste of time to do it manually. Have you any ideas how to automate it?

If you need it just once, then do a "dir" or "ls" and redirect it to a file, then use an editor with macro-ability like notepad++ to record modifying a single line like you desire, then hit play macro for the remainder of the file. If it's dynamic, use PHP.

I would not use C++ to do this. I would use vi, honestly, because running regular expressions repeatedly is all that is needed for this.
But young an do this in C++. I would start with a plan text file with all the file names generated by Dir or ls on the command prompt.
Then write code that takes a line of input and turns it into a line formatted the way you want. Test this and get it working on a single line first.
The RE engine of C++ is probably overkill (and is not all that well supported in compilers), but substr and basic find and replace is all you need. Is there a string library you are familiar with? std::string would do.
To generate the file name without PNG, check the last four characters and see if they exist and are .PNG (if not report an error). Then strip them. To remove dashes, copy characters to a new string but if you are reading a dash write a space. Everything else is just string concatenation.

Related

Applescript to extract the Digital Object Identifier (DOI) from a PDF file

I looked for an applescript to extract the DOI from a PDF file, but could not find it. There is enough information available on the actual format of the DOI (i.e. the regular expression), but how could I use this to get the identifier from the PDF file?
(It would be no problem if some external program were used, such as Hazel.)
If you're ok with using an app, I'd recommend Skim. Good AppleScript support. I'd probably structure it like this (especially if the document might be large):
set DOIFound to false
tell application "Skim"
set pp to pages of document 1
repeat with p in pp
set t to text of p
--look for DOI and set DOIFound to true
if DOIFound then exit repeat--if it's not found then use url?
end repeat
end tell
I'm assuming a DOI would always exist on one page (not spread out to between two). Looks like they are invariably (?) on the first page of an article, which would make this quick of course, even with a large doc.
[edit]
Another way would be to get the Xpdf OSX binaries from http://www.foolabs.com/xpdf/download.html and use pdftotext in the command line (just tested this; it works well) and parse the text using AppleScript. If you want to stay in AppleScript, you can do something like:
do shell script "path/to/pdftotext 'path/to/pdf/file.pdf'"
which would output a file in the same directory with a txt file extension -- you parse that for DOI.
Have you tried it with pdfgrep? It works really well in commmandline
pdfgrep -n --max-count 1 --include "*.pdf" "DOI"
i have no idea to build an apple script though, but i would be interested in one also. so that if i drop a pdf into that folder it just automatically extracts the DOI and renames the file with the DOI in the filename.

Very basic image renaming with regex

I spent most of yesterday putting together a collection of regular expressions to convert all my image names and paths to lower case. Today, I processed a folder full of files and was surprised to discover that many image names are still capitalized.
So I decided to try it one step at a time, first renaming .jpg's, then .gif's, .png's, etc.
I'm working on a Mac, using Dreamweaver and TextWrangler as my text editors. The following regex works perfectly for jpg's, with one major flaw - it deletes the extension...
([\w/-]+)\.jpe?g
\L\1
In other words, it changes South-America.jpg to south-america.
How can I change it so that it retains the file extension? I assume I can then just change it to...
([\w/-]+)\.png
\L\1
...to process png's, etc.
([\w\/-]+)(\.jpe?g)
and replace with \L\1\2
its deleting your extension because you are never saving it in a matchgroup.
You could perhaps capture the extension too?
([\w/-]+)(\.jpe?g)
\L\1\2
And I think you should be able to use something like this for all the files:
([\w/-]+)(\.[^.]+$)
\L\1\2
Or if you specifically want to convert those jpegs, pngs and gifs:
([\w/-]+)(\.(?:jpe?g|gif|png))
\L\1\2
If it's okay for the extension to become lowercase as well, you could just do
^(.*)$
\L\1
As long as you're certain that all lines contain file names.
If you want to process only certain file formats, use
^(.*\.(jpe?g|png|gif))$
\L\1

Find, move, replace and parse strings simultanuosly while building an .xml playlist file

I get many videos and I need to compile functioning .xml playlist files where they are all listed, including snapshot jpg's. Videos and snapshot images are named automatically. So I end up with lots of files like this:
hxxp://site.com/video/_5712.480p.flv
hxxp://site.com/video/_5712.480p.jpg
hxxp://site.com/video/_5713.480p.flv
hxxp://site.com/video/_5713.480p.jpg
So with these files I need to produce an .xml file looking something like this:
....
<track>
<title>5712.480p</title>
<creator>Whatever_5712.480p</creator>
<info>hxxp://site.com/video/_5712.480p.jpg</info>
<annotation>Playlist marked_480p</annotation>
<location>hxxp://site.com/video/_5712.480p.flv</location>
<image>hxxp://site.com/video/_5712.480p.jpg</image>
</track>
<track>
<title>5713.480p</title>
<creator>Whatever_5713.480p</creator>
<info>hxxp://site.com/video/_5713.480p.jpg</info>
<annotation>Playlist marked_480p</annotation>
<location>hxxp://site.com/video/_5713.480p.flv</location>
<image>hxxp://site.com/video/_5713.480p.jpg</image>
</track>
So I guess I might be looking at some advanced sed/awk procedure to copy, move and place the right strings inside the correct brackets, and to compile one whole file? I really appreciate all the help I can get on this one. Thx
With that input, you can do something like:
awk 'NR%2==1 && /\.jpg$/ {JPGFILE=$0}
NR%2==0 { print "whateverXMLtags" JPGFILE "whatanotherXMLtags" $0 "someotherXMLtags" }' INPUTFILELIST
So this assumes that jpg files are on odd numbered lines, and on that saves the name, and on every even line prints the desired output. Note that the SPACE between e.g. JPGFILE and "whatanotherXMLtags" concatenates the sring.

Find Lines with N occurrences of a char

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this:
“123456”,”some text”
“543210”,”some more text”
“111223”,”other text”
etc…
The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported.
The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it).
So the “offending” lines look like this:
“123456”,”this text “contains” a quote”
“543210”,”And the “above” text is bad”
etc…
You can see the problem here.
Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean.
Based upon that, what would be the best regex I could use to identify these lines?
My first thought is: Tell me which lines contain more than 4 quotes (“).
But I couldn’t come up with anything (I’m not good at Regex beyond the basics).
this pattern ^("[^"]+){4,} will match "lines containing more than 4 quotes"
you can experiment with replacing 4 with 5 or more, depending on your data.
I think that you can be more direct with a Regex than you're planning to be. Depending on your dialect of Regex, something like this should do it:
^"\d+",".*".*"
You could also use a regex to remove the outside quotes and use a better delimeter instead. For example, search for ^"([0-9]+)","(.*)"$ and replace it with \1+++++DELIM+++++\2.
Of course, this doesn't directly answer your question, but it might solve the problem.

Incorporating text files in applications?

Is there anyway I can incorporate a pretty large text file (about 700KBs) into the program itself, so I don't have to ship the text files together in the application directory ? This is the first time I'm trying to do something like this, and I have no idea where to start from.
Help is greatly appreciated (:
Depending on the platform that you are on, you will more than likely be able to embed the file in a resource container of some kind.
If you are programming on the Windows platform, then you might want to look into resource files. You can find a basic intro here:
http://msdn.microsoft.com/en-us/library/y3sk7e6b.aspx
With more detailed information here:
http://msdn.microsoft.com/en-us/library/zabda143.aspx
Have a look at the xxd command and its -include option. You will get a buffer and a length variable in a C formatted file.
If you can figure out how to use a resource file, that would be the preferred method.
It wouldn't be hard to turn a text file into a file that can be compiled directly by your compiler. This might only work for small files - your compiler might have a limit on the size of a single string. If so, a tiny syntax change would make it an array of smaller strings that would work just fine.
You need to convert your file by adding a line at the top, enclosing each line within quotes, putting a newline character at the end of each line, escaping any quotes or backslashes in the text, and adding a semicolon at the end. You can write a program to do this, or it can easily be done in most editors.
This is my example document:
"Four score and seven years ago,"
can be found in the file c:\quotes\GettysburgAddress.txt
Convert it to:
static const char Text[] =
"This is my example document:\n"
"\"Four score and seven years ago,\"\n"
"can be found in the file c:\\quotes\\GettysburgAddress.txt\n"
;
This produces a variable Text which contains a single string with the entire contents of your file. It works because consecutive strings with nothing but whitespace between get concatenated into a single string.