How to colorize plain text files in browser based on multiple regular expressions? - regex

I do often have to look at long plain text log file in browser and I find hard to spot the error message in that amount of text.
Most error messages are easily identifiable using regular expressions. I have being using this trick to highlight important text in iTerm2 for years.
Now the interesting challenge is how to bring the same feature to the browser?
I mention that I need to be able enable this feature per domain as it may have undersirable side effects on others.
A cross browser solution would be ideal but I am ready to switch browser (Safari, Chrome, Firefox).

I know of GreaseMonkey for Firefox which allows you to inject javascript into a webpage. I'm not familiar with who you create scripts to use with it but perhaps you could match the messages using regex. Not sure how you'll highlight a plain text file using it though.

You can easily develop an Extension which will inject a CSS file on whichever webpage you want. Now, in this CSS file you will need a pseudo selector(Do all the text elements have same class or other attribute?). You need to figure this out yourself. Once you have hold of all these elements you can add a CSS rule to change their color to whatever you like.

Related

Remove alt attributes from HTML

I have a huge HTML file which I'm trying to format to be able to import the content into a different application. The one thing that remains is that I need to remove all alt attributes from the HTML entirely. They all have different values and there are around 5000 of them, so clearly a straight find & replace isn't an option. Perhaps there's a way to find and replace with regex in Visual Web developer?
The tools/skills I have available are: HTML, Javascript, ASP (Classic), a little bit of .NET, Visual Web Developer Express 2010, but the only similar things I can find are PHP-based and they don't explain fully enough for me to set up a solution and feed the HTML to it.
I've found things like this: Regular expression to replace several html attributes, which give suggestions of regex functions which do similar things, but I'm not even sure how to run a regex function on a straight HTML file (my browser is struggling with the size of the HTML file as it is, so I don't think javascript is going to cut it).
Can anyone suggest the best way to accomplish this?
Thanks folks...
Since you use Visual Studio, you can try the Regex search & replace option, though the implementation of regexes in Visual Studio is pretty different from other regex engines.
Here's a short article about it:
http://www.codinghorror.com/blog/2006/07/the-visual-studio-ide-and-regular-expressions.html
As it says in the article, the builtin regex engine isn't ideal. They mention a plugin with implements standard regexes though:
http://www.codeproject.com/Articles/9125/Standard-Regular-Expression-Searcher-Addin-For-VS

Converting HTML file to PDF using Win32/MFC

As part of my application, my client has requested that I include an automated e-mailing system. As part of this system, I generate HTML code and use automation to send it via. Outlook.
However, they also require a PDF copy of the HTML document to be sent as an attachment. My initial attempts involved using libHaru, which proved difficult to use efficiently, as I was required to create the PDF document from scratch, which required computation of the position of each of the lines in a table, and positioning of all the text, etc.
I was wondering if there would be a way to programmatically convert HTML code (or an HTML file if need be) into a PDF document either by using Win32/MFC itself or an external library.
Thanks in advance!
EDIT: Just to clarify, I am looking for solutions which minimize external dependencies.
You should evaluate this utility wkhtmltopdf:
http://code.google.com/p/wkhtmltopdf/
You can call it from the command line without the need to run a setup.
I use it generating my output documents as html then cal a ShellExecute(...) to convert it to PDF. It's great!
Inside uses webkit + qt. So compability with modern HTML is OK.
Hope it helps.
I'd take a look at PDF Creator, which can be used as a COM object (that acts pretty much like a printer). I haven't used it to print HTML, so I'm not sure, but my guess is that you'll probably end up having to instantiate a web browser control to render the HTML, and then feed it from there to the PDF control.
Some possible answers are in this thread:
C++ Library to Convert HTML to PDF?
Not sure if they will satisfy your particular requirements, but these might at least get you started.
Edit:
Some other possible options here.
Not MFC but you can try QtWebKit. It can render and export HTML to PDF, PNG, JPEG

Extracting key words from HTML to C++ under linux

I am working on a simple client-server project. Client is written in Java, it sends key words to C++ server written under Linux and recives a list of URLs with best ranks ( depending on number of occurrences of key words ). Server's job is to go through some URLs in search of key words and return best-fitting URLs. And now the problem is that I have to parse HTML sites to find occurrences of key words, plus I need to extract links from visited page to search on them as well. And my question is what library can I use to do that? Remember only C++ linux libraries are suitable for me. There were some similar topics, so I tried to go through most of them, but some of libraries parse only html files and I don't want to download every site I visit, but parse it on the fly and just store it's rank and url. Some of them look a bit complicated to me - for instance firstly parsing HTML to XML or something else and then finally work on the results with C++. Is there something simple and sufficient to do what I need it to do? Any advise will be appreciated.
I don't think regular expressions are appropriate for HTML parsing. I'm using libxml2, and I enjoy it very much - easy to use, portable and lightning fast.
To get URLs from the web using C/C++ you could use the libcurl library. To parse URLs and other not too easy stuff from the site you can use a regex library.
Separating the HTML tags from the real content can also be done without the use of a library.
For more advanced stuff one could use Qt which offers classes such as QWebPage (which uses WebKit) that allows one to access the DOM-Model of the page and extract individual HTML objects (e.g. single cells of a table) rather easyly.
You can try xerces-c. It's a powerful library for xml parsing. It support xml reading on the fly, dom and sax parsing.

Custom client app - need ability to control where documents are saved

Okay SO. I need some guidance. I apologize for the length of this post, but I need to provide some details:
I've got someone who is interested in me to do a small project for them. The application in general is a fairly straightforward employee record keeping / documentation app, but it makes pretty heavy use templated Word and Lotus documents. The idea is you select the employee “event” such as commendation, promotion, discipline, etc., and it loads the appropriate template doc and you fill it in from there, and later you can select an employee, view all the “events,” and view the individual documents associated with each one.
Thus, the app must know where the .docs are saved when the user is done.
The client actually has a v1 of this app (it doesn’t do any management of the files or anything, just launches Word/Lotus with the document you wanted to view in a new instance, presumably via a system() call.) We’ve not gotten into a detailed requirements phase, but the client and I agree that for this to really work, some kind of control over where the user saves the .doc’s to is going to be critical , because otherwise the app provides them with the new copy of the template doc, they "Save as" somewhere else, and the app is pointing to the blank copy it provided them with.
Obviously, I can’t think of a way to achieve “Save as” restriction/control in any way via just launching a new instance of Word. The client has the idea of an embedded Word/Lotus instance in the app with the template doc when you choose one, but I’ve few reservations with that:
I’ve dug around online and I’ve read that whichever version of Word I borrow MSWORD.OLB from will be the one the end user would require?
I’ve tried to do the MSDN example of embedding a Word doc from here, but as I’ve come to get used to, the MSDN example doesn’t even compile.
Even if I CAN figure out how to embed a .doc file into their application, I don’t know that I could control the use of “Save as…”
All of this STILL hasn’t touched on Lotus (!)
So… instinctively, I feel the embedded Word/Lotus thing has to be more work than it’s worth in the end.
So I’ve had a few other ideas brewing around.
One is looking into using Office XML (and if there’s a lotus equivalent), and get the user’s “inputs” separately and generate the document on the fly each time. I’m not particularly thrilled with that idea, but I think it COULD work, provided I just use old features to try and stay far backwards compatible.
Get user’s “inputs” separately and generate a document in HTML. Meh. Works, very cross platform and easily parsed and understood, but not good if you want to be able to email it to someone (who emails a .html? Works, yes, very unconventional which to the average user will throw them off) and even worse if you need to email it to someone for revisions…
Perhaps some kind of editable PDF? I know there are PDF libraries out there, and the more I stew on it, the more this sounds like the best option, though I’ve not done much work with PDFs and I don’t know how easily embeddable they are / what options one has when creating them. I know they can be save-disabled, I’ve had that with my bloody state taxes before.
I need some input here. Here’s the TLDR questions:
Is launching a new instance of Word for each .doc as bad as I feel, given user can “Save as” document wherever and then application is left pointing to a blank document?
Is trying to support embedded Word as big of a trouble as I feel like it is / more work than it’s worth / likely to cause problems with supporting multiple versions of Word? (Forward compatibility as well as currently released versions?)
What are thoughts on the PDF plan?
Any other good ideas?
Word does allow for programming some "Save" and "Save As" control via its object model. Any subroutines coded in VBA and placed into your Word template will be copied into all documents generated from that template. Additionally, most menu and Ribbon commands can be intercepted by creating a module containing subroutines named for the intercepted commands. So, for example, if a module contains a sub named FileSaveAs(), any code in that sub will be executed instead of the standard File|Save As command. Lastly, this code will replace Save As commands executed via keystroke, toolbar, menu, or Ribbon.
The code below will launch a dialog box to a predetermined path whenever a "Save" or "Save As" command is executed:
Sub FileSave()
ControlSaveLocation
End Sub
Sub FileSaveAs()
ControlSaveLocation
End Sub
Sub ControlSaveLocation()
Dim Directory As String
Directory = "C:\Documents\"
With Application.Dialogs(wdDialogFileSaveAs)
.Name = Directory
.Show
End With
End Sub
Hope this helps.

Everything inside < > lost, not seen in html?

I have many source/text file, say file.cpp or file.txt . Now, I want to see all my code/text in browser, so that it will be easy for me to navigate many files.
My main motive for doing all this is, I am learning C++ myself, so whenever I learn something new, I create some sample code and then compile and run it. Also, along these codes, there are comments/tips for me to be aware of. And then I create links for each file for easy navigation purpose. Since, there are many such files, I thought it would be easy to navigate it if I use this html method. I am not sure if it is OK or good approach, I would like to have some feedback.
What I did was save file.cpp/file.txt into file.html and then use pre and code html tag for formatting. And, also some more necessare html tags for viewing html files.
But when I use it, everything inside < > is lost
eg. #include <iostream> is just seen as #include, and <iostream> is lost.
Is there any way to see it, is there any tag or method that I can use ?
I can use regular HTML escape code < and > for this, to see < > but since I have many include files and changing it for all of them is bit time-consuming, so I want to know if there is any other idea ??
So is there any other solution than s/</< and s/>/>
I would also like to know if there any other ideas/tips than just converting cpp file into html.
What I want to have is,
in my main page something like this,
tip1 Do this
tip2 Do that
When I click tip1, it will open tip1.html which has my codes for that tip. And also there is back link in tip1.html, which will take me back to main page on clicking it. Everything is OK just that everything inside < > is lost,not seen.
Thanks.
You might want to take a look at online tools such as CodeHtmler, which allows you to copy into the browser, select the appropriate language, and it'll convert to HTML for you, together with keyword colourisation etc.
Or, do like many other people and put your documentation in Doxygen format (/** */) with code samples in #verbatim/#endverbatim tags. Doxygen is good stuff.
A few ideas:
If you serve the files as mimetype text/plain, the browser should display the text for you.
You could also possibly configure your browser to assume .cpp is text/plain.
Instead of opening the files directly in the browser, you could serve them with a web server than can change the characters for you.
You could also use SyntaxHighlighter to display the code on the client side using JavaScript.
It is pretty much essential that somewhere along the line you use a program to prevent the characters '<>&' from being (mis-)interpreted by your browser (and expand significant repeated blanks into '` '). You have a couple of options for when/how to do that. You could use static HTML, simply converting each file once before putting it into the web server document hierarchy. This has the least conversion overhead if the files are looked at more often than they are modified. Alternatively, you can configure your web server to server the pages via a filter program (CGI, or something more sophisticated) and serve the output of that in lieu of the file. The advantage is that files are only converted when needed; the disadvantage is that the files are converted each time they are needed. You could get fancy and consider a caching solution - convert the file on first demand but retain the converted file for future use. The main downside there is that the web server needs to be able to write to where the converted file is cached - not necessarily a good idea for security reasons. (A minimalist approach to security requires the document hierarchy to be owned by and only writable by one user, say webmaster, and the web server runs as another user, say webserver. Now the web server cannot do any damage because it cannot write anywhere in the document hierarchy. Simple; effective; restrictive.)
The program can be a simple Perl script or a simple C program (the C source for webcode 1.3 is available here).