File system regular expression search tool - regex

What is the best tool to make complex (multi-line) regular expression file contents searches with good reporting capabilities?
I need to make a report over large Java/JSP code base and I have to make some charts afterward.
Eclipse is rather good at searches, but it does not provide good report of what is found. It just shows the tree of files, but I would like to see a table with columns corresponding to full match, each group, file name, file path, file date, may some version control information etc. Then I can transfer this table to Excel and make some graphs that I want.
Is there some generic file system search tool that has such capabilities? Or maybe there is some Eclispe plugin that can give better reports (note that I'm stuck on eclipse 3.1.2)?

Agent Ransack, TextPad, and UltraEdit allow you to perform regular expression searches against the file system. My favorite is Agent Ransack as you can specify regular expressions for the file names and for the content.

PowerGREP (on Windows) can be used to do (most of) that. You can define the format of your search results quite freely. I haven't tried yet to also add file meta information to the search results, but that should work. Not sure if you can add version control information (where would that come from?) - perhaps if you could be a bit more specific, I could check.
Other than that, why not write a small Python/Ruby/Perl script like JasonTrue suggested?

For searches over code bases with queries that understand the language structure, look at SD Search Engine. This tool indexes larges source base to provide very fast query response.
Queries are stated in terms of langauge elements (identifiers, operators, strings, ...) with constraints over the language elements (including wildcards and regexps on identifiers, strings and comments, as well as range constraints on numbers). Language whitespace and linebreaks (and comments unless you insist) are ignored.
If you want to do a plain regexp search on file character content, you can do that too but you don't get the speed advantage of the index, runs more like regular grep.
The interactive query result is shown in a hit window with other hits; by clicking, you can go to window containg the full source code of a hit.
In logging mode, all hits found are written to a log file with N lines of context, where you configure N. That's probably the report you want.

um... grep -r ?
Or ruby/perl/python, if you want to have more control over the final output; it sounds like what you're after would only be a few lines.

Related

How to do an efficient search for dynamically defined regexes in Elasticsearch?

I am working in a file system project (like dropbox). For the file system, I have an indexed data for full text search in elastic search. I have lots of large documents and searching works really well. But now my requirement is to use this data to query for some regexes. We have an admin panel for the customer and regexes will be defined dynamically by the customer in admin panel.
I know i can do regex searches in elastic search, but here the problem is tokenizer. For instance, let’s assume that user wants to create a regex pattern and wants to search 3 letters, ‘-’ and 2 digits such that “ABC-12” or "ASD-34". Problem here is my tokenizer. The defined tokenizer omits the character ‘-’, and indexes “ABC” and “12” separately. You may say not the omit ‘-’ character. But user may want to search a pattern with 3 letters, white space and 2 digits to retrieve data "ABC 12". Here white space is the problem. Somehow I have to use a tokenizer and cannot cover all dynamic regexes. So searching in the index does not solve my problem.
Actually for this type of search, I have another option which is to query all data with match all. With search scroll api, I can query all original documents partially. After each response from scroll api, I can run my regex finder in separate thread. So that I can prepare the desired data after the scrolling operation. Do you think this option is good for big data? I think I will need good cpu power and ram. I know it is not a special solution but I can not find any effective solution for my requirement. I am open for better solutions. Thanks.
I believe, ES allows you to analyse the same field multiple times. Documentation states that new analysers can be added to existing fields later:
New multi-fields can be added to existing fields using the PUT mapping API.
This opens up a possibility to dynamically add new analysers (and tokenisers for that matter) as you find what sort of regex your users are after. I am not sure how trivial it will be for your particular use case, but this seems like an avenue to explore

Sublime Text: interactive confirm for replace?

I need to do a lot of search/replace across 50+ files, and am using Sublime Text 3.
Is there a way to step through and interactively confirm each change? I dont't want a blanket Replace All action that just performs all replacements.
I am thinking way back to vi/vim with its %s/old/new/gc functionality.
Both the Find/Replace and Find in Files/Replace commands don't natively support prompting you if the replacement should happen. Regular in-buffer find/replace just replaces directly and the only confirmation that you can get is when you do a Find in Files and Sublime prompts you to confirm the replacement after telling you how many replacements will be made.
As such, the only way to get something like this is to look to an external plugin/package that would do it's own find and replace option so that you could be asked to confirm the changes.
I'm not personally aware of any packages that would do this, but a search in Package Control turns up the RegReplace package, which lists among its features:
Create commands that highlight results and requiring confirmation before replacing.
That said I've never used the package myself, and from briefly looking at the documentation site it seems like it's only capable of searching in the current document and not across files.
A potential workaround would be to use the native Find in Files to find all files with matches, then manually open them and use RegReplace to perform the same operation again.

Autoclose xml tags in C/C++ file in vim

I have some documentation strings embedded within the source code (C/C++ files) as XML tags and I'd like to know what's the most minimal solution to make vim autoclose the tags (closest matching tag).
I've found closetag.vim but is there away to do this neatly without modifying anything but the .vimrc file?
Vim has no built-in support for that, so the closetag.vim plugin is the proper and easiest solution. (I use it myself, too!) Of course, you can develop your own simple mappings (that search backwards for an open tag, get that, drop the attributes, add the slash, and insert that), but:
that will either be very simplistic and therefore often wrong
or ends up with as much complexity as closetag, becoming a reimplementation of that plugin
If some rather strange restrictions (e.g. a custom primitive sync across systems) only allow you to manipulate the ~/.vimrc itself, you could just append the entire plugin's code to it (though I'd recommend against such an ugly hack).

Is there an editor to autoindent / add spaces in existing code?

I have some JavaScript code which I want to make more readable. However, due to the amount of code I want to use a tool which does this automatically for me.
Are there such tools already available or do I have to manually perform some "find/replace-alls"?
The code which I want to convert is written on a single line without spaces.
Quick search found http://jsbeautifier.org/ which seems to do what you are looking for online.
Search terms: javascript beautifier

C++ Logger-Should I use an ordinary xml parser?

I'm working on a logging system for my 2D engine, and I'm confused on how I should go about creating/editing the file, and how I should output that file.
I've learned that XML is more of a data carrier rather than a data displayer like HTML is. I've read that I can use XML to HTML converters. One method I've thought about is writing characters to a file in HTML.
Clarity on these matters is what I ask of you, stack overflow.
Creating an XML (or HTML) file doesn't need any special library. Straightforward string concatenation is usually good enough, you may have to encode some special characters (e.g. > into >.
But as Owen says, plain text is a log more common for log files. One reasonable compromise is comma-separated values in a text file, this gives you a little bit of structure without much overhead. For example, the Windows web server (IIS) uses this format by default, and if you have some fields that are output for each line such as timestamp or source filename and line number, this makes it easy to separate those out again.
Just about every log I've ever worked with has been pure text delimited by newlines. If you're going to depart from that, you may want to ask yourself what it is about your logging needs that you want to accomplish with markup.
If you must go the way of markup, I would suggest an XML format that contains a minimal set of markup that would be useful in your situation. You could use XML to capture structure in your log entries (timestamp, severity, and operational code, for example) that would be inconvenient to code for in HTML.
Note that you could also go hybrid and embed some XHTML tags in an XML element whose purpose is to capture displayable text, if you want.
The problem with XML or HTML files is that you cannot append at any time. You have to close the final tag (document tag) properly at the end of writing.
Therefore, it's not a popular format for logging.
For logging, I suggest using one of the existing log engines, such as Apache logger, or, John Torjo's boost log candidate. They will support log levels, runtime configuration, etc.
If you are considering writing logs in XML files, please, stop.
Log files should be simple plain text files, XML-izing it is introducing needless complexity. They are not structured data, they are meant to be read by people, not automated tools.
It all starts with XML logs, and then it goes downhill from there.