How do I retrieve a specific split from a file in hdfs? - hdfs

I'm attempting to debug a Hadoop Streaming Job. I can see that a given mapper is failing when trying to process hdfs://filepath/filename:364+28. How can I determine what line / lines in the file match up with 364+28?

You could probably try getting that particular split and see what's in there. For example, if you are using C#, you could use System.Environment.GetEnvironmentVariable("map_input_start") to get the start of the split and then see what is wrong with that particular split.
Or, if you are using Python, you can use os.environ["map_input_start"]. I don't know if there is any direct way to achieve this.
HTH

Related

Why doesn't Find-String work on the output from Powershell commands?

I don't know if this is a problem on a larger set of Powershell library commands, but I'm using the Share Point Online library right now and can't get this to work:
Get-SPODeletedSite | Select-String "100060-24"
I can see that this string exists in the output if I just type "Get-SPODeletedSite", but when I try and filter for just the line that contains this text, it comes up empty. What's going on here? This is a basic piping operation.
PowerShell is an Object-Oriented language. As such, it is designed to work with Objects (Structures). PS has builtin instructions on how to present Objects on the screen but that is for display purposes only. That's the best explanation I can provide. Maybe someone with a better overall understanding of PS could offer a better explaination.

Sublime Text: interactive confirm for replace?

I need to do a lot of search/replace across 50+ files, and am using Sublime Text 3.
Is there a way to step through and interactively confirm each change? I dont't want a blanket Replace All action that just performs all replacements.
I am thinking way back to vi/vim with its %s/old/new/gc functionality.
Both the Find/Replace and Find in Files/Replace commands don't natively support prompting you if the replacement should happen. Regular in-buffer find/replace just replaces directly and the only confirmation that you can get is when you do a Find in Files and Sublime prompts you to confirm the replacement after telling you how many replacements will be made.
As such, the only way to get something like this is to look to an external plugin/package that would do it's own find and replace option so that you could be asked to confirm the changes.
I'm not personally aware of any packages that would do this, but a search in Package Control turns up the RegReplace package, which lists among its features:
Create commands that highlight results and requiring confirmation before replacing.
That said I've never used the package myself, and from briefly looking at the documentation site it seems like it's only capable of searching in the current document and not across files.
A potential workaround would be to use the native Find in Files to find all files with matches, then manually open them and use RegReplace to perform the same operation again.

Is there an editor to autoindent / add spaces in existing code?

I have some JavaScript code which I want to make more readable. However, due to the amount of code I want to use a tool which does this automatically for me.
Are there such tools already available or do I have to manually perform some "find/replace-alls"?
The code which I want to convert is written on a single line without spaces.
Quick search found http://jsbeautifier.org/ which seems to do what you are looking for online.
Search terms: javascript beautifier

Get a particular text from website

I'm looking for a way if you know the location where to read the text for example say, under a particular category, how would you connect to a website and search & read the text from it?
what steps do i need to follow to learn about that?
you could use libcurl/cURL for your HTML retrival
You're probably looking for a web crawler.
Here's an example of a simple crawler written in C++.
Moreover, you might want to have a look to wget, a software to retrieve files via HTTP, HTTPS and FTP.
if you are looking at a specific web-page, you could try retrieving the page and parsing it to get to the exact location you want. e.g. specific div, etc.
since you are using c++, you could try reading up on using libcurl to retrieve the information you need from the URL.
You can download an html file with WinHTTP(working example) and then search the file. There's some find algos in the std::string class for searching if your needs are relatively basic.

File system regular expression search tool

What is the best tool to make complex (multi-line) regular expression file contents searches with good reporting capabilities?
I need to make a report over large Java/JSP code base and I have to make some charts afterward.
Eclipse is rather good at searches, but it does not provide good report of what is found. It just shows the tree of files, but I would like to see a table with columns corresponding to full match, each group, file name, file path, file date, may some version control information etc. Then I can transfer this table to Excel and make some graphs that I want.
Is there some generic file system search tool that has such capabilities? Or maybe there is some Eclispe plugin that can give better reports (note that I'm stuck on eclipse 3.1.2)?
Agent Ransack, TextPad, and UltraEdit allow you to perform regular expression searches against the file system. My favorite is Agent Ransack as you can specify regular expressions for the file names and for the content.
PowerGREP (on Windows) can be used to do (most of) that. You can define the format of your search results quite freely. I haven't tried yet to also add file meta information to the search results, but that should work. Not sure if you can add version control information (where would that come from?) - perhaps if you could be a bit more specific, I could check.
Other than that, why not write a small Python/Ruby/Perl script like JasonTrue suggested?
For searches over code bases with queries that understand the language structure, look at SD Search Engine. This tool indexes larges source base to provide very fast query response.
Queries are stated in terms of langauge elements (identifiers, operators, strings, ...) with constraints over the language elements (including wildcards and regexps on identifiers, strings and comments, as well as range constraints on numbers). Language whitespace and linebreaks (and comments unless you insist) are ignored.
If you want to do a plain regexp search on file character content, you can do that too but you don't get the speed advantage of the index, runs more like regular grep.
The interactive query result is shown in a hit window with other hits; by clicking, you can go to window containg the full source code of a hit.
In logging mode, all hits found are written to a log file with N lines of context, where you configure N. That's probably the report you want.
um... grep -r ?
Or ruby/perl/python, if you want to have more control over the final output; it sounds like what you're after would only be a few lines.