How to Set Custom Delimiter in PIG - mapreduce

What is the correct syntax to set a custom TextInputFormat delimiter in Pig? I've tried several variations on the following but its treating it as string values instead of Carriage Return Line Feed.
set textinputformat.record.delimiter '\r\n';
Pig Version is 0.12.0-cdh5.9.0 and Hadoop Version is 2.6.0-cdh5.9.0

Not ideal but a workaround:
Create a properties file like myprops.properties which contains the following line: textinputformat.record.delimiter=\r\n
Then run your script like: pig -P ~/myprops.properties -f path/to/pigscript.pig
It looks like this is a known issue as mentioned in the fourth paragraph of the fourth comment of: PIG_4572

Here is the syntax
SET textinputformat.record.delimiter '<delimiter>';
This works for me

Related

What regex can I use to match and replace full stops in multiple filenames?

I am looking to replace full stops in a filename however I need to remove some and replace others.
The file names are structured like so:
A.A M12345678 SOMEWORD 20.08.2019.pdf
A.A M12345678 SOMEWORD1 SOMEWORD2 20.08.2019.pdf
I want the format to be the following:
AA M12345678 SOMEWORD 20-08-2019.pdf
AA M12345678 SOMEWORD1 SOMEWORD2 20-08-2019.pdf
So the first full stop should be removed but the full stop encountered in the date should be a hyphen (-).
I have been using command prompt but I am running into some issues as I am fairly new to regular expressions.
I have tried approaching the problem one step at a time namely by just focusing on replacing the date format.
I've tested my regex using https://regexr.com/ and it matches correctly.
[0-9]\K[.]
My understanding of the code above should match the full stops in the date.
However when I run the following command:
ren *[0-9]\K[.].pdf -
It fails to find the file.
Expected Result
AA M12345678 SOMEWORD 20-08-2019.pdf
Actual Result
The expression I use just returns this error when I use the REN command.
"The filename, directory name, or volume label syntax is incorrect."
Pretty sure you could do this in the command-line with sed as follows. Run the following command:
sed -i "s/(\d\d)\.(\d\d)\.(\d\d\d\d)/\1-\2-\3 ; s/^(.)\.(.*)/\1\2/" <YOUR_FILE>
The -i flag is for modification in place. Here, I'm running two separate search-and-replace functions on each line; the first is to match to a date, and then format it accordingly, and the second is to match your A.A metadata and format it accordigly.
More about capture groups here.

Create Vim Command for a Regex search

I have a few Regex expressions that I use with xVim for Xcode. Rather than repeatedly typing them out in the command bar with \<Regex>, I'd like to be able to invoke them with a custom command, like :Regex1. So I've added command Regex1 “/-\s*\(“ to my xvimrc file and restarted Xcode. When I run :Regex1 however nothing happens.
Your command wouldn't even work in original Vim. I don't know xVim, but try something along these lines:
" With cursor moving to match.
command Foo /foo/
" Just updating the search pattern (but less likely to be portable to xVim).
command Foo let #/ = 'foo'
If none of that works; try defining a mapping instead. As this is just translating keys, it has the highest chance of being supported.
I would suggest using this PERL Regex plugin since it already does what you want.
https://github.com/othree/eregex.vim
Abbreviations ...
I understand you often use the same regex. You can use abreviations instead of a command to do a search.
ab re -\s*(
then type / + re + space and your long regex (here just "-\s*(" should expand).
... Not user defined command
User defined commands are not available in ed nor in vi nor in vim without the +eval compilation flag (:h user-commands and scroll one line up).
For a list of ex commands: http://www.csb.yale.edu/userguides/wordprocess/vi-summary.html
For a list of ed commands: http://pubs.opengroup.org/onlinepubs/7908799/xcu/ed.html

How to use VI to remove ocurance of character on lines matching regex?

I'm trying to change the case of method names for some functions from lowercase_with_underscores to lowerCamelCase for lines that begin with public function get_method_name(). I'm struggling to get this done in a single step.
So far I have used the following
:%s/\(get\)\([a-zA-Z]*\)_\(\w\)/\1\2\u\3/g
However, this only replaces one _ character at a time. What I would like it a search and replace that does something like the following:
Identify all lines containing the string public function [gs]et.
On these lines, perform the following search and replace :s/_\(\w\)/\u\1/g
(
EDIT:
Suppose I have lines get_method_name() and set_method_name($variable_name) and I only want to change the case of the method name and not the variable name, how might I do that? The get_method_name() is more simple of course, but I'd like a solution that works for both in a single command. I've been able to use :%g/public function [gs]et/ . . . as per the solution listed below to solve for the get_method_name() case, but unfortunately not the set_method_name($variable_name) case.
If I've understood you correctly, I don't know why the things you've tried haven't worked but you can use g to perform a normal mode command on lines matchings a pattern.
Your example would be something like:
:%g/public function [gs]et/:s/_\(\w\)/\u\1/g
Update:
To match only the method names, we can use the fact that there will only be method names before the first $, as this looks to be PHP.
To do that, we can use a negative lookbehind, #<!:
:%g/public function [gs]et/:s/\(\$.\+\)\#<!_\(\w\)/\u\2/g
This will look behind #<! for any $ followed by any number of characters and only match _\(\w\) if no $s are found.
Bonus points(?):
To do this for multiple buffers stick a bufdo in front of the %g
You want to use a substitute with an expression (:h sub-replace-expression)
Match the complete string you want to process then pass that string to a second substitute command to actually change the string
:%s/\(get\|set\)\zs_\w\+/\=substitute(submatch(0), '_\([A-Za-z]\)', '\U\1', 'g')
Running the above on
get_method_name($variable_name)
set_method_name($variable_name)
returns
getMethodName($variable_name)
setMethodName($variable_name)
To have vi do replace sad with happy, on all lines, in a file:
:1, $ s/sad/happy/g
(It is the :1, $ before the sed command that instructs vi to execute the command on every line in the file.)

Delete all lines upto some regex match

I want to delete everything from start of the document upto some regex match, such as _tmm. I wrote the following custom command:
command! FilterTmm exe 'g/^_tmm\\>/,/^$/mo$' | norm /_tmm<CR> | :0,-1 d
This doesn't work as expected. But when I execute these commands directly using the command line, they work.
Do you have any alternative suggestions to accomplish this job using custom commands?
It seems that you want to remove from beginning to the line above the matched line.
/pattern could have offset option. like /pattern/{offset}, :h / for detail, for your needs, you could do (no matter where your cursor is):
ggd/_tmm/-1<cr>
EDIT
I read your question twice, it seems that you want to do it in a single command line.
Your script has problem, normal doesn't support |, that is, it must be the last command.
try this line, if it works for you:
exe 'norm gg'|/_tmm/-1|0,.d

Trying to remove the first column of a document.

I'm using this command below to remove the first column of a document:
%s/^[^\t]*\zs\t[^\t]*\ze//g
but it says command not found. Any idea?
Here's the quickest way to remove the first column:
Press gg to go to the first character in the document.
Hit Ctrl+V to enter visual block mode.
Hit G (that is, shift-g) to go to the end of the document
Hit x to delete the first column.
I like the block selection solution of #Peter, but if you want to use substitution you need this command:
:%s/^.//
Let's analyze why this works:
:%s exec a substitution on all the document
/^./ select the first character after the start of the line
/ and replace it with... nothing.
If I understand you correctly, this should do the job:
:%s/^[^\t]//
The command removes all leading characters that are not a tabulator.
Alternatively, if you're editing a tabulator separated values document and want to remove all "columns" before the first tabulator, then this should do it for you:
%s/^[^\t]*\t//
The below command worked for me:
:%s/^\w*//