bash rename using regex array substitution - regex

i have a very similar question as for this post.
i would like to know how to rename occurances within a filename with designated substitutions. for example if the original file is called: 'the quick brown quick brown fox.avi' i would like to rename it to 'the slow red slow red fox.avi'.
i tried this:
new="(quick=>'slow',brown=>'red')"
regex="quick|brown"
rename -v "s/($regex)/$new{$1}/g" *
but no love :(
i also tried with
regex="qr/quick|brown/"
but this just gives errors. any idea what im doing wrong?

Based on your example, I think you want multiple substitutions (not just converting "quick brown" to "slow red" but converting a list of words to a list of new words. You can separate the substitutions with a semicolon. Here's a solution that works for your example:
rename -v 's/quick/slow/g;s/brown/red/g' *
And if you're really bent on using an array to map the old strings to the new string, you can cram even more Perl into the argument to rename (but at some point you might just write the Perl script as a stand-alone script):
rename -v '%::new=(quick=>"slow",brown=>"red");s/(quick|brown)/$::new{$1}/g' *

Related

Issues while processing zeroes found in CSV input file with Perl

Friends:
I have to process a CSV file, using Perl language and produce an Excel as output, using the Excel::Writer::XSLX module. This is not a homework but a real life problem, where I cannot download whichever Perl version (actually, I need to use Perl 5.6), or whichever Perl module (I have a limited set of them). My OS is UNIX. I can also use (embedding in Perl) ksh and csh (with some limitation, as I have found so far). Please, limit your answers to the tools I have available. Thanks in advance!
Even though I am not a Perl developer, but coming from other languages, I have already done my work. However, the customer is asking for extra processing where I am getting stuck on.
1) The stones in the road I found are coming from two sides: from Perl and from Excel particular styles of processing data. I already found a workaround to handle the Excel, but -as mentioned in the subject- I have difficulties while processing zeroes found in CSV input file. To handle the Excel, I am using the '0 way which is the final way for data representation that Excel seems to have while using the # formatting style.
2) Scenario:
I need to catch standalone zeroes which might be present in whichever line / column / cell of the CSV input file and put them as such (as zeroes) in the Excel output file.
I will go directly to the point of my question to avoid loosing your valuable time. I am providing more details after my question:
Research and question:
I tried to use Perl regex to find standalone "0" and replace them by whichever string, planning to replace them back to "0" at the end of processing.
perl -p -i -e 's/\b0\b/string/g' myfile.csv`
and
perl -i -ple 's/\b0\b/string/g' myfile.csv
Are working; but only from command line. They aren't working when I call them from the Perl script as follows:
system("perl -i -ple 's/\b0\b/string/g' myfile.csv")
Do not know why... I have already tried using exec and eval, instead of system, with the same results.
Note that I have a ton of regex that work perfectly with the same structure, such as the following:
system("perl -i -ple 's/input/output/g' myfile.csv")
I have also tried using backticks and qx//, without success. Note that qx// and backticks have not the same behavior, since qx// is complaining about the boundaries \b because of the forward slash.
I have tried using sed -i, but my System is rejecting -i as invalid flag (do not know if this happens in all UNIX, but at least happens in the one at work. However is accepting perl -i).
I have tried embedding awk (which is working from command line), in this way:
system `awk -F ',' -v OFS=',' '$1 == \"0\" { $1 = "string" }1' myfile.csv > myfile_copy.csv
But this works only for the first column (in command line) and, other than having the disadvantage of having extra copy file, Perl is complaining for > redirection, assuming it as "greater than"...
system(q#awk 'BEGIN{FS=OFS=",";split("1 2 3 4 5",A," ") } { for(i in A)sub(0,"string",$A[i] ) }1' myfile.csv#);
This awk is working from command line, but only 5 columns. But not in Perl using #.
All the combinations of exec and eval have also been tested without success.
I have also tried passing to system each one of the awk components, as arguments, separated by commas, but did not find any valid way to pass the redirector (>), since Perl is rejecting it because of the mentioned reason.
Using another approach, I noticed that the "standalone zeroes" seem to be "swallowed" by the Text::CSV module, thus, I get rid off it, and turned back to a traditional looping in csv line by line and a spliter for commas, preserving the zeroes in that way. However I found the "mystery" of isdual in Perl, and because of the limitation of modules I have, I cannot use the Dumper. Then, I also explored the guts of binaries in Perl and tried the $x ^ $x, which was deprecated since version 5.22 but valid till that version (I said mine is 5.6). This is useful to catch numbers vs strings. However, while if( $x ^ $x ) returns TRUE for strings, if( !( $x ^ $x ) ) does not returns TRUE when $x = 0. [UPDATE: I tried this in a devoted Perl script, just for this purpose, and it is working. I believe that my probable wrong conclusion ("not returning TRUE") was obtained when I did not still realize that Text::CSV was swallowing my zeroes. Doing new tests...].
I will appreciate very much your help!
MORE DETAILS ON MY REQUIREMENTS:
1) This is a dynamic report coming from a database which is handover to me and I pickup programmatically from a folder. Dynamic means that it might have whichever amount of tables, whichever amount of columns in each table, whichever names as column headers, whichever amount of rows in each table.
2) I do not know, and cannot know, the column names, because they vary from report to report. So, I cannot be guided by column names.
A sample input:
Alfa,Alfa1,Beta,Gamma,Delta,Delta1,Epsilon,Dseta,Heta,Zeta,Iota,Kappa
0,J5,alfa,0,111.33,124.45,0,0,456.85,234.56,798.43,330000.00
M1,0,X888,ZZ,222.44,111.33,12.24,45.67,0,234.56,0,975.33
3) Input Explanation
a) This is an example of a random report with 12 columns and 3 rows. Fist row is header.
b) I call "standalone zeroes" those "clean" zeroes which are coming in the CSV file, from second row onwards, between commas, like 0, (if the case is the first position in the row) or like ,0, in subsequent positions.
c) In the second row of the example you can read, from the beginning of the row: 0,J5,alfa,0, which in this particular case, are "words" or "strings". In this case, 4 names (note that two of them are zeroes, which required to be treated as strings). Thus, we have a 4 names-columns example (Alfa,Alfa1,Beta,Gamma are headers for those columns, but only in this scenario). From that point onwards, in the second row, you can see floating point (*.00) numbers and, among them, you can see 2 zeroes, which are numbers. Finally, in the third line, you can read M1,0,X888,Z, which are the names for the first 4 columns. Note, please, that the 4th column in the second row has 0 as name, while the 4th column in the third row has ZZ as name.
Summary: as a general picture, I have a table-report divided in 2 parts, from left to right: 4 columns for names, and 8 columns for numbers.
Always the first M columns are names and the last N columns are numbers.
- It is unknown which number is M: which amount of columns devoted for words / strings I will receive.
- It is unknown which number is N: which amount of columns devoted for numbers I will receive.
- It is KNOWN that, after the M amount of columns ends, always starts N, and this is constant for all the rows.
I have done a quick research on Perl boundaries for regex ( \b ), and I have not found any relevant information regarding if it applies or not in Perl 5.6.
However, since you are using and old Perl version, try the traditional UNIX / Linux style (I mean, what Perl inherits from Shell), like this:
system("perl -i -ple 's/^0/string/g' myfile.csv");
The previous regex should do the work doing the change at the start of the each line in your CSV file, if matches.
Or, maybe better (if you have those "standalone" zeroes, and want avoid any unwanted change in some "leading zeroes" string):
system("perl -i -ple 's/^0,/string,/g' myfile.csv");
[Note that I have added the comma, after the zero; and, of course, after the string].
Note that the first regex should work; the second one is just a "caveat", to be cautious.

Replace list of value in a text file

I have the following problem:
I have a huge file and i have to replace some value (more than one).
Ad example, I have to replace:
DOG with RED
CAT with BLUE
FISH with GREEN
...
...
n with N
Do you know some software that is able (putting in input a list value) to replace all the value of the list in one hit in the text?
EDIT:
My text file is something really big as a book or similar.
In this book i have many words that i have to replace with other words
You can use sed to substitute matching expressions, for example
sed -e 's/DOG/RED/g;s/CAT/BLUE/g' < inputFile > outputFile
You haven't specified if you wan this changed in place or not. You could clearly delete the old version afterwards if you were happy with the results.
.....
If you are on Windows some other answers will give you an equivalent or suggest tools such as cygwin: e.g. here

Bash script to match segments in lines of source code

I'm trying to learn a new programming language, and it's big. Thousands of new terms to learn. I know programming, but don't know the name used for a certain procedure or constant in this language. But I have a script file that I put together that helps tremendously by searching through a large selection of source files, as long as I get a group of the characters right.
But now I want to use && to match up multiple segments in the same line, and I want to pass this whole expression to the script file as one argument, so I might pass it this with a read command:
moo && cow
And it would match this:
Moonlight over Moscow
But not this:
I heard a cow mooing.
If I wanted it either way I would pass it this:
moo && cow || cow && moo
It's tricky, and probably outside what you can normally do with the available syntax. But then I'm no expert, so I don't really know.
I'm flexible on what gets passed to the script, like single &s and |s, the use of brackets, and so on. I just need to understand the rules involved and which utility can do it for me. Or set of utilities if it comes to that.
If you only want to check for the two elements in order, simply match anything between them with .*:
my_str="moonlight over moscow"
if [[ $my_str =~ moo.*cow ]]; then
echo "match"
fi

Using regex with `rename` version from `util-linux`

I’m using a GNU/Linux distribution where the utility rename comes from util-linux and I want to make full use of regular (Perl or POSIX) expressions with it.
There are two versions of rename :
The “Perl” version, with syntax rename 's/^fgh/jkl/' fgh*
The util-linux version, with syntax rename fgh jkl fgh*
If the use of regexes seems pretty obvious with the first one, to which I have no easy access. However, I’m confused about the second one: I could not find any relevant documentation or examples on the possible use, and in that case the format, of the regular expressions to use.
Let’s take, to make a simple example, a directory containing:
foo_a1.ext
foo_a32.ext
foo_c18.ext
foo_h12.ext
I want to use a syntax like one of these two lines:
rename "foo_[a-z]([0-9]{1,2}).ext" "foo_\1.ext" *
rename "foo_[:alpha:]([:digit:]{1,2}).ext" "foo_\1.ext" *
for which the expected output would be:
foo_1.ext
foo_32.ext
foo_18.ext
foo_12.ext
Of course this does not work! Either I’m missing something obvious, or there is
no implemented way to use actual regular expressions with this tool.
(Please note that I am aware of the other possibilities for renaming files with regular expressions in a shell interpreter; this question aims at a specific version of the rename tool.)
Here is the manual page: http://linux.die.net/man/1/rename. It is pretty straightforward:
rename from to file...
rename will rename the specified files by replacing the first
occurrence of from in their name by to.
I believe there are no regexes, it is just plain substring match.
The following command gives the expected result with your input file but using the perl version:
rename 's/foo_\D+(\d+)/foo_$1/' *.ext
You can test the command using -n option to rename

Text editor that searches within search results

Does anyone know of a text editor that searches within search results using regex?
I would like to perform a regex search on several text files and get a list of matches and then apply another regex search on the search results to further narrow down results. I would prefer a Windows GUI editor rather than a specialized editor with a steeper learning curve like Vim or Emacs.
You might want to look at PowerGrep. It's not exactly a text editor, but you can open files containing your search results within its built-in text editor, and edit stuff there.
The main thing though is that it allows you to search using a regex (or list of regexes), then apply an additional regex to each search result, before returning a 'final' result, which I believe is what you are asking for. Kind of hard to explain, but maybe you get the idea.
The only problem with PowerGrep is that its UI is not very good. To say it takes some getting used to is an understatement. But once you figure it out, you can do a lot of powerful stuff (search/replace, data collection, etc on multiple files whose file names can also be regexes).
The companion product EditPadPro by the same company is also a great editor that has a really good regex engine built-in (probably the same one as in PowerGrep), but it doesn't allow you to do the 'regex-applied-to-a-regex-result' that I think you are asking for.
Do you want list of files in which text matches both reg.exps or a list of lines?
In the first case you can do :
{ grep -l -R 'pattern1' * ; grep -l -R 'pattern2' * } | sort | uniq -d
Note that with Windows you can get those binaries from GnuWin32 and use nearly the same syntax in a batch file:
( grep -l -R "pattern1" *
grep -l -R "pattern2" *
) | sort | uniq -d
In the last case you can with vim use my answer to narrow quickfix results with reg.exp.
Of course you can also copy your search results to a buffer and do some linewise filtering.