Unix shell scripting — find and replace - regex
There are multiple files (1000 files) that need this change; doing it manually is not feasible — can anyone please suggest how to go about it in Unix/Linux shell scripting?
Requirement:
Wherever there is l2cache[25-48] for CONF_CATALOG_MULTI_SERVER_HOST_SECONDARY and CONF_SEARCH_MULTI_SERVER_HOST_SECONDARY, I need to change [13-24].
Example : edc-v1-l2cache25 this changes to edc-v1-l2cache13, edc-v1-l2cache26 changes to edc-v1-l2cache14 and so on until 36 changes to 24 and removing the rest 37-48 .
Wherever there is l2cache[1-24] for CONF_SEARCH_MULTI_SERVER_HOST and CONF_CATALOG_MULTI_SERVER_HOST, I need to change to [1-12]
i.e need to remove l2cache[13-24], only edc-v1-l2cache[1-12] should be present in CONF_CATALOG_MULTI_SERVER_HOST and CONF_SEARCH_MULTI_SERVER_HOST
Example/INPUT data:
CONF_CATALOG_MULTI_SERVER_HOST_SECONDARY=edc-v1-l2cache25 ,edc-v1-l2cache26 ,edc-v1-l2cache27 ,edc-v1-l2cache28 ,edc-v1-l2cache29 ,edc-v1-l2cache30 ,edc-v1-l2cache31 ,edc-v1-l2cache32 ,edc-v1-l2cache33 ,edc-v1-l2cache34 ,edc-v1-l2cache35 ,edc-v1-l2cache36 ,edc-v1-l2cache37 ,edc-v1-l2cache38 ,edc-v1-l2cache39 ,edc-v1-l2cache40 ,edc-v1-l2cache41 ,edc-v1-l2cache42 ,edc-v1-l2cache43 ,edc-v1-l2cache44 ,edc-v1-l2cache45 ,edc-v1-l2cache46 ,edc-v1-l2cache47 ,edc-v1-l2cache48
CONF_SEARCH_MULTI_SERVER_HOST_SECONDARY=edc-v1-l2cache25 ,edc-v1-l2cache26 ,edc-v1-l2cache27 ,edc-v1-l2cache28 ,edc-v1-l2cache29 ,edc-v1-l2cache30 ,edc-v1-l2cache31 ,edc-v1-l2cache32 ,edc-v1-l2cache33 ,edc-v1-l2cache34 ,edc-v1-l2cache35 ,edc-v1-l2cache36 ,edc-v1-l2cache37 ,edc-v1-l2cache38 ,edc-v1-l2cache39 ,edc-v1-l2cache40 ,edc-v1-l2cache41 ,edc-v1-l2cache42 ,edc-v1-l2cache43 ,edc-v1-l2cache44 ,edc-v1-l2cache45 ,edc-v1-l2cache46 ,edc-v1-l2cache47 ,edc-v1-l2cache48
CONF_CATALOG_MULTI_SERVER_HOST=edc-v1-l2cache1 ,edc-v1-l2cache2 ,edc-v1-l2cache3 ,edc-v1-l2cache4 ,edc-v1-l2cache5 ,edc-v1-l2cache6 ,edc-v1-l2cache7 ,edc-v1-l2cache8 ,edc-v1-l2cache9 ,edc-v1-l2cache10 ,edc-v1-l2cache11 ,edc-v1-l2cache12 ,edc-v1-l2cache13 ,edc-v1-l2cache14 ,edc-v1-l2cache15 ,edc-v1-l2cache16 ,edc-v1-l2cache17 ,edc-v1-l2cache18 ,edc-v1-l2cache19 ,edc-v1-l2cache20 ,edc-v1-l2cache21 ,edc-v1-l2cache22 ,edc-v1-l2cache23 ,edc-v1-l2cache24
CONF_SEARCH_MULTI_SERVER_HOST=edc-v1-l2cache1 ,edc-v1-l2cache2 ,edc-v1-l2cache3 ,edc-v1-l2cache4 ,edc-v1-l2cache5 ,edc-v1-l2cache6 ,edc-v1-l2cache7 ,edc-v1-l2cache8 ,edc-v1-l2cache9 ,edc-v1-l2cache10 ,edc-v1-l2cache11 ,edc-v1-l2cache12 ,edc-v1-l2cache13 ,edc-v1-l2cache14 ,edc-v1-l2cache15 ,edc-v1-l2cache16 ,edc-v1-l2cache17 ,edc-v1-l2cache18 ,edc-v1-l2cache19 ,edc-v1-l2cache20 ,edc-v1-l2cache21 ,edc-v1-l2cache22 ,edc-v1-l2cache23 ,edc-v1-l2cache24
OUTPUT data:
CONF_CATALOG_MULTI_SERVER_HOST_SECONDARY=edc-v1-l2cache13 ,edc-v1-l2cache14 ,edc-v1-l2cache15 ,edc-v1-l2cache16 ,edc-v1-l2cache17 ,edc-v1-l2cache18 ,edc-v1-l2cache19 ,edc-v1-l2cache20 ,edc-v1-l2cache21 ,edc-v1-l2cache22 ,edc-v1-l2cache23 ,edc-v1-l2cache24
CONF_SEARCH_MULTI_SERVER_HOST_SECONDARY=edc-v1-l2cache13 ,edc-v1-l2cache14 ,edc-v1-l2cache15 ,edc-v1-l2cache16 ,edc-v1-l2cache17 ,edc-v1-l2cache18 ,edc-v1-l2cache19 ,edc-v1-l2cache20 ,edc-v1-l2cache21 ,edc-v1-l2cache22 ,edc-v1-l2cache23 ,edc-v1-l2cache24
CONF_CATALOG_MULTI_SERVER_HOST=edc-v1-l2cache1 ,edc-v1-l2cache2 ,edc-v1-l2cache3 ,edc-v1-l2cache4 ,edc-v1-l2cache5 ,edc-v1-l2cache6 ,edc-v1-l2cache7 ,edc-v1-l2cache8 ,edc-v1-l2cache9 ,edc-v1-l2cache10 ,edc-v1-l2cache11 ,edc-v1-l2cache12
CONF_SEARCH_MULTI_SERVER_HOST=edc-v1-l2cache1 ,edc-v1-l2cache2 ,edc-v1-l2cache3 ,edc-v1-l2cache4 ,edc-v1-l2cache5 ,edc-v1-l2cache6 ,edc-v1-l2cache7 ,edc-v1-l2cache8 ,edc-v1-l2cache9 ,edc-v1-l2cache10 ,edc-v1-l2cache11 ,edc-v1-l2cache12
So, those names are diabolically long!
Judging from the output, for the CSMSHS and CCMSHS entries, you need to change things so that entries with 25-48 are reorganized so that there are 12 entries with values 13-24 and the other 12 entries are deleted.
Similarly, for the CSMSH and CCMSH entries, you want to delete the entries with values 13-24.
Presumably you don't have to worry about erratic entries in the files; they are all consistent at the moment, and should all be consistent afterwards.
Frankly, the simplest thing is to create the replacement string and use a relatively simple search to identify the lines that need to be changed (ensuring that the changes are idempotent; reapplying the script to a converted file won't change the file a second time). I find the space-comma notation off-putting; in the circles I work in, that should be comma-space. However, we can leave that alone.
I'd use Perl, but Awk could be used if you wanted to, and Python likewise would do the job handily.
#!/usr/bin/perl
use strict;
use warnings;
my $base = "edc-v1-l2cache";
my $secondary = "";
my $pad = "";
for (my $i = 13; $i <= 24; $i++)
{
$secondary .= $pad . $base . $i;
$pad = " ,"; # ", "!
}
my $primary = "";
$pad = "";
for (my $i = 1; $i <= 12; $i++)
{
$primary .= $pad . $base . $i;
$pad = " ,"; # ", "!
}
while (<>)
{
s//$1$secondary/
if (m/(CONF_(?:CATALOG|SEARCH)_MULTI_SERVER_HOST_SECONDARY=)${base}25 ,.*${base}48$/);
s//$1$primary/
if (m/(CONF_(?:CATALOG|SEARCH)_MULTI_SERVER_HOST=)${base}1 ,.*${base}24$/);
print;
}
If some of the entries might be missing and that needs to be treated specially, you have to work a lot harder.
Related
How can I elegantly handle state when parsing line oriented files using regex?
I have a perl script that I use to extract data from a raw data/log file. I need help on making the script dynamic. First, let me show you the part of the perl script and raw data file. Perl script: if ( /Catalyst tester (\S+)\S+/ ) { $DETAILS{tester_name} = $1; } if ( /(CATALYST_TH\s*1)/ ) { $FOUND_CAT = 1; $DETAILS{test_head} = $1; $TEST_HEAD = $1; } if ($FOUND_CAT) { if ( /(BACKPLANE\s*A)/ ) { $FRAME = $TEST_HEAD .' '. $1; $FOUND_BACKPLANE_A = 1; } if ( /(BACKPLANE\s*B)/ ) { $FRAME = $TEST_HEAD . ' ' . $1; $FOUND_BACKPLANE_B = 1; } } if ( /END/ ) { $FOUND_CAT = 0; $FOUND_BACKPLANE_A = 0; $FOUND_BACKPLANE_B = 0; $FOUND_PRECISION_1 = 0; $FOUND_PRECISION_2 = 0; $FOUND_UB_SPS = 0; $FOUND_HSD100_1 = 0; $FOUND_HSD100_2 = 0; $FOUND_HSD100_3 = 0; $FOUND_TSY = 0; $FOUND_TIME_SUB = 0; } if ($FOUND_BACKPLANE_A) { if ( /(\d+)\s+(\S+)\s+(\w+)\s+\w+\s+\d*\s+\#\s+\S+\s+(?:\d+\s+){2}((?!.*EMPTY\b).+)$/ ) { push #{$DETAILS{frame}}, $FRAME; push #{$DETAILS{slot}}, $1; push #{$DETAILS{part_no}}, $2; push #{$DETAILS{serial_no}}, $3; push #{$DETAILS{board_name}}, $4; } } if ($FOUND_BACKPLANE_B) { if ( /(\d+)\s+(\S+)\s+(\w+)\s+\w+\s+\d*\s+\#\s+\S+\s+((?!.*EMPTY\b).+)$/ ) { push #{$DETAILS{frame}}, $FRAME; push #{$DETAILS{slot}}, $1; push #{$DETAILS{part_no}}, $2; push #{$DETAILS{serial_no}}, $3; push #{$DETAILS{board_name}}, $4; } } if( /(PRECISION\_AC\s*1)/ ) { $FOUND_PRECISION_1 = 1; $FRAME = $1; } if ($FOUND_PRECISION_1) { if ( /(\d+)\s+(\S+)\s+(\w+)\s+\w+\s+\d*\s+\#\s+\S+\s+((?!.*EMPTY\b).+)/ ) { push #{$DETAILS{frame}}, $FRAME; push #{$DETAILS{slot}}, $1; push #{$DETAILS{part_no}}, $2; push #{$DETAILS{serial_no}}, $3; push #{$DETAILS{board_name}}, $4; } } ## And the rest of the script follows the same format In my perl script, my logic is if the line/word/header(as I prefer to call it) is found, assign a variable with a true or 1. Then in another if statement if the variable is 1, search for the data needed using regex and store it in a hash. Now my main problem is that it is not dynamic. If you noticed I did an if statement for every header and the variable that is used to store a 1 is different for every header; if it's Catalyst tester then the variable would be $FOUND_CAT = 1;. Somethings to take note of: for the header specifically CATALYST_TH 1, there will always be BACKPLANE A or it could be BACKPLANE B. If there is a BACKPLANE B I would have to write another if statement and push everything into the hash again. It's tedious because other log files may have even up to C or D which I do not know of yet, therefore making my script hard to maintain. Other headers only need one line like PRECISION_AC 1. Only CATALYST_TH 1 will always have a backplane. This is just to take note in case it affects any answers. So any help on this? Is there anyway to reduce the number of variables? Or even the number or if statements? I've tried but that way it wouldn't push other data into the hash if it's not true. Suggestions would greatly be appreciated. P.S. Ignore the comments with one '#' symbol, those are part of the log file. The ones with two '#' symbols, like '##' are the comments I have added in.
Since your parsing has lots of state in it depending on what your program has already seen I would switch from regex to Parse-RecDescent, which can easily handle all that state nicely. It's a steep learning curve at first though. There's a tutorial on it here, and an older, simpler tutorial here.
Cakephp Files and Folder class with regex to search for file
I am creating some sort of shell for removing backup files. I use to create backup of original file using keyword 'bkp'. It could be anywhre in file name either like this ["abc.bkp.ctp", "abc-bkp.ctp", abc_bkp.ctp", "bkp_abc.ctp"]. I mean any way. I wish to remove all files using shell. I am using Cakephp's Files N Folder Class "http://book.cakephp.org/2.0/en/core-utility-libraries/file-folder.html". How would I write regex to search for these files. Whats my shell logic is. public function removeBkp() { $path = BASE_PATH . DS . APP_DIR . DS; $count = 0; $this->out("", 2); $this->out("-------------------------------------", 1); $this->out("Path : " . $path, 1); $this->out("FILES DETAILS", 1); $this->out("-------------------------------------", 2); $dir = new Folder($path); // Need to seach bkp files here $defaultFiles = $dir->findRecursive("(bkp)"); $results = $defaultFiles; if ($results == null || empty($results)) { $this->out("No file to delete.", 3); } else { foreach ($results as $file) { // unlink($file); $this->out("File removed - " . $file, 1); $count++; } $this->out("Total files: " . count($results), 1); } }
You can match all filenames with this regex: ([\w._-]*bkp[\S]*?\.ctp) Assuming only ._- are used to split the files up, you may need to add more to that character class. This regex also assumes that the file always ends with ctp. Demo: https://regex101.com/r/tB9nM9/1 EDIT: If you wish to match any extension, you can generalise the extension with \w{3}. Where 3 is the length, you can add variance here if needed, but more specific is usually better. ([\w._-]*bkp[\S]*\.\w{3}) DEMO: https://regex101.com/r/tB9nM9/2
regex validating telephone number, but chops white space using perl
So I have an HTML field in a form that takes in a phone number. It validates it correctly when I use () or / or - however, if I put in say 555 123 4567, it returns 555. As always your help is greatly appreciates it. Here is my code my $userName=param("userName"); my $password=param("password"); my $phoneNumber=param("phoneNumber"); my $email=param("email"); my $onLoad=param("onLoad"); my $userNameReg = "[a-zA-Z0-9_]+"; my $passwordReg = "([a-zA-Z]*)([A-Z]+)([0-9]+)"; my $phoneNumberReg = "((\(?)([2-9]{1}[0-9]{2})(\/|-|\)|\s)?([2-9]{1}[0-9]{2})(\/|-|\s)?([0-9]{4}))"; my $emailReg = "([a-zA-Z0-9_]{2,})(#)([a-zA-Z0-9_]{2,})(.)(com|COM)"; if ($onLoad !=1) { #controlValue = ($userName, $password, $phoneNumber, $email); #regex = ($userNameReg, $passwordReg, $phoneNumberReg, $emailReg); #validated; for ($i=0; $i<4; $i++) { $retVal= validatecontrols ($controlValue[$i], $regex[$i]); if ($retVal) { $count++; } if (!$retVal) { $validated[$i]="*" } } sub validatecontrols { my $ctrlVal = shift(); my $regexVal = shift(); if ($ctrlVal =~ /^$regexVal$/) { return 1; } return 0; } } *html code is here*
I realize that this is part of an assignment, so you may be working under specific restraints. However, your attempt to abstract out your data validation is honestly just making things messy and harder to follow. It also ties you down to specifically regex tests, which may not actually be the best bet. As has already been said, email validation should be done via a module. Also, for this phone validation, an easier solution is just to strip out anything that isn't a number, and then do your validation test. The below code demonstrates what I'm talking about: my $userName = param("userName"); my $password = param("password"); my $phoneNumber = param("phoneNumber"); my $email = param("email"); my $onLoad = param("onLoad"); my $error = 0; if ($onLoad !=1) { if ($username !~ /^[a-zA-Z0-9_]+$/) { $username = '*'; $error++; } if ($password !~ /^[a-zA-Z]*[A-Z]+[0-9]+$/) { $password = '*'; $error++; } (my $phoneNumOnly = $phoneNumber) =~ s/\D//g; if ($phoneNumOnly !~ /^1?[2-9]{1}\d{2}[2-9]{1}\d{6}$/) { $phoneNumber = '*'; $error++; } if ($email !~ /^\w{2,}\#\w{2,}\.com$/i) { $email = '*'; $error++; } } *html code is here*
That regex you're using looks a overly complicated. You have a lot of capturing groups in there, but I get the feeling you're mostly using them to define "OR" statements with the vertical bar. It's usually a lot easier to just use brackets for this purpose if you're only selecting single characters. Also, it's not a good idea to use\s for normal spaces, since this will actually match any whitespace character (tabs and newlines). Maybe try something like this: (?:\(?[2-9]\d{2}\)?[-\/ ]?)?[2-9]\d{2}[-\/ ]?\d{4}
perl regex match and store specific character in scalar variable
Now suppose say i have this line in a file: my %address = ( or any such similar line in which i have defined the hash. I want to find the character "(" in the line and store "address" in say $hash_name. How do I do it? Basic idea is to capture the name of the hash defined in the files. I am trying to do is, foreach $line <MYFILE> { if($line =~ /($/ { How do I proceed further?
Not sure if I understood your problem, but, how about: my %hash; while (my $line = <MYFILE>) { if ($line =~ /\%(\w+)\s*=\s*\($/) { $hash{$1} = 1; } }
open (F1,"inputfile.txt") or die("unable to open inputfile.txt"); my $hash_name while (<F1>) { if (/%(\w+) *= *\(/) { $hash_name = $1; print $hash_name; } }
How can I read a custom defined pattern from a file in Perl?
Advance New Year Wishes to All. I have an error log file with the contents in a pattern parameter, result and stderr (stderr can be in multiple lines). $cat error_log <parameter>:test_tot_count <result>:1 <stderr>:Expected "test_tot_count=2" and the actual value is 3 test_tot_count = 3 <parameter>:test_one_count <result>:0 <stderr>:Expected "test_one_count=2" and the actual value is 0 test_one_count = 0 <parameter>:test_two_count <result>:4 <stderr>:Expected "test_two_count=2" and the actual value is 4 test_two_count = 4 ... I need to write a function in Perl to store each parameters, result and stderr in an array or hash table. This is our own internally defined structure. I wrote the Perl function like this. Is there a better way of doing this using regular expression itself? my $err_msg = ""; while (<ERR_LOG>) { if (/<parameter>:/) { s/<parameter>://; push #parameter, $_; } elsif (/<result>:/) { s/<result>://; push #result, $_; } elsif (/<stderr>:/) { if (length($err_msg) > 0) { push #stderr, $err_msg; } s/<stderr>://; $err_msg = $_; } else { $err_msg .= $_; } } if (length($err_msg) > 0) { push #stderr, $err_msg; }
If you're using Perl 5.10, you can do something very similar to what you have now but with a much nicer layout by using the given/when structure: use 5.010; while (<ERR_LOG>) { chomp; given ($_) { when ( m{^<parameter>: (.*)}x ) { push #parameter, $1 } when ( m{^<result>: (.*)}x ) { push #result, $1 } when ( m{^<stderr>: (.*)}x ) { push #stderr, $1 } default { $stderr[-1] .= "\n$_" } } } It's worth noting that for the default case here, rather than keeping a separate $err_msg variable, I'm simply pushing onto #stderr when I see a stderr tag, and appending to the last item of the #stderr array if I see a continuation line. I'm adding a newline when I see continuation lines, since I assume you want them preserved. Despite the above code looking quite elegant, I'm not really all that fond of keeping three separate arrays, since it will presumably cause you headaches if things get out of sync, and because if you want to add more fields in the future you'll end up with lots and lots of variables floating around that you'll need to keep track of. I'd suggest storing each record inside a hash, and then keeping an array of records: use 5.010; my #records; my $prev_key; while (<ERR_LOG>) { chomp; given ($_) { when ( m{^<parameter> }x ) { push(#records, {}); continue; } when ( m{^<(\w+)>: (.*)}x ) { $records[-1]{$1} = $2; $prev_key = $1; } default { $records[-1]{$prev_key} .= "\n$_"; } } } Here we're pushing a new record onto the array when we see a field, adding an entry to our hash whenever we see a key/value pair, and appending to the last field we added to if we see a continuation line. The end result of #records looks like this: ( { parameter => 'test_one_count', result => 0, stderr => qq{Expected "test_one_count=2" and the actual value is 0\ntest_one_count=0}, }, { parameter => 'test_two_count', result => 4, stderr => qq{Expected "test_two_count=2" and the actual value is 4\ntest_two_count=4}, } ) Now you can pass just a single data structure around which contains all of your records, and you can add more fields in the future (even multi-line ones) and they'll be correctly handled. If you're not using Perl 5.10, then this may be a good excuse to upgrade. If not, you can translate the given/when structures into more traditional if/elsif/else structures, but they lose much of their beauty in the conversion. Paul
The main thing that jumps out for refactoring is the repetition in the matching, stripping, and storing. Something like this (untested) code is more concise: my( $err_msg , %data ); while (<ERR_LOG>) { if(( my $key ) = $_ =~ s/^<(parameter|result|stderr)>:// ) { if( $key eq 'stderr' ) { push #{ $data{$key} } , $err_msg if $err_msg; $err_msg = $_; } else { push #{ $data{$key} } , $_ } } else { $err_msg .= $_ } } # grab the last err_msg out of the hopper push #{ $data{stderr} } , $err_msg; ... but it may be harder to understand six months from now... 8^)
Looks nice. =) An improvement is probably to anchor those tags at the beginning of the line: if (/^<parameter>:/) It'll make the script a bit more robust. You can also avoid the stripping of the tag if you catch what's after it and use only that part: if (/^<parameter>:(.*)/s) { push #parameter, $1; }