how to change pattern in file's line - regex

I have file with one line:
22:50133-MM:MM1,52-MM:MM2;23:254940-MM:MM1,63-MM:MM2;24:15574-MM:MM1,65-MM:MM2;
I need find this part of line 24:15574-MM and then replace the number 15574 to another one. The number can be any length.
I want to use bash for it, but I have no idea how to do it.
How can I do it? Please help.

Since you asked for I want to use bash for it, here is an attempt using only native operators in it; using the regEx feature with its ~ operator (supported from bash 3.0 onwards) .
Assuming your file has only one single line in it, you can do the following steps,
The below commands can be run directly on the command-line (or)
wrap-it up in a shell script with the bash she-bang(#!/bin/bash).
Capturing the file contents for regEx match using the <file, which stores the entire file contents in the variable.
fileContent=$(<file)
[[ $fileContent =~ .*24:([[:digit:]]+)-MM.* ]] && replacement="${BASH_REMATCH[1]}"
replaceValue=5555
printf "%s\n" "${fileContent/$replacement/$replaceValue}"
For your input file, the commands produce a result
22:50133-MM:MM1,52-MM:MM2;23:254940-MM:MM1,63-MM:MM2;24:5555-MM:MM1,65-MM:MM2;

It can be easily achieved using sed command with -i option:
new_number=11111
sed -i "s/24:\(15574\)-MM/24:$new_number-MM/" /tmp/test.txt
/tmp/test.txt - replace with your current filepath
new_number - is a variable for replacement number
To replace using regexp pattern use the following command with -E option enabled(extended regular expressions mode):
sed -i -E "s/24:(15574)-MM/24:$new_number-MM/" /tmp/test.txt

Related

Match multiline pattern in bash using Perl on macOS

On macOS, using built-in bash, I need to match two words on two consecutive lines within a file, say myKey and myValue.
Example file:
<dict>
<key>myKey</key>
<string>myValue</string>
</dict>
I already have a working command for substituting a value in such a pair using perl:
perl -i -p0e 's/(<key>myKey<\/key>\s*\n\s*<string>).+(<\/string>)/$1newValue$2/' -- "$filepath"
Question is, how do I simply find whether the file contains that key/value pair, without substituting anything, or, more to the point, just get to know, whether any substitution was made?
EDIT:
Within replacement pattern: \1 -> $1.
Added clarification to the question.
For the basic question you only need to change the substitution operator to the match operator, and print conditionally on whether it matches or not. This can also be done with substitution.
However, since this is in a bash script you can also exit from the perl program (one-liner) with a code that indicates whether there was a match/substitution; then the script can check $?.
To only check whether a pattern is in a file
perl -0777 -nE'say "yes" if /pattern/' -- "$file"
The -0777, that "slurps" the whole file (into $_), is safer than -0 which uses the null byte as records separator. Also, here you don't want -i (change file in place) and want -n (loop over records) instead of -p (also prints each). I use -E instead of -e to enable (all) features, for say. See all this in perlrun.
Inside a shell script you can use the truthy/falsy return of the match operator in exit
perl -0777 -nE'exit(/pattern/)' -- "$file"
# now check $? in shell
where you can now programatically check whether the pattern was found in the file.
Finally, to run the original substitution and be able to check whether any were made
perl -i -0777 -pe'exit(s/pattern/replacement/)' -- "$file"
# now check $? in shell
where now the exit code, so $? in the shell, is the number of substitutions made.
Keep in mind that this does abuse the basic success/failure logic of return codes.
See perlretut for a regex tutorial.

Extracting a match from a string with sed and a regular expression in bash

In bash, I want to get the name of the last folder in a folder path.
For instance, given ../parent/child/, I want "child" as the output.
In a language other than bash, this regex works .*\/(.*)\/$ works.
Here's one of my attempts in bash:
echo "../parent/child/" | sed "s_.*/\(.*?\)/$_\1_p"
This gives me the error:
sed: -e expression #1, char 17: unterminated `s' command
What have I failed to understand?
One problem with your script is that inside the "s_.*/\(.*?\)/$_\1_p" the $_ is interpreted by the shell as a variable name.
You could either replace the double-quotes with single-quotes or escape the $.
Once that's fixed, the .*? may or may not work with your implementation of sed. It will be more robust to write something roughly equivalent that's more widely supported, for example:
sed -e 's_.*/\([^/]*\)/$_\1_'
Note that I dropped the p flag of sed to avoid printing the result twice.
Finally, a much simpler solution will be to use the basedir command.
$ basename ../parent/child/
child
Finally, a native Bash solution is also possible using parameter expansion:
path=../parent/child/
path=${path%/}
path=${path##*/}
You can use cut too
echo '../parent/child/' | cut -d/ -f3

Linux script to parse each line, check the regex and modify the line

I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'

Converting Regex to Sed

I have the following regex.
/http:\/\/([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+/g
Which identifies matching URL's (https://regex101.com/r/sG9zR7/1). I need to modify it in order to be able to use it on the command line so it prints out the results. so I modified it to following
sed -n 's/.*\(http:\/\/\([a-zA-Z0-9\-]+\.\)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+\).*/\1/p' filename
(I was trying to add bold to the characters added but could not)
there were as follows
sed -n 's/.*( (in the beginning )
\ (For the inner parenthesis)
).*/\1/p' filename (at the end)
However, i get no results when i execute it.
Make it a habit to use a delimiter other that / when dealing with
URLs. It makes the pattern easier to read.
sed -r -n 's~.*\(http://\([a-z0-9\-]+\.\)+[a-z0-9\-]+:[a-z0-9\-]+/[a-z]+\.[a-z]+\).*~\1~ip' file
Note that I use i modifier for ignorecase.
As hwnd comments, you should put -r flag to sed command as well since your pattern requires + to be treated in a special manner.
sed -rn 's~.*(http://([a-z0-9\-]+.)*[a-z0-9\-]+:[0-9]+\/[a-z0-9]+.[a-z]+).*~\1~ip' Filename is the working command. With the assistance of the sample supplied (thank you hjpotler92) I was able to figure out the escape character did not need to be applies to certain characters. Will have to find out when and how it is applied when using the -r option.
You can achieve the same with an xpath query via xidel:
xidel file.html -e '//a/#href[fn:matches(.,"http://[^/]*:")]/fn:substring-after(.,"=")'

Regular Expressions for file name matching

In Bash, how does one match a regular expression with multiple criteria against a file name?
For example, I'd like to match against all the files with .txt or .log endings.
I know how to match one type of criteria:
for file in *.log
do
echo "${file}"
done
What's the syntax for a logical or to match two or more types of criteria?
Bash does not support regular expressions per se when globbing (filename matching). Its globbing syntax, however, can be quite versatile. For example:
for i in A*B.{log,txt,r[a-z][0-9],c*} Z[0-5].c; do
...
done
will apply the loop contents on all files that start with A and end in a B, then a dot and any of the following extensions:
log
txt
r followed by a lowercase letter followed by a single digit
c followed by pretty much anything
It will also apply the loop commands to an file starting with Z, followed by a digit in the 0-5 range and then by the .c extension.
If you really want/need to, you can enable extended globbing with the shopt builtin:
shopt -s extglob
which then allows significantly more features while matching filenames, such as sub-patterns etc.
See the Bash manual for more information on supported expressions:
http://www.gnu.org/software/bash/manual/bash.html#Pattern-Matching
EDIT:
If an expression does not match a filename, bash by default will substitute the expression itself (e.g. it will echo *.txt) rather than an empty string. You can change this behaviour by setting the nullglob shell option:
shopt -s nullglob
This will replace a *.txt that has no matching files with an empty string.
EDIT 2:
I suggest that you also check out the shopt builtin and its options, since quite a few of them affect filename pattern matching, as well as other aspects of the the shell:
http://www.gnu.org/software/bash/manual/bash.html#The-Shopt-Builtin
Do it the same way you'd invoke ls. You can specify multiple wildcards one after the other:
for file in *.log *.txt
for file in *.{log,txt} ..
for f in $(find . -regex ".*\.log")
do
echo $f
end
You simply add the other conditions to the end:
for VARIABLE in 1 2 3 4 5 .. N
do
command1
command2
commandN
done
So in your case:
for file in *.log *.txt
do
echo "${file}"
done
You can also do this:
shopt -s extglob
for file in *.+(log|txt)
which could be easily extended to more alternatives:
for file in *.+(log|txt|mp3|gif|foo)