On macOS, using built-in bash, I need to match two words on two consecutive lines within a file, say myKey and myValue.
Example file:
<dict>
<key>myKey</key>
<string>myValue</string>
</dict>
I already have a working command for substituting a value in such a pair using perl:
perl -i -p0e 's/(<key>myKey<\/key>\s*\n\s*<string>).+(<\/string>)/$1newValue$2/' -- "$filepath"
Question is, how do I simply find whether the file contains that key/value pair, without substituting anything, or, more to the point, just get to know, whether any substitution was made?
EDIT:
Within replacement pattern: \1 -> $1.
Added clarification to the question.
For the basic question you only need to change the substitution operator to the match operator, and print conditionally on whether it matches or not. This can also be done with substitution.
However, since this is in a bash script you can also exit from the perl program (one-liner) with a code that indicates whether there was a match/substitution; then the script can check $?.
To only check whether a pattern is in a file
perl -0777 -nE'say "yes" if /pattern/' -- "$file"
The -0777, that "slurps" the whole file (into $_), is safer than -0 which uses the null byte as records separator. Also, here you don't want -i (change file in place) and want -n (loop over records) instead of -p (also prints each). I use -E instead of -e to enable (all) features, for say. See all this in perlrun.
Inside a shell script you can use the truthy/falsy return of the match operator in exit
perl -0777 -nE'exit(/pattern/)' -- "$file"
# now check $? in shell
where you can now programatically check whether the pattern was found in the file.
Finally, to run the original substitution and be able to check whether any were made
perl -i -0777 -pe'exit(s/pattern/replacement/)' -- "$file"
# now check $? in shell
where now the exit code, so $? in the shell, is the number of substitutions made.
Keep in mind that this does abuse the basic success/failure logic of return codes.
See perlretut for a regex tutorial.
In bash, I want to get the name of the last folder in a folder path.
For instance, given ../parent/child/, I want "child" as the output.
In a language other than bash, this regex works .*\/(.*)\/$ works.
Here's one of my attempts in bash:
echo "../parent/child/" | sed "s_.*/\(.*?\)/$_\1_p"
This gives me the error:
sed: -e expression #1, char 17: unterminated `s' command
What have I failed to understand?
One problem with your script is that inside the "s_.*/\(.*?\)/$_\1_p" the $_ is interpreted by the shell as a variable name.
You could either replace the double-quotes with single-quotes or escape the $.
Once that's fixed, the .*? may or may not work with your implementation of sed. It will be more robust to write something roughly equivalent that's more widely supported, for example:
sed -e 's_.*/\([^/]*\)/$_\1_'
Note that I dropped the p flag of sed to avoid printing the result twice.
Finally, a much simpler solution will be to use the basedir command.
$ basename ../parent/child/
child
Finally, a native Bash solution is also possible using parameter expansion:
path=../parent/child/
path=${path%/}
path=${path##*/}
You can use cut too
echo '../parent/child/' | cut -d/ -f3
I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'
I have the following regex.
/http:\/\/([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+/g
Which identifies matching URL's (https://regex101.com/r/sG9zR7/1). I need to modify it in order to be able to use it on the command line so it prints out the results. so I modified it to following
sed -n 's/.*\(http:\/\/\([a-zA-Z0-9\-]+\.\)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+\).*/\1/p' filename
(I was trying to add bold to the characters added but could not)
there were as follows
sed -n 's/.*( (in the beginning )
\ (For the inner parenthesis)
).*/\1/p' filename (at the end)
However, i get no results when i execute it.
Make it a habit to use a delimiter other that / when dealing with
URLs. It makes the pattern easier to read.
sed -r -n 's~.*\(http://\([a-z0-9\-]+\.\)+[a-z0-9\-]+:[a-z0-9\-]+/[a-z]+\.[a-z]+\).*~\1~ip' file
Note that I use i modifier for ignorecase.
As hwnd comments, you should put -r flag to sed command as well since your pattern requires + to be treated in a special manner.
sed -rn 's~.*(http://([a-z0-9\-]+.)*[a-z0-9\-]+:[0-9]+\/[a-z0-9]+.[a-z]+).*~\1~ip' Filename is the working command. With the assistance of the sample supplied (thank you hjpotler92) I was able to figure out the escape character did not need to be applies to certain characters. Will have to find out when and how it is applied when using the -r option.
You can achieve the same with an xpath query via xidel:
xidel file.html -e '//a/#href[fn:matches(.,"http://[^/]*:")]/fn:substring-after(.,"=")'
In Bash, how does one match a regular expression with multiple criteria against a file name?
For example, I'd like to match against all the files with .txt or .log endings.
I know how to match one type of criteria:
for file in *.log
do
echo "${file}"
done
What's the syntax for a logical or to match two or more types of criteria?
Bash does not support regular expressions per se when globbing (filename matching). Its globbing syntax, however, can be quite versatile. For example:
for i in A*B.{log,txt,r[a-z][0-9],c*} Z[0-5].c; do
...
done
will apply the loop contents on all files that start with A and end in a B, then a dot and any of the following extensions:
log
txt
r followed by a lowercase letter followed by a single digit
c followed by pretty much anything
It will also apply the loop commands to an file starting with Z, followed by a digit in the 0-5 range and then by the .c extension.
If you really want/need to, you can enable extended globbing with the shopt builtin:
shopt -s extglob
which then allows significantly more features while matching filenames, such as sub-patterns etc.
See the Bash manual for more information on supported expressions:
http://www.gnu.org/software/bash/manual/bash.html#Pattern-Matching
EDIT:
If an expression does not match a filename, bash by default will substitute the expression itself (e.g. it will echo *.txt) rather than an empty string. You can change this behaviour by setting the nullglob shell option:
shopt -s nullglob
This will replace a *.txt that has no matching files with an empty string.
EDIT 2:
I suggest that you also check out the shopt builtin and its options, since quite a few of them affect filename pattern matching, as well as other aspects of the the shell:
http://www.gnu.org/software/bash/manual/bash.html#The-Shopt-Builtin
Do it the same way you'd invoke ls. You can specify multiple wildcards one after the other:
for file in *.log *.txt
for file in *.{log,txt} ..
for f in $(find . -regex ".*\.log")
do
echo $f
end
You simply add the other conditions to the end:
for VARIABLE in 1 2 3 4 5 .. N
do
command1
command2
commandN
done
So in your case:
for file in *.log *.txt
do
echo "${file}"
done
You can also do this:
shopt -s extglob
for file in *.+(log|txt)
which could be easily extended to more alternatives:
for file in *.+(log|txt|mp3|gif|foo)