I'm trying to process my text.
For example i got:
asdf asdf get.this random random get.that
get.it this.no also.this.no
My desired output is:
get.this get.that
get.it
So regexp should catch only this pattern (get.\w), but it has to do it recursively because of multiple occurences in one line, so easiest way with sed
sed 's/.*(REGEX).*/\1/'
does not work (it shows only first occurence).
Probably the good way is to use grep -o, but i have old version of grep and -o flag is not available.
This grep may give what you need:
grep -o "get[^ ]*" file
Try awk:
awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
You might need to tweak the regex between the slashes for your specific issue. Sample output:
$ awk '{for(i=1;i<=NF;i++){if($i~/get\.\w+/){print $i}}}' file.txt
get.this
get.that
get.it
With awk:
awk -v patt="^get" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
printf "%s%s", $i, OFS;
print ""
}' <<< "$text"
bash
while read -a words; do
for word in "${words[#]}"; do
if [[ $word == get* ]]; then
echo -n "$word "
fi
done
echo
done <<< "$text"
perl
perl -lane 'print join " ", grep {$_ =~ /^get/} #F' <<< "$text"
This might work for you (GNU sed):
sed -r '/\bget\.\S+/{s//\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1 /g;s/ $//}' file
or if you want one per line:
sed -r '/\n/!s/\bget\.\S+/\n&\n/g;/^get/P;D' file
Related
Given the following files:
input_file:
if_line1
if_line2
template_file_1:
temp_file_line1
temp_file_line2
##regex_match## <= must be replaced by input_file
temp_file_line3
template_file_2:
temp_file_line1
temp_file_line2
{my_file.global} <= must be replaced by input_file
temp_file_line3
output_file:
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3
For template_file_1 the following sed command works:
sed -n -e '/##regex_match##/{r input_file' -e 'b' -e '}; p' template_file_1 > output_file
However, for template_file_2 the analog sed command fails:
sed -r -n -e '/(?<={).+\.global(?=})/{r input_file' -e 'b' -e '}; p' template_file_2 > output_file
sed complains the regular expression was invalid
The given regex is at least PCRE valid, for example grep -oP '(?<={).+\.global(?=})' template_file_2 works. Any idea how to deal with that?
perl one-liners:
perl -pe 'do {local $/; open $f, "<input_file"; $_ = <$f>; close $f} if /\{.+?\.global\}/' template_file_2
or perhaps this one, not "pure" perl
perl -ne 'if (/\{.+?\.global\}/) {system("cat","input_file")} else {print}' template_file_2
Using CPAN modules can make this really tidy:
perl -MPath::Tiny -pe '$_ = path("input_file")->slurp if /\{.+?\.global\}/' template_file_2
idk exactly what that PCRE is intended to do but taking a guess at it, this will work using any awk in any shell on every UNIX box:
$ awk 'NR==FNR{new=new s $0; s=ORS; next} /##regex_match##/{$0=new} 1' input_file template_file_1
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3
$ awk 'NR==FNR{new=new s $0; s=ORS; next} /\{[^.{}]+\.global}/{$0=new} 1' input_file template_file_2
temp_file_line1
temp_file_line2
if_line1
if_line2
temp_file_line3
I'm trying to refine my code by getting rid of unnecessary white spaces, empty lines, and having parentheses balanced with a space in between them, so:
int a = 4;
if ((a==4) || (b==5))
a++ ;
should change to:
int a = 4;
if ( (a==4) || (b==5) )
a++ ;
It does work for the brackets and empty lines. However, it forgets to reduce the multiple spaces to one space:
int a = 4;
if ( (a==4) || (b==5) )
a++ ;
Here is my script:
#!/bin/bash
# Script to refine code
#
filename=read.txt
sed 's/((/( (/g' $filename > new.txt
mv new.txt $filename
sed 's/))/) )/g' $filename > new.txt
mv new.txt $filename
sed 's/ +/ /g' $filename > new.txt
mv new.txt $filename
sed '/^$/d' $filename > new.txt
mv new.txt $filename
Also, is there a way to make this script more concise, e.g. removing or reducing the number of commands?
If you are using GNU sed then you need to use sed -r which forces sed to use extended regular expressions, including the wanted behavior of +. See man sed:
-r, --regexp-extended
use extended regular expressions in the script.
The same holds if you are using OS X sed, but then you need to use sed -E:
-E Interpret regular expressions as extended (modern) regular expressions
rather than basic regular regular expressions (BRE's).
You have to preceed + with a \, otherwise sed tries to match the character + itself.
To make the script "smarter", you can accumulate all the expressions in one sed:
sed -e 's/((/( (/g' -e 's/))/) )/g' -e 's/ \+/ /g' -e '/^$/d' $filename > new.txt
Some implementations of sed even support the -i option that enables changing the file in place.
Sometimes, -r and -e won't work.
I'm using sed version 4.2.1 and they aren't working for me at all.
A quick hack is to use the * operator instead.
So let's say we want to replace all redundant space characters with a single space:
We'd like to do:
sed 's/ +/ /'
But we can use this instead:
sed 's/ */ /'
(note the double-space)
May not be the cleanest solution. But if you want to avoid -E and -r to remain compatible with both versions of sed, you can do a repeat character cc* - that's 1 c then 0 or more c's == 1 or more c's.
Or just use the BRE syntax, as suggested by #cdarke, to match a specific number or patternsc\{1,\}. The second number after the comma is excluded to mean 1 or more.
This might work for you:
sed -e '/^$/d' -e ':a' -e 's/\([()]\)\1/\1 \1/g' -e 'ta' -e 's/ */ /g' $filename >new.txt
on the bash front;
First I made a script test.sh
cat test.sh
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
SRC=`echo $line | awk '{print $1}'`
DEST=`echo $line | awk '{print $2}'`
echo "moving $SRC to $DEST"
mv $SRC $DEST || echo "move $SRC to $DEST failed" && exit 1
done < "$1"
then we make a data file and a test file aaa.txt
cat aaa.txt
<tag1>19</tag1>
<tag2>2</tag2>
<tag3>-12</tag3>
<tag4>37</tag4>
<tag5>-41</tag5>
then test and show results.
bash test.sh list.txt
Text read from file: aaa.txt bbb.txt
moving aaa.txt to bbb.txt
I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel
As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]
My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file
This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.
This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'
As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file
You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)
How to use like (.*) operation in sed to search a pattern (eg: STRING.*) and append "*" to the end of the string that matches.
Below is the example:
cat file1.txt
MAC BOOK
MODERN MACHINE
MECHANICS
MOUNT
DISK
DATA INFORMATICS
cat file2.txt
MAC
DATA
for line in $(cat file2.txt|uniq)
do
sed -i "/$line.*/s/$line.*/$line.**/" file1.txt
done
Expected output:
cat file1.txt
MAC* BOOK
MODERN MACHINE*
MECHANICS
MOUNT
DISK
DATA* INFORMATICS
A one-liner:
$ sed -r '/'"$(paste -sd'|' file2.txt)"'/s/$/*/' file1.txt
MAC*
MACHINE*
MECHANICS
MOUNT
DISK
DATA*
The paste command creates a regular expression from file2:
$ paste -sd'|' file2.txt
MAC|DATA
Then the sed command looks file lines matching this regex, and replaces the end-of-line with an asterisk.
Add -i to the sed command to complete the task.
Update for your new input:
awk -v patt="$(paste -sd'|' file2.txt)" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
$i = $i "*"
print
}' file1.txt
MAC* BOOK
MODERN MACHINE*
MECHANICS
MOUNT
DISK
DATA* INFORMATICS
and to edit save the output back into the file:
tmp=$(mktemp)
awk ... file1.txt > "$tmp" && mv "$tmp" file1.txt
Or, with the latest GNU awk:
gawk -i inplace -v patt="$(paste -sd'|' file2.txt)" '{
for (i=1; i<=NF; i++)
if ($i ~ patt)
$i = $i "*"
print
}' file1.txt
You can just replace the "end of line" with * when you match like:
for line in $(uniq file2.txt); do
sed -i "/$line/s/\$/*/" file1.txt
done
though this will only work with GNU sed, and it will match $line anywhere in the line, so hopefully that's what you expect
awk is better suited for this:
awk 'FNR==NR{a[$1];next} {for (i in a) if (index($1, i)) $0 = $0 "*"}1' file2.txt file1.txt
MAC*
MACHINE*
MECHANICS
MOUNT
DISK
DATA*
I think what you are looking for is this:
for line in $(cat file2.txt|uniq)
do
sed -i "s/\(${line}.*\)/\1\*/" file1.txt
done
You can use () in the sed search to save the result and \1 to use it in the replacement.
For example, we have a file like:
{abc}...{def}
How to append 123 at the end of every string inside the {} and meanwhile, remove the {}? The above would be changed to:
abc123...def123
Use this sed:
echo "$s"|sed 's/{\([^}]*\)}/\1123/g'
abc123...def123
Or using awk:
awk -v x=123 -F '[{}]' '{for(i=1; i<=NF; i++) if (i%2) printf $i, OFS; else printf $i x, OFS; print ""}'
abc123...def123
You could use a sed capturing group
echo '{abc}...{def}' | sed 's/{\([^}]*\)}/\1123/g'
abc123...def123
sed 's/{//g; s/}/123/g'
test:
kent$ echo "{abc}...{def}"|sed 's/{//g; s/}/123/g'
abc123...def123
Using gnu awk
echo '{abc}...{def}' | awk '{print gensub(/{([^}]*)}/,"\\1123","g")}'
abc123...def123
This might work for you (GNU sed):
sed ':a;s/{\([^{}]*\)}/\1123/g;ta' file
or:
sed -e ':a' -e 's/{\([^{}]*\)}/\1123/g' -e 'ta' file