put regular expression in variable

put regular expression in variable - regex

output=`grep -R -l "${images}" *`
new_output=`regex "slide[0-9]" $output`
Basically $output is a string like this:
slides/_rels/slide9.xml.rels
The number in $output will change. I want to grab "slide9" and put that in a variable. I was hoping new_output would do that but I get a command not found for using regex. Any other options? I'm using a bash shell script.

Well, regex is not a program like grep. ;)
But you can use
grep -Eo "(slide[0-9]+)"
as a simple approach. -o means: show only the matching part, -E means: extended regex (allows more sophisticated patterns).

Reading I want to grab "slide9" and put that in a variable. I assume you want what matches your regexp to be the only thing put in $new_output? If so, then you can change that to:
new_output=`egrep -R -l "${images}" * | sed 's/.*\(slide[0-9]+\).*/\1/'`
Note no setting of output= is required (unless you use that for something else)
If you need $output to use elsewhere then instead use:
output=`grep -R -l "${images}" *`
new_output=`echo ${ouput} | sed 's/.*\(slide[0-9]+\).*/\1/'`
sed's s/// command is similar to perls s// command and has an equivalent in most languages.
Here I'm matching zero or more characters .* before and after your slide[0-9]+ and then remembering (backrefrencing) the result \( ... \) in sed (the brackets may or may not need to be escaped depending on the version of sed). We then replace that whole match (i.e the whole line) with \1 which expands to the first captured result in this case your slide[0-9]+ match.

In these situations using awk is better :
output="`grep -R -l "main" codes`"
echo $output
tout=`echo $output | awk -F. '{for(i=1;i<=NF;i++){if(index($i,"/")>0){n=split($i,ar,"/");print ar[n];}}}'`
echo $tout
This prints the filename without the extension. If you want to grab only slide9 than use the solutions provided by others.
Sample output :
A#A-laptop ~ $ bash try.sh
codes/quicksort_iterative.cpp codes/graham_scan.cpp codes/a.out
quicksort_iterative graham_scan a

Related

Why does this regex work in grep but not sed?

I have two regular expressions:
$ grep -E '\-\- .*$' *.sql
$ sed -E '\-\- .*$' *.sql
(I am trying to grep lines in sql files that have comments and remove lines in sql files that have comments)
The grep command works using this regex; however, the sed returns the following error:
sed: -e expression #1, char 7: unterminated address regex
What am I doing incorrectly with sed?
(The space after the two hyphens is required for sql comments if you are unfamiliar with MySql comments of this type)

You're trying to use:
sed -E '\-\- .*$' *.sql
Here sed command is not correct because you're not really telling sed to do something.
It should be:
sed -n '/-- /p' *.sql
and equivalent grep would be:
grep -- '-- ' *.sql
or even better with a fixed string search:
grep -F -- '-- ' *.sql
Using -- to separate pattern and arguments in grep command.
There is no need to escape - in a regex if it is outside bracket expression (or character class) i.e. [...].
Based on comments below it seems OP's intent is to remove commented section in all *.sql files that start with 2 hyphens.
You may use this sed for that:
sed -i 's/-- .*//g' *.sql

The problem here is not the regex, the problem is that sed requires a command. The equivalent of your grep would be:
sed -n '/\-\- .*$/p'
You suppress output for non-matching lines -n ... you search (wrap your regex in slashes) and you print p (after the last slash).
P.S.: As Anub pointed out, escaping the hyphens - inside the regex is unnecessary.

You are trying to use sed's \cregexpc syntax where with \-<...> you are telling sed the delimiter character you want use is a dash -, but you didn't terminate it where it should be: \-<...>- also add d command to delete those lines.
sed '\-\-\-.*$-d' infile
see man sed about that:
\cregexpc
Match lines matching the regular expression regexp. The c may be any character.
if default / was used this was not required so:
sed '/--.*$/d' infile
or simply:
sed '/^--/d' infile
and more accurately:
sed '/^[[:blank:]]*--/d' infile

Replace the separator between pairs of numbers

I want to replace all strings like [0-9][0-9]-[0-9][0-9] with [0-9][0-9]/[0-9][0-9] using sed.
In other words, I want to replace - with /.
If I have somewhere in my text:
09-36
32-43
54-65
I want this change:
09/36
32/43
54/65

Using GNU sed:
$ echo '09-36 32-43 54-65' | sed -r 's|\<([0-9]{2})-([0-9]{2})\>|\1/\2|g'
09/36 32/43 54/65
-r turns on extended regular expressions, which:
doesn't require \-escaping ( ) { } char.
enables use of \< and /> to only match at word boundaries (if the expression should only match full lines, use ^ and $ instead, and omit the g option)
| is used as an alternative regex delimiter so that / can be used without \-escaping.
A BSD/macOS sed solution would look slightly different:
echo '09-36 32-43 54-65' | sed -E 's|[[:<:]]([0-9]{2})-([0-9]{2})[[:>:]]|\1/\2|g'

sed -e 's/\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1\/\2/g'
Might not be the most elegant version, but works for me. The gazillion backslashes make this rather unreadable in my opinion. You might improve the readability by not using / to separate the pattern and the replacement maybe?

perl -C -npe 's/(?<!\d)(\d\d)-(\d\d)(?!\d)/\1\/\2/g' file
Input
维基 1-11 22-33 444-44 55-555 66-66百科
77-77
8 88-88
Output
维基 1-11 22/33 444-44 55-555 66/66百科
77/77
8 88/88
In the command above
-C enables Unicode;
-n causes Perl to process the script for each input line;
-p causes Perl to print the result of the script to the standard output;
-e accepts a Perl expression (particularly, it is a substitution).
In this mode (-npe), Perl works just like sed. The script substitutes each pair of digits separated with - to the same pair separated with a slash.
(?<!\d) and (?!\d) are negative lookaround expressions.
To edit the file in place use -i option: perl -C -i.backup -npe ....
If the input is not a file, you can pass the input to Perl via pipe, e.g.:
echo '维基 1-11 22-33 444-44 55-555 66-66百科' | \
perl -C -npe 's/(?<!\d)(\d\d)-(\d\d)(?!\d)/\1\/\2/g'

How to use sed to grab regular expression

I'd like to grab the digits in a string like so :
"sample_2341-43-11.txt" to 2341-43-11
And so I tried the following command:
echo "sample_2341-43-11.txt" | sed -n -r 's|[0-9]{4}\-[0-9]{2}\-[0-9]{2}|\1|p'
I saw this answer, which is where I got the idea.
Use sed to grab a string, but it doesn't work on my machine:
it gives an error "illegal option -r".
it doesn't like the \1, either.
I'm using sed on MacOSX yosemite.
Is this the easiest way to extract that information from the file name?

You need to set your grouping and match the rest of the line to remove it with the group. Also the - does not need to be escaped. And the -n will inhibit the output (It just returns exit level for script conditionals).
echo "sample_2341-43-11.txt" | sed -r 's/^.*([0-9]{4}-[0-9]{2}-[0-9]{2}).*$/\1/'

Enhanced regular expressions are not supported in the Mac version of sed.
You can use grep instead:
echo "sample_2341-43-11.txt" | grep -Eo "((\d+|-)+)"
OUTPUT
2341-43-11

echo "one1sample_2341-43-11.txt" \
| sed 's/[^[:digit:]-]\{1,\}/ /g;s/ \{1,\}/ /g;s/^ //;s/ $//'
1 2341-43-11
Extract all numbers(digit) completed with - (thus allow here --12 but can be easily treated)
posix compliant
all number of the line are on same line (if several) separate by a space character (could be changed to new line if wanted)

You can try this ways also
sed 's/[^_]\+_\([^.]\+\).*/\1/' <<< sample_2341-43-11.txt
OutPut:
2341-43-11
Explanation:
[^_]\+ - Match the content untile _ ( sample_)
\([^.]\+\) - Match the content until . and capture the pattern (2341-43-11)
.* - Discard remaining character (.txt)

You can go with what the poster above said. Well, making use of this
pattern "\d+-\d+-\d+" would match what you are looking for. See demo here
https://regex101.com/r/kO2cZ1/3

Retrieve value of attribute in bash

I have a list of lines:
<some_random_text="someval" my_val_="0.4" some_random_text_1="someval_">
<some_random_text="someval" my_val_="0.8" some_random_text_1="someval_">
<some_random_text="someval" my_val_="1.2" some_random_text_1="someval_">
and so on.
From each line, I want to return the numeric value given after my_val_. How can I do this in bash?

Within this very rigid structure, what you want to do is quite easy using sed:
sed 's/.*my_val_="\([0-9.]\{1,\}\)".*/\1/' file
or using extended regular expressions:
sed -r 's/.*my_val_="([0-9.]+)".*/\1/' file
This captures the part you're interested in (the digits and dots between the quotes) and uses them to replace the contents of the line.
As mentioned in the comments (thanks), the switch to enable extended regular expressions differs between versions of sed. Out of habit, I tend to use -r but some implementations (such as BSD sed on OSX) work with -E instead. Others work with either -r or -E but neither option is defined by the standard.
This could also be done in native bash (although I wouldn't recommend it...):
re='my_val_="([0-9.]+)"'
while read -r line; do
[[ $line =~ $re ]] && echo "${BASH_REMATCH[1]}"
done < file
=~ is the regex match operator. The captured digits and dots are stored in element 1 of the special array BASH_REMATCH.
The sed and bash approaches are subtly different, as the sed version will print all lines in the file, even if they don't match the pattern. If this is a problem, you can add the -n switch and a p at the end of the command to print matching lines:
sed -nr 's/.*my_val_="([0-9.]+)".*/\1/p' file

With grep:
grep -oP 'my_val_="\K[^"]*' filename
-o so that grep only prints only the match, -P so that Perl-compatible regexes are used.
The \K in the regex removes from the match everything that was matched by the part of the regex that came before it; this has the effect of a lookbehind: only non-quote characters that come directly after my_val_=" are matched.

How to remove quotation and spaces arround numbers in a CSV file using sed?

I have some numbers in a CSV file which I'm trying to remove quotations and spaces arround it.
Input:
1," 23","45","67 ",89
Expected output: 1,23,45,67,89
I'm trying to remove with:
sed -r -e 's#\"[ ]*\([0-9]+\)[ ]*\"#\1#g' file.csv
But I'm getting the error "sed: -e expression #1, char 38: invalid reference \1 on s' command's RHS", if I remove the-r` option, I don't get the error, but it does not work either.

Tom Fenech provided the crucial pointer in a comment:
The only problem with the OP's command is a minor syntax problem:
Since sed is used with -r in order to activate extended regular expressions, ( and ) - for defining capture groups - must NOT be \-escaped.
(By contrast, when sed is used without -r, basic regular expressions must be used, where such escaping is needed.)
The correct form is therefore (\ before ( and ) removed):
sed -r 's#\"[ ]*([0-9]+)[ ]*\"#\1#g' file.csv
If you want the command to work on OSX also, use -E instead of -r.
Alternatively, for maximum portability (POSIX compliance) you could just use \{1,\} instead of + and do away with the -r switch entirely:
sed 's#\"[ ]*\([0-9]\{1,\}\)[ ]*\"#\1#g' file.csv

You could try the below perl command,
$ echo '1," 23","45","67 ",89, "foo" , "bar" ' | perl -pe 's/[" ]+(\d+)[ "]+/\1/g'
1,23,45,67,89, "foo" , "bar"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

put regular expression in variable - regex

Well, regex is not a program like grep. ;) But you can use grep -Eo "(slide[0-9]+)" as a simple approach. -o means: show only the matching part, -E means: extended regex (allows more sophisticated patterns).

Related

Why does this regex work in grep but not sed?

Replace the separator between pairs of numbers

How to use sed to grab regular expression

Retrieve value of attribute in bash

How to remove quotation and spaces arround numbers in a CSV file using sed?

Categories

Resources