Given the contents of test.txt as follows:
Hello 10 love 20 haha 30
Hello Hello 11 love love 21 haha 31
41 Hello Hello 42 love love 43 haha 44
I want some kind of grep expression so that after saying:
$ cat test.txt | grep ???
I get this output:
20
21
42
How to implement this function?
Seems like you're trying to get the second number..
grep -oP '^\D*\d+\D*\K\d+' file
or
Use sed.
sed 's/^[^[:digit:]]*[[:digit:]]\+[^[:digit:]]*\([[:digit:]]\+\).*/\1/' file
DEMO
An alternative you might like to consider, using awk:
awk -F'[^[:digit:]]+' '{ print /^[[:digit:]]/ ? $2 : $3 }' file
This sets the field separator to one or more non-digit characters, which means that the field you're interested in is either the second or the third field, depending on whether the line starts with a digit or not.
For brevity you may prefer to use the range [0-9] instead of [[:digit:]]:
awk -F'[^0-9]+' '{ print /^[0-9]/ ? $2 : $3 }' file
Or you could use perl to capture the part of the line you're interested in:
perl -lne 'print $1 if /\d\D+(\d+)/' file
\d matches digits and \D matches non-digits, so this captures the second set of digits found on the line. In the case where a second set of digits aren't found, nothing will be printed (this differs to the behaviour of the awk script).
Related
I'm not really using regex in a daily basis and I'm still new to this.
For example, I have these strings and this is the format of the strings:
APPLE20B50A
APPLE30A60B
APPLE12B5B
APPLE360A360B
APPLE56B
Basically, I want to get the last letter (A or B) and the digit before the last letter (or a digit after the letter/before the digit which is also A or B too). There are also a format like APPLE56B that doesn't have digit+letter in the middle.
Expected Output:
50A
60B
5B
360B
56B
I tried grep -o '.\{2\}$' but it only outputs the last 2 characters:
0A
0B
5B
0B
6B
and obviously, it's not dynamic for the digits. Any help would be appreciated.
grep -o would indeed work with the correct pattern
grep -oP '[0-9]+[AB]$'
With Perl,
perl -nle'print $& if /[0-9]+[AB]$/'
perl -nle'print for /([0-9]+[AB])$/'
In all cases, you can provide the input via STDIN or by passing a file name to read as an argument.
Try this:
cat input-file | perl -ne 'print "$1\n" if (m/([0-9]+[AB])$/)'
This might work for you (GNU grep):
grep -o '\(360\|3[0-5][0-9]\|[1-2][0-9][0-9]\|[1-9][0-9]\|[1-9]\)[AB]\>' file
This will print each value on a separate line from 1A/1B to 360A/360B.
To space separate these values use:
grep -o '\(360\|3[0-5][0-9]\|[1-2][0-9][0-9]\|[1-9][0-9]\|[1-9]\)[AB]\>' file |
paste -sd' '
I have a text file containing :
A 25 27 50
B 35 75
C 75 78
D 99 88 76
I wanted to delete the line that does not have the fourth field(the fourth pair of digits).
Expected output :
A 25 27 50
D 99 88 76
I know that awk command would be the best option for such task, but i'm wondering what's the problem with my sed command since it should work as you can see below :
sed -E '/^[ABCD] ([0-9][0-9]) \1$/d' text.txt
Using POSIX ERE with back-referencing (\1) to refer to the previous pattern surrounded with parenthesis.
I have tried this command instead :
sed -E '/^[ABCD] ([0-9][0-9]) [0-9][0-9]$/d' text.txt
But it seems to delete only the first occurrence of what i want.
I would appreciate further explanation of,
why the back-referencing doesn't work as expected.
what's the matter with the first occurrence in the second attempt,should i included global option if yes then how, since i already tried adding it at the end along side with /d (for delete) but it didn't work .
Much much easier with awk:
awk 'NF == 4' file
A 25 27 50
D 99 88 76
This awk command uses default field separator of space or tab and checks a condition NF == 4 to make sure we print lines with 4 fields only.
With sed it would be (assuming no leading+trailing spaces in each line):
sed -nE '/^[^[:blank:]]+([[:blank:]]+[^[:blank:]]+){3}$/p' file
A 25 27 50
D 99 88 76
With your shown samples in sed program you could try following. Written and tested in GNU sed.
sed -nE '/^([^[:space:]]+[[:space:]]+){3}[^[:space:]]+$/p' Input_file
Explanation: Simply stopping the printing for lines by sed's -n option. Then using -E for using ERE in program. In main program using regex to match from starting non-space(1 or more occurrences) followed by spaces(1 or more occurrences) and this combo 3 times(to match 3 fields basically) which is followed by non spaces 1 or more occurrences till end of line's value, if this regex matched then print that line.
This might work for you (GNU sed):
sed -En 's/\S+/&/4p' file
Turn off implicit printing -n and on extended regexp -E.
Substitute the 4th field with itself and print the result.
Question
Let's say I have one line of text with a number placed somewhere (it could be at the beginning, in the middle or at the end of the line).
How to match and keep the first number found in a line using sed?
Minimal example
Here is my attempt (following this page of a tutorial on regular expressions) and the output for different positions of the number:
$echo "SomeText 123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "123SomeText" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
$echo "SomeText 123" | sed 's:.*\([0-9][0-9]*\).*:\1:'
3
As you can only the last digit is kept in the process whereas the desired output should be 123...
Using sed:
echo "SomeText 123SomeText 456" | sed -r 's/^[^0-9]*([0-9]+).*$/\1/'
123
You can also do this in gnu awk:
echo "SomeText 123SomeText 456" | awk '{print gensub(/^[^0-9]*([0-9]+).*$/, "\\1", $0)}'
123
To complement the sed solutions, here's an awk alternative (assuming that the goal is to extract the 1st number on each line, if any (i.e., ignore lines without any numbers)):
awk -F'[^0-9]*' '/[0-9]/ { print ($1 != "" ? $1 : $2) }'
-F'[^0-9]*' defines any sequence of non-digit chars. (including the empty string) as the field separator; awk automatically breaks each input line into fields based on that separator, with $1 representing the first field, $2 the second, and so on.
/[0-9]/ is a pattern (condition) that ensures that output is only produced for lines that contain at least one digit, via its associated action (the {...} block) - in other words: lines containing NO number at all are ignored.
{ print ($1!="" ? $1 : $2) } prints the 1st field, if nonempty, otherwise the 2nd one; rationale: if the line starts with a number, the 1st field will contain the 1st number on the line (because the line starts with a field rather than a separator; otherwise, it is the 2nd field that contains the 1st number (because the line starts with a separator).
You can also use grep, which is ideally suited to this task. sed is a Stream EDitor, which is only going to indirectly give you what you want. With grep, you only have to specify the part of the line you want.
$ cat file.txt
SomeText 123SomeText
123SomeText
SomeText 123
$ grep -o '[0-9]\+' file.txt
123
123
123
grep -o prints only the matching parts of a line, each on a separate line. The pattern is simple: one or more digits.
If your version of grep is compatible with the -P switch, you can use Perl-style regular expressions and make the command even shorter:
$ grep -Po '\d+' file.txt
123
123
123
Again, this matches one or more digits.
Using grep is a lot simpler and has the advantage that if the line doesn't match, nothing is printed:
$ echo "no number" | grep -Po '\d+' # no output
$ echo "yes 123number" | grep -Po '\d+'
123
edit
As pointed out in the comments, one possible problem is that this won't only print the first matching number on the line. If the line contains more than one number, they will all be printed. As far as I'm aware, this can't be done using grep -o.
In that case, I'd go with perl:
perl -lne 'print $1 if /.*?(\d+).*/'
This uses lazy matching (the question mark) so only non-digit characters are consumed by the .* at the start of the pattern. The $1 is a back reference, like \1 in sed. If there are more than one number on the line, this only prints the first. If there aren't any at all, it doesn't print anything:
$ echo "no number" | perl -ne 'print "$1\n" if /.*?(\d+).*/'
$ echo "yes123number456" | perl -lne 'print $1 if /.*?(\d+).*/'
123
If for some reason you still really want to use sed, you can do this:
sed -n 's/^[^0-9]*\([0-9]\{1,\}\).*$/\1/p'
unlike the other answers, this is compatible with all version of sed and will only print lines that contain a match.
Try this sed command,
$echo "SomeText 123SomeText" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123
Another example,
$ echo "SomeText 123SomeText 456" | sed -r '/[^0-9]*([0-9][0-9]*)[^0-9]*/ s//\1 /g'
123 456
It prints all the numbers in a file and the captured numbers are separated by spaces while printing.
Say for the string:
test.1234.mp4
I would like to extract the numbers
1234
without extracting the 4 in mp4
What would the regex be for this?
The numbers aren't always in the second position and can be in different positions and might not always be four digits. I would like to extract the number without extracting the 4 in mp4 essentially.
More examples:
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
Essentially only the numbers would be extracted. Hence, for the last example, 666 from e666 would not be extracte and only 123.
To extract I have been using
echo "example.123.mp4" | grep -o "REGEX"
Edit: test456 was meant to be test.456
The accepted answer will fail on "test.e666.123.mp4" (print 666).
This should work
$ cat | perl -ne '/\.(\d+)\./; print "$1\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
1234
456
111
123
Note that this will only print the first group of numbers, if we have test.123.456.mp4 only 123 will be printed.
The idea is to match a dot followed by numbers which we are interested in (parentheses to save the match), followed by another dot. This means that it will fail on 123.mp4.
To fix this you could have:
$ cat | perl -ne '/(^|\.)(\d+)\./; print "$2\n"'
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
781.test.mp4
1234
456
111
123
781
First match is either beginning of line (^) or a dot, followed by numbers and a dot. We use $2 here since $1 is either beginning of a line or a dot.
cut can make it:
$ echo "test.1234.mp4" | cut -d. -f2
1234
where
cut -d'.' -f2
delimiter 2nd field
If you provide more examples we can improve the output. With the current code you would extract any something in blablabla.something.blablabla.
Update: from your question update we can do this:
grep -o '\.[0-9]*\.' | sed 's/\.//g'
test:
$ echo "test.abc.1234.mp4
test456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4" | grep -o '\.[0-9]*\.' | sed 's/\.//g'
1234
111
123
grep -Po "(?<=\.)\d+(?=\.)"
echo "test.1234.mp4" | perl -lpe 's/[^.\d]+\d*//g;s/\D*(\d+).*/$1/'
or:
echo "1321.test.mp4" | perl -lpe 's/.*(?:^|\.)(\d+)\..*/$1/'
p is to print by default so that we don't need explicit print.
e says we have an expression, not a script file
l puts the newline
These will also work if you have a number at the first part of the name.
perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' your_file
tested:
> perl -F'\.' -lane 'print "$F[scalar(#F)-2]" if(/\d+\.mp4$/)' temp
1234
111
123
$ cat file
test.abc.1234.mp4
test.456.abc.mp4
test.aaa.bbb.c.111.mp4
test.e666.123.mp4
$ sed 's/.*\.\([0-9][0-9]*\)\..*/\1/' file
1234
456
111
123
Can I increase some numbers in txt files with grep/sed?
I want to find all numbers in file and increase them for 5. Is that possible with grep and sed or I need to write app for that?
EDIT:
File has n lines which begin with number - number and than some text.
Like title for movie.
example line:
34 - 36 : Some text.
You can use perl as:
perl -i -pe 's/(\d+)/$1+5/eg' filename
See it
Probably awk. Change the record separator to whitespace (assuming this is what you want to do), then if a record matches the regex ^[0-9]*$ convert to number add 5 and print, otherwise print.
This is a pretty complete solution but "left as exercise" to code up.
I believe you should use awk Changing the Contents of a Field
>cat 1.txt
34 - 36 : Some text.
cat 1.txt | awk '{ $1=$1+5; $3=$3+5; print $0; }'
39 - 41 : Some text.
This might work for you (GNU sed & Bash):
sed 's/[0-9]\+/$((&+5))/g;s/.*/echo "&"/e' file