how to exclude output with sed - regex

I'm trying to remove amazon and downloadAll.sh from output.
Any thoughts on what I'm doing wrong?
❯ ls | sed 's/[^0-9]{1,4}//'
downloadAll.sh
1041
973
295
127
273
221
1010
1152
227
937
994
210
572
1091
323
1328
472
1710
1192
1629
957
1167
1120
1628
1597
amazon

You can use find with a regex:
find . -regextype posix-egrep -regex '.*/[0-9]{1,4}'
Details:
. - search in the current directory
-regextype posix-egrep - the regex engine is set to egrep, POSIX ERE (we can use {min,max} quantifier then with no extra escaping)
-regex '.*/[0-9]{1,4}' - the filenames fully matching the regex will get returned. The .*/[0-9]{1,4} pattern matches anything followed with / + one to four digits till the end of string.

To list files/directories ending with a digit use:
printf '%s\n' *[0-9]
To list files/directories NOT ending with a digit use:
printf '%s\n' *[!0-9]

# digits.sh
find \
./ ` # search in current directory` \
-maxdepth 1 ` # don't search recursively` \
| sed 's/.*[^0-9]$//g' ` # remove any file that doesn't end in a digit`
❯ bash digits.sh
./1328
./1091
./957
./1010
./210
./937
./295
./1597
./1629
./973
./1041
./323
./1192
./1167
./1710
./221
./42
./572
./127
./1628
./472
./1120
./227
./1152
./994
./273
./46

You are using an Extended Regular Expression (ERE) while sed is interpreting the script as a Basic Regular Expression (BSE), here is the GNU sed manual's take on it. Either convert to BSE or add the -E switch to the sed command for ERE interpretation.
However, for this simple task, grep is sufficient:
grep -x '[0-9]\+'

This might work for you (GNU sed):
sed -E '/^[0-9]{1,4}$/!s/.*//' file
If a line does not contain 1 to 4 digits, replace it by nothing.
N.B. This will return a blank line for any line that does not meet the required criteria.
Perhaps what you really want is to delete such lines?
sed -E '/^[0-9]{1,4}$/!d' file
Of course to list 1 to 4 digit files can be achieved like so:
ls [0-9] [0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]

Related

Rgex doen't work with sed command as expected

I have a text file containing :
A 25 27 50
B 35 75
C 75 78
D 99 88 76
I wanted to delete the line that does not have the fourth field(the fourth pair of digits).
Expected output :
A 25 27 50
D 99 88 76
I know that awk command would be the best option for such task, but i'm wondering what's the problem with my sed command since it should work as you can see below :
sed -E '/^[ABCD] ([0-9][0-9]) \1$/d' text.txt
Using POSIX ERE with back-referencing (\1) to refer to the previous pattern surrounded with parenthesis.
I have tried this command instead :
sed -E '/^[ABCD] ([0-9][0-9]) [0-9][0-9]$/d' text.txt
But it seems to delete only the first occurrence of what i want.
I would appreciate further explanation of,
why the back-referencing doesn't work as expected.
what's the matter with the first occurrence in the second attempt,should i included global option if yes then how, since i already tried adding it at the end along side with /d (for delete) but it didn't work .
Much much easier with awk:
awk 'NF == 4' file
A 25 27 50
D 99 88 76
This awk command uses default field separator of space or tab and checks a condition NF == 4 to make sure we print lines with 4 fields only.
With sed it would be (assuming no leading+trailing spaces in each line):
sed -nE '/^[^[:blank:]]+([[:blank:]]+[^[:blank:]]+){3}$/p' file
A 25 27 50
D 99 88 76
With your shown samples in sed program you could try following. Written and tested in GNU sed.
sed -nE '/^([^[:space:]]+[[:space:]]+){3}[^[:space:]]+$/p' Input_file
Explanation: Simply stopping the printing for lines by sed's -n option. Then using -E for using ERE in program. In main program using regex to match from starting non-space(1 or more occurrences) followed by spaces(1 or more occurrences) and this combo 3 times(to match 3 fields basically) which is followed by non spaces 1 or more occurrences till end of line's value, if this regex matched then print that line.
This might work for you (GNU sed):
sed -En 's/\S+/&/4p' file
Turn off implicit printing -n and on extended regexp -E.
Substitute the 4th field with itself and print the result.

Sed not pattern matching

I have a directory with these files in it:
abc12345abc
abc1234567abc
abc123456789abc
I want to grab the file that has 7 numerals in it. I need to do this using sed via the pipe. I thought this would work:
ls -l | sed -n '/[0-9]\{7\}/p'
It returns:
abc1234567abc
abc123456789abc
Regular expressions aren't anchored. abc123456789abc has a string of exactly 7 digits, 3 of them in fact: 1234567, 2345678, and 3456789. If you want the file names that don't have any longer matches, you need to check for non-digits before and after.
sed -n '/[^0-9][0-9]{7}[^0-9]/p'
You want to match seven digits that are not enclosed with another digit.
You may use
sed -En '/(^|[^0-9])[0-9]{7}($|[^0-9])/p'
See the online demo.
Details
-E - enables POSIX ERE syntax (now, there is no need to escape {x} interval quantifiers)
(^|[^0-9]) - start of string or a non-digit char
[0-9]{7} - seven digits
($|[^0-9]) - end of string or a non-digit char.
First rule of scripting, don't parse ls
If you are trying to match files in a directory, use find, that's what it's meant for
find dir/ -regextype posix-extended -type f \
-regex ".*[^[:digit:]][[:digit:]]{7}[^[:digit:]].*"
This might work for you (GNU sed):
sed '/[0-9]\{7\}/!d;/[0-9]\{8\}/d' file
If there are not 7 consecutive digits or there are more, delete the line.

Regex to return last 3 characters of matching pattern

I am using grep to search through text files containing 88 character long MRZs (machine readable zones). Within the text file they are preceeded by a semicolon.
I only want to get the substring of characters 3-5 from the string.
This is my pattern:
egrep --include *.txt -or . -e ";[A-Z][A-Z0-9<][A-Z<]{3}"
This is a textfile:
text is here;P<RUSIVAN<<DEL<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<F64D123456RUS7404124F131009734P41234<<<<<<<8 ;2019-02-08
This is my output:
;P<RUS
This is my desired output:
RUS
The semicolon introduces the MRZ. It starts with a uppercase letter, followed by either an uppercase letter, a digit or a filler character <. Then follows the 3 digit country code that can contain uppercase letters or filler characters <.
This pattern works fine, but what I only want returned is the last 3 digits I am quantifying. Is there a way to get only the last 3 characters of a matching pattern?
In the sample text file the desired output would be RUS.
Thank you!
If you could use GNU Grep, you can make use of \K which will no longer include any of the previous matched characters in the match and then match your character class 3 times:
grep -roP --include=*.txt ";[A-Z][A-Z0-9<]\K[A-Z<]{3}"
Is this all you're trying to do?
$ awk -F';' '{print substr($2,3,3)}' file
RUS
$ sed -E 's/[^;]*;..(.{3}).*/\1/' file
RUS
If not then edit your question to provide more truly representative sample input/output.
The UNIX command to find files is named find, btw, not grep. I know the GNU guys added a bunch of options for finding files to grep but just don't use them as they make your grep command unnecessarily complicated (and inconsistent with the other UNIX text processing tools) as it then needs arguments to find files as well as to g/re/p within the files. So your command line if you're using grep should be:
find . -name '*.txt' -exec grep 'stuff' {} +
not:
egrep --include *.txt -or . -e 'stuff'
and do the same for any other tool:
find . -name '*.txt' -exec grep 'stuff' {} +
find . -name '*.txt' -exec sed 'stuff' {} +
find . -name '*.txt' -exec awk 'stuff' {} +

Is it possible to substitute a number using sed matching multiple regexp?

I'm trying to replace a number in a file using sed. This number can be found using \b<NUMBER>\b. However, there are comments in the file I'm parsing that sometimes have the same number and I would like to leave them unchanged.
All the lines that need to be replaced are similar to:
some_text <1 4 35 314 359>
And the complete file could be something like:
# This is not to be replaced: 314
some_text <1 4 35 314 359>
So, if I wanted to replace 314, how could I do it with sed?
I can find it with the following grep:
grep -P "^[^#].*some_text <[ 0-9]*>" "<FILE>" | grep -e "\b314\b")
But I can't seem to figure out a way to do it with sed. The old line I had would replace all the entries for that number:
sed -i "s/\b *314\b//" <FILE>
Any clarifications or help would be most welcome!
Thank you for your help!
/G
You can use sed like this:
sed '/some_text/s/\b314\b/789/' file
# This is not to be replaced: 314
some_text <1 4 35 789 359>
You could use awk instead, skipping any lines that are comments:
awk '!/^#/{sub(/\y314\y/,789)}1' file
As you've used word boundaries in your example, I'm assuming that you have GNU awk installed and I've used \y, which is a word boundary.

How to replace space with comma using sed?

I would like to replace the empty space between each and every field with comma delimiter.Could someone let me know how can I do this.I tried the below command but it doesn't work.thanks.
My command:
:%s//,/
53 51097 310780 1
56 260 1925 1
68 51282 278770 1
77 46903 281485 1
82 475 2600 1
84 433 3395 1
96 212 1545 1
163 373819 1006375 1
204 36917 117195 1
If you are talking about sed, this works:
sed -e "s/ /,/g" < a.txt
In vim, use same regex to replace:
s/ /,/g
Inside vim, you want to type when in normal (command) mode:
:%s/ /,/g
On the terminal prompt, you can use sed to perform this on a file:
sed -i 's/\ /,/g' input_file
Note: the -i option to sed means "in-place edit", as in that it will modify the input file.
I know it's not exactly what you're asking, but, for replacing a comma with a newline, this works great:
tr , '\n' < file
Try the following command and it should work out for you.
sed "s/\s/,/g" orignalFive.csv > editedFinal.csv
IF your data includes an arbitrary sequence of blank characters (tab, space), and you want to replace each sequence with one comma, use the following:
sed 's/[\t ]+/,/g' input_file
or
sed -r 's/[[:blank:]]+/,/g' input_file
If you want to replace sequence of space characters, which includes other characters such as carriage return and backspace, etc, then use the following:
sed -r 's/[[:space:]]+/,/g' input_file
If you want the output on terminal then,
$sed 's/ /,/g' filename.txt
But if you want to edit the file itself i.e. if you want to replace space with the comma in the file then,
$sed -i 's/ /,/g' filename.txt
I just confirmed that:
cat file.txt | sed "s/\s/,/g"
successfully replaces spaces with commas in Cygwin terminals (mintty 2.9.0). None of the other samples worked for me.
On Linux use below to test (it would replace the whitespaces with comma)
sed 's/\s/,/g' /tmp/test.txt | head
later you can take the output into the file using below command:
sed 's/\s/,/g' /tmp/test.txt > /tmp/test_final.txt
PS: test is the file which you want to use