sed delete trailing pattern of digits - regex

I have a .txt file where the last column includes a number pattern after the text like 'Baker 2-13' or 'Charlie 03-144.' I would like to remove all the digits at the end of the line, and just be left with Baker and Charlie. I have tried piping the sed command at the end of my awk statement, with no success.
sed -E 's/[0-9]{1,2}"-"[0-9]{1,3}$//'
I've tried adding the space and carriage returns to my sed command, but still no luck.
sed -E 's/[0-9]{1,2}"-"[0-9]{1,3}\s\r$//'
I've also tried this, but it only works when I echo a text sample, it doesn't work on each line of my .txt file
echo "CHARLIE 02-157" | sed -E 's/[0-9]*([0-9])+\-[0-9]*([0-9])+$//'
Any ideas?

This should work:
sed -i.bak -E 's/[0-9]{1,2}-[0-9]{1,3}$//' file
cat file
Baker
Charlie
You don't need to quote hyphen in the pattern.

Simple sed solution
sed 's/[- 0-9]*$//'
This will delete trailing dashes, blanks and numbers!

Related

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you
Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.
Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"
Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff
Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt
This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$
Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -d’:’

Using to delete

I need to Write a ‘sed’ command that would delete the first field from every line of a file (that is, everything up to and including the first spaces in the line.)
I think it should look something like this but I'm not quite sure:
sed '^[^:]*/d file
In sed /d means delete. Your code will delete lines that match the regex.
sed 's/^[^ ]* //g' file
This might work for you (GNU sed):
sed -r 's/^\S+\s+//' file
This removes the first non-space(s) followed by space(s).
d command in sed deletes the whole line.
You need to use s command like this:
sed -i.bak 's/^[^ ]* //' file
Assuming by spaces you mean, you want to remove the entirety of the first block of whitespace and everything proceeding it. In which case do something like
sed 's/^\w*[ \t]*//' file.txt
e.g.
$ printf "string1 \t string2\n\tstring3 string4\n"
string1 string2
string3 string4
$ printf "string1 \t string2\n\tstring3 string4\n" | sed 's/^\w*[ \t]*//'
string2
string3 string4

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file
delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'
You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename
You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.
This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Replace Strings Using Sed And Regex

I'm trying to uncomment file content using sed but with regex (for example: [0-9]{1,5})
# one two 12
# three four 34
# five six 56
The following is working:
sed -e 's/# one two 12/one two 12/g' /file
However, what I would like is to use regex pattern to replace all matches without entering numbers but keep the numbers in the result.
For complying sample question, simply
sed 's/^# //' file
will suffice, but if there is a need to remove the comment only on some lines containing a particular regex, then you could use conditionnal address:
sed '/regex/s/^# //' file
So every lines containing regex will be uncomented (if line begin with a #)
... where regex could be [0-9] as:
sed '/[0-9]/s/^# //' file
will remove # at begin of every lines containing a number, or
sed '/[0-9]/s/^# \?//' file
to make first space not needed: #one two 12, or even
sed '/[0-9]$/s/^# //' file
will remove # at begin of lines containing a number as last character. Then
sed '/12$/s/^# //' file
will remove # at begin of lines ended by 12. Or
sed '/\b\(two\|three\)\b/s/^# //' file
will remove # at begin of lines containing word two or three.
sed -e 's/^#\s*\(.*[0-9].*\)$/\1/g' filename
should do it.
If you only want those lines uncommented which contain numbers, you can use this:
sed -e 's/^#\s*\(.*[0-9]+.*\)/\1/g' file
Is the -i option for replacement in the respective file not necessary? I get to remove leading # by using the following:
sed -i "s/^# \(.*\)/\1/g" file
In order to uncomment only those commented lines that end on a sequence of at least one digit, I'd use it like this:
sed -i "s/^# \(.*[[:digit:]]\+$\)/\1/g" file
This solution requires commented lines to begin with one space character (right behind the #), but that should be easy to adjust if not applicable.
The following sed command will uncomment lines containing numbers:
sed 's/^#\s*\(.*[0-9]\+.*$\)/\1/g' file
I find it. thanks to all of you
echo "# one two 12" | grep "[0-9]" | sed 's/# //g'
or
cat file | grep "[0-9]" | sed 's/# //g'

sed misbehaving?

I have the following command:
$ xlscat -i $file
and I get:
Excel File Name.xslx - 01: [ Sheet #1 ] 34 Cols, 433 Rows
Excel File Name.xlsx - 02: [ Sheet Number2 ] 23 Cols, 32 Rows
Excel File Name.xlsx - 03: [ Foo Factor! ] 14 Cols, 123 Rows
I want just the sheet name, so i do this:
$ xlscat -i $file 2>&1 | sed -e 's/.*\[ *\(.*\) *\].*/\1/' | while read file
> do
> echo "File: '$file'"
> done
And get this:
File: 'Sheet #1'
File: 'Sheet Number2'
File: 'Foo Factor!'
Great! Everything works beautifully. As you can see with the single quotes, I've removed the extra spaces at the end of the file name. Now convert all remaining spaces to underscores:
$ xlscat -i $file 2>&1 | sed -e 's/.*\[ *\(.*\) *\].*/\1/' | sed -e 's/ /_/g' | while read file
> do
> echo "File: '$file'"
> done
Now I get this:
File: 'Sheet_#1_____'
File: 'Sheet_Number2'
File: 'Foo_Factor!__'
Huh? The first one didn't show any trailing blanks, but the second one seems to be appending underscores on the end of the file. What am I not seeing?
The first sed command is not stripping the trailing whitespace, read is. Check your expression:
sed -e 's/.*\[ *\(.*\) *\].*/\1/'
It matches:
anything
a bracket
1 or more spaces
anything, captured
1 or more spaces
a right bracket
anything
The regular expressions are greedy, meaning that they match as much as possible, and the earlier expressions will match before later ones do. So for example, the regular expression (.*)(.*) matches anything in two capturing groups, but there are any number of ways the data could be split between the two groups. So the regex implementation has to choose, and it will put as much as possible in the first, and nothing in the second.
Since you need to match filenames with spaces in them, you can't match "anything except a space"; your best bet is to trim the trailing whitespace as a separate step. Try this sed command instead:
sed -e 's/.*\[ *\(.*\) *\].*/\1/' -e 's/ *$//'
I think the read file is trimming the trailing whitespace for you. Try putting the
sed -e 's/ /_/g'
inside the while loop ... like:
echo "File: $(echo $file | sed -e 's/ /_/g')"
Could it be echo that's stripping the trailing spaces? Although it does seem like they should show up inside the quotes. Anyway, try this:
sed -e 's/.*\[ *\([^] ]\+\( \+[^] ]\+\)*\).*/\1/'
Each word of the sheet name is matched by [^] ]\+ (i.e., one or more of any characters other than space or ]). When the final word of the name has been matched, the second .* consumes the rest of the line. There's no need to match the closing ], so the trailing spaces don't have to be included in the match.
I'm not a sed user, but this regex works correctly in RegexBuddy when I specify the GNU-BRE flavor, so it should work in sed.