How to keep just the first 300 charactes of every line? [duplicate] - regex

This question already has answers here:
Using sed, how do you print the first 'N' characters of a line?
(6 answers)
Closed 5 years ago.
I want to keep just the first 300 characters of every line. The obvious solution:
sed -E 's/^(.{0,300}).*/\1/'
apparently exceeds some internal regex limit:
RE error: invalid repetition count(s)
Some experimentation shows rep count can only go up to 255, at least on my platform (MacOS). Python can handle {0,300}, but I'd prefer to do this with normal shell tools, if possible. Any ideas?
PS: Yeah, I know, if I was doing it in Python, I'd do line[:300] and ditch the regex completely.

Doesn't error out for me on GNU sed, see if cut works for you
$ perl -e 'print "a" x 350' | sed -E 's/^(.{0,300}).*/\1/' | wc -L
300
$ perl -e 'print "a" x 350' | cut -c1-300 | wc -L
300

Related

Extract word and digit from string using SED [duplicate]

This question already has answers here:
sed: print only matching group
(5 answers)
Closed 2 years ago.
how can I extract the TIC-9890 from a
branch name that looks like feature/TIC-9890/some-other-wording
I am not a SED expert, but I managed to come up with:
echo "feature/TIC-000/random-description" |
sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/'
This seems to work fine if the TIC-\d+ string is in there,
but returns the entire string if that is missing...
However, I need it to return null or empty string if the match isn't present.
You should add a p option to print and it should fly then. Why because we have stopped printing of sed by using -n option so when substitution happens then p needs to be used to print it.
echo "feature/TIC-000/random-description" | sed -n 's/.*\(TIC-[0-9]\{1,\}\).*/\1/p'
From man sed page:
-n, --quiet, --silent suppress automatic printing of pattern space
p Print the current pattern space.
OR as per #anubhava sir's comments one could use grep with -E option we could try:
echo "feature/TIC-000/random-description" | grep -oE 'TIC-[0-9]+'

Opposit letters with sed -r? [duplicate]

This question already has answers here:
How can I get sed to change all of the instances of each letter only once?
(3 answers)
Closed 3 years ago.
I have a text with random alphabetic characters and I want to change every character to his opposit one.
ex. i have a character z, i change it in to a, b to y etc.
I can't really find a better way to do this unless i do
sed -r -e 's/a/z/' -e 's/b/y/' ... 's/z/a/'
Is there a way to do this in a more simple way?
I just want to use the -r option in sed.
Using the y command maybe?
tr is easier,
e.g. for lowercase chars
$ z_a=$(echo {z..a} | tr -d ' '); echo adfa alfja | tr a-z $z_a
zwuz zouqz
the detour to create z-a is required since tr can't handle "reverse collating sequence order".
I know this isn't a sed solution, but this seems a simple, straight forward use of perl:
cat input_file | perl -ple 's/(.)/chr(25-ord($1)+ord("a")*2)/eg'
sed comes, one way is like this:
sed -r 'y/'"$(echo {a..z})"'/'"$(echo {z..a})"'/' file

Use Variable in SED [duplicate]

This question already has answers here:
Escape a string for a sed replace pattern
(17 answers)
Closed 4 years ago.
I cannot expand this variable in sed. I've tried everything I can think of.
I am trying to put the md5sum of file1 in line 10 of file2
I can take $x out of the regex and put some text and it works. It just will not accept the variable. printf the variable is fine.
#!/bin/bash
x=$(md5sum /etc/file1)
printf "$x \n"
sed -i 10"s/.*/$x/g" /usr/bin/file2
You may use this command that uses ~ as regex delimiter instead of / since output of md5sum contains /:
sed -i "10s~.*~$x~" /usr/bin/file2
After I reduced the variable from the md5sum output which includes the filename and directory by running $x thru:
x=$(echo $x | head -n1 | awk '{print $1;}')
Leaving only the MD5 it worked and quit erroring.

Regex extract substring after '=' sign [duplicate]

This question already has answers here:
How can I output only captured groups with sed?
(11 answers)
Closed 5 years ago.
I saw many examples, but for some reason it still does not work for me.
This is the command I'm executing:
NUMBER=$(docker logs vault | grep Token)
NUMBER=${NUMBER##*": "}
NUMBER=$(echo $NUMBER | sed 's/^token=(.*)$//g')
echo $NUMBER
I want to get the value after '=', which is a string basically.
I tried using GREP, and other regex's but I either get nothing, or just the original string.
Please advise.
To get text after a delimiter better to use cut instead of sed as in this example:
echo 'token=dsa32e3' | cut -d= -f2
dsa32e3
-d= sets delimiter as = for cut
-f1 makes cut print first field
With sed you can simply remove the token=, with
NUMBER=$(echo token=dsa32e3 | sed 's/^token=//g')
echo $NUMBER
Other non-regexp based alternatives are possible, as other users pointed out.
Another fun possibility is using the negative lookbehind, not supported by sed, so I used perl.
NUMBER=$(echo token=dsa32e3 | perl -pe 's/.*(?<=token=)([a-z0-9]*)/$1/g')
echo $NUMBER

Extract word after a known pattern in UNIX [duplicate]

This question already has answers here:
get the next word after grep matching [duplicate]
(3 answers)
Closed 7 years ago.
I have a file called in.txt which contains a whole bunch of code, however I need to extract a user ID which is guaranteed to be of the form 'EID:nmb685', potentially with content before and/or after the guaranteed format. I want to extract the 'nmb685' using a bash script. I've tried some combinations of grep and sed but nothing has worked.
if your grep doesn't support -p but supports -o, you can combine grep and awk.
grep -o 'EID:\w\+' file|awk -F':' '{print $2}'
Though can it be done by awk alone, but this is more straightforward.
If your grep supports -P, perl-regexp parameter, you may use this.
grep -oP 'EID:\K\w+' file
What is being output after the ID? Is there anything consistent that you can match against?
If you know the length of the userid you can use:
grep "EID:......" in.txt > out.txt
or if you don't maybe something like this (checks all char/num followed by space, preceeded by EID:)
grep "EID:[A-Za-z0-9]* " in.txt > out.txt
Not very elegant, but this works:
grep "EID:" in.txt | sed 's/\(.*\EID:......\).*/\1/g' | sed 's/^.*EID://'
Select all lines with the substring "EID:"
Remove everything after "EID:" plus 6 characters
Remove everything before (and including) "EID:"