How to cut a string till first numerical value appears using regex - regex

I am trying to write a script which can extract the words from a string untill the first number appears.
ex :- I have a file named as typed-list-4.1.3.Final.jar and I want the output as:- typed-list.jar
Since all the files have different names, but, they end with a version number and .jar extension so I was trying to sed the part from where the first number appears and then append .jar.
My files look like :-
log4j-slf4j-impl-2.8.2.jar, hibernate-core-5.0.12.Final.jar etc
I tried to use sed command like this but it's not working :-
sed -i 's/-[0-9]*$//g' test1.sh --- where test1.sh contains this string "typed-list-4.1.3.Final.jar"

How about:
sed 's/-\([0-9]\+\.\)\+[0-9]\+.*\.jar/.jar/' Input_file
Results for the provided inputs:
typed-list.jar
log4j-slf4j-impl.jar
hibernate-core.jar
The regex matches with a substring such as:
starting with a dash -
pattern repetition of digit(s) dot digit(s) ...
some other substring in between (such as Final)
ends with the extension .jar
Then the sed command replaces the matched substring with just the extension.
Hope this helps.

Sed:
sed -E 's/(.*)-([[:digit:]]+\.){2}[[:digit:]]+.*(\.[^.]+)$/\1\3/' dat
log4j-slf4j-impl.jar
hibernate-core.jar
typed-list.jar

echo typed-list-4.1.3.Final.jar | awk 'sub(/-4.{10}/,"",$0)'
typed-list.jar

Related

regex in sed removing only the first occurrence from every line

I have the following file I would like to clean up
cat file.txt
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
My desired output is:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
I would like to remove everything between ":" and the first occurence of "or"
I tried sed 's/MNS:d*?or /MNS:/g' though it removes the second "or" as well.
I tried every option in https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/
to no avail. should I create alias sed='perl -pe'? It seems that sed does not properly support regex
perl should be more suitable here because we need Lazy match logic here.
perl -pe 's|(:.*?or +)(.*)|:\2|' Input_file
by using .*?or we are checking for the first nearest match for or string in the line.
This might work for you (GNU sed):
sed '/:.*\<or\>/{s/\<or\>/\n/;s/:.*\n//}' file
If a line contains : followed by the word or, then substitute the first occurrence of the word or with a unique delimiter (e.g.\n) and then remove everything between : and the unique delimiter.
Wrt I would like to remove everything between ":" and the first occurence of "or" - no you wouldn't. The first occurrence of or in the 2nd line of sample input is as the start of orweqqwe. That text immediately after : looks like it could be any set of characters so couldn't it contain a standalone or, e.g. MNS:2 or eqqwe or M+ GYPA*02 or GYPA*N
Given that and the fact it's apparently a fixed number of characters to be removed on every line, it seems like this is what you should really be using:
$ sed 's/:.\{14\}/:/' file
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
If it is sure the or always occurs twice a line as provided example, please try:
sed 's/\(MNS:\).\+ or \(.\+ or .*\)/\1\2/' file.txt
Result:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
Otherwise using perl is a better solution which supports the shortest match as RavinderSingh13 answers.
ex supports lazy matching with \{-}:
ex -s '+%s/:\zs.\{-}or //g|wq' input_file
The pattern :\zs.\{-}or matches any character after the first : up to the first or.

sed regex match and replace any last digit

I have lots of file containing following ipaddress, and i want to replace last digit of ip and look like i am having struggle to come up with correct regex
file1
IPADDR=10.30.2.26
NETMASK=255.255.0.0
GATEWAY=10.30.0.1
I want to replace 10.30.2.26 to 10.30.2.27 using sed but somehow i am missing something, i have tried following.
I have many file which i want to replace and last digit could be anything.
I have tried sed 's/[^IPADDR].$/7/g' file1
how do i match anything between ^IPADDR{anything}$ ?
In your regex, [^IPADDR] is a character class that search for any character except those listed between brackets. I'm not sure that's what you want.
You can use an address instead to find lines starting with IPADDR(/^IPADDR/) and apply the substitution command on it:
sed '/^IPADDR/s/[0-9]$/7/' file
You may use the following command:
sed -r 's/(^IPADDR=[0-9.]+)([0-9]$)/\17/g' file
Prints:
IPADDR=10.30.2.27
NETMASK=255.255.0.0
GATEWAY=10.30.0.1

Select a single character in an alphanumeric string in bash

I have an issue with string manipulation in bash. I have a list of names, each name being composed of two parts, chars and numbers: for example
abcdef01234
I want to cut the last character before the numeric part starts, in this case
f
I think there is a regular expression to help me with this but just can't figure it out. AWK/sed solutions are accepted too. Hope someone can help.
Thank you.
In bash it can be done with parameter expansion with substring removal and string indexes, e.g.,
a=abcdef01234 # your string
tmp=${a%%[0-9]*} # remove all numbers from right
echo ${tmp:(-1)} # output last of remaining chars
Output: f
You can use a regexp like [a-zA-Z]+([a-zA-Z])[0-9]+. If you know how to use sed is pretty easy.
Check https://regex101.com/r/XCkKM5/1
The match will be the letter you want.
^\w+([a-zA-Z])\d+$
As a sed command (on OSX) this will be :
echo "abcdef12345" | sed -E "s#^[a-zA-Z]+([a-zA-Z])[0-9]+\$#\1#"
try following too once.
echo "abcdef01234" | awk '{match($0,/[a-zA-Z]+/);print substr($0,RLENGTH,1)}'
I have a list of names I assume is a file, file. Using grep's PCRE and (positive) lookahead:
$ grep -oP "[a-z](?=[^a-z])" file
f
It prints out the first (lowercase) letter followed by a non-(lowercase)-letter.

sed command to delete text until match is found for each line of a csv

I have a csv file and I am trying to delete all characters from the beginning of the line till it finds the first occurrence of "2015". I want to do this for each line in the csv file.
My csv file structure is as follows:
Field1 , Field2 , Field3 , Field4
sometext1 , 2015-07-15 , sometext2, sometext3
sometext1 , 2015-07-14 , sometext2, sometext3
sometext1 , 2015-07-13 , sometext2, sometext3
I cannot use the cut command or sed for the first occurrence of a comma because the text in the Field1 sometimes has commas in them too, which is making it complicated for parsing. I figured if I search for the first occurrence of the text 2015 for each line and replace all the preceding characters with nothing, then that should work.
FYI I only want to do this for the FIRST occurrence of 2015 only. There is another text field with 2015 in it within another column and I don't any text prior to that to be affected.
For example, if my original line is:
sometext1,#015,2015-07-10,sometext2,2015,sometext3
I want it to return:
2015-07-10,sometext2,2015,sometext3
Does anyone know the sed command to do this?
Any help will be appreciated!
Thanks
Here is a way to do it with sed assuming "#####" never occurs in a line:
sed -e 's/2015/#####&/'|sed -e 's/.*#####//'
For example:
> echo sometext1,#015,2015-07-10,sometext2,2015,sometext3\
|sed -e 's/2015/#####&/'|sed -e 's/.*#####//'
2015-07-10,sometext2,2015,sometext3
The first sed command prefixes "#####" to the first occurence of 2015 and the second sed command removes everything from the beginning to the end of the "#####" prefix.
The basic reason for using this two stage method is that sed's regular expression matcher has only greedy wildcards that always pick the longest match and does not support lazy matching which picks the shortest match.
If "#####" may occur in a line a more unlikely string could be substituted for it such as "7z#dNjm_wG8a3!esu#Rhv=".
To do this with sed without Perl-style non-greedy operators, you need to mark the first instance with something you know won't be in the line, as Tris describes. However, that solution requires knowledge of what won't be in the file. Fortunately, you can guarantee that a newline won't be in the line because that's what terminated the line. Thus you can do something like:
sed 's/2015/\n&/;s/.*\n//' input.txt > output.txt
NOTE: this won't modify the header row which you would have to treat specially.

How to find/extract a pattern from a file?

Here are the contents of my text file named 'temp.txt'
---start of file ---
HEROKU_POSTGRESQL_AQUA_URL (DATABASE_URL) ----backup---> b687
Capturing... done
Storing... done
---end of file ----
I want to write a bash script in which I need to capture the string 'b687' in a variable. this is really a pattern (which is the letter 'b' followed by 'n' number of digits). I can do it the hard way by looping through the file and extracting the desired string (b687 in example above). Is there an easy way to do so? Perhaps by using awk or sed?
Try using grep
v=$(grep -oE '\bb[0-9]{3}\b' file)
This will seach for a word starting with b followed by '3' digits.
regex101 demo
Using sed
v=$(sed -nr 's/.*\b(b[0-9]{3})\b.*/\1/p' file)
varname=$(awk '/HEROKU_POSTGRESQL_AQUA_URL/{print $4}' filename)
what this does is reads the file when it matches the pattern HEROKU_POSTGRESQL_AQUA_URL print the 4th token in this case b687
your other option is to use sed
varname=$(sed -n 's/.* \(b[0-9][0-9]*\)/\1/p' filename)
In this case we are looking for the pattern you mentioned b####... and only print that pattern the -n tells sed not to print line that do not have that pattern. the rest of the sed command is a substitution .* is any string at the beginning. followed by a (...) which forms a group in which we put the regex that will match your b##### the second part says out of all that match only print the group 1 and the p at the end tells sed to print the result (since by default we told sed not to print with the -n)