I am searching for an regex which would match exactly 7 occurences of the .*: (7 fields colon separated)
unfortunatelly, what I combined:
grep -E '(.*:){7}' ...
does also print the same lines when I decrase number in {}.
how to test it for fixed exactly 7 occurences?
Your problem statement "match 7 colon separated fields" is a good fit for awk:
awk -F: 'NF == 7' file ...
Just in case you still want to use grep to solve the problem:
grep -E '^([^:]*:){7}[^:]*$' file
Details:
^ - start of line
([^:]*:){7} - exactly 7 occurrences of
[^:]* - zero or more chars other than :
: - a colon
[^:]* - zero or more chars other than :
$ - end of line.
See an online demo of the regex (do not rely on this service to check grep patterns validity!)
Related
I have a file containing lines like:
353451word2423157
anotherword
7412yetanother1
3262andherese123anotherline4359013
5342512354325324523andherese123anotherline45913
532453andherese123anotherline413
I'd like to strip most of the leading and tailing numbers (0-9), while still leaving 2 leading and trailing numbers in place, if any...
To clarify, for the list above, the expected output would be:
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41
Preferred tools would be sed or awk, but any other suggestions are welcome...
I've tried something like sed 's/[0-9]\+$//' | sed 's/^[0-9]\+//', but obviously this strips all leading and trailing numbers...
You may try this sed:
sed -E 's/^[0-9]+([0-9]{2})|([0-9]{2})[0-9]+$/\1\2/g' file
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41
Command Details:
^[0-9]+([0-9]{2}): Match 1+ digits at start if that is followed by 2 digits (captured in a group) and replace with 2 digits in group #1.
([0-9]{2})[0-9]+$: Match 1+ digits at the end if that is preceded by 2 digits (captured in a group) and replace with 2 digits in group #2.
Here is an awk that trims digits to a max of 2 on each side of a string:
awk '{ match($0, /^[0-9]*/); lh=RLENGTH
s=substr($0, lh>2 ? lh-1 : 1)
match(s, /[0-9]*$/); rh=RLENGTH
print substr(s, 1, rh>2 ? length(s)-rh+2 : length(s))
}' file
Prints:
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41
I suggest using perl:
perl -pe 's/^\d+(?=\d{2})|(\d{2})\d+$/$1/' file
See the online demo and the regex demo.
Regex details:
^ - start of string
\d+ - one or more digits
(?=\d{2}) - on the right, there must be two digits (not added to the match as the lookahead is a non-consuming pattern)
| - or
(\d{2}) - two digits captured into Group 1 ($1)
\d+ - one or more digits
$ - end of string.
Using GNU awk gensub() function with parentheses in the regexp to mark the components and then specifying them in the replacement (here "\\2\\3")
awk '{print gensub(/^([[:digit:]]*)([[:digit:]]{2})|([[:digit:]]{2})([[:digit:]]*)$/,"\\2\\3","g",$0)}' file
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41
I would use GNU AWK following way, let file.txt content be
353451word2423157
anotherword
7412yetanother1
3262andherese123anotherline4359013
5342512354325324523andherese123anotherline45913
532453andherese123anotherline413
then
awk 'BEGIN{FPAT="[0-9]+|[^0-9]+";OFS=""}$1~/[0-9]+/{$1=substr($1,length($1)-1)}$NF~/[0-9]+/{$NF=substr($NF,1,2)}{print}' file.txt
output
51word24
anotherword
12yetanother1
62andherese123anotherline43
23andherese123anotherline45
53andherese123anotherline41
Explanation: I instruct GNU AWK to split into fields which consist solely of digits or solely of non-digits using FPAT. If 1st column ($1) consist of digits, I slice it to get 2 last characters. If last column ($NF) consist solely of digits, I slice it to get 2 first characters. Finally whole line is printed using empty string as output field seperator (OFS).
(tested in gawk 4.2.1)
I am simply trying to print 5 or 6 digit number present in each line.
cat file.txt
Random_something xyz ...64763
Random2 Some String abc-778986
Something something 676347
Random string without numbers
cat file.txt | sed 's/^.*\([0-9]\{5,6\}\+\).*$/\1/'
Current Output
64763
78986
76347
Random string without numbers
Expected Output
64763
778986
676347
The regex doesn't seem to work as intended with 6 digit numbers. It skips the first number of the 6 digit number for some reason and it prints the last line which I don't need as it doesn't contain any 5 or 6 digit number whatsoever
grep is a better for this with -o option that prints only matched string:
grep -Eo '[0-9]{5,6}' file
64763
778986
676347
-E is for enabling extended regex mode.
If you really want a sed, this should work:
sed -En 's/(^|.*[^0-9])([0-9]{5,6}).*/\2/p' file
64763
778986
676347
Details:
-n: Suppress normal output
(^|.*[^0-9]): Match start or anything that is followed by a non-digit
([0-9]{5,6}): Match 5 or 6 digits in capture group #2
.* Match remaining text
\2: is replacement that puts matched digits back in replacement
/p prints substituted text
With awk, you could try following. Simple explanation would be, using match function of awk and giving regex to match 5 to 6 digits in each line, if match is found then print the matched part.
awk 'match($0,/[0-9]{5,6}/){print substr($0,RSTART,RLENGTH)}' Input_file
I have a file that has content similiar below:
ptrn: 435324kjlkj34523453
Note1: rtewqtiojdfgkasdktewitogaidfks
Note2: t4rwe3tewrkterqwotkjrekqtrtlltre
I am trying to get characters after space at the line starts with "ptrn:" . I am trying the command below ;
>>> cat daily.txt | grep '^p.*$' > dailynew.txt
and I am getting the result in the new file:
ptrn: 435324kjlkj34523453
But I want only the characters after space, which are " 435324kjlkj34523453" to be written in the new file without "ptrn:" at the beginning.
The result should be like:
435324kjlkj34523453
How can establish this goal with an efficient regex?
You can use
grep -oP '^ptrn:\s*\K.*' daily.txt > dailynew.txt
awk '/^ptrn:/{print $2}' daily.txt > dailynew.txt
sed -n 's/^ptrn:[[:space:]]*\(.*\)/\1/p' daily.txt > dailynew.txt
See the online demo. All output 435324kjlkj34523453.
In the grep PCRE regex (enabled with -P option) the patterns match
^ - the startof string
ptrn: - a ptrn: substring
\s* - zero or more whitespaces
\K - match reset operator that clears the current match value
.* - the rest of the line.
In the awk command, ^ptrn: regex is used to find the line starting with ptrn: and then {print $2} prints the value after the first whitespace, from the second "column" (since the default field separator in awk is whitespace).
In sed, the command means
-n - suppresses the default line output
s - substitution command is used
^ptrn:[[:space:]]*\(.*\) - start of string, ptrn:, zero or more whitespace, and the rest of the line captured into Group 1
\1 - replaces the match with group 1 value
p - prints the result of the substitution.
You can use this sed:
sed -nE 's/^ptrn: (.*)/\1/p' file > output_file.txt
I am looking for a way to delete all lines that do not follow a specific pattern (from a txt file).
Pattern which I need to keep the lines for:
x//x/x/x/5/x/
x could be any amount of characters, numbers or special characters.
5 is always a combination of alphanumeric - 5 characters - e.g Xf1Lh, always appears after the 5th forward slash.
/ are actual forward slashes.
Input:
abc//a/123/gds:/4AdFg/f3dsg34/
y35sdf//x/gd:df/j5je:/x/x/x
yh//x/x/x/5Fsaf/x/
45wuhrt//x/x/dsfhsdfs54uhb/
5ehys//srt/fd/ab/cde/fg/x/x
Desired output:
abc//a/123/gds:/4AdFg/f3dsg34/
yh//x/x/x/5Fsaf/x/
grep selects lines according to a regular expression and your x//x/x/x/5/x/ just needs minor changes to make it into a regular expression:
$ grep -E '.*//.*/.*/.*/[[:alnum:]]{5}/.*/' file
abc//a/123/gds:/4AdFg/f3dsg34/
yh//x/x/x/5Fsaf/x/
Explanation:
"x could be any amount of characters, numbers or special characters". In a regular expression that is .* where . means any character and * means zero or more of the preceding character (which in this case is .).
"5 is always a combination of alphanumeric - 5 characters". In POSIX regular expressions, [[:alnum:]] means any alphanumeric character. {5} means five of the preceding. [[:alnum:]] is unicode-safe.
Possible improvements
One issue is how x should be interpreted. In the above, x was allowed to be any character. As triplee points out, however, another reasonable interpretation is that x should be any character except /. In that case:
grep -E '[^/]*//[^/]*/[^/]*/[^/]*/[[:alnum:]]{5}/[^/]*/' file
Also, we might want this regex to match only complete lines. In that case, we can either surround the regex with ^ an $ or we can use grep's -x option:
grep -xE '[^/]*//[^/]*/[^/]*/[^/]*/[[:alnum:]]{5}/[^/]*/' file
I was figuring out how to do it in awk at the same time as the other answer and came up with:
awk -F/ 'BEGIN{OFS=FS}$2==""&&$6~/[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/&&NF=8'
The awk I worked it out on didn't support the {5} regexp frob.
You can use -P option for extended perl support like
grep -P "^(?:[^/]*/){5}[A-Za-z0-9]{5}/(?:/|$)" input
Output
abc//a/123/gds:/4AdFg/f3dsg34/
yh//x/x/x/5Fsaf/x/
Regex Breakdown
^ #Start of line
(?: #Non capturing group
[^/]* #Match anything except /
/ #Match / literally
){5} #Repeat this 5 times
[A-Za-z0-9]{5} #Match alphanumerics. You can use \w if you want to allow _ along with [A-Za-z0-9]
(?: #Non capturing group
/ #Next character should be /
| #OR
$ #End of line
)
Using sed and in place edit to delete all lines that do not follow a specific pattern (from a txt file):
$ sed -i.bak -n "/.*\/\/.*\/.*\/.*\/[a-zA-Z0-9]\{5\}\/.*\//p" test.in
$ cat test.in
abc//a/123/gds:/4AdFg/f3dsg34/
yh//x/x/x/5Fsaf/x/
-i.bak in place edit creating a test.in.bak backup file, -n quiet, do not print non-matches to output
and ".../p" print matches.
This code is for check if a character is a integer or not (i think). I'm trying to understand what this means, I mean... each part of that line, checking the GREP man pages, but it's really difficult for me. I found it on the internet. If anyone could explain me the part of the grep... what means each thing put there:
echo $character | grep -Eq '^(\+|-)?[0-9]+$'
Thanks people!!!
Analyse this regex:
'^(\+|-)?[0-9]+$'
^ - Line Start
(\+|-)? - Optional + or - sign at start
[0-9]+ - One or more digits
$ - Line End
Overall it matches strings like +123 or -98765 or just 9
Here -E is for extended regex support and -q is for quiet in grep command.
PS: btw you don't need grep for this check and can do this directly in pure bash:
re='^(\+|-)?[0-9]+$'
[[ "$character" =~ $re ]] && echo "its an integer"
I like this cheat sheet for regex:
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
It is very useful, you could easily analyze the
'^(+|-)?[0-9]+$'
as
^: Line must begin with...
(): grouping
\: ESC character (because + means something ... see below)
+|-: plus OR minus signs
?: 0 or 1 repetation
[0-9]: range of numbers from 0-9
+: one or more repetation
$: end of line (no more characters allowed)
so it accepts like: -312353243 or +1243 or 5678
but do not accept: 3 456 or 6.789 or 56$ (as dollar sign).