How can I use regex to exclude lines with extra characters?

How can I use regex to exclude lines with extra characters? - regex

I have a bunch of email addresses:
abc#google.com
bdc#yahoo.com
\\ske#google.com
I'd like to delete the bolded line because there is extra character in the string other than # . and letters. How do I do this ?

Through awk,
$ awk '/^\w+#\w+/{print}' file
abc#google.com
bdc#yahoo.com
Awk searches for the lines which starts with one or more word character followed by an # symbol and again followed by one or more word characters. If it founds any, then prints the whole line.
This line \\ske#google.com wouldn't starts with a word character, so it not get printed.

You can use this sed:
sed -i.bak -n '/^[[:alnum:]]*#/p' file

You can use vim to take care of it too:
vim -c 'v/^[[:alnum:]]*#/d' -c 'wq' file

You could also use a perl module:
perl -ne 'use Email::Valid; print if Email::Valid->address($_)'

Related

Adding blank line spaces before and after pattern 'string' match

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?

sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.

Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt

This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.

And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

how to trim trailing spaces after all delimiter in a text file

Need help to remove trailing spaces after all delimiter in a text file
I have Text file with below data.
eg.
ADDRESS_ID| COUNTRY_TP_CD| RESIDENCE_TP_CD| PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0| 76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
I want to remove spaces after the delimiter and the first letter of the word.
Any regex or unix script that can do the same. Looking for output as below:
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU||||||2013-09-19 14:48:49.609000|
Any help will be appreciated.

awk 'BEGIN{FS=OFS="|"} {for (i=1;i<=NF;i++) gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i)} 1' file

Using a perl one-liner to remove the spacing around every field. Assumes no embedded delimiters:
perl -i -lpe 's/\s*([^|]*?)\s*/$1/g' file.txt
Switches:
-i: Edit <> files in place (makes backup if extension supplied)
-l: Enable line ending processing
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.

The below perl code would remove the spaces which are present at the start of a line or the spaces after to the delimiter | ,
$ perl -pe 's/(?<=\|) +|^ +//g' file
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
To save the changes made to that file,
perl -i -pe 's/(?<=\|) +|^ +//g' file

sed 's/\ //g' input.txt > output.txt

With sed:
sed -r -e 's/(^|\|)\s+/\1/g' -e 's/\s+$//' filename
In the first expression:
(^|\|) matches the beginning of the line or a | character, and saves this in capture group 1.
\s+ matches a sequence of whitespace characters after that.
The replacement \1 substitutes capture group 1, so this deletes the whitespace at the beginning of the line and after the delimiter.
The g modifier makes it operate on all the matches in the line.
In the second expression:
\s+ again matches a sequence of whitespace
$ matches the end of the line
The replacement replaces the whole thing with an empty string, this removing trailing spaces.

for posix sed (for GNU sed add --posix)
sed 's/^[[:space:]]//;s/|[[:space:]]/|/g' YourFile
use 2 substitution (there are no OR (|) in sed regex posix version)
Remove starting space by replacing space at start( ^[[:space:]]*) by nothing
Replace any sequence pipe than any space (|[[:space:]]*) by pipe
[[:space:]] could be replace by a single space char if text only have space (ASCII 32) char

Find a pattern and replace the whole line & find a pattern and insert after

Question 1:
Pattern:
test_$(whoami)
Variable:
var1=$(pwd)
I want to find the pattern and replace the whole line with var1
sed -i "s/.*test_$(whoami).*/$var1/" test.txt
It gives me sed: -e expression #1, char 28: unknown option to `s'
Question 2.
Pattern:
#####Insert here#####
Content to be insert: include $var1/file_$(whoami).txt
I want to find the line with the pattern(Fully match), and insert the content one line after
sed -i "s/#####Insert here#####/include $var1/file_$(whoami).txt" test.txt
Doesn't work either
Can someone help?

Re Question 1. Use a different regex delimiter:
sed -i.bak "s~^.*test_$(whoami).*$~$var1~" test.txt
since $var1 can contain /

Question 1.
It seems $var1 contains a character interpreted as a sed delimiter, namely: '/'.
In a substitute command, after the third delimiter, sed expects an occurrence number, and you may be providing text.
Example, if:
var1="~/myDirectory"
Then this produces a sed command with too many delimiters:
sed -i 's/.*test_$(whoami).*/~/myDirectory/"
You should use a different delimiter character such as ~, #, !, ?, &, | ... one which is not present in your regexp.
Sed will automatically recognize the delimiter character after the substitute command and enable you to use the '/' character in your regexp:
sed "s#~/toto#~/tata#"
If you have difficulties finding a character that is not present in your regexp, you can use a non-printable character which is unlikely to exist in your pattern. For example, if your shell is bash:
$ echo '/~#' | sed s$'\001''/~#'$'\001''!?\&'$'\001''g'
In this example, bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.
Question 2.
You may be looking for sed's append function ('a'):
sed -i "/#####Insert here#####/ a include $var1/file_$(whoami).txt" test.txt

Bash to find lines with exact one word?

I'm trying to write a bash script that takes a file name, and return lines that have one word. Here is sample text:
This has more than one word
There
is exactly one word in above line.
White-space
in the start of the above line doesn't matter.
Need-some-help.
Output:
There
White-space
Need-some-help.
I'm looking into using a combination SED and Regex.
Note: I cannot using anything else (it has to be a bash script, without custom modules), so suggesting that wouldn't help.

If words can contain any non-whitespace characters, then:
grep -E '^\s*\S+\s*$'
or
sed -E '/^\s*\S+\s*$/!d'
or
sed -n -E '/^\s*\S+\s*$/p'

If you have awk available: awk 'NF==1'
sed: delete any line with a "non-space space non-space" sequence sed '/[^ ] +[^ ]/d'

Well You could just delete lines which contain a char + space + char using sed.
#!/bin/bash
echo "This has more than one word
There
is exactly one word in above line.
White-space
in the start of the above line doesn't matter.
Need-some-help." | sed '/\S \S/d' -

^\s*\b[a-zA-Z.-]+\s*$
For the regex part and assuming you are searching the file line by line this regex will only match if there is exactly one word in the line.

Assuming you can use grep (one of the most common tools used in shell scripts):
#!/bin/bash
grep '^ *[^ ]\+ *$' "$#"

perl -pe regex problem

I use perl to check some text input for a regex pattern, but one pattern doesn't work with perl -pe.
Following pattern doesn't work with the command call:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!
I use the linux shell. Following call I use to test my regex:
cat test | perl -pe 's![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!'
File test:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
Result:
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
Cache
How can I remove the first result?
Thanks for any advice.

That last slash after "Comp-(.*)" may be what's doing it. Your file content in the "Database" doesn't have a slash. Try replacing Comp-(.*)/.* with Comp-(.*)[/.].* so you can match either the subdirectory or the file extension.

$ cat input
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
$ perl -ne 'print if s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)/.*!$1!' input
Cache

The problem is in last slash character in the regex. Instead of escaping the dot, it is just normal slash character, which is missing from input string. Try this:
s![a-zA-Z]+ +(?:.*?)/(?:.*)Comp-(.*)[./].*!$1!
Edit: Updated to match new input data and added another option:
On the other hand, your replacement regex might be replaced by something like:
perl -ne 'print "$1\n" if /Comp-(.*?)[.\/]/'
Then there is no need to parse full line with whatever it contains.

\s match whitespace (spaces, tabs, and line breaks) and '+' means one or more characters. In this case '\s+' would mean search for one or more whitespaces.
cat test
A MaintanceGie?\195?\159mannFlock/System/Comp-Database.cpp
A MaintanceGie?\195?\159mannFlock/System/Comp-Cache/abc.h
perl -ne 'print "$1\n" if /\w+?\d+?\d+\w+\/\w+\/Comp-(\w+)[\/]/' test

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I use regex to exclude lines with extra characters? - regex

I have a bunch of email addresses: abc#google.com bdc#yahoo.com \\ske#google.com I'd like to delete the bolded line because there is extra character in the string other than # . and letters. How do I do this ?

You can use this sed: sed -i.bak -n '/^[[:alnum:]]*#/p' file

You can use vim to take care of it too: vim -c 'v/^[[:alnum:]]*#/d' -c 'wq' file

You could also use a perl module: perl -ne 'use Email::Valid; print if Email::Valid->address($_)'

Related

Adding blank line spaces before and after pattern 'string' match

how to trim trailing spaces after all delimiter in a text file

Find a pattern and replace the whole line & find a pattern and insert after

Bash to find lines with exact one word?

perl -pe regex problem

Categories

Resources