Including break line in sed replacement - regex

I've got the following sed replacement, which replaces an entire line with different text, if a certain string is found in the line.
sed "s/.*FOUNDSTRING.*/replacement \"text\" for line"/ test.txt
This works fine - but, for example I want to add a new line after 'for'. My initial thought was to try this:
sed "s/.*FOUNDSTRING.*/replacement \"text\" for \n line"/ test.txt
But this ends out replacing with the following:
replacement "text" for n line
Desired outcome:
replacement "text" for
line

It can be painful to work with newlines in sed. There are also some differences in the behaviour depending on which version you're using. For simplicity and portability, I'd recommend using awk:
awk '{print /FOUNDSTRING/ ? "replacement \"text\" for\nline" : $0}' file
This prints the replacement string, newline included, if the pattern matches, otherwise prints the line $0.
Testing it out:
$ cat file
blah
blah FOUNDSTRING blah
blah
$ awk '{print /FOUNDSTRING/ ? "replacement \"text\" for\nline" : $0}' file
blah
replacement "text" for
line
blah
Of course there is more than one way to skin the cat, as shown in the comments:
awk '/FOUNDSTRING/ { $0 = "replacement \"text\" for \n line" } 1' file
This replaces the line $0 with the new string when the pattern matches and uses the common shorthand 1 to print each line.

Related

Print the line matching 'pattern' string, excluding the 'pattern'

I have the following lines in a text file 'file.txt'
String1 ABCDEFGHIJKL
String2 DCEGIJKLQMAB
I want to print the characters corresponding to 'String1' in another text file 'text.txt' like this
ABCDEFGHIJKL
Here, I don't want to use any line numbers. Any suggestions using 'sed' command?. I tried with between 'string 1' and 'string 2', but couldn't obtain command excluding 'string1'. This following code for excluding only 'string2'.
sed -n '/^string1/,/^string2/{p;/^string2/q}' file.txt | sed '$d' > text.txt
awk '$1=="String1" { print $2 }' file.txt > text.txt
Where the first space delimited field equals "String1", print the second field. Redirect the output to text.txt.
Use GNU grep:
grep -Po 'String1\s+\K.*' in_file
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Count number of line in txt file when new line is inside data

I have one txt file which has below data
Name mobile url message text
test11 1234567890 www.google.com "Data Test New
Date:27/02/2020
Items: 1
Total: 3
Regards
ABC DATa
Ph:091 : 123456789"
test12 1234567891 www.google.com "Data Test New one
Date:17/02/2020
Items: 26
Total: 5
Regards
user test
Ph:091 : 433333333"
Now you can see my last column data has new line character. so when I use below command
awk 'END{print NR}' file.txt
it is giving my length is 15 but actually line length is 3 . Please suggest command for the same
Edited Part:
As per the answer given the below script is not working if there's no newline at the end of input file
awk -v RS='"[^"]*"' '{gsub(/\n/, " ", RT); ORS=RT} END{print NR "\n"}' test.txt
Also my file may have 3-4 Million of records . So converting file to unix format will take time and that is not my preference.
So Please suggest some optimum solution which should work in both case
head 5.csv | cat -A
Above command is giving me the output
Name mobile url message text^M$
Using gnu-awk you can do this using a custom RS:
awk -v RS='"[^"]*"' '{gsub(/(\r?\n){2,}/, "\n"); n+=gsub(/\n/, "&")}
END {print n}' <(sed '$s/$//' file)
15001
Here:
-v RS='"[^"]*"': Uses this regex as input record separator. Which matches a double quoted string
n+=gsub(/\n/, "&"): Dummy replace \n with itself and counts \n in variable n
END {print n}: Prints n in the end
sed '$s/$//' file: For last line adds a newline (in case it is missing)
Code Demo
With perl, assuming last line always ends with a newline character
$ perl -0777 -nE 'say s/"[^"]+"(*SKIP)(*F)|\n//g' ip.txt
3
-0777 to slurp entire input file as a single string, so this isn't suitable if the input file is very large
the s command returns number of substitutions made, which is used here to get the count of newlines
"[^"]+"(*SKIP)(*F) will cause newlines within double quotes to be ignored
You can use the below command if you want to count the last line even if it doesn't end with newline character.
perl -0777 -nE 'say scalar split /"[^"]+"(*SKIP)(*F)|\n/' ip.txt
Same as anubhava but with GNU sed:
<infile sed '/"/ { :a; N; /"$/!ba; s/\n/ /g; }' | wc -l
Output:
3

Grep pattern to output substring if line contains string

I'd like Grep (or awk or sed) to output word2 on a new line if word3 is 'nn1'. Each line in my tab delimited source text file is
<number> <word1> <word2> <word3> <lots of junk>
Or do I need to do this in two passes - one to isolate the line, and one to pull out word2?
Any help gratefully received!
You can use awk:
awk '$4 == "nn1"{print $3}' file
Note: For a tab delimited file as well above awk command will work since space or tab are default delimiters.
However if you want fields to be split only on tabs and NOT on spaces then use:
awk -F'\t' '$4 == "nn1"{print $3}' file
Awk is the tool for the job:
awk '$4 == "nn1" { print $3 }' file
If the fourth column is nn1, print the third.
By default, awk splits the line on any number of white space characters (tabs or spaces). As you have said "word1", "word2", etc. I guess that there are no spaces within each field, so the default behaviour should be OK. However, if you want to be explicit, you can specify the field separator yourself:
awk -F'\t' '$4 == "nn1" { print $3 }' file

Regexp to catch string between first and second comma, where there's alphabetical character in number

First, I must mention my native language is french, so I may make english mistake!
I try to use sed to catch and delete the lines where the second item in a CSV file contains other characters then numbers.
Here is an example of a OK line :
2323421,9781550431209,,2012-07-24 13:30:57,False,2012-07-01 00:00:00,False,118,,1,246501
A line that must be deleted :
1901461,3002CAN,,2010-09-29 13:46:59,True,,True,,,,
or
2977837,9782/76132396,,2015-04-27 10:14:47,True,2015-04-26 00:00:00,True,,,,
etc...
I'm not sure this is possible to be honest!
Thank you !
Here it is using sed
sed -e '/^[^,]*,[^,]*[^0-9,]/d'
A breakdown of the pattern:
^ Start of line
[^,]*, Everything up to the first comma inclusive
[^,]* Everything which isn't a comma
[^0-9,] At least one character which isn't a number or comma
Using awk you can do this:
awk -F, '$2 ~ /^[[:digit:]]+$/' file
Or (thanks to #ghoti):
awk -F, '$2 !~ /[^[:digit:]]/' file
to get only those line where 2nd column is an integer number.
Or using sed you can do:
sed -i.bak '/^[^,]*,[[:digit:]]*[^,[:digit:]]/d' file
Perl:
perl -F, -lane 'print if $F[1] =~ /^\d+$/' file
-a autosplit line to array #F, fields start with 0
-F, splits line using commas
print the line only if field 1 contain only digits: /^\d+$/

extract all values for specific key from space delimited text file

have a text file in the format
1=23 2=44 15=17:31:37.640 5=abc 15=17:31:37.641 4=23 15=17:31:37.643 15=17:31:37.643
I need a regex to extract all the values for key 15 for a multiline text file
output should be
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643
Sorry, I should have stated that the values I'm trying to extract are timestamps in the form 17:31:37.643
You can use GNU grep to extract the substrings.
grep -Po '\b15=\K\S+' | tr '\n' ' '
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Output
17:31:37.640 17:31:37.641 17:31:37.643 17:31:37.643
You can use sed:
sed 's/15=\([^ ]*\)/\1/g;s/[0-9]\+[^ ]\+ //g' input.file
Gave that answer before OP added the expected output, it will work too, but adds a new line after every value:
If you have GNU grep, you can use a lookbehind assertion that comes with perl compatible regex mode:
grep -oP '(?<=15=)[^ ]*' <<< '1=23 2=44 15=xyz 5=abc 15=yyy 4=23 15=omnet 15=that'
Output:
xyz
yyy
omnet
that
Using awk:
awk -F'=' -v RS=' ' -v ORS=' ' '$1==15 { print $2 }' file
xyz yyy omnet that
Set the Input and Output Record Separator to space and Input Field Separator to =. Test the condition of column1 to be 15. If that is true, print the second column.
As suggested by Ed Morton in the comments, this would leave a trailing blank char or even an absent newline. If thats a concern, you can use the following using GNU awk for multi-char RS.
gawk -F'=' -v RS='[[:space:]]+' '$1==15{ printf "%s%s", (c++?OFS:""), $2 } END{print ""}' file