sed - get last value in file - regex

I am making a script to collect a value from an external file. In the middle of this, I saw myself having trouble with the following sed command to limit the result to a single line.
The following command searches for all words with "value=" by collecting the next text, ignoring rows with "#"
NUM=$(sed -n -e '/#/!s/^.*value=//p' $LOGFILE)
I found other command variations for this but none of them allowed the use of words to be ignored as is the case with this command line.
Any soul to do this capture only the final line but still ignoring lines that contain "#"?
optional: can you adapt this command to capture only numbers, ignoring rest the words on the line?

Here's 3 ways:
if you need just sed:
sed -n '/value=/h; ${g; s/value=//p}' file
if you can use other tools:
tac file | sed -n '/value=/{s///p;q}'
or, this is quite readable:
awk -F= '$1 == "value" {value = $2} END {print value}' file

Related

rename specific lines in a text file with sed

I have a file that looks like this:
>alks|keep1|aoiuor|lskdjf
ldkfj
alksj
asdflkj
>jhoj_kl|keep2|kjghoij|adfjl
aldskj
alskj
alsdkj
I would like to edit just the lines starting with >, ideally in-place, to get a file:
>keep1
ldkfj
alksj
asdflkj
>keep2
aldskj
alskj
alsdkj
I know that in principle this is achievable with various combinations of sed/awk/cut, but I haven't been able to figure out the right combination. Ideally it should be fast - the file has many millions of lines, and many of the lines are also very long.
Key things about the lines I want to edit:
Always start with >
The bit I want to keep is always between the first and second pipe symbol | (hence thinking cut is going to help
The bit I want to keep has alphanumeric symbols and sometimes underscores. The rest of the string on the same line can have any symbols
What I've tried that seems helpful
(Most of my sed attempts are pure garbage)
cut -d '|' -f 2 test.txt
Gets me the bit of the string that I want, and it keeps the other lines too. So it's close, but (of course) it doesn't preserve the initial > on the lines where cut applies, so it's missing a crucial part of the solution.
With sed:
sed -E '/^>/ s/^[^|]+\|([^|]+).*/>\1/'
/^>/ to select lines starting with >, not strictly necessary for given sample but sometimes this provides faster result than using s alone
^[^|]+\| this will match non | characters from the start of line
([^|]+) capture the second field
.* rest of the line
>\1 replacement string where \1 will have the contents of ([^|]+)
If your input has only ASCII character, this would give you much faster results:
LC_ALL=C sed -E '/^>/ s/^[^|]+\|([^|]+).*/>\1/'
Timing
Checking the timing results by creating a huge file from given input sample, awk is much faster and mawk is even faster
However, OP reports that the sed solution is faster for the actual data
With your shown samples, you could simply try following. In this code, we are setting field separator as | for all the lines of Input_file then in main program checking if line starts from > then print 2nd field else print the complete line.
awk -F'|' '/^>/{print ">"$2;next} 1' Input_file
Explanation: Adding detailed explanation for above.
awk -F'|' ' ##Starting awk program from here and setting field separator as | here.
/^>/{ ##Checking condition if line starts from > then do following.
print ">"$2 ##Printing 2nd field of current line here.
next ##next will skip all further statements from here.
}
1 ##Will print current line.
' Input_file ##mentioning Input_file name here.
You can also use the following awk command:
awk -F\| '/^>/{print ">"$2} !/^>/{print}' file
# Inplace replacement with gawk (GNU awk)
gawk -i inplace -F\| '/^>/{print ">"$2} !/^>/{print}' file
# "Inline-like" replacement with any awk
awk -F\| '/^>/{print ">"$2} !/^>/{print}' file > tmp && mv tmp file
Here,
-F\| - sets the field separator to a | char
/^>/ is the condition: if line starts with < (and !/^>/ means the opposite)
{print ">"$2} prints the Field 2 value with a > char prepended to it
{print} simply prints the full line.
Note that since !/^>/{print} can be reduced to !/^>/ as print is the default action.
See an online demo:
s='>alks|keep1|aoiuor|lskdjf
ldkfj
alksj
asdflkj
>jhoj_kl|keep2|kjghoij|adfjl
aldskj
alskj
alsdkj'
awk -F\| '/^>/{print ">"$2} !/^>/{print}' <<< "$s"
Output:
>keep1
ldkfj
alksj
asdflkj
>keep2
aldskj
alskj
alsdkj

Removing rows that contains "(null)" value from a text file

I would like to remove any row within a .txt file that contains "(null)". The (null) value is always in the 3rd column. I would like to add this to a script that I already have.
Txt file example:
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
In this example I would like to remove the third row.
Im guessing its an awk -F but not sure from there.
You are on the right track with using -F.
$ awk -F '|' '$3 != "(null)"' file.txt
39|1411|XXYZ
40|1416|XXX
You set the field separator to |, then print all lines where the third field is not equal to (null). This uses awk's default of "print the line" if there's no action associated with a pattern.
If you relax the requirement to specifically test the third field, and there is no other place for the "(null)" substring to occur, you can get the same result with
grep -vF '(null)' file.txt
With awk:
awk '-F|' '$3 != "(null)"' < input-file
Here is a sed:
$ sed '/(null)$/d' file
39|1411|XXYZ
40|1416|XXX
The $ assures that the (null) is at the end of the line. If you want to assure that (null) is the final column:
$ sed '/\|(null)$/d' file
And if you want to be extra sure that it is the third column:
$ sed '/^[^|]*\|[^|]*\|(null)$/d' file
Or with grep:
$ grep -v '^[^|]*|[^|]*|(null)$'
(But instead of this last one, just use awk...)
Use grep:
grep -v '|.*|(null)' in_file
Here, grep uses option -v : print lines that do not match.
Or use Perl:
perl -F'[|]' -lane 'print if $F[2] ne "(null)";' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'[|]' : Split into #F on literal |, rather than on whitespace.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
I would like to remove any row within a .txt file that contains "(null)"
If you wish to do that using AWK let file.txt content be
39|1411|XXYZ
40|1416|XXX
41|1420|(null)
then
awk '!index($0,"(null)")' file.txt
will output
39|1411|XXYZ
40|1416|XXX
Explanation: index return position of first occurence of substring ((null) in this case) or 0 if none will find, I negate what is return thus getting truth for 0 and false for anything else and AWK does print where result was truth.

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Sed replace string with multi-lines

I have simple sed command:
#!/bin/bash
COMMAND=$1
sed -e "s#COMMAND#$COMMAND#
The value for command should be a new line for every command but i cannot figure out how to give them to sed and sed put every command on new line. What i have tried is:
./script 'ls\n date\n uname\n'
Regards!
If I'm understanding your question, you are looking to replace a representation of newlines within a string (i.e. a backslash character, followed by an 'n') as actual printed newlines.
The following script takes a single quoted string (the input as shown in your question) containing the literals '\n' and converts those into actual new lines.
#!/bin/bash
echo -n $1 | sed -e 's#\\n#\n#g'
Example usage:
[user#localhost ~]$ ./simple_sed.sh 'ls\ndate\nuname\n'
ls
date
uname
The changes needed from your script are to
echo the argument, otherwise sed expects a file and will do nothing;
match the \\n and replace it with a newline; and
add a 'g' to the end which will continue searching within a line after a replacement has occurred (read: multiple \n are substituted in a single line).

Troubles with regular expressions

I wanted some help on extended regular expressions.
I have been trying to figure out but in vain
I have a file conflicts.txt which looks like this please note that it is only a part of this file , there are many lines like these
Server/core/wildSetting.json
Server/core
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
Client/arcade/src/Game/
i am writing a shell script which goes thorugh this file line by line :
if [ -s "$CONFLICTS" ] ; then
count=0
while read LINE
do
let count++
echo -e "\n $LINE \n"
done < $CONFLICTS
fi
the above prints the file line by line what i am trying now is to redirect the lines which have a certain text into some other file for that i have modified echo line of the code to :
echo -e "\n $LINE \n" | grep -E "Server/game" > newfile.txt
My Query :
As we can see there are many lines of the form Server/Core...
I want to write a regular expression and use it in grep, which matches two kind of lines
1) line s containing the ONLY the string "Server/core" preceeded and suceeded by any number of spaces
2) all the lines containing the string "assets"
I have written a regular expression for the same but it doesn't work
here my regEx:
grep -E '[^' '*Server/core$] | [assets]'
can you please tell me what is the right way of doing it ?
Please note that there can be any number of spaces before and after "Server/core" as this file is a result of parsing a previous file.
Thanks !
Based on what's asked in the comments:
1) the lines containing the string "assets"
$ grep "assets" file
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
2) lines that contain only the sting "Server/core" preceeded and succeed by any amount of space
$ grep "^[ ]*Server/core[ ]*$" file
Server/core
sed (Stream EDitor) can solve your problem perfectly.
Try this command sed -n '/^ *Server\/core\|assets/p' conflicts.txt.
There is something wrong with your grep -E '[^' '*Server/core$] | [assets]'.
The ^ in a squared brackets omits all the strings containing any of the subsequent characters in the brackets.
If you want to perform in-place modification, add the -i option to the sed command like
sed -in '/^ *Server\/core\|assets/p' conflicts.txt
Your regex just needs to be this:
assets|^\s*Server/Core\s*$
I think sed or awk would be a better tool than grep - you would need to escape the forward slash if you used one of these.