Troubles with regular expressions - regex

I wanted some help on extended regular expressions.
I have been trying to figure out but in vain
I have a file conflicts.txt which looks like this please note that it is only a part of this file , there are many lines like these
Server/core/wildSetting.json
Server/core
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
Client/arcade/src/Game/
i am writing a shell script which goes thorugh this file line by line :
if [ -s "$CONFLICTS" ] ; then
count=0
while read LINE
do
let count++
echo -e "\n $LINE \n"
done < $CONFLICTS
fi
the above prints the file line by line what i am trying now is to redirect the lines which have a certain text into some other file for that i have modified echo line of the code to :
echo -e "\n $LINE \n" | grep -E "Server/game" > newfile.txt
My Query :
As we can see there are many lines of the form Server/Core...
I want to write a regular expression and use it in grep, which matches two kind of lines
1) line s containing the ONLY the string "Server/core" preceeded and suceeded by any number of spaces
2) all the lines containing the string "assets"
I have written a regular expression for the same but it doesn't work
here my regEx:
grep -E '[^' '*Server/core$] | [assets]'
can you please tell me what is the right way of doing it ?
Please note that there can be any number of spaces before and after "Server/core" as this file is a result of parsing a previous file.
Thanks !

Based on what's asked in the comments:
1) the lines containing the string "assets"
$ grep "assets" file
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
2) lines that contain only the sting "Server/core" preceeded and succeed by any amount of space
$ grep "^[ ]*Server/core[ ]*$" file
Server/core

sed (Stream EDitor) can solve your problem perfectly.
Try this command sed -n '/^ *Server\/core\|assets/p' conflicts.txt.
There is something wrong with your grep -E '[^' '*Server/core$] | [assets]'.
The ^ in a squared brackets omits all the strings containing any of the subsequent characters in the brackets.
If you want to perform in-place modification, add the -i option to the sed command like
sed -in '/^ *Server\/core\|assets/p' conflicts.txt

Your regex just needs to be this:
assets|^\s*Server/Core\s*$
I think sed or awk would be a better tool than grep - you would need to escape the forward slash if you used one of these.

Related

Find multi-line text & replace it, using regex, in shell script

I am trying to find a pattern of two consecutive lines, where the first line is a fixed string and the second has a part substring I like to replace.
This is to be done in sh or bash on macOS.
If I had a regex tool at hand that would operate on the entire text, this would be easy for me. However, all I find is bash's simple text replacement - which doesn't work with regex, and sed, which is line oriented.
I suspect that I can use sed in a way where it first finds a matching first line, and only then looks to replace the following line if its pattern also matches, but I cannot figure this out.
Or are there other tools present on macOS that would let me do a regex-based search-and-replace over an entire file or a string? Maybe with Python (v2.7 and v3 is installed)?
Here's a sample text and how I like it modified:
keyA
value:474
keyB
value:474 <-- only this shall be replaced (follows "keyB")
keyC
value:474
keyB
value:474
Now, I want to find all occurances where the first line is "keyB" and the following one is "value:474", and then replace that second line with another value, e.g. "value:888".
As a regex that ignores line separators, I'd write this:
Search: (\bkeyB\n\s*value):474
Replace: $1:888
So, basically, I find the pattern before the 474, and then replace it with the same pattern plus the new number 888, thereby preserving the original indentation (which is variable).
You can use
sed -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
# Or, to replace the contents of the file inline in FreeBSD sed:
sed -i '' -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
Details:
/keyB$/ - finds all lines that end with keyB
n - empties the current pattern space and reads the next line into it
s/\(.*\):[0-9]*/\1:888/ - find any text up to the last : + zero or more digits capturing that text into Group 1, and replaces with the contents of the group and :888.
The {...} create a block that is executed only once the /keyB$/ condition is met.
See an online sed demo.
Use a perl one-liner with -0777 to scan over multiple lines:
$ # inline edit:
$ perl -0777 -i -pe 's/\bkeyB\s*value):\d*/$1:888/' file.txt
$ # to stdout:
$ cat file.txt | perl -0777 -pe 's/\bkeyB\s*value):\d*/$1:888/'
In plain bash:
#!/bin/bash
keypattern='^[[:blank:]]*keyB$'
valpattern='(.*):'
replacement=888
while read -r; do
printf '%s\n' "$REPLY"
if [[ $REPLY =~ $keypattern ]]; then
read -r
if [[ $REPLY =~ $valpattern ]]; then
printf '%s%s\n' "${BASH_REMATCH[0]}" "$replacement"
else
printf '%s\n' "$REPLY"
fi
fi
done < file

Adding blank line spaces before and after pattern 'string' match

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?
sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.
Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.
And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

Match multiple patterns in same line using sed [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

pipe sed command to create multiple files

I need to get X to Y in the file with multiple occurrences, each time it matches an occurrence it will save to a file.
Here is an example file (demo.txt):
\x00START how are you? END\x00
\x00START good thanks END\x00
sometimes random things\x00\x00 inbetween it (ignore this text)
\x00START thats nice END\x00
And now after running a command each file (/folder/demo1.txt, /folder/demo2.txt, etc) should have the contents between \x00START and END\x00 (\x00 is null) in addition to 'START' but not 'END'.
/folder/demo1.txt should say "START how are you? ", /folder/demo2.txt should say "START good thanks".
So basicly it should pipe "how are you?" and using 'echo' I can prepend the 'START'.
It's worth keeping in mind that I am dealing with a very large binary file.
I am currently using
sed -n -e '/\x00START/,/END\x00/ p' demo.txt > demo1.txt
but that's not working as expected (it's getting lines before the '\x00START' and doesn't stop at the first 'END\x00').
If you have GNU awk, try:
awk -v RS='\0START|END\0' '
length($0) {printf "START%s\n", $0 > ("folder/demo"++i".txt")}
' demo.txt
RS='\0START|END\0' defines a regular expression acting as the [input] Record Separator which breaks the input file into records by strings (byte sequences) between \0START and END\0 (\0 represents NUL (null char.) here).
Using a multi-character, regex-based record separate is NOT POSIX-compliant; GNU awk supports it (as does mawk in general, but seemingly not with NUL chars.).
Pattern length($0) ensures that the associated action ({...}) is only executed if the records is nonempty.
{printf "START%s\n", $0 > ("folder/demo"++i)} outputs each nonempty record preceded by "START", into file folder/demo{n}.txt", where {n} represent a sequence number starting with 1.
You can use grep for that:
grep -Po "START\s+\K.*?(?=END)" file
how are you?
good thanks
thats nice
Explanation:
-P To allow Perl regex
-o To extract only matched pattern
-K Positive lookbehind
(?=something) Positive lookahead
EDIT: To match \00 as START and END may appear in between:
echo -e '\00START hi how are you END\00' | grep -aPo '\00START\K.*?(?=END\00)'
hi how are you
EDIT2: The solution using grep would only match single line, for multi-line it's better use perl instead. The syntax will be very similar:
echo -e '\00START hi \n how\n are\n you END\00' | perl -ne 'BEGIN{undef $/ } /\A.*?\00START\K((.|\n)*?)(?=END)/gm; print $1'
hi
how
are
you
What's new here:
undef $/ Undefine INPUT separator $/ which defaults to '\n'
(.|\n)* Dot matches almost any character, but it does not match
\n so we need to add it here.
/gm Modifiers, g for global m for multi-line
I would translate the nulls into newlines so that grep can find your wanted text on a clean line by itself:
tr '\000' '\n' < yourfile.bin | grep "^START"
from there you can take it into sed as before.

sed misbehaving?

I have the following command:
$ xlscat -i $file
and I get:
Excel File Name.xslx - 01: [ Sheet #1 ] 34 Cols, 433 Rows
Excel File Name.xlsx - 02: [ Sheet Number2 ] 23 Cols, 32 Rows
Excel File Name.xlsx - 03: [ Foo Factor! ] 14 Cols, 123 Rows
I want just the sheet name, so i do this:
$ xlscat -i $file 2>&1 | sed -e 's/.*\[ *\(.*\) *\].*/\1/' | while read file
> do
> echo "File: '$file'"
> done
And get this:
File: 'Sheet #1'
File: 'Sheet Number2'
File: 'Foo Factor!'
Great! Everything works beautifully. As you can see with the single quotes, I've removed the extra spaces at the end of the file name. Now convert all remaining spaces to underscores:
$ xlscat -i $file 2>&1 | sed -e 's/.*\[ *\(.*\) *\].*/\1/' | sed -e 's/ /_/g' | while read file
> do
> echo "File: '$file'"
> done
Now I get this:
File: 'Sheet_#1_____'
File: 'Sheet_Number2'
File: 'Foo_Factor!__'
Huh? The first one didn't show any trailing blanks, but the second one seems to be appending underscores on the end of the file. What am I not seeing?
The first sed command is not stripping the trailing whitespace, read is. Check your expression:
sed -e 's/.*\[ *\(.*\) *\].*/\1/'
It matches:
anything
a bracket
1 or more spaces
anything, captured
1 or more spaces
a right bracket
anything
The regular expressions are greedy, meaning that they match as much as possible, and the earlier expressions will match before later ones do. So for example, the regular expression (.*)(.*) matches anything in two capturing groups, but there are any number of ways the data could be split between the two groups. So the regex implementation has to choose, and it will put as much as possible in the first, and nothing in the second.
Since you need to match filenames with spaces in them, you can't match "anything except a space"; your best bet is to trim the trailing whitespace as a separate step. Try this sed command instead:
sed -e 's/.*\[ *\(.*\) *\].*/\1/' -e 's/ *$//'
I think the read file is trimming the trailing whitespace for you. Try putting the
sed -e 's/ /_/g'
inside the while loop ... like:
echo "File: $(echo $file | sed -e 's/ /_/g')"
Could it be echo that's stripping the trailing spaces? Although it does seem like they should show up inside the quotes. Anyway, try this:
sed -e 's/.*\[ *\([^] ]\+\( \+[^] ]\+\)*\).*/\1/'
Each word of the sheet name is matched by [^] ]\+ (i.e., one or more of any characters other than space or ]). When the final word of the name has been matched, the second .* consumes the rest of the line. There's no need to match the closing ], so the trailing spaces don't have to be included in the match.
I'm not a sed user, but this regex works correctly in RegexBuddy when I specify the GNU-BRE flavor, so it should work in sed.