sed -e : is it possible to match everything between two quotation marks? - regex

Sed, is it possible to match everything between two chars?
In a script that I have to use there is a bug.
The script has to replace the value of
#define MAPPING,
The line containing the bug is the one below:
sed -i -e "s/#define MAPPING \"\"/#define MAPPING \"$string\"/1" file.hpp
Since in file.hpp MAPPING is defined as:
#define MAPPING ""
the script works, but if I try to call the script again and MAPPING was already redefined, now sed won't match #define MAPPING "" and thus not override anything.
I'm not a sed expert, and with a quick search couldn't find the way to let it match
#define MAPPING "<everything>".
Is it possible to achieve this?

This is does you want:
sed -Ei 's/(#define MAPPING ")[^"]*(")/\1'"$string\2/" file.hpp
[^"]* means zero or more non double quote characters.
I used back references instead of repeating the same text, it's up to you.
1 at the end of your example means replace the first occurence. However this is the default, so it can be removed.
Be aware: if $string contains sequences like &, \5, or \\, they won't be passed literally, and can even cause an error. Also, C escapes like \t for tab are expanded by many sed implementations (so you'll end up with a literal tab in the file, instead of \t).
For what it's worth, this sed does the same thing, but is more accomodating of varied whitespace:
sed -Ei 's/(^[[:space:]]*#[[:space:]]*define[[:space:]]+MAPPING[[:space:]]+")[^"]*(")/\1'"$string\2/" file.hpp

You can also try:
sed -i -e "s/#define MAPPING \".*\"/#define MAPPING \"$string\"/1" file.hpp
The dot means anything can go here and the star means at least 0 times so .* accepts any sequence of characters, including an empty string.

Related

what is '/[A-Z]/ s| |/|gp' meaning?

I am reading a sed tutorial at https://riptutorial.com/sed/example/13753/lines-matching-regular-expression-pattern.
Looks like
$ sed -n '/[A-Z]/ s| |/|gp' ip.txt
is filtering 'Add Sub Mul Div' out of the file, and convert it to 'Add/Sub/Mul/Div'
I really don't understand the regex considering I just read https://www.tldp.org/LDP/abs/html/x23170.html.
It does not even match the print syntax which is:
[address-range]/p
and is the pipe sign '|' here alternation?
Could anyone explain:
'/[A-Z]/ s| |/|gp'
in English?
Edit
I also found that the extra empty space before 's' and after '/' is allowed and does not do anything. the correct syntax should be:
[address-range]/s/pattern1/pattern2/
the syntax check of sed pattern is not strict, and confusing
-n option turns off automatic printing
sed allows to qualify commands with an address filtering, which could be regex or line addresses
for example, /foo/ d will delete lines containing foo
and /foo/ s/baz/123/ will change baz to 123 only if the line also contains foo
/[A-Z]/ match only lines containing at least one uppercase alphabet
if such a line is matched:
s| |/|gp perform this substitution and print
s command allows delimiter other than / too (see Using different delimiters in sed commands and range addresses)
in this case, using | allows you to use / as a normal character instead of having to escape it

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

Property File with Sed regex - Ignore first character for match

I have a test property file with this in it:
-config.test=false
config.test=false
I'm trying to, using sed, update the values of these properties whether they have the - in front of them or not. Originally I was using this, which worked:
sed -i -e "s/#*\(config.test\)\s*=\s*\(.*\)/\1=$(echo "true" | sed -e 's/[\/&]/\\&/g')/" $FILE_NAME
However, since I was basically ignoring all characters before the match, I found that when I had properties with keys that ended in the same value, it'd give me problems. Such as:
# The regex matches both of these
config.test=true
not.config.test=true
Is there a way to either ignore the first character for a match or ignore the initial - specifically?
EDIT:
Adding a little clarification in terms of what I'd want the regex to match:
config.test=false # Should match
-config.test=false # Should match
not.config.test=false # Should NOT match
sed -E 's/^(-?config\.test=).*/\1true/' file
? means zero or 1 repetitions of so it means the - can be present or not when matching the regexp.
I found some solution for a regex of a specific length instead of ignoring the first character with sed and awk. Sometimes the opposite does the same by an easier way.
If you only have the alternative to use sed I have two workaround depending on your file.
If your file looks like this
$ cat file
config.test=false
-config.test=false
not.config.test=false
you can use this one-liner
sed 's/^\(.\{11,12\}=\)\(.*$\)/\1true/' file
sed is looking at the beginning ^ of each line and is grouping \( ... \) for later back referencing every character . that occurs 11 or 12 times \{11,12\} followed by a =.
This first group will be replaced with the back reference \1.
The second group that match every character after the = to the end of line \(.*$\) will be dropped. Instead of the second group sed replaces with your desired string true.
This also means, that every character after the new string true will be chopped.
If you want to avoid this and your file looks like
$ cat file
config.test=true # Should match
-config.test=true # Should match
not.config.test=false # Should NOT match
you can use this one-liner
sed 's/^\(.\{11,12\}=\)\(false\)\(.*$\)/\1true\3/' file
This is like the example before but works with three groups for back referencing.
The content of the former group 2 is now in group 3. So no content after a change from false to true will be chopped.
The new second group \(false\) will be dropped and replaced by the string true.
If your file looks like in the example before and you are allowed to use awk, you can try this
awk -F'=' 'length($1)<=12 {sub(/false/,"true")};{print}'
For me this looks much more self-explanatory, but is up to your decision.
In both sed examples you invoke only one time the sed command which is always good.
The first sed command needs 39 and the second 50 character to type.
The awk command needs 52 character to type.
Please tell me if this works for you or if you need another solution.

GREP: Extracting all characters from inside double quote

What I did:
grep -E -o -e "[^"]+"
It can extract, for example: "Poland" and "New York" but can't extract "Marcos Juárez" due to the existence of 'á'...it cuts the output to "Marcos Ju" and "rez"
How can I prevent this?
I don't think this is a regex problem per say. It could be a Unicode or wide-char issue.
Your regex should be "[^"]+" thats a NOT double quote.
I don't know unix command line, but what is delimiting the "[^']+" parameter,
is it done by just spaces ?
Try ".*?", it should match. If not its a unicode problem.
Try:
grep -Po '(?<=\")(.*?)(?=\")'
for me it output all the three correctly.

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input