Specify a "-" in a sed pattern - regex

I'm trying to find a '-' character in a line, but this character is used for specifying a range. Can I get an example of a sed pattern that will contain the '-' character?
Also, it would be quicker if I could use a pattern that includes all characters except a space and a tab.

'-' specifies a range only between square brackets.
For example, this:
sed -n '/-/p'
prints all lines containing a '-' character. If you want a '-' to represent itself between square brackets, put it immediately after the [ or before the ]. This:
sed -n '/[-x]/p'
prints all lines containing either a '-' or a 'x'.
This pattern:
[^ <tab>]
matches all characters other than a space and a tab (note that you need a literal tab character, not "<tab>").

If you want to find a dash, specify it outside a character class:
/-/
If you want to include it in a character class, make it the first or last character:
/[-a-z]/
/[a-z-]/
If you want to find anything except a blank or tab (or newline), then:
/[^ \t]/
where, I hasten to add, the '\t' is a literal tab character and not backslash-t.

find "-" in file
awk '/-/' file

Related

What does "%%" mean in a sed regex?

In the code base, there is a line:
echo x/y/z | sed 's%/[^/]*$%%'
It will remove the /z from the input string.
I can't quite understand it.
s is substitution
%/ is quoting /
[^/]*$ means matching any characters except / any times from the end of line
But what is %% here?
Here's info sed:
The '/' characters may be uniformly replaced by any other single
character within any given 's' command. The '/' character (or whatever
other character is used in its stead) can appear in the REGEXP or
REPLACEMENT only if it is preceded by a '\' character.
So the % is just an arbitrary delimiter. The canonical delimiter is /, but that collides with your pattern which is also /.
In other words, %/ isn't an escaped /. They're independent characters.
The expression breaks down like this:
s Replace
% Delimiter
/[^/]*$ Search pattern
% Delimiter
Empty replacement string
% Delimiter
Which is completely analogous to a simple s/foo/bar/:
s Replace
/ Delimiter
foo Search pattern
/ Delimiter
bar Replacement string
/ Delimiter

Delete any non-word character but spaces and a single quote inside the word

Want cleanup some texts. So, want remove anything but \w and \s, but also want keep the single ' inside the word. (e.g. want keep it in words like don't.
I could do
perl -plE "s/[^\w\s']//g" <<< "'a:b/c d????ef' don't"
which keeps the ' but it keeps it also at the begining or end of string, e.g. it prints
'abc def' don't
I'm unable to implement the keep this (?<\w)'(?=\w), e.g. remove the ' unless it is between two word characters.
The wanted result:
abc def don't
How to do this?
You could do this:
s/[^\w\s']|(?<!\w)'|'(?!\w)//g
Delete everything that is either
a character that is not (a word character or a space or '), or
a ' that is not preceded by a word character, or
a ' that is not followed by a word character
The first clause will match (and remove) all characters that we obviously don't want to keep.
The second and third clause will remove all ' characters unless they're surrounded by word characters on both sides.
You can also use a global research instead of a replacement, this way you only have to describe what you want to keep and the pattern becomes more simple:
perl -ne"print /[\w\s]|\b'\b/g" <<< "'a:b/c d????ef' don't"

Replace all Occurences of '.' with '_' before '=' using sed

I have a properties file as below:
build.number=153013
db.create.tablespace=0
db.create.user=0
db.create.schema=0
upgrade.install=1
new.install=0
configure.jboss=0
configure.jbosseap=false
configure.weblogic=1
configure.websphere=0
I need to import these variables into a shell script. As you know, '.' is not a valid character to use as a variable in linux. How would I use sed to replace all occurrences of '.' before the '=' with '_' . I have replaced all occurrences of '.' but there are some properties that have values which contain a '.' that I would like to not modify.
Any help is appreciated!
Thanks!
You can use
sed -e ':b; s/^\([^=]*\)*\./\1_/; tb;'
It replaces stringWithoutEquals. with stringWithoutEquals_ for as long as the match succeeds. In effect, this replaces all the .'s before the = with _.
You can try this:
sed 's/\.\|\(=.*\)/_\1/g;s/_=/=/' file
The approach consists to capture all the content after the equal to consume all trailing characters (including possible dots). So all dots before the equal are replaced with an underscore and an empty capture group, but there is an underscore before the equal. The second replacement removes this last underscore.
This might work for you (GNU sed):
sed 's/=/&\n/;h;y/./_/;G;s/\n.*\n//' file
Separate the line with a marker at the =. Copy the line. Replace all .'s with _'s. Append the original line and subtract the text between the two markers.
Here is an awk
awk -F= '{gsub(/\./,"_",$1)}1' OFS== file
build_number=153013
db_create_tablespace=0
db_create_user=0
db_create_schema=0
upgrade_install=1
new_install=0
configure_jboss=0
configure_jbosseap=false
configure_weblogic=1
configure_websphere=0
It divides the text by =, then replace in first field . with _.

How to regexp match surrounding whitespace or beginning/end of line

I am trying to find lines in a file that contain a / (slash) character which is not part of a word, like this:
grep "\</\>" file
But no luck, even if the file contains the "/" alone, grep does not find it.
I want to be able to match lines such as
some text / pictures
/ text
text /
but not e.g.
/home
Why your approach does not work
\<, \> only match against the beginning (or end, respectively) of a word. That means that they can never match if put adjacent to / (which is not treated as a word-character) – because e.g. \</ basically says "match the beginning of a word directly followed by something other than a word (a 'slash', in this case)", which is impossible.
What will work
This will match / surrounded by whitespace (\s) or beginning/end of line:
egrep '(^|\s)/($|\s)' file
(egrep implies the -E option, which turns on processing of extended regular expressions.)
What might also work
The following slightly simpler expression will work if a / is never adjacent to non-word characters (such as *, #, -, and characters outside the ASCII range); it might be of limited usefulness in OP's case:
grep '\B/\B' file
for str in 'some text / pictures' ' /home ' '/ text' ' text /'; do
echo "$str" | egrep '(^|\s)/($|\s)'
done
This will match /:
if the entire input string is /
if the input string starts with / and is followed by at least 1 whitespace
if the input string ends with / and is preceded by at least 1 whitespace
if / is inside the input string surrounded by at least 1 whitespace on either side.
As for why grep "\</\>" file did not work:
\< and /> match the left/right boundaries between words and non-words. However, / does not qualify as a word, because words are defined as a sequence of one or more instances of characters from the set [[:alnum:]_], i.e.: sequences of at least length 1 composed entirely of letters, digits, or _.
This seems to work for me.
grep -rni " / \| /\|/ " .

What's special about a "space" character in an "expr match" regexp?

In a bash shell, I set line like so:
line="total active bytes: 256"
Now, I just want to get the digits from that line so I do:
echo $(expr match "$line" '.*\([[:digit:]]*\)' )
and I don't get anything. But, if I add a space character before the first backslash in the regexp, then it works:
echo $(expr match "$line" '.* \([[:digit:]]*\)' )
Why?
The space isn't special at all. What's happening is that in the first case, the .* matches the entire string (i.e., it matches "greedily"), including the numbers, and since you've quantified the digits with * (as opposed to \+), that part of the regex is allowed to match 0 characters.
By putting a space before the digit match, the first part can only match up to but not including the last space in the string, leaving the digits to be matched by \([[:digit:]]*\).