I would like to remove some strings from filename.
I want to remove every string in bracket but not if there is a string "remix" or "Remix" or "REMIX"
Now I have got
sed "s/\s*\(\s?[A-z0-9. ]*\)//g"
but how to exclude cases when there is remix in string?
You can use a capture group:
sed 's/\(\s*([^)]*remix[^)]*)\)\|\s*(\s\?[a-z0-9. ]*)/\1/gi'
When the "remix branch" doesn't match, the capture group is not defined and the matched part is replaced with an empty string.
When the "remix branch" succeeds, the matched part is replaced by the content of the capture group, so by itself.
Note: if that helps to avoid false positive, you can add word-boundaries around "remix": \bremix\b
pattern details:
\( # open the capture group 1
\s* # zero or more white-spaces
( # a literal parenthesis
[^)]* # zero or more characters that are not a closing parenthesis
remix
[^)]*
)
\) # close the capture group 1
\| # OR
# something else between parenthesis
\s* # note that it is essential that the two branches are able to
# start at the same position. If you remove \s* in the first
# branch, the second branch will always win when there's a space
# before the opening parenthesis.
(\s\?[a-z0-9. ]*)
\1 is the reference to the capture group 1
i makes the pattern case-insensitive
[EDIT]
If you want to do it in a POSIX compliant way, you must use a different approach because several Gnu features are not available, in particular the alternation \| (but also the i modifier, the \s character class, the optional quantifier \?).
This other approach consists to find all eventual characters that are not an opening parenthesis and all eventual substrings enclosed between parenthesis with "remix" inside, followed by eventual white-spaces and an eventual substring enclosed between parenthesis.
As you can see all is optional and the pattern can match an empty string, but it isn't a problem.
All before the parenthesis part to remove is captured in group 1.
sed 's/\(\([^(]*([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)[^ \t(]*\([ \t]\{1,\}[^ \t(]\{1,\}\)*\)*\)\([ \t]*([^)]*)\)\{0,1\}/\1/g;'
pattern details:
\( # open the capture group 1
\(
[^(]* # all that is not an opening parenthesis
# substring enclosed between parenthesis without "remix"
( [^)]* [Rr][Ee][Mm][Ii][Xx] [^)]* )
# Let's reach the next parenthesis without to match the white-spaces
# before it (otherwise the leading white-spaces are not removed)
[^ \t(]* # all that is not a white-space or an opening parenthesis
# eventual groups of white-spaces followed by characters that are
# not white-spaces nor opening parenthesis
\( [ \t]\{1,\} [^ \t(]\{1,\} \)*
\)*
\) # close the capture group 1
\(
[ \t]* # leading white-spaces
([^)]*) # parenthesis
\)\{0,1\} # makes this part optional (this avoid to remove a "remix" part
# alone at the end of the string)
The word boundaries in this mode aren't available too. So the only way to emulate them is to list the four possibilities:
([Rr][Ee][Mm][Ii][Xx]) # poss1
([Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss2
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx]) # poss3
([^)]*[^a-zA-Z][Rr][Ee][Mm][Ii][Xx][^a-zA-Z][^)]*) # poss4
and to replace ([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*) with:
\(poss1\)\{0,\}\(poss2\)\{0,\}\(poss3\)\{0,\}\(poss4\)\{0,\}
Just skip the lines matching "remix":
sed '/([^)]*[Rr][Ee][Mm][Ii][Xx][^)]*)/! s/([^)]*)//g'
where bracket are (US) :[]
sed '/remix\|REMIX\|Remix/ !s/\[[^]]*]//g'
where bracet (ROW): ()
sed '/remix\|REMIX\|Remix/ !s/([^)]*)//g'
assuming:
- there is no internal bracket
- Other form of remix are excluced (ReMix, ...), so line is deleted
- Remix could be any place in title (i love remix) [if needed specify which to take and remove]
Related
I am trying to find in the Notepad++ strings like this:
'',
And convert them into this:
'',
I've made a regular expression to crop the string beginning with cards/ and ending with </a>:
(cards/)([^\s]{1,50})(([\s\.\?\!\-\,])(\w{1,50}))+(\.mp3"></a>)
Or an alternative approach:
(cards/)([^\s]{1,50})([\s\.\?\!\-\,]{0,})([^\s]{1,50})
Both work fine for search, but I can't get the replacement.
The problem is that the number of words in a sentence may vary.
And I can't get the ID of sub-expressions in the double parentheses.
The following format of replacement: \1\2\3... doesn't work, as I can't get the correct ID of the sub-expressions in the double parentheses.
I tried to google the topic, but couldn't find anything. Any advice, link or best of all a full replacement expression will be very much appreciated.
This will replace all spaces after /cards/ with a hyphen and lowercase the filename.
Ctrl+H
Find what: (?:href="/mp3files/cards/|\G)\K(?!\.mp3)(\S+)(?:\h+|(\.mp3))
Replace with: \L$1(?2$2:-)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
href="/mp3files/cards/ # literally
| # OR
\G # restart fro last match position
) # end group
(?!\.mp3) # negative lookahead, make sure we haven't ".mp3" after this position
\K # forget all we have seen until this position
(\S+) # group 1, 1 or more non spaces
(?: # non capture group
\h+ # 1 or more horizontal spaces
| # OR
(\.mp3) # group 2, literally ".mp3"
) # end group
Replacement:
\L$1 # lowercase content of group 1
(?2 # if group 2 exists (the extension .mp3)
$2 # use it
: # else
- # put a hyphen
) # endif
Screenshot (before):
Screenshot (after):
I'm trying to write a regex that matches strings as the following:
translate("some text here")
and
translate('some text here')
I've done that:
preg_match ('/translate\("(.*?)"\)*/', $line, $m)
but how to add if there are single quotes, not double. It should match as single, as double quotes.
You could go for:
translate\( # translate( literally
(['"]) # capture a single/double quote to group 1
.+? # match anything except a newline lazily
\1 # up to the formerly captured quote
\) # and a closing parenthesis
See a demo for this approach on regex101.com.
In PHP this would be:
<?php
$regex = '~
translate\( # translate( literally
([\'"]) # capture a single/double quote to group 1
.+? # match anything except a newline lazily
\1 # up to the formerly captured quote
\) # and a closing parenthesis
~x';
if (preg_match($regex, $string)) {
// do sth. here
}
?>
Note that you do not need to escape both of the quotes in square brackets ([]), I have only done it for the Stackoverflow prettifier.
Bear in mind though, that this is rather error-prone (what about whitespaces, escaped quotes ?).
In the comments the discussion came up that you cannot say anything BUT the first captured group. Well, yes, you can (thanks to Obama here), the technique is called a tempered greedy token which can be achieved via lookarounds. Consider the following code:
translate\(
(['"])
(?:(?!\1).)*
\1
\)
It opens a non-capturing group with a negative lookahead that makes sure not to match the formerly captured group (a quote in this example).
This eradicates matches like translate("a"b"c"d") (see a demo here).
The final expression to match all given examples is:
translate\(
(['"])
(?:
.*?(?=\1\))
)
\1
\)
#translate\(
([\'"]) # capture quote char
((?:
(?!\1). # not a quote
| # or
\\\1 # escaped one
)* #
[^\\\\]?)\1 # match unescaped last quote char
\)#gx
Fiddle:
ok: translate("some text here")
ok: translate('some text here')
ok: translate('"some text here..."')
ok: translate("a\"b\"c\"d")
ok: translate("")
no: translate("a\"b"c\"d")
You can alternate expression components using the pipe (|) like this:
preg_match ('/translate(\("(.*?)"\)|\(\'(.*?)\'\))/', $line, $m)
Edit: previous also matched translate("some text here'). This should work but you will have to escape the quotes in some languages.
I'm trying to write a regex that matches strings as the following:
translate("some text here")
and
translate('some text here')
I've done that:
preg_match ('/translate\("(.*?)"\)*/', $line, $m)
but how to add if there are single quotes, not double. It should match as single, as double quotes.
You could go for:
translate\( # translate( literally
(['"]) # capture a single/double quote to group 1
.+? # match anything except a newline lazily
\1 # up to the formerly captured quote
\) # and a closing parenthesis
See a demo for this approach on regex101.com.
In PHP this would be:
<?php
$regex = '~
translate\( # translate( literally
([\'"]) # capture a single/double quote to group 1
.+? # match anything except a newline lazily
\1 # up to the formerly captured quote
\) # and a closing parenthesis
~x';
if (preg_match($regex, $string)) {
// do sth. here
}
?>
Note that you do not need to escape both of the quotes in square brackets ([]), I have only done it for the Stackoverflow prettifier.
Bear in mind though, that this is rather error-prone (what about whitespaces, escaped quotes ?).
In the comments the discussion came up that you cannot say anything BUT the first captured group. Well, yes, you can (thanks to Obama here), the technique is called a tempered greedy token which can be achieved via lookarounds. Consider the following code:
translate\(
(['"])
(?:(?!\1).)*
\1
\)
It opens a non-capturing group with a negative lookahead that makes sure not to match the formerly captured group (a quote in this example).
This eradicates matches like translate("a"b"c"d") (see a demo here).
The final expression to match all given examples is:
translate\(
(['"])
(?:
.*?(?=\1\))
)
\1
\)
#translate\(
([\'"]) # capture quote char
((?:
(?!\1). # not a quote
| # or
\\\1 # escaped one
)* #
[^\\\\]?)\1 # match unescaped last quote char
\)#gx
Fiddle:
ok: translate("some text here")
ok: translate('some text here')
ok: translate('"some text here..."')
ok: translate("a\"b\"c\"d")
ok: translate("")
no: translate("a\"b"c\"d")
You can alternate expression components using the pipe (|) like this:
preg_match ('/translate(\("(.*?)"\)|\(\'(.*?)\'\))/', $line, $m)
Edit: previous also matched translate("some text here'). This should work but you will have to escape the quotes in some languages.
Please help create a regular expression that would be allocated "|" character everywhere except parentheses.
example|example (example(example))|example|example|example(example|example|example(example|example))|example
After making the selection should have 5 characters "|" are out of the equation. I want to note that the contents within the brackets should remain unchanged including the "|" character within them.
Considering you want to match pipes that are outside any set of parentheses, with nested sets, here's the pattern to achieve what you want:
Regex:
(?x) # Allow comments in regex (ignore whitespace)
(?: # Repeat *
[^(|)]*+ # Match every char except ( ) or |
( # 1. Group 1
\( # Opening paren
(?: # chars inside:
[^()]++ # a. everything inside parens except nested parens
| # or
(?1) # b. nested parens (recurse group 1)
) #
\) # Until closing paren.
)?+ # (end of group 1)
)*+ #
\K # Keep text out of match
\| # Match a pipe
regex101 Demo
One-liner:
(?:[^(|)]*+(\((?:[^()]++|(?1))\))?+)*+\K\|
regex101 Demo
This pattern uses some advanced features:
Possessive quantifiers
Recursion
Resetting the match start
I am having a lot of trouble interpreting this expression and I am getting really lost trying to read it. can someone help me?
^[^?](?:htaccess|access_log)(?:[.][^/?])?(?:[~])?(?:[?].*)?$
I know that ^ means to start at the beginning of the line, [^?] not matching a "?" i think, and then (?:) not sure what this does or how to interpret the rest of the line. Im thinking that htaccess|access_log means its an or statement so either htacces or access_log. [.][^/?] is a . followed by not a "?" but then what would the earlier [^?] mean...
What would an example of something this matches?
There are plenty of explainers that will breakdown a regular expression for you.
To be concise, the caret inside of a character class [^ ] is the negation operator, meaning match anything NOT in the character class. The ?: placed inside of an opening parentheses is a non-capturing group which specifies that the group is not to be captured, but to group expressions, and | is the alternation operator.
I would recommend taking a look at these sites for basic use of regular expressions.
Regular-Expressions.info
Rexegg (Regex Tutorial)
Regular Expression:
^ # the beginning of the string
[^?] # any character except: '?'
(?: # group, but do not capture:
htaccess # 'htaccess'
| # OR
access_log # 'access_log'
) # end of grouping
(?: # group, but do not capture (optional):
[.] # any character of: '.'
[^/?] # any character except: '/', '?'
)? # end of grouping
(?: # group, but do not capture (optional):
[~] # any character of: '~'
)? # end of grouping
(?: # group, but do not capture (optional):
[?] # any character of: '?'
.* # any character except \n (0 or more times)
)? # end of grouping
$ # before an optional \n, and the end of the string