Regular expression to replace spaces with dashes within a sub string. - regex

I've been struggling to find a way to replace spaces with dashes in a string but only spaces that are within a particular part of the string.
Source:
ABC *This is a sub string* DEF
My attempt at a regular expression:
/\s/g
If I use the regular expression to match spaces and replace I get the following result:
ABC-*This-is-a-sub-string*-DEF
But I only want to replace spaces within the text surrounded by the two asterisks.
Here is what I'm trying to achieve:
ABC *This-is-a-sub-string* DEF
Not sure why type of regular expressions I'm using as I'm using the find and replace in TextMate with Regular Expressions option enabled.
It's important to note that the strings that I will be running this regular expression search and replace on will have different text but it's just the spaces within the asterisks that I want to match.
Any help will be appreciated.

To identify spaces that are surrounded by asterisks, the key observation is, that, if asterisks appear only in pairs, the spaces you look for are always followed by an odd number of asterisks.
The regex
\ (?=[^*]*\*([^*]*\*[^*]*\*)*[^*]*$)
will match the once that should be replaced. Textmate would have to support look-ahead assertions for this to work.

s/(?<!\*)\s(?!\*)(?!$)/-/g
If TextMate supports Perl style regex commands (I have no experience with it all, sorry), this is a one-liner that should work.

try this one
/(?<=\*.*)\s(?=.*\*)/g
but it won't work in javascript if you want to use it in it, since it uses also lookbehind which is not supported in js

Try this: \s(\*[^*]*\*)\s. It will match *This is a sub string* in group 1. Then replace to -$1-.

Use this regexp to get spaces from within asterisks
(.)(*(.(\ ).)*)(.)
Take 4th element of the array provided by regex {4} and replace it with dashes.
I find this site very good for creating regular expressions.

It depends on your programming language but in many of them you can use lambda functions with your regular expression replacement statements and thereby perform further replacement on substrings.
Here's an example in Python:
string = "ABC *This is a sub string* DEF"
import re
new_string = re.sub("\*(.*?)\*", lambda x: '*' + x.group(1).replace(" ", "-") + '*', a)
That should give you ABC *This-is-a-sub-string* DEF.

Related

Regular expression to find and replace wrong quotation marks

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.
The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

Is there a smarter way to keep the indentation when replacing characters with a regex?

I want to replace the asterisks in a Markdown list with hyphens.
Example:
1.0
1.1
1.2
2
2.1
2.2
Currently I have a separate regex pattern for up to three levels of indentation set up in Keyboard Maestro for Mac:
I wonder if there isn't a smarter way to do this and which adresses all kinds of indentation.
In many regular expression search and replace systems, you can refer to a parenthesized group in the regular expression in the replacement, using \1, \2, etc. to refer to each successive group. So for example, in sed you could do:
sed -e 's/\(^[\t ]*\)\*/\1-/'
I'm not sure if Keyboard Maestro gives you that option. It mentions that it uses ICU regular expressions; if it also uses their replacement options, then you can use $1, $2 etc. to refer to the replacement.
If not, all is not lost. You can use a lookbehind assertion to match the sequence of whitespace before the the asterisk, without including the asterisk as part of the match; then just use a single dash as your replacement:
Search for: (?<=^[\t ]*)\*
Replace with: -
You can use submatching groups and reference them in the replacing string like this:
Regular expression matching your lines with list items: ([\t ]*)\*(.*)
The string used for replacement: \1-\2

Regex detect if a matched comma(,) does not lie in a regex

I am trying to figure out a way to determine if my matched comma(,) does not lie inside a regex. Basically, i do not want to match my character if it lies in a regex.
The regex i have come up with is ,(?<!.+\/)(?!.+\/) but its not quite working.
Any ideas?
I want to skip /some,regex/ but match any other commas.
Edit:
Live example: http://rubular.com/r/WjrwSnmzyP
Here is the regex that will work for you:
,(?!\s)(?=(?:(?:[^/]*\/){2})*[^/]*$)
Live Demo: http://rubular.com/r/37buDdg1tW
Explanation: It means match comma followed by EVEN number of forward slash /. Hence comma (,) between 2 slash (/) characters will NOT be matched and outside ones will be matched (since those are followed by even number of / characters).
A curious thing about regular expressions is that if you want to use them to ignore "something" that is within "something else", you need to match that "something else", prefer matches of it, and then either silently discard or reproduce those matches.
For example, in order to remove all commas from a string unless they are in a regular expression literal—
In Perl:
my $s = "/foo,bar/,baz";
$s =~ s{(/(?:[^/\\]|\\.)+/)|,}{\1}g;
In ECMAScript:
var s = "/foo,bar/,baz";
s = s.replace(/(\/([^\/\\]|\\.)+\/)|,/g, "$1");
or
s = s.replace(new RegExp("(/([^/\\\\]|\\\\.)+/)|,", "g"), "$1");
Note that I am capturing the match for the regular expression literal in the string value, and reproducing it (\1 or $1) if it matched. (If the other part of the alternation – the standalone comma – matched, the empty string is captured, so this simple approach suffices here.)
For further reading I recommend “Mastering Regular Expressions” by Jeffrey E. F. Friedl. Two rather enlightening example chapters, each from a different edition, are available for free online.

TCL regexp pattern search

I am trying to find a pattern match as below
abc(xxxx):efg(xxxx):xyz(xxxx) where xxxx - [0-9] digits
I used
set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
regexp abc(....):efg(....):xyz(....) $string result_str
it returns 0. Can anyone help?
The problem you've got is that ( and ) have special meaning to regular expressions in Tcl (and many other RE engines besides) in that they denote a capturing sub-RE. To make the characters “normal”, they have to be escaped with a backslash, and that means that it's best to put the regular expression in braces (because backslashes are general Tcl metacharacters).
Thus:
% set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
% regexp {abc\(....\):efg\(....\):xyz\(....\)} $string
1
If you want to also capture the contents of those parentheses, you need a slightly more complex RE:
regexp {abc\((....)\):efg\((....)\):xyz\((....)\)} $string \
all abc_bit efg_bit xyz_bit
Note that those .... sequences always match exactly four characters, but it's better to be more specific. To match any number of digits in each case:
regexp {abc\((\d+)\):efg\((\d+)\):xyz\((\d+)\)} $string -> abc efg xyz
When using regexp to extract bits of a string, it's pretty common to use -> as a (rather strange) variable name for the whole string match; it looks mnemonically like it's saying “send the pieces extracted to these variables”.
Not worked with tcl but seems like you need to escape the ( and ). Also if you are sure that the x's would be digits, use \d{4} instead of ..... Based on this, the updated regex you could try is
abc\(\d{4}\):efg\(\d{4}\):xyz\(\d{4}\).

What regular expression can remove duplicate items from a string?

Given a string of identifiers separated by :, is it possible to construct a regular expression to extract the unique identifiers into another string, also separated by :?
How is it possible to achieve this using a regular expression? I have tried s/(:[^:])(.*)\1/$1$2/g with no luck, because the (.*) is greedy and skips to the last match of $1.
Example: a:b:c:d:c:c:x:c:c:e:e:f should give a:b:c:d:x:e:f
Note: I am coding in perl, but I would very much appreciate using a regex for this.
In .NET which supports infinite repetition inside lookbehind, you could search for
(?<=\b\1:.*)\b(\w+):?
and replace all matches with the empty string.
Perl (at least Perl 5) only supports fixed-length lookbehinds, so you can try the following (using lookahead, with a subtly different result):
\b(\w+):(?=.*\b\1:?)
If you replace that with the empty string, all previous repetitions of a duplicate entry will be removed; the last one will remain. So instead of
a:b:c:d:x:e:f
you would get
a:b:d:x:c:e:f
If that is OK, you can use
$subject =~ s/\b(\w+):(?=.*\b\1:?)//g;
Explanation:
First regex:
(?<=\b\1:.*): Check if you can match the contents of backreference no. 1, followed by a colon, somewhere before in the string.
\b(\w+):?: Match an identifier (from a word boundary to the next :), optionally followed by a colon.
Second regex:
\b(\w+):: Match an identifier and a colon.
(?=.*\b\1:?): Then check whether you can match the same identifier, optionally followed by a colon, somewhere ahead in the string.
Check out: http://www.regular-expressions.info/duplicatelines.html
Always a useful site when thinking about any regular expression.
$str = q!a:b:c:d:c:c:x:c:c:e:e:f!;
1 while($str =~ s/(:[^:]+)(.*?)\1/$1$2/g);
say $str
output :
a:b:c:d:x:e:f
here's an awk version, no need regex.
$ echo "a:b:c:d:c:c:x:c:c:e:e:f" | awk -F":" '{for(i=1;i<=NF;i++)if($i in a){continue}else{a[$i];printf $i}}'
abcdxef
split the fields on ":", go through the splitted fields, store the elements in an array. check for existence and if exists, skip. Else print them out. you can translate this easily into Perl code.
If the identifiers are sorted, you may be able to do it using lookahead/lookbehind. If they aren't, then this is beyond the computational power of a regex. Now, just because it's impossible with formal regex doesn't mean it's impossible if you use some perl specific regex feature, but if you want to keep your regexes portable you need to describe this string in a language that supports variables.