How can I replace a pattern in a list of strings in boost build ?
In GNU make that could be done using substitution for changing file extension, or patsubst in general.
Here is an example using the rule "replace-list" from builtin module regex:
SWIG_SOURCES = [ glob *.i ] ;
import regex ;
SWIG_GENERATED_CPP_FILES = [ regex.replace-list $(SWIG_SOURCES) : \\.i : _wrap.cpp ] ;
Let's say the file example_file.i is located in the directory, its name will be added to the list SWIG_SOURCES by glob and will become example_file_wrap.cpp in the list SWIG_GENERATED_CPP_FILES;
The \\ are used to mean that . is a litteral dot, without them . would match any character.
The $ matches the end of the string.
More information in the documentation of regex builtin
Related
I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:
As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
I am extracting some information about package body files, and now I need to get the package references (packages invoked) in the same file. How to do this in Notepad++ with regex?
I understand that its possible with regex by marking a search with
pac_\w*
And unmark lines, but I need only the package names, not the lines.
For example if I have this code portion:
pac_test1.function1(...);
if pac_finally.f_result then
pac_execute.p_result;
v_load := pac_gui.f_show_result(pnum1, pnum2);
.
.
I expect to get this:
pac_test1
pac_finally
pac_execute
pac_gui
Or desired:
pac_test1, pac_finally, pac_execute, pac_gui
Notepad++ may not be the right tool for this job, because the typical approach you would use would be to search for something like pac_[^.]+. But the problem is that NPP operates starting with the entire line, and ending up some replacement of that line. Lines which have no matches would need to be removed, and that is tricky.
So I recommend using an app language like PHP. Here is a PHP script which can find all matches:
$script = "pac_test1.function1(...);
if pac_finally.f_result then
pac_execute.p_result;
v_load := pac_gui.f_show_result(pnum1, pnum2);";
preg_match_all("/pac_[^.]+/", $script, $matches);
print_r($matches[0]);
echo implode(",", $matches[0]);
Array
(
[0] => pac_test1
[1] => pac_finally
[2] => pac_execute
[3] => pac_gui
)
pac_test1,pac_finally,pac_execute,pac_gui
Ctrl+H
Find what: (?:^|\G).*?(pac_\w+)(?:(?!pac_).)*(\R|\z)?
Replace with: $1,
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?:^|\G) # beginning of line OR restart from last match position
.*? # 0 or more any character but newline, not greedy
(pac_\w+) # group 1, pac_ followed by 1 or more word characters, the package
(?:(?!pac_).)* # Tempered greedy token, make sure we haven't pac_
(\R|\z)? # optional group 2, any kind of linebreak or end of file
Replacement:
$1, # content of group 1, package, a comma and a space
Given:
pac_test1.function1(...); pac_test2
if pac_finally.f_result then
pac_execute.p_result;
v_load := pac_gui.f_show_result(pnum1, pnum2);
Result for given example:
pac_test1, pac_test2, pac_finally, pac_execute, pac_gui,
I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:
As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
Although I have consulted several threads, I cannot get my code to work, maybe someone can help me find a solution here.
I would like to look for files in a directory, which starts with
start.contain <- "VP01_SPSG2015_Experimental" ## beginning of the file name
and ends with
stop.contain <- ".vmrk" ## the file extension
What pattern do I have to feed to
findfile <- list.files(path, pattern = ???)
to find my file?
You can use
start.contain <- "VP01_SPSG2015_Experimental" ## beginning of the file name
stop.contain <- "[.]vmrk" ## the file extension
findfile <- list.files(path, pattern = paste0("^", start.contain, ".*", stop.contain, "$"))
The ^ means match at the beginning of the string, and $ means match at the end of the string. .* will match any zero or more characters.
Note that in a regex, . must be escaped or used in a character class ([.]) to be treated as a literal. Thus, you should use "[.]vmrk" or "\\.vmrk".
I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.