Calling function during regular expression replacement

Calling function during regular expression replacement - regex

I need to decode a string coming from json. Special characters are encoded as hex unicode (e.g. the apostrophe is /u0027).
I'm trying to accomplish this with these expression:
regexprep('Can/u0027t add the category','/u(\d{4})',native2unicode(hex2dec(strrep('$1','/u',''))))
but I get the following error
Error using hex2dec (line 38)
Input string found with characters other than 0-9, a-f, or A-F.
because hex2dec receives '$1' as value and not the result of strrep('$1','/u','').
If I try
regexprep('Can/u0027t add the category','/u(\d{4})',strrep('$1','/u',''))
I get, correctly, 'Can0027t add the category'. If I try with
regexprep('Can/u0027t add the category','/u(\d{4})',native2unicode(hex2dec(strrep('/u0027','/u',''))))
I get the right result (but with a fixed decoding, obviously).
I don't understand why the result of strrep is not the input argument of hex2dec.

You're tricking yourself with the debug. The $1 expansion in the replacement string operates on the string itself, as seen by regexprep. It is not expanded by the MATLAB parser before calling any functions, which will just see the string '$1'. If the result of those functions contains a $1, it will get passed into regexprep and expanded. So, for example, your test case with the bare strrep replaces nothing (since its input is the string '$1'), and passes the bare $1 string right back into regexprep.
You have two issues. One is easy: you don't need strrep at all, since the parentheses mark just the hex digits as the token. $1 expands with no /u. Test it:
regexprep('Can/u0027t add the category','/u(\d{4})','$1')
results in 'Can0027t add the category'.
Now for the harder one. As previously noted, you can't call normal functions on the $1 and have them do anything. However, MATLAB provides a special regexp syntax to call functions from inside the replacement string. Here is the documentation:
http://www.mathworks.com/help/matlab/matlab_prog/dynamic-regular-expressions.html
In summary, ${cmd($1)} expands to calling the MATLAB function cmd on the replacement token to generate the replacement string. So putting it all together:
regexprep('Can/u0027t add the category', '/u(\d{4})', '${native2unicode(hex2dec($1))}')
ans = Can't add the category

Related

Ruby Regex on Active Directory String

I have a string that represents multiple DNs for Active Directory but has been separated by commas instead of ;
The String:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
I am trying to write a regex that will match on both ou=App1 and not the ou=App2 but then also make the , after dc=internal become a ;
Is this possible?
The result would be:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;

Using #strip and #sub to Clean Up Your LDIF Data
Really, the "correct" answer would be to get valid LDIF in the first place, and then parse it as such with a gem like Net::LDAP. However, the changes you want to your existing file are fairly trivial. For example, we'll start by assigning the String data from your question to a variable named ldif using a here-document literal:
ldif = <<~'LDIF'
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
LDIF
You can now modify and match the lines from the String that you want with String#each_line to iterate, and String#gsub and a Regexp lookahead assertion to find and collect the lines you want using Array#select on the output from #each_line, and storing the results into a matching_apps Array.
This all sounds much more complicated than it is. Consider the following method chain, which is really just a one-liner wrapped for readability:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, ";" }
#=>
["CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;",
"CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;"]
The use of String#strip and String#sub will help to ensure that all lines are normalized the way you want, including the trailing semicolons. However, this is likely to cause problems in subsequent steps, so I'd probably recommend removing those trailing semicolons as well.
Note: You can stop reading here if you just want to solve your immediate question as originally posted. The rest of the answer covers additional considerations related to data normalization, and provides some examples on how and why you might want to strip the semicolons as well.
Why and How to Normalize without Semicolons
You can replace the final substitution from #sub with an empty String (e.g. "") to remove the trailing semicolons (if present). Normalizing without the semicolons now may save you the trouble of having to clean up those lines again later when you iterate over the Array of results stored in matching_apps from Array#select.
For example, if you need to rejoin lines with commas, interpolate the lines within other String objects in subsequent steps, or do anything where those stored semicolons may be an unexpected surprise it's better to deal with it sooner rather than later. If you really need the trailing semicolons, it's very easy to use String#concat or other forms of String interpolation to add them back, but having unexpected characters in a String can be a source of unexpected bugs that are best avoided unless you're sure you'll always need that semicolon at the end.
Example 1: Output Where Semicolons Might be Unexpected
For example, suppose you want to use the results to format output for a command-line client where a trailing semicolon wouldn't be expected. The following works nicely because the semicolons are already stripped:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, "" }
printf "Make the following calls:\n\n"
matching_apps.each_with_index do |dn, idx|
puts %(#{idx.succ}. ldapsearch -D '#{dn}' [opts])
end
This would print out:
Make the following calls:
1. ldapsearch -D 'CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
2. ldapsearch -D 'CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
without having to first strip any trailing semicolons that might not work with the printed command, tool, or other output.
Examples of Rejoining with Commas and Semicolons
On the other hand, you can just as easily rejoin the Array elements with a comma or semicolon if you want. Consider the following two examples:
matching_apps.join ", "
#=> "CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal, CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal"
p format("(%s)", matching_apps.join("; "))
#=> "(CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal; CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal)"
Keep Flexibility in Mind
If the String objects in your Array still had the trailing semicolons, you'd have to do something about them. So, unless you already know what you plan to do with each String, and whether or not the semicolons will be needed, it's probably best to keep them out of matching_apps in the first place to optimize for flexibility. That's just an opinion, to be sure, but definitely one worth considering.

Powershell: Replace all occurrences of different substrings starting with same Unicode char (Regex?)

I have a string:
[33m[TEST][90m [93ma wonderful testorius line[90m ([37mbite me[90m) which ends here.
You are not able to see it (as stackoverflow will remove it when I post it) but there is a special Unicode char before every [xxm where xx is a variable number and [ as well as m are fixed. You can find the special char here: https://gist.githubusercontent.com/mlocati/fdabcaeb8071d5c75a2d51712db24011/raw/b710612d6320df7e146508094e84b92b34c77d48/win10colors.cmd
So, it is like this (the special char is displayed here with a $):
$[33m[TEST]$[90m $[93ma wonderful testorius line$[90m ($[37mbite me$[90m) which ends here.
Now, I want to remove all $[xxm substrings in this line as it is only for colored monitor output but should not be saved to a log file.
So the expected outcome should be:
[TEST] a wonderful testorius line (bite me) which ends here.
I tried to use RegEx but I dont understand it (perhaps it is extra confusing due to the special char and the open bracked) and I am not able to use wildcards in a normal .Replace ("this","with_that") operation.
How am I able to accomplish this?

In this simple case, the following -replace operation will do, but note that this is not sufficient to robustly remove all variations of ANSI / Virtual Terminal escape sequences:
# Sample input.
# Note: `e is used as a placeholder for ESC and replaced with actual ESC chars.
# ([char] 0x1b)
# In PowerShell (Core) 7+, "..." strings directly understand `e as ESC.
$formattedStr = '`e[33m[TEST]`e[90m `e[93ma wonderful testorius line`e[90m (`e[37mbite me`e[90m) which ends here.' -replace '`e', [char] 0x1b
# \x1b is a regex escape sequence that expands to an ESC char.
$formattedStr -replace '\x1b\[\d*m'
Generally speaking, it's advisable to look for options on programs producing such for-display-formatted strings to make them output plain-text strings instead, so that the need to strip escape sequences after the fact doesn't even arise.

Modifying a QString that contains a "\"

I'm trying to modify a QString. The Qstring that I'm trying to modify is
"\002"
However when I try to modify it, the string either gets entirely deleted or shows no change.
I've tried
String.split("\"");
String.remove("\"");
String.remove(QChar('\'');
for some reason Qt requires that I add an extra " or ' in order to compile and not produce errors
What I currently have is this
string = pointer->data.info.get_type();
which according to the debugger returns "\002"
string = string.remove(QChar('\''));
the remove functionality does nothing afterwards.
I'm expecting to remove the \ from the string, but either it gets entirely deleted or nothing happens. What could be the problem and how do I modify the Qstring to just be the numerical values?

You're currently asking Qt to remove " from your string, not \. To remove \, you'll have to escape it, just like you escaped ", i.e. remove("\\").

First of all your string "\002" do not contain any slash, quotes or apostrophes.
Read about C++ string literals. This is escape sequence.
Note \nnn represents arbitrary octal value!
So your literal contains only one character of value decimal value 2! This is ASCII spatial code meaning: STX (start of text)
As a result this code:
String.split("\"");
String.remove("\"");
String.remove(QChar('\'');
won't split or anything since this string do not contain quote characters or apostrophe. It also do not tries split or remove slash character, since again this is an escape sequence, but different kind.
Now remember that debugger shows you this unprintable characters in escaped form to show you actual content. In live application user will see nothing or some strange glyph.

How to use VI to remove ocurance of character on lines matching regex?

I'm trying to change the case of method names for some functions from lowercase_with_underscores to lowerCamelCase for lines that begin with public function get_method_name(). I'm struggling to get this done in a single step.
So far I have used the following
:%s/\(get\)\([a-zA-Z]*\)_\(\w\)/\1\2\u\3/g
However, this only replaces one _ character at a time. What I would like it a search and replace that does something like the following:
Identify all lines containing the string public function [gs]et.
On these lines, perform the following search and replace :s/_\(\w\)/\u\1/g
(
EDIT:
Suppose I have lines get_method_name() and set_method_name($variable_name) and I only want to change the case of the method name and not the variable name, how might I do that? The get_method_name() is more simple of course, but I'd like a solution that works for both in a single command. I've been able to use :%g/public function [gs]et/ . . . as per the solution listed below to solve for the get_method_name() case, but unfortunately not the set_method_name($variable_name) case.

If I've understood you correctly, I don't know why the things you've tried haven't worked but you can use g to perform a normal mode command on lines matchings a pattern.
Your example would be something like:
:%g/public function [gs]et/:s/_\(\w\)/\u\1/g
Update:
To match only the method names, we can use the fact that there will only be method names before the first $, as this looks to be PHP.
To do that, we can use a negative lookbehind, #<!:
:%g/public function [gs]et/:s/\(\$.\+\)\#<!_\(\w\)/\u\2/g
This will look behind #<! for any $ followed by any number of characters and only match _\(\w\) if no $s are found.
Bonus points(?):
To do this for multiple buffers stick a bufdo in front of the %g

You want to use a substitute with an expression (:h sub-replace-expression)
Match the complete string you want to process then pass that string to a second substitute command to actually change the string
:%s/\(get\|set\)\zs_\w\+/\=substitute(submatch(0), '_\([A-Za-z]\)', '\U\1', 'g')
Running the above on
get_method_name($variable_name)
set_method_name($variable_name)
returns
getMethodName($variable_name)
setMethodName($variable_name)

To have vi do replace sad with happy, on all lines, in a file:
:1, $ s/sad/happy/g
(It is the :1, $ before the sed command that instructs vi to execute the command on every line in the file.)

Replace a pattern based off of the integer in the pattern in Vim

I'm trying to convert a bunch of .textile files into their equivalent .markdown files.
I would like a vim search/replace command to replace all h1., h2., h3., etc. patterns with the associated number of # characters. So, h1. would become #, h2. would be come ## and so forth.
I think what I want to use is the \=repeat command, but I'm a bit lost as to what arguments to pass it.
Here is what I have so far. It replaces the correct matches, but it just deletes them and gives me errors:
:1,$s/h\d./\=repeat('#',submatch(0))
What are the proper arguments to pass to the \=repeat command?

this line may help you:
%s/\vh(\d)\./\=repeat('#',submatch(1))
you used submatch(0), it was the whole matched string : h and number and any char (here you had another problem, you should escape the period ), so it won't do what you were expecting.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Calling function during regular expression replacement - regex

Related

Ruby Regex on Active Directory String

Powershell: Replace all occurrences of different substrings starting with same Unicode char (Regex?)

Modifying a QString that contains a "\"

How to use VI to remove ocurance of character on lines matching regex?

Replace a pattern based off of the integer in the pattern in Vim

Categories

Resources