Ruby Regex on Active Directory String - regex

I have a string that represents multiple DNs for Active Directory but has been separated by commas instead of ;
The String:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
I am trying to write a regex that will match on both ou=App1 and not the ou=App2 but then also make the , after dc=internal become a ;
Is this possible?
The result would be:
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;

Using #strip and #sub to Clean Up Your LDIF Data
Really, the "correct" answer would be to get valid LDIF in the first place, and then parse it as such with a gem like Net::LDAP. However, the changes you want to your existing file are fairly trivial. For example, we'll start by assigning the String data from your question to a variable named ldif using a here-document literal:
ldif = <<~'LDIF'
CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
LDIF
You can now modify and match the lines from the String that you want with String#each_line to iterate, and String#gsub and a Regexp lookahead assertion to find and collect the lines you want using Array#select on the output from #each_line, and storing the results into a matching_apps Array.
This all sounds much more complicated than it is. Consider the following method chain, which is really just a one-liner wrapped for readability:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, ";" }
#=>
["CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;",
"CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;"]
The use of String#strip and String#sub will help to ensure that all lines are normalized the way you want, including the trailing semicolons. However, this is likely to cause problems in subsequent steps, so I'd probably recommend removing those trailing semicolons as well.
Note: You can stop reading here if you just want to solve your immediate question as originally posted. The rest of the answer covers additional considerations related to data normalization, and provides some examples on how and why you might want to strip the semicolons as well.
Why and How to Normalize without Semicolons
You can replace the final substitution from #sub with an empty String (e.g. "") to remove the trailing semicolons (if present). Normalizing without the semicolons now may save you the trouble of having to clean up those lines again later when you iterate over the Array of results stored in matching_apps from Array#select.
For example, if you need to rejoin lines with commas, interpolate the lines within other String objects in subsequent steps, or do anything where those stored semicolons may be an unexpected surprise it's better to deal with it sooner rather than later. If you really need the trailing semicolons, it's very easy to use String#concat or other forms of String interpolation to add them back, but having unexpected characters in a String can be a source of unexpected bugs that are best avoided unless you're sure you'll always need that semicolon at the end.
Example 1: Output Where Semicolons Might be Unexpected
For example, suppose you want to use the results to format output for a command-line client where a trailing semicolon wouldn't be expected. The following works nicely because the semicolons are already stripped:
matching_apps =
ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
.map { _1.strip.sub /[,;]$/, "" }
printf "Make the following calls:\n\n"
matching_apps.each_with_index do |dn, idx|
puts %(#{idx.succ}. ldapsearch -D '#{dn}' [opts])
end
This would print out:
Make the following calls:
1. ldapsearch -D 'CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
2. ldapsearch -D 'CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
without having to first strip any trailing semicolons that might not work with the printed command, tool, or other output.
Examples of Rejoining with Commas and Semicolons
On the other hand, you can just as easily rejoin the Array elements with a comma or semicolon if you want. Consider the following two examples:
matching_apps.join ", "
#=> "CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal, CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal"
p format("(%s)", matching_apps.join("; "))
#=> "(CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal; CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal)"
Keep Flexibility in Mind
If the String objects in your Array still had the trailing semicolons, you'd have to do something about them. So, unless you already know what you plan to do with each String, and whether or not the semicolons will be needed, it's probably best to keep them out of matching_apps in the first place to optimize for flexibility. That's just an opinion, to be sure, but definitely one worth considering.

Related

complex search/delete/move/replace operation using sed?

after several hours of searching and experimenting, I'm hoping someone can either help me or rub my nose in a post I've missed which acctually would be helpful as well come to think of it...
Problem:
I've made a quick&dirty fix in several dozens of php scripts (we use to enhance smarty capabilities) with security checks.
Example of input(part1):
///// SMARTY AUTH /////
$auth['model'] = isset($params['model']) ? $params['model'] : null;
$auth['requiredLevel'] = isset($params['requiredlevel']) ? $params['requiredlevel'] : null;
$auth['baseAuthorizationLevel'] = isset($params['_authorizationlevel']) ? $params['_authorizationlevel'] : null;
$auth['defaultRequiredLevel'] = AuthorizationLevel::AULE_WRITE;
$auth['baseModel'] = $smarty->getTemplateVars('model');
///// SMARTY AUTH /////
...which i'd like to replace with a much cleaner solution we've come up with. Now here's the rub; in one section of the file there's a block of lines, luckily with very distinct delimiter lines, but in one of those lines is a piece of code that needs to be merged with a replacement string which replaces a second pattern in a line which follows the before-said block, with optionally a variable number of lines in between.
I'm having trouble figuring out how to piece this nested code together as the shorthand code of sed is quite confusing to me.
So far I've tried to assemble the code needed to capture the first block, but sed keeps giving me the same error each time; extra characters after command
here are some of the attempts I've made:
sed -n 'p/^\/\/\/\/\/ SMARTY AUTH \/\/\/\/\/\\n.*\\n.*\\n.*\\n.*AULE_\([A-Z_]*\);$^.*$^^\/\/\/\/\/ SMARTY AUTH \/\/\/\/\/$/' function.xls_form.php
sed -n 'p/\(^.*SMARTY AUTH.*$^.*$^.*$^.*$^.*AULE_\([A-Z_]*\);$^.*$^.*SMARTY AUTH.*$/' function.xls_form.php
the second part is relatively easy compared to the first;
sed -ei'.orig' 's/RoleContextAuthorizations::smartyAuth(\$auth)/$smarty->hasAccess(\$params,AuthorizationLevel::AULE_\1)/' *.php
where \1 would be the matched snippet from the first part...
Edit:
The first codeblock is an example of input part 1 which needs to be removed; part 2 is RoleContextAuthorizations::smartyAuth($auth) which needs to be replaced with $smarty->hasAccess($params, AuthorizationLevel::AULE_<snippet from part1>)
/edit
Hoping somebody can point me in the right direction, Many thanks in advance!!!
The hold space is going to be key to solving this. You can copy material from the pattern space (where sed normally works) into the hold space, and do various operations with the hold space, etc.
You need to find the AuthorizationLevel::AULE_WRITE type text within the block markers, and copy that to the hold space, and then delete the text within the block markers. And then separately find the other pattern and replace it with information from the hold space.
Given that the markers use slashes, it is also time to use a custom search marker which is introduced by a backslash. The following could be in a file script.sed, to be used as:
sed -f script.sed function.xls_form.php
When you're sure it's working, you can play with -i options to overwrite the original.
\%///// SMARTY AUTH /////%,\%///// SMARTY AUTH /////% {
/.*\(AuthorizationLevel::AULE_[A-Z]\{1,\}\).*/{
s//$smarty->hasAccess($params,\1);/
x
}
d
}
/RoleContextAuthorizations::smartyAuth($auth)/x
The first line searches for the start and end marker, using \% to change the delimiter to %. There's then a group of actions in braces. The second line searches for the authorization level and starts a second group of actions. The substitute command replaces the line with the desired output line. The x swaps the pattern space and the hold space, copying the desired output line to the hold space (and copying the empty hold space to the pattern space — it's x for eXchange pattern and hold spaces). This has saved the AuthorizationLevel information. The inner block ends; the outer block deletes the line and continues the execution. Note that there's no need to escape the $ symbol most of the time — it would matter if it was at the end of a pattern (there's a difference between /a\$/ and /a$/, but no difference between /b$c/ and /b\$c/).
The last line then looks for the RoleContextAuthorizations line and swaps it with the hold space. Everything else is just let through.
Given a data file containing:
Gibberish
Rhubarb
///// SMARTY AUTH /////
$auth['model'] = isset($params['model']) ? $params['model'] : null;
$auth['requiredLevel'] = isset($params['requiredlevel']) ? $params['requiredlevel'] : null;
$auth['baseAuthorizationLevel'] = isset($params['_authorizationlevel']) ? $params['_authorizationlevel'] : null;
$auth['defaultRequiredLevel'] = AuthorizationLevel::AULE_WRITE;
$auth['baseModel'] = $smarty->getTemplateVars('model');
///// SMARTY AUTH /////
More gibberish
More rhubarb - it is good with strawberries, especially in yoghurt
RoleContextAuthorizations::smartyAuth($auth);
Trailing gibbets — ugh; worse are trailing giblets
Finish - EOF
The output from sed -f script.sed data is:
$ sed -f script.sed data
Gibberish
Rhubarb
More gibberish
More rhubarb - it is good with strawberries, especially in yoghurt
$smarty->hasAccess($params,AuthorizationLevel::AULE_WRITE);
Trailing gibbets — ugh; worse are trailing giblets
Finish - EOF
$
I think that's what was wanted.
You can convert the file of sed script into a single line of gibberish, but that's left as an exercise for the reader — it isn't very hard, but GNU sed and BSD (macOS) sed have different rules for when you need semicolons as part of a single line command; you were warned. There are also differences in the rules for the -i option between the GNU and BSD variants of sed.
If you have to preserve some portions of the RoleContextAuthorizations::smartyAuth line, you have to work harder, but it can probably be done. For example, you can add the hold space to the current pattern space with the G command, and then edit the information into the right places. It is simplest if every place the line occurs needs to look the same apart from the AULE_XYZ string — that's what I've assumed here.
Also, note that using x rather than h or g is lazy — but doesn't matter if there's only one RoleContextAuthorizations::smartyAuth line. Using the alternatives would mean that if a file has multiple RoleContextAuthorizations::smartyAuth lines, then you'd be able to make the same substitution in each, unless there's another ///// SMARTY AUTH ///// in the file.

Escape white space in boost::fs::path

What it says on the tin. Is there a cleverer way to replace white spaces in a boost::fs::path that does not require a regex?
EDIT as an example:
_appBundlePath = boost::fs::path("/path/with spaces/here");
regex space(" ");
string sampleFilename = regex_replace((_appBundlePath/"audio/samples/C.wav").string(), space, "\\ ");
Question: is there a way that avoids using a regex? Seems like an overkill to me.
EDIT 2 My issue is when passing a string to Pure Data via libpd. PD will interpret a space as a separator, so my string will be chopped up into multiple symbols. Surrounding it with double quotes won't work, and I'm not even sure that escaping white space would, but it's worth a shot.
The cleverest way is not to do it.
For example, use execve instead of system (so you can pass arguments in an array, no need for shell escaping). See e.g. How can I escape variables sent to the 'system' command in C++?
Or, if you e.g. talk to a database server, do not concatenate your queries but bind parameters into a prepared statement. Again this precludes the need for any escaping.
Avoiding escaping avoids a whole slew of security issues (RCE, SQLi etc.)
If you must, probably just do
"'" + replace_all(path.string(), "'", "''") + "'"
This would be fine for e.g. bash shells
For anything else, find out which characters need escaping and use the existing library functions that suit the goal, e.g.
http://en.cppreference.com/w/cpp/io/manip/quoted
https://dev.mysql.com/doc/refman/5.7/en/mysql-real-escape-string.html
... etc.

How can I use Regex to parse irregular CSV and not select certain characters

I have to handle a weird CSV format, and I have been running into problems. The string I have been able to work out thus far is
(?:\s*(?:\"([^\"]*)\"|([^,]+))\s*?)+?
My files are often broken and irregular, since we have to deal with OCR'd text which is usually not checked by our users. Therefore, we tend to end up with lots of weird things, like a single " within a field, or even a newline character(which is why I am using Regex instead of my previous readLine()-based solution). I've gotten it to parse most everything correctly, except it captures [,] [,]. How can I get it to NOT select fields with only a single comma? When I try and have it not select commas, it turns "156,000" into [156] and [000]
The test string I've been using is
"156,000","",""i","parts","dog"","","Monthly "running" totals"
The ideal desire capture output is
[156,000],[],[i],[parts],[dog],[],[Monthly "running" totals]
I can do with or without the internal quotes, since I can always just strip them during processing.
Thank you all very much for your time.
Your CSV is indeed irregular and difficult to parse. I suggest you do 2 replacements first to your data.
// remove all invalid double ""
input = Regex.Replace(input, #"(?<!,|^)""(?=,|$)|(?<=,)""(?!,|$)", "\"");
// now escape all inner "
input = Regex.Replace(input, #"(?<!,|^)"(?!,|$)", #"\\\"");
// at this stage your have proper CSV data and I suggest using a good .NET csv parser
// to parse your data and get individual values
Replacement 1 demo
Replacement 2 demo

TCL: Backslash issue (regsub)

I have an issue while trying to read a member of a list like \\server\directory
The issue comes when I try to get this variable using the lindex command, that proceeds with TCL substitution, so the result is:
\serverdirectory
Then, I think I need to use a regsub command to avoid the backslash substitution, but I did not get the correct proceedure.
An example of what I want should be:
set mistring "\\server\directory"
regsub [appropriate regular expresion here]
puts "mistring: '$mistring'" ==> "mistring: '\\server\directory'"
I have checked some posts around this, and keep the \\ is ok, but I still have problems when trying to keep always a single \ followed by any other character that could come here.
UPDATE: specific example. What I am actually trying to keep is the initial format of an element in a list. The list is received by an outer application. The original code is something like this:
set mytable $__outer_list_received
puts "Table: '$mytable'"
for { set i 0 } { $i < [llength $mitabla] } { incr i } {
set row [lindex $mytable $i]
puts "Row: '$row'"
set elements [lindex $row 0]
puts "Elements: '$elements'"
}
The output of this, in this case is:
Table: '{{
address \\server\directory
filename foo.bar
}}'
Row: '{
address \\server\directory
filename foo.bar
}'
Elements: '
address \\server\directory
filename foo.bar
'
So I try to get the value of address (in this specific case, \\server\directory) in order to write it in a configuration file, keeping the original format and data.
I hope this clarify the problem.
If you don't want substitutions, put the problematic string inside curly braces.
% puts "\\server\directory"
\serverdirectory
and it's not what you want. But
% puts {\\server\directory}
\\server\directory
as you need.
Since this is fundamentally a problem on Windows (and Tcl always treats backslashes in double-quotes as instructions to perform escaping substitutions) you should consider a different approach (otherwise you've got the problem that the backslashes are gone by the time you can apply code to “fix” them). Luckily, you've got two alternatives. The first is to put the string in {braces} to disable substitutions, just like a C# verbatim string literal (but that uses #"this" instead). The second is perhaps more suitable:
set mistring [file nativename "//server/directory"]
That ensures that the platform native directory separator is used on Windows (and nowadays does nothing on other platforms; back when old MacOS9 was supported it was much more magical). Normally, you only need this sort of thing if you are displaying full pathnames to users (usually a bad idea, GUI-wise) or if you are passing the name to some API that doesn't like forward slashes (notably when going as an argument to a program via exec but there are other places where the details leak through, such as if you're using the dde, tcom or twapi packages).
A third, although ugly, option is to double the slashes. \\ instead of \, and \ instead of \, while using double quotes. When the substitution occurs it should give you what you want. Of course, this will not help much if you do the substitution a second time.

RegEx for VB.net

I have a txt file with content
$NETS
P3V3_AUX_LGATE; PQ6.8 PU37.2
U335_PIN1; R3328.1 U335.1
$END
need to be updated in this format, and save back to another txt file
$NETS
'P3V3_AUX_LGATE'; PQ6.8 PU37.2
'U335_PIN1'; R3328.1 U335.1
$END
NOTE: number of lines may go up to 10,000 lines
My current solution is to read the txt file line by line, detect the presence of the ";" and newline character and do the changes.
Right now i have a variable that holds ALL the lines, is there other way something like Replace via RegEx to do the changes without looping thru each line, this way i can readily print the result
and follow up question, which one is more efficient?
Try
ResultString = Regex.Replace(SubjectString, "^([^;\r\n]+);", "'$1';", RegexOptions.Multiline)
on your multiline string.
This will find any string (length one or more) at the start of a line up until the first semicolon if there is one and replace it with its quoted equivalent.
It should be more efficient than looping through the string line by line as you're doing now, but if you're in doubt, you'd have to profile it.
You could probably find all the matches using something like \w+; but I don't know how you'd be able to do a replace on that using Regex.Replace to add the 's but keep the original match.
However, if you already have it as one variable, you don't have to read the file again, either you could make your code find all ;s and then find the previous newline for each, or you could use a String.Split on newlines to split the variable you've already got into lines.
And if you want to get it back to one variable you can just use String.Join.
Personally I'd normally use the String.Split (and possibly the String.Join if needed) method, since I think that would make the code easy to read.
I would say Yes! this can be done with Regular expressions. Make sure you got the "multiline" option turned on and craft your regular expression using some capture groups to ease the work.
I can however say this will NOT be the optimal one. Since you mention the amount of lines you could be processing, it seems 'resource wise' smarter to use a streaming approach instead of the in memory approach.
Taking the Regex approach (and this took 15 mins so please don't think this is an optimal solution, just prove it would work)
private static Regex matcher = new Regex(#"^\$NETS\r\n(?<entrytitle>.[^;]*);\s*(?<entryrest>.*)\r\n(?<entrytitle2>.[^;]*);\s*(?<entryrest2>.*)\r\n\$END\r\n", RegexOptions.Compiled | RegexOptions.Multiline);
static void Main(string[] args)
{
string newString = matcher.Replace(ExampleFileContent, new MatchEvaluator(evaluator));
}
static string evaluator(Match m)
{
return String.Format("$NETS\r\n'{0}'; {1}\r\n'{2}'; {3}\r\n$END\r\n",
m.Groups["entrytitle"].Value,
m.Groups["entryrest"].Value,
m.Groups["entrytitle2"].Value,
m.Groups["entryrest2"].Value);
}
Hope this helps,