regular expression to parse LDAP dn - regex

I have the following string:
cn=abcd,cn=groups,dc=domain,dc=com
Can a regular expression be used here to extract the string after the first cn= and before the first ,? In the example above the answer should be abcd.

/cn=([^,]+),/
most languages will extract the match as $1 or matches[1]
If you can't for some reason wield subscripts,
$x =~ s/^cn=//
$x =~ s/,.*$//
Thats a way to do it in 2 steps.
If you were parsing it out of a log with sed
sed -n -r '/cn=/s/^cn=([^,]+),.*$/\1/p' < logfile > dumpfile
will get you what you want. ( Extra commands added to only print matching lines )

/^cn=([^,]+),/

Also, look for a pre-built LDAP parser.

Yeah, using perl/java syntax cn=([^,]*),. You'd then get the 1st group.

I had to work that out in PHP.
Since a LDAP string can sometimes be lengthy and have many attributes, I thought of contributing how I am using it in a project.
I wanted to use:
CN=username,OU=UNITNAME,OU=Region,OU=Country,DC=subdomain,DC=domain,DC=com
And turn it into:
array (
[CN] => array( username )
[OU] => array( UNITNAME, Region, Country )
[DC] => array ( subdomain, domain, com )
)
Here is how I built my method.
/**
* Read a LDAP DN, and return what is needed
*
* Takes care of the character escape and unescape
*
* Using:
* CN=username,OU=UNITNAME,OU=Region,OU=Country,DC=subdomain,DC=domain,DC=com
*
* Would normally return:
* Array (
* [count] => 9
* [0] => CN=username
* [1] => OU=UNITNAME
* [2] => OU=Region
* [5] => OU=Country
* [6] => DC=subdomain
* [7] => DC=domain
* [8] => DC=com
* )
*
* Returns instead a manageable array:
* array (
* [CN] => array( username )
* [OU] => array( UNITNAME, Region, Country )
* [DC] => array ( subdomain, domain, com )
* )
*
*
* #author gabriel at hrz dot uni-marburg dot de 05-Aug-2003 02:27 (part of the character replacement)
* #author Renoir Boulanger
*
* #param string $dn The DN
* #return array
*/
function parseLdapDn($dn)
{
$parsr=ldap_explode_dn($dn, 0);
//$parsr[] = 'EE=Sôme Krazï string';
//$parsr[] = 'AndBogusOne';
$out = array();
foreach($parsr as $key=>$value){
if(FALSE !== strstr($value, '=')){
list($prefix,$data) = explode("=",$value);
$data=preg_replace("/\\\([0-9A-Fa-f]{2})/e", "''.chr(hexdec('\\1')).''", $data);
if(isset($current_prefix) && $prefix == $current_prefix){
$out[$prefix][] = $data;
} else {
$current_prefix = $prefix;
$out[$prefix][] = $data;
}
}
}
return $out;
}

Related

pattern with at least 8 characters/uppercase/number

currently I'm using symfony2 and I need a pattern with condition:
-min 8 characters, max 20 characters
-must contain 1 letter at least
-must contain 1 number at least
-may contain special characters like !##$%^&*()_+
I tried this code but it doen't work :
/**
* Encrypted password. Must be persisted.
* #Assert\Regex(
* pattern = "/^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[?!-‌​/_/=:;§]).{8,20}+$/i",
* htmlPattern="/^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[?!-‌​/_/=:;§]).{8,20}$/",
* match=true,
* message="message error ")
*
* #var string
*/
protected $password;
did you try something like that?
/**
* #Assert\Regex(
* pattern = "/^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9!##$%^&*()_]+){8,20}$"/",
*
* rest of options...
*
* )
*/
Thank yoou so much for your answer .
I had to override regex for $plainPassword instead of $password because I'm using FOSuserBundle .

Regex pattern to match groups starting with pattern

I am extract data from a text stream which is data structured as such
/1-<id>/<recType>-<data>..repeat n times../1-<id>/#-<data>..repeat n times..
In the above, the "/1" field precedes the record data which can then have any number of following fields, each with choice of recType from 2 to 9 (also, each field starts with a "/")
For example:
/1-XXXX/2-YYYY/9-ZZZZ/1-AAAA/3-BBBB/5-CCCC/8=NNNN/9=DDDD/1-QQQQ/2-WWWW/3=PPPP/7-EEEE
So, there are three groups of data above
1=XXXX 2=YYYY 9=ZZZZ
1=AAAA 3=BBBB 5=CCCC 8=NNNN 9=DDDD
1=QQQQ 2=WWWW 3=PPPP 7=EEEE
Data is for simplicity, I know for certain that its only contains [A-Z0-9. ] but can be variable length (not just 4 chars as per example)
Now, the following expression sort of works, but its only capturing the first 2 fields of each group and none of the remaining fields...
/1-(?'fld1'[A-Z]+)/((?'fldNo'[2-9])-(?'fldData'[A-Z0-9\. ]+))
I know I need some sort of quantifier in there somewhere, but I do not know what or where to place it.
You can use a regex to match these blocks using 2 .NET regex features: 1) capture collection and 2) multiple capturing groups with the same name in the pattern. Then, we'll need some Linq magic to combine the captured data into a list of lists:
(?<fldNo>1)-(?'fldData'[^/]+)(?:/(?<fldNo>[2-9])[-=](?'fldData'[^/]+))*
Details:
(?<fldNo>1) - Group fldNo matching 1
- - a hyphen
(?'fldData'[^/]+) - Group "fldData" capturing 1+ chars other than /
(?:/(?<fldNo>[2-9])[-=](?'fldData'[^/]+))* - zero or more sequences of:
/ - a literal /
(?<fldNo>[2-9]) - 2 to 9 digit (Group "fldNo")
[-=] - a - or =
(?'fldData'[^/]+)- 1+ chars other than / (Group "fldData")
See the regex demo, results:
See C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var str = "/1-XXXX/2-YYYY/9-ZZZZ/1-AAAA/3-BBBB/5-CCCC/8=NNNN/9=DDDD/1-QQQQ/2-WWWW/3=PPPP/7-EEEE";
var res = Regex.Matches(str, #"(?<fldNo>1)-(?'fldData'[^/]+)(?:/(?<fldNo>[2-9])[-=](?'fldData'[^/]+))*")
.Cast<Match>()
.Select(p => p.Groups["fldNo"].Captures.Cast<Capture>().Select(m => m.Value)
.Zip(p.Groups["fldData"].Captures.Cast<Capture>().Select(m => m.Value),
(first, second) => first + "=" + second))
.ToList();
foreach (var t in res)
Console.WriteLine(string.Join(" ", t));
}
}
I would suggest to first split the string by /1, then use a patern along these lines:
\/([1-9])[=-]([A-Z]+)
https://regex101.com/r/0nyzzZ/1
A single regex isn't the optimal tool for doing this (at least used in this way). The main reason is because your stream has a variable number of entries in it, and using a variable number of capture groups is not supported. I also noticed some of the values had "=" between them as well as the dash, which your current regex doesn't address.
The problem comes when you try and add a quantifier to a capture group - the group will only remember the last thing it captured, so if you add a quantifier, it will end up catching the first and last fields, leaving out all the rest of them. So something like this won't work:
\/1-(?'fld1'[A-Z]+)(?:\/(?'fldNo'[2-9])[-=](?'fldData'[A-Z]+))+
If your streams were all the same length, then a single regex could be used, but there's a way to do it using a foreach loop with a much simpler regex working on each part of your stream (so it verifies your stream as well when it goes along!)
Now I'm not sure what language you're working with when using this, but here is a solution in PHP that I think delivers what you need.
function extractFromStream($str)
{
/*
* Get an array of [num]-[letters] with explode. This will make an array that
* contains [0] => 1-AAAA, [1] => 2-BBBB ... etc
*/
$arr = explode("/", substr($str, 1));
$sorted = array();
$key = 0;
/*
* Sort this data into key->values based on numeric ordering.
* If the next one has a lower or equal starting number than the one before it,
* a new entry will be created. i.e. 2-aaaa => 1-cccc will cause a new
* entry to be made, just in case the stream doesn't always start with 1.
*/
foreach ($arr as $value)
{
// This will get the number at the start, and has the added bonus of making sure
// each bit is in the right format.
if (preg_match("/^([0-9]+)[=-]([A-Z]+)$/", $value, $matches)) {
$newKey = (int)$matches[1];
$match = $matches[2];
} else
throw new Exception("This is not a valid data stream!");
// This bit checks if we've got a lower starting number than last time.
if (isset($lastKey) && is_int($lastKey) && $newKey <= $lastKey)
$key += 1;
// Now sort them..
$sorted[$key][$newKey] = $match;
// This will be compared in the next iteration of the loop.
$lastKey = $newKey;
}
return $sorted;
}
Here's how you can use it...
$full = "/1-XXXX/2-YYYY/9-ZZZZ/1-AAAA/3-BBBB/5-CCCC/8=NNNN/9=DDDD/1-QQQQ/2-WWWW/3=PPPP/7-EEEE";
try {
$extracted = extractFromStream($full);
$stream1 = $extracted[0];
$stream2 = $extracted[1];
$stream3 = $extracted[2];
print "<pre>";
echo "Full extraction: \n";
print_r($extracted);
echo "\nFirst Stream:\n";
print_r($stream1);
echo "\nSecond Stream:\n";
print_r($stream2);
echo "\nThird Stream:\n";
print_r($stream3);
print "</pre>";
} catch (Exception $e) {
echo $e->getMessage();
}
This will print
Full extraction:
Array
(
[0] => Array
(
[1] => XXXX
[2] => YYYY
[9] => ZZZZ
)
[1] => Array
(
[1] => AAAA
[3] => BBBB
[5] => CCCC
[8] => NNNN
[9] => DDDD
)
[2] => Array
(
[1] => QQQQ
[2] => WWWW
[3] => PPPP
[7] => EEEE
)
)
First Stream:
Array
(
[1] => XXXX
[2] => YYYY
[9] => ZZZZ
)
Second Stream:
Array
(
[1] => AAAA
[3] => BBBB
[5] => CCCC
[8] => NNNN
[9] => DDDD
)
Third Stream:
Array
(
[1] => QQQQ
[2] => WWWW
[3] => PPPP
[7] => EEEE
)
So you can see you have the numbers as the array keys, and the values they correspond to, which are now readily accessible for further processing. I hope this helps you :)

Regex expression symbol without space

Here is my regex: {{[^\{\s}]+\}}
And my input is {{test1}}{{test2}}{{test3}}.
How can I get these 3 tests by array using regex expression?
I would use: ~\{\{([^}]+?)\}\}~
and accessing array depends on your language!
[EDIT] add explanations
~: delimiter
\{\{, \}\}~: match characters literally. Should be
escaped.
[^}]: match anything inside {{}} until a }
+: repeat
pattern multiple times (for multiple characters)
?: is for 'lazy'
to match as few times as possible.
(): is to capture
:)
[EDIT] add PHP code sample for matching illustration:
<?php
$string= "{{test1}}{{test2}}{{test3}}";
if (preg_match_all("~\{\{([^}]+?)\}\}~s", $string, $matches))
{
print_r(array($matches));
// Do what you want
}
?>
will output this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => {{test1}}
[1] => {{test2}}
[2] => {{test3}}
)
[1] => Array
(
[0] => test1
[1] => test2
[2] => test3
)
)
)
test[0-9]+
This matches all occurences of testX where X is an integer of any size.
If you're trying to identify the braces instead, use this:
[{\}]
C# uses Matches Method returns MatchCollection object.
Here is some codes,
Regex r = new Regex(#"{{[^{\s}]+}}");
MatchCollection col = r.Matches("{{test1}}{{test2}}{{test3}}");
string[] arr = null;
if (col != null)
{
arr = new string[col.Count];
for (int i = 0; i < col.Count; i++)
{
arr[i] = col[i].Value;
}
}

Regex - Ignore some parts of string in match

Here's my string:
address='St Marks Church',notes='The North East\'s premier...'
The regex I'm using to grab the various parts using match_all is
'/(address|notes)='(.+?)'/i'
The results are:
address => St Marks Church notes => The North East\
How can I get it to ignore the \' character for the notes?
Not sure if you're wrapping your string with heredoc or double quotes, but a less greedy approach:
$str4 = 'address="St Marks Church",notes="The North East\'s premier..."';
preg_match_all('~(address|notes)="([^"]*)"~i',$str4,$matches);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => address="St Marks Church"
[1] => notes="The North East's premier..."
)
[1] => Array
(
[0] => address
[1] => notes
)
[2] => Array
(
[0] => St Marks Church
[1] => The North East's premier...
)
)
Another method with preg_split:
//split the string at the comma
//assumes no commas in text
$parts = preg_split('!,!', $string);
foreach($parts as $key=>$value){
//split the values at the = sign
$parts[$key]=preg_split('!=!',$value);
foreach($parts[$key] as $k2=>$v2){
//trim the quotes out and remove the slashes
$parts[$key][$k2]=stripslashes(trim($v2,"'"));
}
}
Output looks like:
Array
(
[0] => Array
(
[0] => address
[1] => St Marks Church
)
[1] => Array
(
[0] => notes
[1] => The North East's premier...
)
)
Super slow old-skool method:
$len = strlen($string);
$key = "";
$value = "";
$store = array();
$pos = 0;
$mode = 'key';
while($pos < $len){
switch($string[$pos]){
case $string[$pos]==='=':
$mode = 'value';
break;
case $string[$pos]===",":
$store[$key]=trim($value,"'");
$key=$value='';
$mode = 'key';
break;
default:
$$mode .= $string[$pos];
}
$pos++;
}
$store[$key]=trim($value,"'");
Because you have posted that you are using match_all and the top tags in your profile are php and wordpress, I think it is fair to assume you are using preg_match_all() with php.
The following patterns will match the substrings required to buildyour desired associative array:
Patterns that generate a fullstring match and 1 capture group:
/(address|notes)='\K(?:\\\'|[^'])*/ (166 steps, demo link)
/(address|notes)='\K.*?(?=(?<!\\)')/ (218 steps, demo link)
Patterns that generate 2 capture groups:
/(address|notes)='((?:\\\'|[^'])*)/ (168 steps, demo link)
/(address|notes)='(.*?(?<!\\))'/ (209 steps, demo link)
Code: (Demo)
$string = "address='St Marks Church',notes='The North East\'s premier...'";
preg_match_all(
"/(address|notes)='\K(?:\\\'|[^'])*/",
$string,
$out
);
var_export(array_combine($out[1], $out[0]));
echo "\n---\n";
preg_match_all(
"/(address|notes)='((?:\\\'|[^'])*)/",
$string,
$out,
PREG_SET_ORDER
);
var_export(array_column($out, 2, 1));
Output:
array (
'address' => 'St Marks Church',
'notes' => 'The North East\\\'s premier...',
)
---
array (
'address' => 'St Marks Church',
'notes' => 'The North East\\\'s premier...',
)
Patterns #1 and #3 use alternatives to allow non-apostrophe characters or apostrophes not preceded by a backslash.
Patterns #2 and #4 (will require an additional backslash when implemented with php demo) use lookarounds to ensure that apostrophes preceded by a backslash don't end the match.
Some notes:
Using capture groups, alternatives, and lookarounds often costs pattern efficiency. Limiting the use of these components often improves performance. Using negated character classes with greedy quantifiers often improves performance.
Using \K (which restarts the fullstring match) is useful when trying to reduce capture groups and it reduces the size of the output array.
You should match up to an end quote that isn't preceded by a backslash thus:
(address|notes)='(.*?)[^\\]'
This [^\\] forces the character immediately preceding the ' character to be anything but a backslash.

Processing a Comma Separated List Before Shunting-Yard

So I'm processing some math from XML strings using the Shunting-Yard algorithm. The trick is that I want to allow the generation of random values by using comma separated lists. For example...
( ( 3 + 4 ) * 12 ) * ( 2, 3, 4, 5 ) )
I've already got a basic Shunting-Yard processor working. But I want to pre-process the string to randomly pick one of the values from the list before processing the expression. Such that I might end up with:
( ( 3 + 4 ) * 12 ) * 4 )
The Shunting-Yard setup is already pretty complicated, as far as my understanding is concerned, so I'm hesitant to try to alter it to handle this. Handling that with error checking sounds like a nightmare. As such, I'm assuming it would make sense to look for that pattern beforehand? I was considering using a regular expression, but I'm not one of "those" people... though I wish that I was... and while I've found some examples, I'm not sure how I might modify them to check for the parenthesis first? I'm also not confident that this would be the best solution.
As a side note, if the solution is regex, it should be able to match strings (just characters, no symbols) in the comma list as well, as I'll be processing for specific strings for values in my Shunting-Yard implementation.
Thanks for your thoughts in advance.
This is easily solved using two regexes. The first regex, applied to the overall text, matches each parenthesized list of comma separated values. The second regex, applied to each of the previously matched lists, matches each of the values in the list. Here is a PHP script with a function that, given an input text having multiple lists, replaces each list with one of its values randomly chosen:
<?php // test.php 20110425_0900
function substitute_random_value($text) {
$re = '/
# Match parenthesized list of comma separated words.
\( # Opening delimiter.
\s* # Optional whitespace.
\w+ # required first value.
(?: # Group for additional values.
\s* , \s* # Values separated by a comma, ws
\w+ # Next value.
)+ # One or more additional values.
\s* # Optional whitespace.
\) # Closing delimiter.
/x';
// Match each parenthesized list and replace with one of the values.
$text = preg_replace_callback($re, '_srv_callback', $text);
return $text;
}
function _srv_callback($matches_paren) {
// Grab all word options in parenthesized list into $matches.
$count = preg_match_all('/\w+/', $matches_paren[0], $matches);
// Randomly pick one of the matches and return it.
return $matches[0][rand(0, $count - 1)];
}
// Read input text
$data_in = file_get_contents('testdata.txt');
// Process text multiple times to verify random replacements.
$data_out = "Run 1:\n". substitute_random_value($data_in);
$data_out .= "Run 2:\n". substitute_random_value($data_in);
$data_out .= "Run 3:\n". substitute_random_value($data_in);
// Write output text
file_put_contents('testdata_out.txt', $data_out);
?>
The substitute_random_value() function calls the PHP preg_replace_callback() function, which matches and replaces each list with one of the values in the list. It calls the _srv_callback() function which randomly picks out one of the values and returns it as the replacement value.
Given this input test data (testdata.txt):
( ( 3 + 4 ) * 12 ) * ( 2, 3, 4, 5 ) )
( ( 3 + 4 ) * 12 ) * ( 12, 13) )
( ( 3 + 4 ) * 12 ) * ( 22, 23, 24) )
( ( 3 + 4 ) * 12 ) * ( 32, 33, 34, 35 ) )
Here is the output from one example run of the script:
Run 1:
( ( 3 + 4 ) * 12 ) * 5 )
( ( 3 + 4 ) * 12 ) * 13 )
( ( 3 + 4 ) * 12 ) * 22 )
( ( 3 + 4 ) * 12 ) * 35 )
Run 2:
( ( 3 + 4 ) * 12 ) * 3 )
( ( 3 + 4 ) * 12 ) * 12 )
( ( 3 + 4 ) * 12 ) * 22 )
( ( 3 + 4 ) * 12 ) * 33 )
Run 3:
( ( 3 + 4 ) * 12 ) * 3 )
( ( 3 + 4 ) * 12 ) * 12 )
( ( 3 + 4 ) * 12 ) * 23 )
( ( 3 + 4 ) * 12 ) * 32 )
Note that this solution uses \w+ to match values consisting of "word" characters, i.e. [A-Za-z0-9_]. This can be easily changed if this does not meet your requirements.
Edit: Here is a Javascript version of the substitute_random_value() function:
function substitute_random_value(text) {
// Replace each parenthesized list with one of the values.
return text.replace(/\(\s*\w+(?:\s*,\s*\w+)+\s*\)/g,
function (m0) {
// Capture all word values in parenthesized list into values.
var values = m0.match(/\w+/g);
// Randomly pick one of the matches and return it.
return values[Math.floor(Math.random() * values.length)];
});
}