Augeas: How to match dash? - regex

Want to write a lens for duply-exclude Files. Example:
+ /etc
- /
So my lens looks like this:
module DuplyExclude =
let nl = del /[\n]+/ "\n"
let entry = [ label "entry" . [ label "op" . store /(\+|-)/ ] . del /[ \t]+/ " " . [ label "path" . store /\/[^ \t\n\r]+/ ] ]
let lns = ( entry . nl )*
test lns get "+ /hello\n+ /etc\n- /" = ?
This results in an error. I know from experimenting a bit, that the regular expression /(\+|-)/ does not match the second line. The question is: Why the dash seems to be not matchable, even if escaped by \?

There are two reasons for this:
The test string is missing a trailing \n. This is important as lns is defined as having an entry followed by an unconditional new line. Note that this only really affects string tests with augparse because when loading files via the library, it adds a trailing \n to any file read in (since many lenses can't handle a missing EOL).
The path node is defined as matching a single / followed by at least one (+) other character in store /\/[^ \t\n\r]+/. This won't match a single / entry.
So with these two changes, this lens works:
module DuplyExclude =
let nl = del /[\n]+/ "\n"
let entry = [ label "entry" . [ label "op" . store /(\+|-)/ ] . del /[ \t]+/ " " . [ label "path" . store /\/[^ \t\n\r]*/ ] ]
let lns = ( entry . nl )*
test lns get "+ /hello\n+ /etc\n- /\n" = ?
Test result: /tmp/duplyexclude.aug:6.2-.44:
{ "entry"
{ "op" = "+" }
{ "path" = "/hello" }
}
{ "entry"
{ "op" = "+" }
{ "path" = "/etc" }
}
{ "entry"
{ "op" = "-" }
{ "path" = "/" }
}

Related

RegEx for computer name validation (cannot be more than 15 characters long, be entirely numeric, or contain the following characters...)

I have these requirements to follow:
Windows computer name cannot be more than 15 characters long, be
entirely numeric, or contain the following characters: ` ~ ! # # $ % ^
& * ( ) = + _ [ ] { } \ | ; : . ' " , < > / ?.
I want to create a RegEx to validate a given computer name.
I can see that the only permitted character is - and so far I have this:
/^[a-zA-Z0-9-]{1,15}$/
which matches almost all constraints except the "not entirely numeric" part.
How to add last constraints to my RegEx?
You could use a negative lookahead:
^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$
Or simply use two regular expressions:
^[a-zA-Z0-9-]{1,15}$
AND NOT
^[0-9]{1,15}$;
Here is a live example:
var regex1 = /^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$/;
var regex2 = /^[a-zA-Z0-9-]{1,15}$/;
var regex3 = /^[0-9]{1,15}$/;
var text1 = "lklndlsdsvlk323";
var text2 = "4214124";
console.log(text1 + ":", !!text1.match(regex1));
console.log(text1 + ":", text1.match(regex2) && !text1.match(regex3));
console.log(text2 + ":", !!text2.match(regex1));
console.log(text2 + ":", text2.match(regex2) && !text2.match(regex3));

Regex Classic ASP

I've currently got a string which contains a URL, and I need to get the base URL.
The string I have is http://www.test.com/test-page/category.html
I am looking for a RegEx that will effectively remove any page/folder names at the end. The issue is that some people may enter the domain in the following formats:
http://www.test.com
www.test.co.uk/
www.test.info/test-page.html
www.test.gov/test-folder/test-page.html
It must return http://www.websitename.ext/ each time i.e. the domain name and extension (e.g. .info .com .co.uk etc) with a forward slash at the end.
Effectively it needs to return the base URL, without any page/folder names. Is there any easy way to do with with a Regular Expression?
Thanks.
My approach: Use a RegEx to extract the domain name. Then add http: to the front and / to the end. Here's the RegEx:
^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))
Also see this answer to the question Extract root domain name from string. (It left me somewhat disatisfied, although pointed out the need to account for https, the port number, and user authentication info which my RegEx does not do.)
Here is an implementation in VBScript. I put the RegEx in a constant and defined a function named GetDomainName(). You should be able to incorporate that function in your ASP page like this:
normalizedUrl = "http://" & GetDomainName(url) & "/"
You can also test my script from the command prompt by saving the code to a file named test.vbs and then passing it to cscript:
cscript test.vbs
Test Program
Option Explicit
Const REGEXPR = "^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))"
' ^^^^^^^^^ ^^^^^^ ^^^^^^^^^^ ^^^^
' A B1 B2 C
'
' A - An optional 'http://' scheme
' B1 - Followed by one or more alpha-numeric characters
' B2 - Followed optionally by one or more occurences of a string
' that begins with a period that is followed by
' one or more alphanumeric characters, and
' C - Terminated by a slash or nothing.
Function GetDomainName(sUrl)
Dim oRegex, oMatch, oMatches, oSubMatch
Set oRegex = New RegExp
oRegex.Pattern = REGEXPR
oRegex.IgnoreCase = True
oRegex.Global = False
Set oMatches = oRegex.Execute(sUrl)
If oMatches.Count > 0 Then
GetDomainName = oMatches(0).SubMatches(0)
Else
GetDomainName = ""
End If
End Function
Dim Data : Data = _
Array( _
"xhttp://www.test.com" _
, "http://www..test.com" _
, "http://www.test.com." _
, "http://www.test.com" _
, "www.test.co.uk/" _
, "www.test.co.uk/?q=42" _
, "www.test.info/test-page.html" _
, "www.test.gov/test-folder/test-page.html" _
, ".www.test.co.uk/" _
)
Dim sUrl, sDomainName
For Each sUrl In Data
sDomainName = GetDomainName(sUrl)
If sDomainName = "" Then
WScript.Echo "[ ] [" & sUrl & "]"
Else
WScript.Echo "[*] [" & sUrl & "] => [" & sDomainName & "]"
End If
Next
Expected Output:
[ ] [xhttp://www.test.com]
[ ] [http://www..test.com]
[ ] [http://www.test.com.]
[*] [http://www.test.com] => [www.test.com]
[*] [www.test.co.uk/] => [www.test.co.uk]
[*] [www.test.co.uk/?q=42] => [www.test.co.uk]
[*] [www.test.info/test-page.html] => [www.test.info]
[*] [www.test.gov/test-folder/test-page.html] => [www.test.gov]
[ ] [.www.test.co.uk/]
I haven't coded Classic ASP in 12 years and this is totally untested.
result = "http://" & Split(Replace(url, "http://",""),"/")(0) & "/"

Extracting quoted and unquoted values using regex

I'm trying to to parse a string of type <tag>=<value> using regular expressions but have hit some issues adding support for quoted values. The idea is that any unquoted values should be trimmed of leading / trailing white space so that [ Hello ] becomes [Hello] (Pls ignore the square brackets.)
However, when the value is quoted, I want anything up to and including the double quotes to be removed but no further, so [ " Hello World " ] would become [" Hello World "]
So far, I've come up with the following code with a pattern match for this (note that some of the character have been escaped or doubly escaped to avoid them being interpreted as tri-graphs or other C format characters.)
void getTagVal( const std::string& tagVal )
{
boost::smatch what;
static const boost::regex pp("^\\s*([a-zA-Z0-9_-]+)\\s*=\\s*\"\?\?([%:\\a-zA-Z0-9 /\\._]+?)\"\?\?\\s*$");
if ( boost::regex_match( tagVal, what, pp ) )
{
const string tag = static_cast<const string&>( what[1] );
const string val = static_cast<const string&>( what[2] );
cout << "Tag = [" << tag << "] Val = [" << val << "]" << endl;
}
}
int main( int argc, char* argv[] )
{
getTagVal("Qs1= \" Hello World \" ");
getTagVal("Qs2=\" Hello World \" ");
getTagVal("Qs3= \" Hello World \"");
getTagVal("Qs4=\" Hello World \"");
getTagVal("Qs5=\"Hello World \"");
getTagVal("Qs6=\" Hello World\"");
getTagVal("Qs7=\"Hello World\"");
return 0;
}
Taking out the double escaping, this breaks down as:
^ - Start of line.
\s* - an optional amount of whitespace.
([a-zA-Z0-9_-]+) - One or more alphanumerics or a dash or underscore. This is captured as the tag.
\s* - an optional amount of whitespace.
= - an "equal" symbol.
\s* - an optional amount of whitespace.
"?? - an optional double quote (non-greedy).
([%:\a-zA-Z0-9 /\._]+?) - One or more alphanumerics or a space, underscore, percent, colon, period, forward or back slash. This is captured as the value (non-greedy).
"?? - an optional double quote (non-greedy).
\s* - an optional amount of whitespace.
$ - End of line
For the example calls in main(), I would expect to get:
Tag = [Qs1] Val = [ Hello World ]
Tag = [Qs2] Val = [ Hello World ]
Tag = [Qs3] Val = [ Hello World ]
Tag = [Qs4] Val = [ Hello World ]
Tag = [Qs5] Val = [Hello World ]
Tag = [Qs6] Val = [ Hello World]
Tag = [Qs7] Val = [Hello World]
but what I actually get is:
Tag = [Qs1] Val = [" Hello World ]
Tag = [Qs2] Val = [" Hello World ]
Tag = [Qs3] Val = [" Hello World ]
Tag = [Qs4] Val = [" Hello World ]
Tag = [Qs5] Val = ["Hello World ]
Tag = [Qs6] Val = [" Hello World]
Tag = [Qs7] Val = ["Hello World]
So it's almost correct but for some reason the first quote is hanging around in the output value even though I specifically bracket the value section of the regex with the quote outside it.
I would change the part starting with the first quote to an alternative:
"([^"]+)"|([%:\a-zA-Z0-9 /\._]+)\s*
You would then have to handle the two possibilities of quoted or unquoted text ending up in the second or third capturing parenthesis pair in the host code around the regex.
Figured out what the problem was.
When using \ you have to be careful as this is processed within the C string and so needs to be escaped there, but it will also be processed by the regex engine so if you're not careful \\a becomes \a which is absolutely not what you wanted.
So, to tell it that I want a \ to be in my set of characters in the value (which I do as ironically, they're being used as escape sequences within a format string) then you have to double escape them so
static const boost::regex pp("^\\s*([a-zA-Z0-9_-]+)\\s*=\\s*\"\?\?([%:\\a-zA-Z0-9 /\\._]+?)\"\?\?\\s*$");
becomes:
static const boost::regex pp("^\\s*([a-zA-Z0-9_-]+)\\s*=\\s*\"\?\?([%:\\\\a-zA-Z0-9 /._]+?)\"\?\?\\s*$");
(i.e. you need to make it \\\\)

Parsing sectioned file with augeas

I am trying to create a module for parsing vim files which are sectioned in a specific manner. A sample file:
" My section {
set nocompatible " be iMproved
set encoding=utf-8
" }
" vim: set foldmarker={,} foldlevel=0 foldmethod=marker:
While writing the module, I've got stuck at this point:
module Vimrc =
autoload xfm
let section = del "\" " "\" " . key /[^\n]+/ . del "\n" "\n" . store /.*/ . del "\" " "\" "
let lns = [ section . del "\n" "\n" ] *
let filter = (incl "*.vim")
let xfm = transform lns filter
I'm aware that there are some other mistakes, but it complains about the regex key /[^\n]+/, saying:
/tmp/aug/vimrc.aug:3.36-.48:exception: The key regexp /[^ ]+/ matches
a '/'
I do not understand what the / character has got to do with this.
As the error says, your key regexp matches a slash, which is illegal since / is used as a level separator in the tree.
If your section names can contain slashes, you need to store them as a node value, not label, so instead of:
{ "My section"
{ "set" = "nocompatible" { "#comment" = "be iMproved" } } }
you'll have to do:
{ "section" = "My section"
{ "set" = "nocompatible" { "#comment" = "be iMproved" } } }

Convert punctuation to space

I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "