Implementing Regex in AppleScript for matching exactly 6 numbers - regex

I am new to Regex and AppleScript and I need a little bit of support and guidence.
First user inputs a string. It could be anything in one or multilines.
A Regex should be applied on the string in order to find numbers with only 6 digits..no more or less, and separates them by a space.
The final string should look like: 867689, 867617, 866478, 866403, 866343.
Then this string will be converted into a list.
I am using this site to test my Regexes : https://www.freeformatter.com/regex-tester.html
The Regex that matches exactly 6 digits is:
(?<!\d)\d{6}(?!\d)
I am aware that in order to implement Regex to AppleScript i need to use Shell Script. I also am aware that I should use sed but unfortunately I am not fully aware how to use it and what exactly is.
Fallowing a few guides and tests I understood that sed does not work with \d and I should use [0-9] instead and I also should escape the brackets like this \(..\). Also replace $1, should be implemented like \1,. Till this moment I was not able to make it work.
My user input for tests is as follows:
MASTER
ARTIKEL
Artikel
5910020015
867689
PULL1/1
5910020022
867617
PULL1/1
Cappuccino
5910020017
866478
PULL1/1
Braun
5921020017
866403
SHIRT1/2
Kastanie-Multi
5910020016
866343
PULL1/1
and the AppleScript Code itself:
use scripting additions
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
on list2string(theFoldersList, theDelimiter)
set theBackup to AppleScript's text item delimiters
set AppleScript's text item delimiters to theDelimiter
set theString to theFoldersList as string
set AppleScript's text item delimiters to theBackup
return theString
end list2string
on run {input}
display dialog "Please enter your string: " default answer ""
set stringOfNumbers to the text returned of the result
set num to do shell script "sed 's/\(\(?<![0-9]\)[0-9]{6}\(?![0-9]\)\)\1, /' <<< " & quoted form of stringOfNumbers
--(?<!\d)\d{6}(?!\d)
display dialog stringOfNumbers
set stringOfNumbers to current application's NSString's stringWithString:stringOfNumbers
set listOfArtNumbers to (stringOfNumbers's componentsSeparatedByString:", ") as list
display dialog list2string(listOfArtNumbers, ", ")
return input
end run
Unfortunately everywhere I escape characters by using \ I get an error. So I had to remove all \ but once I run the script I receive "Syntax Error: sed: 1: "s/(?<![0-9])[0-9]{6}(?! ...": unterminated substitute pattern" and all my effort resulted in a similar error.

AppleScript Objective-C allows us to do regular expressions using NSRegularExpression, starting with OS 10.7 (Lion). The following handler returns the results of a regular expressions search as a list:
use AppleScript version "2.4"
use framework "Foundation"
property NSRegularExpression : class "NSRegularExpression"
property NSString : class "NSString"
on findPattern:thePattern inString:theString
set theText to NSString's stringWithString:theString
set theRegEx to NSRegularExpression's regularExpressionWithPattern:thePattern ¬
options:0 |error|:(missing value)
set theResult to (theRegEx's matchesInString:theText ¬
options:0 ¬
range:{location:0, |length|:theText's |length|})'s valueForKey:("range")
set outputArray to {}
repeat with thisRange in theResult
copy (theText's substringWithRange:thisRange) as text to end of outputArray
end repeat
return outputArray
end findPattern:inString:
Note that the '¬' symbol is a line-continuation symbol (type option-return in the AppleScript editor). I've broken up lines to make the script more readable, but that may not copy/paste correctly, so be aware that those should be single, continuous lines.
You use this handler as follows. Remember that the backslash is a special character in AppleScript, so it has to be escaped by preceding it with another backslash:
set foundList to my findPattern:"(?<!\\d)\\d{6}(?!\\d)" inString:"MASTER
ARTIKEL
Artikel
5910020015
867689
PULL1/1
5910020022
867617
PULL1/1
Cappuccino
5910020017
866478
PULL1/1
Braun
5921020017
866403
SHIRT1/2
Kastanie-Multi
5910020016
866343
PULL1/1"
-- Result: {"867689", "867617", "866478", "866403", "866343"}
EDIT
It seems Automator doesn't like the property ClassName : class "ClassName" method I've used, so we have to switch to another form: using current application's ClassName's ... The revised Automator AppleScript looks like so (assuming that the text string is passed in as the input):
use AppleScript version "2.4"
use framework "Foundation"
on run {input, parameters}
set foundList to my findPattern:"(?<!\\d)\\d{6}(?!\\d)" inString:((item 1 of input) as text)
return foundList
end run
on findPattern:thePattern inString:theString
set theText to current application's NSString's stringWithString:theString
set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern ¬
options:0 |error|:(missing value)
set theResult to (theRegEx's matchesInString:theText ¬
options:0 ¬
range:{location:0, |length|:theText's |length|})'s valueForKey:("range")
set outputArray to {}
repeat with thisRange in theResult
copy (theText's substringWithRange:thisRange) as text to end of outputArray
end repeat
return outputArray
end findPattern:inString:

Related

TCL regex passing into variable

Working with TCL and I am trying to setup a regex to get the data within my xml string. The code that I provided has an example string of what I am dealing with and the regexp is attempting to find the first close bracket and keep the data until the next open bracket then place that into variable number. Unfortunately the output I am getting is: "< RouteLabel>Hurdman<" instead of the expected "Hurdman". Any help would really be appreciated.
set direction(1) {<RouteLabel>Hurdman</RouteLabel>}
regexp {^.*>(.*)<} $direction(1) number
The issue here is not with the regex but with how you are using it.
The syntax you need is
regexp <PATTERN> <INPUT> <WHOLE_MATCH_VAR> <CAPTURE_1_VAR> ... <CAPTURE_n_VAR>
So, in your case, as you are not interested in the whole match, just put _ where the whole match is expected:
set direction(1) {<RouteLabel>Hurdman</RouteLabel>}
regexp {^.*>(.*)<} $direction(1) _ number
puts $number
printing Hurdman. See the online Tcl demo.
Crash course in tDOM for this exact task:
Get tDOM (note different spelling in package name):
% package require tdom
0.8.3
Create an empty document with a root element called foobar:
% set doc [dom createDocument foobar]
domDoc02569130
Get a fix on the root:
% set root [$doc documentElement]
domNode025692E0
Setup one of your XML strings:
% set direction(1) {<RouteLabel>Hurdman</RouteLabel>}
<RouteLabel>Hurdman</RouteLabel>
Add it to the DOM tree at the root:
% $root appendXML $direction(1)
domNode025692E0
Get the string you want by XPath expression:
% $root selectNodes {string(//RouteLabel/text())}
Hurdman
Or by querying the root (only works if there is only one single text node inserted at a time, otherwise you get them all concatenated):
% $root asText
Hurdman
If you want to clear the DOM tree from the root to make it ready for appending new strings without the old ones interfering:
% foreach node [$root childNodes] {$node delete}
But if you use XPath expressions you should be able to append any number of XML strings and still retrieve their content.
Once again:
package require tdom
set doc [dom createDocument foobar]
set root [$doc documentElement]
set direction(1) {<RouteLabel>Hurdman</RouteLabel>}
$root appendXML $direction(1)
$root selectNodes {string(//RouteLabel/text())}
# => Hurdman
Documentation:
tdom (package)

notepad++ Stop replacing at a specific line

I've been trying to figure something out for a while now and I can't seem to understand. I've looked everywhere and I still can't find it.
I'm trying to make a dictionary for an auto corrector with AutoHotKey and I need to replace the beginning of each line with "::" and somewhere in between the line with another "::"
like so:
::togehter::together
Now I have around 20,000 of these to add with no "::" yet and what I'm doing is this in the replace textbox:
Replace: ^
With: ::
Now it works fine for the first line BUT if I press replace all cause no way am I going to click 20,000 times on replace, it replaces not only from where I am to the bottom but also the beginning too. So every line now has a new "::" added.
So what I need is to be able to tell the replace at what line to stop instead of doing every single line.
Also if you could help me add the "::(word)" after the first ::(misspelled word) that would be a great help.
Image for reference
I have found that the regular expression replace-all of ^ with some text, i.e. to add some text at the start of every line, does not work in some versions of Notepad++. My workaround for this was to use the ^(.) as the search string and include \1 in the replacement. For your case the replacement would be ::\1. The effect here is to replace the first character of each line with :: plus the first character. In a quick test with Notepad++ v7.1, replacing ^ with :: worked as I would want.
Two things should be checked in the Replace dialogue before doing the replace-all: (1) that "Regular expression" is selected and (2) "In selection" is not selected.
The question is not clear how the two words in the input are separated, so assuming that one or more spaces or tabs is used the search string to use is ^(\w+)\h+ and the replace string is ::\1::.
This AutoHotkey script might do what you require.
It leaves unchanged lines that start with '::',
and prepends/replaces text in the others. You copy the original text to the clipboard, run this script, and then the desired text is put on the clipboard. (To create and run the script: copy and paste it into a text editor and save it as myscriptname.ahk, or myscriptname.txt and then drag and drop the file into the AutoHotkey exe file. Or alternatively, if you save it as an ahk file, and install AutoHotkey, you can double-click to run.) AutoHotkey
vText := Clipboard
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2*2)
StringReplace, vText, vText, `r`n, `n, All
Loop, Parse, vText, `n
{
vTemp := A_LoopField
if (vTemp = "")
if (1, vOutput .= "`r`n")
continue
if (SubStr(vTemp, 1, 2) = "::")
if (1, vOutput .= vTemp "`r`n")
continue
StringReplace, vTemp, vTemp, %A_Space%, ::, All
vOutput .= "::" vTemp "`r`n"
}
Clipboard := vOutput
MsgBox done
Return

Manage looping on txt file with AppleScript

I have a text file that looks like this: screenshot below
http://i.stack.imgur.com/AqKzS.png
Each item has this format:
ID<>Text
~~
ID<>Text
~~
I want to fetch the ID in an INT to be used later. And the Text in a String to be used later.
I looped over the file many times using delimiters "<>" & "~~". However, I fail each time with a different script error.
first I faced difficulties because the file contains a lot of newlines throughout the "Text". Also, the text sometimes contains an English paragraph followed by an Arabic paragraph, as showed in the Screenshot.
The ID as highlighted should be {9031} and the Text should be {N/M06"El Patio.......
......
....
....
....
Arabic Text.....}
Can someone help me with the correct script to loop over this text file and fetch each ID followed by its text to be used in a DataEntry process?
For this purpose I recommend to install Satimage sax 3.7.0
The benefit is to find text with regular expression.
Then you easily filter the text with find text
set theText to read file "HD:Path:to:text.txt" as «class utf8» -- replace the HFS path with the actual path
set theResult to {}
set matches to find text "\\d{1,4}<>.*" in theText with regexp and all occurrences
repeat with aMatch in matches
tell aMatch's matchResult
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end repeat
find text returns a record:
matchLen: length of the match
matchPos: offset of the match (0 is the first character!)
matchResult: the matching string (possibly formatted according to the "using" parameter)
The result of the script in variable theResult is a list of lists containing the id and the text. The text starts after the <> but you might cut more characters.
Edit:
It seems that the regex can't parse this text (or my regex knowledge is too bad).
This is a pure AppleScript version without the Scripting Addition.
set theText to read file ((path to desktop as text) & "description.txt") as «class utf8» -- replace the HFS path with the actual path
set {TID, text item delimiters} to {text item delimiters, ("~~" & linefeed)}
set theMatches to text items of theText
set text item delimiters to TID
set theResult to {}
repeat with aMatch in theMatches
if length of aMatch > 1 then
tell aMatch
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end if
end repeat

Using VBScript to find and replace all multi-line text between curly braces of a node within a JSON file

As part of a Windows logon script (hence the VBScript requirement), I would like to set values in a user's Google Chrome preferences (stored in a JSON file in the user profile) to apply download settings when they logon.
I'm trying to achieve the following:
Open a JSON file (%userprofile%\Local Settings\Application Data\Google\Chrome\User Data\Default\Preferences) and read the contents to a string;
Search for a particular node named "download", which by is pre-populated with multi-line values that may vary between builds;
Replace the entire text between the braces with specified multi-line text; and
Write the updated string to the original file and save.
The full JSON file is fairly large, but as a sample to use as the input, this is copied from a typical Google Chrome prefs JSON file:
"bookmark_bar": {
"show_on_all_tabs": false
},
"download": {
"directory_upgrade": true,
"prompt_for_download": false
},
"sync": {
"suppress_start": true
},
I would like to programmatically search for the "download" node, and replace everything between the braces of just this node so that it reads:
"download": {
"default_directory": "C:\\Windows\\Temp",
"extensions_to_open": "pdf",
"prompt_for_download": false
},
...with the rest of the file's contents unchanged.
Given the whitespace and multiple lines in the section of the JSON to be replaced, as well as the wildcard requirement to include all/any text between the braces, I can't do this using the VBScript Replace function, but my RegEx knowledge is limited.
You can do the replacement with a regular expression:
prefsFile = "%userprofile%\Local Settings\...\Preferences"
prefsFile = CreateObject("WScript.Shell").ExpandEnvironmentStrings(prefsFile)
newPrefs = "..."
Set fso = CreateObject("Scripting.FileSystemObject")
json = fso.OpenTextFile(prefsFile).ReadAll
Set re = New RegExp
re.Pattern = """download"": {[\s\S]*?},"
json = re.Replace(json, """download"": {" & vbCrLf & newPrefs & vbCrLf & "},")
fso.OpenTextFile(prefsFile, 2).Write(json)
The pattern [\s\S] matches any whitespace or non-whitespace character. You can't use . in this case, because that special character does not match newlines, and you want the expression to span multiple lines. The qualifiers * and ? mean "match any number of characters" and "use the shortest match" respectively. That way the expression matches everything between a pair of curly braces after the "download": keyword.

Vim Regex OR in File Name Pattern on Windows

I'm trying to formulate a regular expression that matches the names of a set of files I would like to batch process in Vim but am finding that I cannot seem to use \| (regex OR) as expected...
Specifically, I would like to create an argument list consisting of the following files in the current directory:
f0148.e, f0149.e, f0150.e ... f0159.e (i.e., 12 files total)
The vim command I entered goes as follows:
:arg f01\(\(4[89]\)\|\(5[0-9]\)\).e
Vim completes this command without any noticeable result -- there's no message and the output from :args remains unchanged (doesn't produce the desired list of file names).
If I split up the regular expression to:
:arg f01\(\(4[89]\)\).e (note: leaving parenthesis here as in above full expression)
...and...
:arg f01\(\(5[0-9]\)\).e
... then :args produces f0148.e f0149.e and f0150.e ... f0159.e respectively (as desired).
Also, if I enter the above mentioned list of file names in a text file and use the above mentioned regular expression as a search pattern (i.e., /f01\(\(4[89]\)\|\(5[0-9]\)\).e), it works just as desired.
Thus, I determined that the alternation (\|) is somehow causing the the expression to fail. Please note that I'm using Vim on Windows 7, if this is relevant (since both backslash and pipe are valid symbols at the Windows command prompt).
A quick workaround would be to use:
:arg f014[89].e
:argadd f015[0-9].e
...but I would really like to figure out how to make the above regular expression work.
Thanks for your help!
I could suggest:
:let file_list = filter(split(globpath('.','**'),nr2char(10)), 'v:val =~ ''f01\(\(4[89]\)\|\(5[0-9]\)\)\.e'' ')
:execute 'args ' . join(map(file_list,'fnameescape(v:val)'),' ')
How this works:
globpath('.','**') makes a list of all files in current directory and all subdirectories. :help globpath().
split(..., nr2char(10)) will make a list of it, because the separator was Line Feed
filter(..., 'v:val =~ ''pattern'' ') filters the list keeping only items matching pattern. :help v:val. Doubling single quote is escaping them inside single quote string.
map(..., fnameescape()) escapes all spaces and backslashes
join() adds spaces between file names
If you want to make it a function you can put this into your vimrc:
function! ArgsPattern(pat)
let file_list = filter(split(globpath('.','**'),nr2char(10)), 'v:val =~ ''' . substitute(a:pat,"'","''",'g') . '''')
execute 'args ' . join(map(file_list,'fnameescape(v:val)'),' ')
endfunction
command! -nargs=+ ArgsPattern call ArgsPattern(<q-args>)
And then you only have to do:
:ArgsPattern f01\(\(4[89]\)\|\(5[0-9]\)\)\.e
Note that if there is no match, then the execute command inside the function evaluates to :args and therefore the list of your current arguments are printed.