I have a text file that looks like this: screenshot below
http://i.stack.imgur.com/AqKzS.png
Each item has this format:
ID<>Text
~~
ID<>Text
~~
I want to fetch the ID in an INT to be used later. And the Text in a String to be used later.
I looped over the file many times using delimiters "<>" & "~~". However, I fail each time with a different script error.
first I faced difficulties because the file contains a lot of newlines throughout the "Text". Also, the text sometimes contains an English paragraph followed by an Arabic paragraph, as showed in the Screenshot.
The ID as highlighted should be {9031} and the Text should be {N/M06"El Patio.......
......
....
....
....
Arabic Text.....}
Can someone help me with the correct script to loop over this text file and fetch each ID followed by its text to be used in a DataEntry process?
For this purpose I recommend to install Satimage sax 3.7.0
The benefit is to find text with regular expression.
Then you easily filter the text with find text
set theText to read file "HD:Path:to:text.txt" as «class utf8» -- replace the HFS path with the actual path
set theResult to {}
set matches to find text "\\d{1,4}<>.*" in theText with regexp and all occurrences
repeat with aMatch in matches
tell aMatch's matchResult
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end repeat
find text returns a record:
matchLen: length of the match
matchPos: offset of the match (0 is the first character!)
matchResult: the matching string (possibly formatted according to the "using" parameter)
The result of the script in variable theResult is a list of lists containing the id and the text. The text starts after the <> but you might cut more characters.
Edit:
It seems that the regex can't parse this text (or my regex knowledge is too bad).
This is a pure AppleScript version without the Scripting Addition.
set theText to read file ((path to desktop as text) & "description.txt") as «class utf8» -- replace the HFS path with the actual path
set {TID, text item delimiters} to {text item delimiters, ("~~" & linefeed)}
set theMatches to text items of theText
set text item delimiters to TID
set theResult to {}
repeat with aMatch in theMatches
if length of aMatch > 1 then
tell aMatch
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end if
end repeat
Related
I have multiple Pages documents in which I need to replace special set of characters - in our language we have one-character prepositions (e.g. v, s, k, u, a), that can't be orphaned at the end of lines, so I need to replace the preposition and the next space with preposition and non-breakable space. Have been trying to use AppleScript (am quite newbie to programming) like this one:
set findList to {"v ", "s "}
set replaceList to {"v ", "s "}
set AppleScript's text item delimiters to ""
tell application "Pages"
activate
tell body text of front document
repeat with i from 1 to count of findList
set word of (words where it is (item i of findList)) to (item i of replaceList)
end repeat
end tell
end tell
return
This does not work as long as there are any spaces in the findList and replaceList parameters.
So I found, that text item delimiters might help me. I was able to make this script
set theText to "Some of my text with v in it"
set AppleScript's text item delimiters to "v "
set theTextItems to text items of theText
set AppleScript's text item delimiters to "v " --this is v with non-breakable space (alt+space)
set theText to theTextItems as string
set AppleScript's text item delimiters to {""}
theText
which works, but only with plain text set on the first line of the code (when I copy the result to Pages there is truly a non-breakable space).
But now I need to write a script, that works on the whole text of Pages document.
I have tried something like this:
tell application "Pages"
activate
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "v "
set textItems to body text of front document
set AppleScript's text item delimiters to "v " --again v with non-breakable space (alt+space)
tell textItems to set editedText to beginning & "v " & rest --again v with non-breakable space (alt+space)
set AppleScript's text item delimiters to astid
set text of document 1 to editedText
end tell
but I get the error
Can’t get beginning of "here is the whole text of the Pages document"." number -1728 from insertion point 1 of "and again the whole text of the document"
If I change the script to:
tell application "Pages"
activate
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "v "
set textItems to text items of body text of front document
set AppleScript's text item delimiters to "v "
tell textItems to set editedText to beginning & "v " & rest
set AppleScript's text item delimiters to astid
set text of document 1 to editedText
end tell
I get another error
Pages got an error: Can’t get every text item of body text of document 1." number -1728 from every text item of body text of document 1
Can anyone point me to the right direction how to properly script this?
Thanks.
I hope this will help you. This uses AppleScript's text item delimiters to split/join back texts. It can be more compact, but this is a comprehensive way to write it. As you can use it often in your script, it's a good thing to put it in a special subroutine.
I build a list of pairs {search,replace} easier to maintain in one place, and a "repeat" loop to apply every pair of corrections. Don't forget the "my" statement as Pages doesn't own strRepl() and will fire an error.
Unfortunately, extracting text, and putting it back into Pages will loose any text attributes. So here it is :
set findReplaceList to {{"a", "A"}, {"b ", "B"}, {"this", "that"}}
tell application "Pages"
set bodyText to body text of front document -- get the content as text
repeat with thisFindReplaceValues in findReplaceList
copy thisFindReplaceValues to {findItem, replaceItem} -- put first and second item resp. in findItem and replaceItem
set bodyText to my strRepl(bodyText, findItem, replaceItem) -- search and replace text
end repeat
set body text of front document to bodyText -- put the new text back. Loosing attributes.
end tell
on strRepl(SourceStr, searchString, newString)
set saveDelim to AppleScript's text item delimiters
set AppleScript's text item delimiters to searchString -- change ATID : the search item
set temporaryList to every text item in SourceStr -- split the text in parts removing searched items
set AppleScript's text item delimiters to newString -- New ATID : the replace item
set SourceStr to temporaryList as text -- this put back the parts to text with newString between
set AppleScript's text item delimiters to saveDelim -- clean up ATIDs
return SourceStr
end strRepl
This script is a variation on #Chino22's. Given the consistent requirement here (always a single letter, replaced by itself), I've moved to a simple list of single elements and set the replacement when calling the handler.
-- List of prepositions to seek out (added the 'z' as it was prevalent in the article used for testing)
set chList to {"v", "s", "k", "u", "a", "z"}
tell application "Pages"
set bodyText to body text of front document
repeat with prep in chList
-- call replacement handler
set bodyText to my strRepl(bodyText, space & prep & space, space & prep & character id 160)
end repeat
set body text of front document to bodyText
end tell
on strRepl(srcStr, oldStr, newStr)
set AppleScript's text item delimiters to oldStr
considering case
set temporaryList to every text item in srcStr
end considering
set AppleScript's text item delimiters to newStr
set srcStr to temporaryList as text
return srcStr
end strRepl
NB My search and replace strings include a space both before and after the letter. This ensures that only single-letter words are affected. I added a considering case to further restrict the search to lower case letters. The 'character id 160' specifies the non-breaking space. Finally I left out the first and last delimiter commands to reduce clutter. Add them back at your discretion. A single letter followed by punctuation will not be processed.
Regarding some of the errors you were seeing… They are likely a result of Pages having issues with text item delimiters within its tell block. In general, you would need to split the script into three sections, along these lines:
tell application "Pages" to set bt to body text of front document
myriad delimiters stuff, including 'set editedText to…'
tell application "Pages" to set body text of front document to editedText
Using the handler as Chino22 suggests circumvents this issue by putting all that work within the handler (which is outside the tell block). Also, 'beginning' and 'rest' don't mean what you assume they do in applescript. Finally, I have read of recommendations for working at the paragraph level rather than with the entire body text. It may not be an issue for you but perhaps if you are working with very large documents and have issues, it may be worth making some modifications to the script.
I am new to Regex and AppleScript and I need a little bit of support and guidence.
First user inputs a string. It could be anything in one or multilines.
A Regex should be applied on the string in order to find numbers with only 6 digits..no more or less, and separates them by a space.
The final string should look like: 867689, 867617, 866478, 866403, 866343.
Then this string will be converted into a list.
I am using this site to test my Regexes : https://www.freeformatter.com/regex-tester.html
The Regex that matches exactly 6 digits is:
(?<!\d)\d{6}(?!\d)
I am aware that in order to implement Regex to AppleScript i need to use Shell Script. I also am aware that I should use sed but unfortunately I am not fully aware how to use it and what exactly is.
Fallowing a few guides and tests I understood that sed does not work with \d and I should use [0-9] instead and I also should escape the brackets like this \(..\). Also replace $1, should be implemented like \1,. Till this moment I was not able to make it work.
My user input for tests is as follows:
MASTER
ARTIKEL
Artikel
5910020015
867689
PULL1/1
5910020022
867617
PULL1/1
Cappuccino
5910020017
866478
PULL1/1
Braun
5921020017
866403
SHIRT1/2
Kastanie-Multi
5910020016
866343
PULL1/1
and the AppleScript Code itself:
use scripting additions
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
on list2string(theFoldersList, theDelimiter)
set theBackup to AppleScript's text item delimiters
set AppleScript's text item delimiters to theDelimiter
set theString to theFoldersList as string
set AppleScript's text item delimiters to theBackup
return theString
end list2string
on run {input}
display dialog "Please enter your string: " default answer ""
set stringOfNumbers to the text returned of the result
set num to do shell script "sed 's/\(\(?<![0-9]\)[0-9]{6}\(?![0-9]\)\)\1, /' <<< " & quoted form of stringOfNumbers
--(?<!\d)\d{6}(?!\d)
display dialog stringOfNumbers
set stringOfNumbers to current application's NSString's stringWithString:stringOfNumbers
set listOfArtNumbers to (stringOfNumbers's componentsSeparatedByString:", ") as list
display dialog list2string(listOfArtNumbers, ", ")
return input
end run
Unfortunately everywhere I escape characters by using \ I get an error. So I had to remove all \ but once I run the script I receive "Syntax Error: sed: 1: "s/(?<![0-9])[0-9]{6}(?! ...": unterminated substitute pattern" and all my effort resulted in a similar error.
AppleScript Objective-C allows us to do regular expressions using NSRegularExpression, starting with OS 10.7 (Lion). The following handler returns the results of a regular expressions search as a list:
use AppleScript version "2.4"
use framework "Foundation"
property NSRegularExpression : class "NSRegularExpression"
property NSString : class "NSString"
on findPattern:thePattern inString:theString
set theText to NSString's stringWithString:theString
set theRegEx to NSRegularExpression's regularExpressionWithPattern:thePattern ¬
options:0 |error|:(missing value)
set theResult to (theRegEx's matchesInString:theText ¬
options:0 ¬
range:{location:0, |length|:theText's |length|})'s valueForKey:("range")
set outputArray to {}
repeat with thisRange in theResult
copy (theText's substringWithRange:thisRange) as text to end of outputArray
end repeat
return outputArray
end findPattern:inString:
Note that the '¬' symbol is a line-continuation symbol (type option-return in the AppleScript editor). I've broken up lines to make the script more readable, but that may not copy/paste correctly, so be aware that those should be single, continuous lines.
You use this handler as follows. Remember that the backslash is a special character in AppleScript, so it has to be escaped by preceding it with another backslash:
set foundList to my findPattern:"(?<!\\d)\\d{6}(?!\\d)" inString:"MASTER
ARTIKEL
Artikel
5910020015
867689
PULL1/1
5910020022
867617
PULL1/1
Cappuccino
5910020017
866478
PULL1/1
Braun
5921020017
866403
SHIRT1/2
Kastanie-Multi
5910020016
866343
PULL1/1"
-- Result: {"867689", "867617", "866478", "866403", "866343"}
EDIT
It seems Automator doesn't like the property ClassName : class "ClassName" method I've used, so we have to switch to another form: using current application's ClassName's ... The revised Automator AppleScript looks like so (assuming that the text string is passed in as the input):
use AppleScript version "2.4"
use framework "Foundation"
on run {input, parameters}
set foundList to my findPattern:"(?<!\\d)\\d{6}(?!\\d)" inString:((item 1 of input) as text)
return foundList
end run
on findPattern:thePattern inString:theString
set theText to current application's NSString's stringWithString:theString
set theRegEx to current application's NSRegularExpression's regularExpressionWithPattern:thePattern ¬
options:0 |error|:(missing value)
set theResult to (theRegEx's matchesInString:theText ¬
options:0 ¬
range:{location:0, |length|:theText's |length|})'s valueForKey:("range")
set outputArray to {}
repeat with thisRange in theResult
copy (theText's substringWithRange:thisRange) as text to end of outputArray
end repeat
return outputArray
end findPattern:inString:
I've been trying to figure something out for a while now and I can't seem to understand. I've looked everywhere and I still can't find it.
I'm trying to make a dictionary for an auto corrector with AutoHotKey and I need to replace the beginning of each line with "::" and somewhere in between the line with another "::"
like so:
::togehter::together
Now I have around 20,000 of these to add with no "::" yet and what I'm doing is this in the replace textbox:
Replace: ^
With: ::
Now it works fine for the first line BUT if I press replace all cause no way am I going to click 20,000 times on replace, it replaces not only from where I am to the bottom but also the beginning too. So every line now has a new "::" added.
So what I need is to be able to tell the replace at what line to stop instead of doing every single line.
Also if you could help me add the "::(word)" after the first ::(misspelled word) that would be a great help.
Image for reference
I have found that the regular expression replace-all of ^ with some text, i.e. to add some text at the start of every line, does not work in some versions of Notepad++. My workaround for this was to use the ^(.) as the search string and include \1 in the replacement. For your case the replacement would be ::\1. The effect here is to replace the first character of each line with :: plus the first character. In a quick test with Notepad++ v7.1, replacing ^ with :: worked as I would want.
Two things should be checked in the Replace dialogue before doing the replace-all: (1) that "Regular expression" is selected and (2) "In selection" is not selected.
The question is not clear how the two words in the input are separated, so assuming that one or more spaces or tabs is used the search string to use is ^(\w+)\h+ and the replace string is ::\1::.
This AutoHotkey script might do what you require.
It leaves unchanged lines that start with '::',
and prepends/replaces text in the others. You copy the original text to the clipboard, run this script, and then the desired text is put on the clipboard. (To create and run the script: copy and paste it into a text editor and save it as myscriptname.ahk, or myscriptname.txt and then drag and drop the file into the AutoHotkey exe file. Or alternatively, if you save it as an ahk file, and install AutoHotkey, you can double-click to run.) AutoHotkey
vText := Clipboard
vOutput := ""
VarSetCapacity(vOutput, StrLen(vText)*2*2)
StringReplace, vText, vText, `r`n, `n, All
Loop, Parse, vText, `n
{
vTemp := A_LoopField
if (vTemp = "")
if (1, vOutput .= "`r`n")
continue
if (SubStr(vTemp, 1, 2) = "::")
if (1, vOutput .= vTemp "`r`n")
continue
StringReplace, vTemp, vTemp, %A_Space%, ::, All
vOutput .= "::" vTemp "`r`n"
}
Clipboard := vOutput
MsgBox done
Return
As part of a Windows logon script (hence the VBScript requirement), I would like to set values in a user's Google Chrome preferences (stored in a JSON file in the user profile) to apply download settings when they logon.
I'm trying to achieve the following:
Open a JSON file (%userprofile%\Local Settings\Application Data\Google\Chrome\User Data\Default\Preferences) and read the contents to a string;
Search for a particular node named "download", which by is pre-populated with multi-line values that may vary between builds;
Replace the entire text between the braces with specified multi-line text; and
Write the updated string to the original file and save.
The full JSON file is fairly large, but as a sample to use as the input, this is copied from a typical Google Chrome prefs JSON file:
"bookmark_bar": {
"show_on_all_tabs": false
},
"download": {
"directory_upgrade": true,
"prompt_for_download": false
},
"sync": {
"suppress_start": true
},
I would like to programmatically search for the "download" node, and replace everything between the braces of just this node so that it reads:
"download": {
"default_directory": "C:\\Windows\\Temp",
"extensions_to_open": "pdf",
"prompt_for_download": false
},
...with the rest of the file's contents unchanged.
Given the whitespace and multiple lines in the section of the JSON to be replaced, as well as the wildcard requirement to include all/any text between the braces, I can't do this using the VBScript Replace function, but my RegEx knowledge is limited.
You can do the replacement with a regular expression:
prefsFile = "%userprofile%\Local Settings\...\Preferences"
prefsFile = CreateObject("WScript.Shell").ExpandEnvironmentStrings(prefsFile)
newPrefs = "..."
Set fso = CreateObject("Scripting.FileSystemObject")
json = fso.OpenTextFile(prefsFile).ReadAll
Set re = New RegExp
re.Pattern = """download"": {[\s\S]*?},"
json = re.Replace(json, """download"": {" & vbCrLf & newPrefs & vbCrLf & "},")
fso.OpenTextFile(prefsFile, 2).Write(json)
The pattern [\s\S] matches any whitespace or non-whitespace character. You can't use . in this case, because that special character does not match newlines, and you want the expression to span multiple lines. The qualifiers * and ? mean "match any number of characters" and "use the shortest match" respectively. That way the expression matches everything between a pair of curly braces after the "download": keyword.
I would like to find a way to easy format lists in Vim.
I checked PAR and the default formatter of Vim.
p.e.
1. this is my text this is my text this is my text
2. this is my text this is my text this is my text
3. this is my text this is my text this is my text
4. this is my text this is my text this is my text
and this
- this is my text this is my text this is my text
- this is my text this is my text this is my text
- this is my text this is my text this is my text
- this is my text this is my text this is my text
when I select the lines and do a default format to 42 with PAR and VIM these are the results:
NUMBERED LIST
formatting with par:
par error:
(42) <= (0) + (50)
formatting with vim:
1. this is my text this is my text this is
my text
2. this is my text this is my text this is
my text
3. this is my text this is my text this is
my text
4. this is my text this is my text this is
my text
LIST with '-'
formatting with par:
4 lines filtered (no change)
formatting with vim:
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
Vim does a better job formatting lists but it is not correct as well in a numbered list.
Par does have a lot of troubles formatting lists even when I use the prefix ("p") option like this:
'<,'>!par w42p4dh or '<,'>!par w42p3dh
Does anyone know a good way how to format lists without problems?
Try set fo+=n. From :help fo-table:
n When formatting text, recognize numbered lists. This actually uses
the 'formatlistpat' option, thus any kind of list can be used. The
indent of the text after the number is used for the next line. The
default is to find a number, optionally followed by '.', ':', ')',
']' or '}'. Note that 'autoindent' must be set too. Doesn't work
well together with "2".
Example:
1. the first item
wraps
2. the second item