How to use Applescript text item delimiters to find and replace text in Pages? - replace

I have multiple Pages documents in which I need to replace special set of characters - in our language we have one-character prepositions (e.g. v, s, k, u, a), that can't be orphaned at the end of lines, so I need to replace the preposition and the next space with preposition and non-breakable space. Have been trying to use AppleScript (am quite newbie to programming) like this one:
set findList to {"v ", "s "}
set replaceList to {"v ", "s "}
set AppleScript's text item delimiters to ""
tell application "Pages"
activate
tell body text of front document
repeat with i from 1 to count of findList
set word of (words where it is (item i of findList)) to (item i of replaceList)
end repeat
end tell
end tell
return
This does not work as long as there are any spaces in the findList and replaceList parameters.
So I found, that text item delimiters might help me. I was able to make this script
set theText to "Some of my text with v in it"
set AppleScript's text item delimiters to "v "
set theTextItems to text items of theText
set AppleScript's text item delimiters to "v " --this is v with non-breakable space (alt+space)
set theText to theTextItems as string
set AppleScript's text item delimiters to {""}
theText
which works, but only with plain text set on the first line of the code (when I copy the result to Pages there is truly a non-breakable space).
But now I need to write a script, that works on the whole text of Pages document.
I have tried something like this:
tell application "Pages"
activate
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "v "
set textItems to body text of front document
set AppleScript's text item delimiters to "v " --again v with non-breakable space (alt+space)
tell textItems to set editedText to beginning & "v " & rest --again v with non-breakable space (alt+space)
set AppleScript's text item delimiters to astid
set text of document 1 to editedText
end tell
but I get the error
Can’t get beginning of "here is the whole text of the Pages document"." number -1728 from insertion point 1 of "and again the whole text of the document"
If I change the script to:
tell application "Pages"
activate
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "v "
set textItems to text items of body text of front document
set AppleScript's text item delimiters to "v "
tell textItems to set editedText to beginning & "v " & rest
set AppleScript's text item delimiters to astid
set text of document 1 to editedText
end tell
I get another error
Pages got an error: Can’t get every text item of body text of document 1." number -1728 from every text item of body text of document 1
Can anyone point me to the right direction how to properly script this?
Thanks.

I hope this will help you. This uses AppleScript's text item delimiters to split/join back texts. It can be more compact, but this is a comprehensive way to write it. As you can use it often in your script, it's a good thing to put it in a special subroutine.
I build a list of pairs {search,replace} easier to maintain in one place, and a "repeat" loop to apply every pair of corrections. Don't forget the "my" statement as Pages doesn't own strRepl() and will fire an error.
Unfortunately, extracting text, and putting it back into Pages will loose any text attributes. So here it is :
set findReplaceList to {{"a", "A"}, {"b ", "B"}, {"this", "that"}}
tell application "Pages"
set bodyText to body text of front document -- get the content as text
repeat with thisFindReplaceValues in findReplaceList
copy thisFindReplaceValues to {findItem, replaceItem} -- put first and second item resp. in findItem and replaceItem
set bodyText to my strRepl(bodyText, findItem, replaceItem) -- search and replace text
end repeat
set body text of front document to bodyText -- put the new text back. Loosing attributes.
end tell
on strRepl(SourceStr, searchString, newString)
set saveDelim to AppleScript's text item delimiters
set AppleScript's text item delimiters to searchString -- change ATID : the search item
set temporaryList to every text item in SourceStr -- split the text in parts removing searched items
set AppleScript's text item delimiters to newString -- New ATID : the replace item
set SourceStr to temporaryList as text -- this put back the parts to text with newString between
set AppleScript's text item delimiters to saveDelim -- clean up ATIDs
return SourceStr
end strRepl

This script is a variation on #Chino22's. Given the consistent requirement here (always a single letter, replaced by itself), I've moved to a simple list of single elements and set the replacement when calling the handler.
-- List of prepositions to seek out (added the 'z' as it was prevalent in the article used for testing)
set chList to {"v", "s", "k", "u", "a", "z"}
tell application "Pages"
set bodyText to body text of front document
repeat with prep in chList
-- call replacement handler
set bodyText to my strRepl(bodyText, space & prep & space, space & prep & character id 160)
end repeat
set body text of front document to bodyText
end tell
on strRepl(srcStr, oldStr, newStr)
set AppleScript's text item delimiters to oldStr
considering case
set temporaryList to every text item in srcStr
end considering
set AppleScript's text item delimiters to newStr
set srcStr to temporaryList as text
return srcStr
end strRepl
NB My search and replace strings include a space both before and after the letter. This ensures that only single-letter words are affected. I added a considering case to further restrict the search to lower case letters. The 'character id 160' specifies the non-breaking space. Finally I left out the first and last delimiter commands to reduce clutter. Add them back at your discretion. A single letter followed by punctuation will not be processed.
Regarding some of the errors you were seeing… They are likely a result of Pages having issues with text item delimiters within its tell block. In general, you would need to split the script into three sections, along these lines:
tell application "Pages" to set bt to body text of front document
myriad delimiters stuff, including 'set editedText to…'
tell application "Pages" to set body text of front document to editedText
Using the handler as Chino22 suggests circumvents this issue by putting all that work within the handler (which is outside the tell block). Also, 'beginning' and 'rest' don't mean what you assume they do in applescript. Finally, I have read of recommendations for working at the paragraph level rather than with the entire body text. It may not be an issue for you but perhaps if you are working with very large documents and have issues, it may be worth making some modifications to the script.

Related

How to make QTextCursor::WordUnderCursor include more characters or larger words

I'm trying to show a tooltip when the cursor is over a keyword in a text editor using:
QTextCursor cursor = cursorForPosition(pos);
cursor.select(QTextCursor::WordUnderCursor);
This works well but the definition of a word does not fits my needs. For exemple the keyword \abcde is recognized as 2 words \ and abcde. Similarly the word a1:2 is recognized in three parts a1, : and 1. Basically what I'd like is to change the behavior such as a word is defined as a set of characters separated by space.
I tryied QTextCursor::BlockUnderCursor but it does the same than QTextCursor::LineUnderCursor and returns the entire line.

Manage looping on txt file with AppleScript

I have a text file that looks like this: screenshot below
http://i.stack.imgur.com/AqKzS.png
Each item has this format:
ID<>Text
~~
ID<>Text
~~
I want to fetch the ID in an INT to be used later. And the Text in a String to be used later.
I looped over the file many times using delimiters "<>" & "~~". However, I fail each time with a different script error.
first I faced difficulties because the file contains a lot of newlines throughout the "Text". Also, the text sometimes contains an English paragraph followed by an Arabic paragraph, as showed in the Screenshot.
The ID as highlighted should be {9031} and the Text should be {N/M06"El Patio.......
......
....
....
....
Arabic Text.....}
Can someone help me with the correct script to loop over this text file and fetch each ID followed by its text to be used in a DataEntry process?
For this purpose I recommend to install Satimage sax 3.7.0
The benefit is to find text with regular expression.
Then you easily filter the text with find text
set theText to read file "HD:Path:to:text.txt" as «class utf8» -- replace the HFS path with the actual path
set theResult to {}
set matches to find text "\\d{1,4}<>.*" in theText with regexp and all occurrences
repeat with aMatch in matches
tell aMatch's matchResult
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end repeat
find text returns a record:
matchLen: length of the match
matchPos: offset of the match (0 is the first character!)
matchResult: the matching string (possibly formatted according to the "using" parameter)
The result of the script in variable theResult is a list of lists containing the id and the text. The text starts after the <> but you might cut more characters.
Edit:
It seems that the regex can't parse this text (or my regex knowledge is too bad).
This is a pure AppleScript version without the Scripting Addition.
set theText to read file ((path to desktop as text) & "description.txt") as «class utf8» -- replace the HFS path with the actual path
set {TID, text item delimiters} to {text item delimiters, ("~~" & linefeed)}
set theMatches to text items of theText
set text item delimiters to TID
set theResult to {}
repeat with aMatch in theMatches
if length of aMatch > 1 then
tell aMatch
set end of theResult to {text 1 thru 4, text 7 thru -1}
end tell
end if
end repeat

applescript choose from list result

I have created a 'choose from list' in Applescript where the choices are lines in a .txt file. It looks like this:
set listofUrls to {}
set Urls to paragraphs of (read urlList)
repeat with nextLine in Urls
if length of nextLine is greater than 0 then
copy nextLine to the end of listofUrls
end if
end repeat
choose from list listofUrls with title "Refine URL list" with prompt "Please select the URLs that will comprise your corpus." with multiple selections allowed
This works very nicely, and if I 'return result', I get a list in the results window in the formal "urlx", "urlb" etc.
The problem is thaat when I try to save this list to a textfile, with, for example:
write result to newList
the formatting of the file is bizarre:
listutxtÇhttp://url1.htmlutxtÇhttp://url2.htmlutxt~http://url3.htmlutxtzhttp:// ...
It seems that null characters have been inserted, too. So, does anybody know what's going on? Can anybody think of a way to either:
a) write results as clean (preferably newline delimited) txt?
b) clean this output so that it is back to normal?
Thanks for your time!
Daniel
without seeing what you are to write to file I think you just need to convert the result to a string with paragraphs
pseudo code
set listofUrls to {}
set urlList to ":Users:loaner:Documents:urllist.txt" as alias
set Urls to paragraphs of (read urlList)
repeat with nextLine in Urls
if length of nextLine is greater than 0 then
copy nextLine to the end of listofUrls
end if
end repeat
choose from list listofUrls with title "Refine URL list" with prompt "Please select the URLs that will comprise your corpus." with multiple selections allowed
set choices to the result
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
set list_2_string to choices as text
set AppleScript's text item delimiters to tid
log list_2_string
write list_2_string to newList

RichTextBox search'n'replace results are staggered

I am currently trying to generate colored results after a search containing keywords. My code displays a richtextbox containing a text that was succesfully hit by the search engine.
Now I want to highlight the keywords in the text, by making them bold and colored in red. I have my list of words in a nice string table, which I browse this way (rtb is my RichTextBox, plainText is the only Run from rtb, containing the entire text of it) :
rtb.SelectAll();
string allText = rtb.Selection.Text;
string expression = "";
foreach (string word in words)
{
expression = Regex.Escape(word);
Regex regExp = new Regex(expression);
foreach (Match match in regExp.Matches(allText))
{
TextPointer start = plainText.ContentStart.GetPositionAtOffset(match.Index, LogicalDirection.Forward);
TextPointer end = plainText.ContentStart.GetPositionAtOffset(match.Index + match.Length, LogicalDirection.Forward);
rtb.Selection.Select(start, end);
rtb.Selection.ApplyPropertyValue(Run.FontWeightProperty, FontWeights.Bold);
rtb.Selection.ApplyPropertyValue(Run.ForegroundProperty, "red");
}
}
Now I thought this would do the trick. But somehow, only the first word gets highlighted correctly. Then, the second occurence of the highlights starts two early, with the correct amount of letters getting highlighted, but a few characters before the actual word. Then for the third occurence it's further more characters earlier, etc.
Have you got any idea what is causing this behavior?
EDIT (01/07/2013): Still not figuring out why these results are staggered... So far I noticed that if I created a variable set to zero right before the second foreach statement, added it up to each textpointer's positions and incremented it by 4 (no idea why) at the end of each loop, the results are colored adequately. Nevertheless, if I search for two keywords or more (doesn't matter if they're the same size), each occurence of the first keyword get colored correctly, but only the first occurences of the other keywords are well-colored. (the others are staggered again) Here's the edited code:
rtb.SelectAll();
string allText = rtb.Selection.Text;
string expression = "";
foreach (string word in words)
{
expression = Regex.Escape(word);
Regex regExp = new Regex(expression);
int i = 0;
foreach (Match match in regExp.Matches(allText))
{
TextPointer start = plainText.ContentStart.GetPositionAtOffset(match.Index + i, LogicalDirection.Forward);
TextPointer end = plainText.ContentStart.GetPositionAtOffset(match.Index + match.Length + i, LogicalDirection.Forward);
rtb.Selection.Select(start, end);
rtb.Selection.ApplyPropertyValue(Run.FontWeightProperty, FontWeights.Bold);
rtb.Selection.ApplyPropertyValue(Run.ForegroundProperty, "red");
i += 4; // number found out from trials
}
}
Alright! So I learned by reading this question that everytime I modify the style, it adds 4 characters to the text, which is what was messing up my setting.
In order to fix this, since I possibly have multiple keywords and that they do not appear one after the other in the text in the order that they were typed in the search box, I had to first browse my text to locate each occurence for each keyword without modifying the text. For each occurence, I store in a custom list the start position, end position and desired color of the occurence.
When this selection is done, I order my occurence list by the start attribute of each member in it. I can now be assured that each occurence I browse in my foreach loop is the next one in the text, with no regard to its content or length. And I know in which color I want to make it appear, so I can distinguish different keywords.
Then, finally, I can browse each member of my ordered list and modify the style of my text, knowing that the next word will appear later in the text, so I must add 4 characters to my index at the end of each loop.

Format lists in VIM

I would like to find a way to easy format lists in Vim.
I checked PAR and the default formatter of Vim.
p.e.
1. this is my text this is my text this is my text
2. this is my text this is my text this is my text
3. this is my text this is my text this is my text
4. this is my text this is my text this is my text
and this
- this is my text this is my text this is my text
- this is my text this is my text this is my text
- this is my text this is my text this is my text
- this is my text this is my text this is my text
when I select the lines and do a default format to 42 with PAR and VIM these are the results:
NUMBERED LIST
formatting with par:
par error:
(42) <= (0) + (50)
formatting with vim:
1. this is my text this is my text this is
my text
2. this is my text this is my text this is
my text
3. this is my text this is my text this is
my text
4. this is my text this is my text this is
my text
LIST with '-'
formatting with par:
4 lines filtered (no change)
formatting with vim:
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
- this is my text this is my text this is
my text
Vim does a better job formatting lists but it is not correct as well in a numbered list.
Par does have a lot of troubles formatting lists even when I use the prefix ("p") option like this:
'<,'>!par w42p4dh or '<,'>!par w42p3dh
Does anyone know a good way how to format lists without problems?
Try set fo+=n. From :help fo-table:
n When formatting text, recognize numbered lists. This actually uses
the 'formatlistpat' option, thus any kind of list can be used. The
indent of the text after the number is used for the next line. The
default is to find a number, optionally followed by '.', ':', ')',
']' or '}'. Note that 'autoindent' must be set too. Doesn't work
well together with "2".
Example:
1. the first item
wraps
2. the second item