Replace cleverly values in a cell with macros using regex? - regex

I've got the following macro :
Sub MacroClear()
Dim wbD As Workbook, _
wbC As Workbook, _
wsD As Worksheet, _
wsC As Worksheet, _
DicC() As Variant, _
Dic() As String, _
ValToReplace As String, _
IsInDic As Boolean
Set wbD = Workbooks.Open("D:\Users\me\Documents\macro\Dictionnary", ReadOnly:=True)
Set wbC = Workbooks("FileToTreat.xlsm")
Set wsD = wbD.Worksheets("Feuil1")
Set wsC = wbC.Worksheets("draft")
ReDim DicC(1, 0)
For i = 1 To wsD.Range("C" & wsD.Rows.Count).End(xlUp).Row
Dic = Split(wsD.Cells(i, 3), ";")
ValToReplace = Trim(wsD.Cells(i, 2))
For k = LBound(Dic) To UBound(Dic)
IsInDic = False
For l = LBound(DicC, 2) To UBound(DicC, 2)
If LCase(DicC(1, l)) <> Trim(LCase(Dic(k))) Then
'No match
Else
'Match
IsInDic = True
Exit For
End If
Next l
If IsInDic Then
'Don't add to DicC
Else
DicC(0, UBound(DicC, 2)) = Trim(Dic(k))
DicC(1, UBound(DicC, 2)) = ValToReplace
ReDim Preserve DicC(UBound(DicC, 1), UBound(DicC, 2) + 1)
End If
Next k
Next i
ReDim Preserve DicC(UBound(DicC, 1), UBound(DicC, 2) - 1)
wbD.Close
Erase Dic
For l = LBound(DicC, 2) To UBound(DicC, 2)
Cells.Replace What:="*" & Trim(DicC(0, l)) & "*", _
Replacement:=Trim(DicC(1, l)), _
LookAt:=xlPart, _
SearchOrder:=xlByRows, _
MatchCase:=False, _
SearchFormat:=False, _
ReplaceFormat:=False
Next l
Erase DicC
Set wbD = Nothing
Set wbC = Nothing
Set wsD = Nothing
Set wsC = Nothing
End Sub
I'll try to explain : It takes from the dictionnary my "words to replace" (column C), all separated by a ";", and my "primary words" (column B).
image http://img11.hostingpics.net/pics/403257dictionnary.png
Then it searches in all the cells of my "file to treat" (via Cells.Replace), if it finds something in column C of my dictionnary, it replaces it with what's in column B.
But now that I've got "SCE" in my column C (For Sony Computer Entertainment, to be replaced by Sony in column B), even when SCE is in a word (for example : ascend), it replaces the word with Sony. I don't want to replace it if it's inside a word...
In Java, I'd have done it easily with p = Pattern.compile("[^a-zA-Z]"+keyword+"[^a-zA-Z]", Pattern.CASE_INSENSITIVE);
but I have no idea how to solve this problem in VBA. I tried some things but it didn't work, had errors etc. so I came back to the start.

So I changed few parameters in the replace method and proposed a loop for all your cells, you'll just have to set the right column (in second proposition : here B=2).
Parameters :
LookAt:=xlWhole 'To search for whole expression
SearchOrder:=xlByColumns 'Search in column
MatchCase:=True 'Will look for the expression with the same casing (not sure about this word...)
Try one of these :
For l = LBound(DicC, 2) To UBound(DicC, 2)
Cells.Replace What:="*" & Trim(DicC(0, l)) & "*", _
Replacement:=Trim(DicC(1, l)), _
LookAt:=xlWhole, _
SearchOrder:=xlByColumns, _
MatchCase:=True, _
SearchFormat:=False, _
ReplaceFormat:=False
Next l
Or with the loop on each cell :
For l = LBound(DicC, 2) To UBound(DicC, 2)
For k = 1 To wsC.Rows(wsC.Rows.Count).End(xlUp).Row
wsC.Cells(i, 2).Replace What:="*" & Trim(DicC(0, l)) & "*", _
Replacement:=Trim(DicC(1, l)), _
LookAt:=xlWhole, _
SearchOrder:=xlByColumns, _
MatchCase:=True, _
SearchFormat:=False, _
ReplaceFormat:=False
Next k
Next l

Related

how to extract words from a string with special characters

I am currently trying to do a spellcheck, but am having some trouble dealing with certain cases.
For example, given the string: { else"--but, }, my spellcheck automatically reads this as an invalid word. However, since else and but are both correctly spelled, I don't want to mark this as incorrect.
Is there any way I can do this with regular expressions?
A more common case I am having trouble with is things like "waistcoat-pocket".
Rather than a regular expression, you should use unicode word segmentation. With the uuseg and uucp library, you can extract words and filter word boundaries with
let is_alphaword =
let alphachar = function
| `Malformed _ -> false
| `Uchar x ->
match Uucp.Break.word x with
| `LE | `Extend -> true
| _ -> false
in
Uutf.String.fold_utf_8 (fun acc _ x -> acc && alphachar x) true
(* Note that we are supposing strings to be utf-8 encoded *)
let words s =
let cons l x = if is_alphaword x then x :: l else l in
List.rev (Uuseg_string.fold_utf_8 `Word cons [] s)
This function splits the string words-by-words:
words "else\"--but";;
- : string list = ["else"; "but"]
words "waistcoat-pocket";;
- : string list = ["waistcoat"; "pocket"]
and works correctly in more general context
words "आ तवेता नि षीदतेन्द्रमभि पर गायत";;
- : string list =
["आ"; "तवेता"; "नि"; "षीदतेन्द्रमभि";
"पर"; "गायत"]
or
words "Étoile(de Barnard)";;
- : string list = ["Étoile"; "de"; "Barnard"]

How do you split a string into lists unless it is inside quotation marks ("") in Ocaml?

I'm reading an input file of several lines. Each line has the following format:
Greeting "hello"
Greeting " Good morning"
Sit
Smile
Question "How are you?"
My current can read each line into a string list. Then I process it using this function which is supposed to break it into a string list list:
let rec process (l : string list) (acc : string list list) : string list list =
match l with
| [] -> acc
| hd :: tl -> String.split_on_char ' ' hd :: (process tl acc)
Which, unfortunately, does not work, since it also splits spaces inside quotation marks. Anyone think of a the right way to do this, possibly using map or fold_left, etc? This would be my expected output:
[["Greeting"; "/"hello/""];[Greeting; "/" Good morning"];["Sit"]]
and so on. Thank you!
You want a real (but very simple) lexical analysis. IMHO this is beyond what you can do with simple string splitting.
A scanner takes a stream of characters and returns the next token it sees. You can make a string into a stream by having an index that traverses the string.
Here is a scanner that is roughly what you would want:
let rec scan s offset =
let slen = String.length s in
if offset >= slen then
None
else if s.[offset] = ' ' then
scan s (offset + 1)
else if s.[offset] = '"' then
let rec qlook loff =
if loff >= slen then
(* Unterminated quotation *)
let tok = String.sub s offset (slen - offset) in
Some (tok, slen)
else if s.[loff] = '"' then
let tok = String.sub s offset (loff - offset + 1) in
Some (tok, loff + 1)
else qlook (loff + 1)
in
qlook (offset + 1)
else
let rec wlook loff =
if loff >= slen then
let tok = String.sub s offset (slen - offset) in
Some (tok, slen)
else if s.[loff] = ' ' || s.[loff] = '"' then
let tok = String.sub s offset (loff - offset) in
Some (tok, loff)
else
wlook (loff + 1)
in
wlook (offset + 1)
It handles a few cases that you didn't specify: what to do if there is an unclosed quotation. What to do with something like abc"def ghi".
The scanner returns None at the end of the string, or Some (token, offset), i.e., the next token and the offset to continue scanning.
A recursive function to break up a string would look something like this:
let split s =
let rec isplit accum offset =
match scan s offset with
| None -> List.rev accum
| Some (tok, offset') -> isplit (tok :: accum) offset'
in
isplit [] 0
This can be visualized with a state machine. You have 2 main states: looking for ' ' and looking for '"'. Processing strings is ugly and you can't pattern match it. So first thing I did is turn the string into a char list. Implementing the two states then becomes simple:
let split s =
let rec split_space acc word = function
| [] -> List.rev (List.rev word::acc)
| ' '::xs -> split_space (List.rev word::acc) [] xs
| '"'::xs -> find_quote acc ('"'::word) xs
| x::xs -> split_space acc (x::word) xs
and find_quote acc word = function
| [] -> List.rev (List.rev word::acc)
| '"'::xs -> split_space acc ('"'::word) xs
| x::xs -> find_quote acc (x::word) xs
in
split_space [] [] s
;;
# split ['a';'b';' ';'"';'c';' ';'d';'"';' ';'e'];;
- : char list list = [['a'; 'b']; ['"'; 'c'; ' '; 'd'; '"']; ['e']]
Now if you want to do it with strings that's left to you. The Idea would be the same. Or you can just turn the char list list into a string list at the end.

SML- how to look at a string and put letters a-z into a list (only once)

I have seen some similar questions, but nothing that really helped me. Basically the title says it all. Using SML I want to take a string that I have, and make a list containing each letter found in the string. Any help would be greatly appreciated.
One possibility is to use the basic logic of quicksort to sort the letters while removing duplicates at the same time. Something like:
fun distinctChars []:char list = []
| distinctChars (c::cs) =
let val smaller = List.filter (fn x => x < c) cs
val bigger = List.filter (fn x => x > c) cs
in distinctChars smaller # [c] # distinctChars bigger
end
If the < and > in the definitions of smaller and bigger were to be replaced by <= and >= then it would simply be an implementation of quicksort (although not the most efficient one since it makes two passes over cs when a suitably defined auxiliary function could split into smaller and bigger in just one pass). The strict inequalities have the effect of throwing away duplicates.
To get what you want from here, do something like explode the string into a list of chars, remove non-alphabetical characters from the resulting list, while simultaneously converting to lower case, then invoke the above function -- ideally first refined so that it uses a custom split function rather than List.filter twice.
On Edit: # is an expensive operator and probably results in the naïve SML quicksort not being all that quick. You can use the above idea of a modified sort, but one that modifies mergesort instead of quicksort:
fun split ls =
let fun split' [] (xs,ys) = (xs,ys)
| split' (a::[]) (xs, ys) = (a::xs,ys)
| split' (a::b::cs) (xs, ys) = split' cs (a::xs, b::ys)
in split' ls ([],[])
end
fun mergeDistinct ([], ys) = ys:char list
| mergeDistinct (xs, []) = xs
| mergeDistinct (x::xs, y::ys) =
if x < y then x::mergeDistinct(xs,y::ys)
else if x > y then y::mergeDistinct(x::xs,ys)
else mergeDistinct(x::xs, ys)
fun distinctChars [] = []
| distinctChars [c] = [c]
| distinctChars chars =
let val (xs,ys) = split chars
in mergeDistinct (distinctChars xs, distinctChars ys)
end
You can get a list of all the letters in a few different ways:
val letters = [#"a",#"b",#"c",#"d",#"e",#"f",#"g",#"h",#"i",#"j",#"k",#"l",#"m",#"n",#"o",#"p",#"q",#"r",#"s",#"t",#"u",#"v",#"w",#"x",#"y",#"z"]
val letters = explode "abcdefghijklmnopqrstuvwxyz"
val letters = List.tabulate (26, fn i => chr (i + ord #"a"))
Update: Looking at your question and John's answer, I might have misunderstood your intention. An efficient way to iterate over a string and gather some result (e.g. a set of characters) could be to write a "foldr for strings":
fun string_foldr f acc0 s =
let val len = size s
fun loop i acc = if i < len then loop (i+1) (f (String.sub (s, i), acc)) else acc
in loop 0 acc0 end
Given an implementation of sets with at least setEmpty and setInsert, one could then write:
val setLetters = string_foldr (fn (c, ls) => setInsert ls c) setEmpty "some sentence"
The simplest solution I can think of:
To get the distinct elements of a list:
Take the head
Remove that value from the tail and get the distinct elements of the result.
Put 1 and 2 together.
In code:
(* Return the distinct elements of a list *)
fun distinct [] = []
| distinct (x::xs) = x :: distinct (List.filter (fn c => x <> c) xs);
(* All the distinct letters, in lower case. *)
fun letters s = distinct (List.map Char.toLower (List.filter Char.isAlpha (explode s)));
(* Variation: "point-free" style *)
val letters' = distinct o (List.map Char.toLower) o (List.filter Char.isAlpha) o explode;
This is probably not the most efficient solution, but it's uncomplicated.

Trying to get first word from character list

I have a character list [#"h", #"i", #" ", #"h", #"i"] which I want to get the first word from this (the first character sequence before each space).
I've written a function which gives me this warning:
stdIn:13.1-13.42 Warning: type vars not generalized because of value
restriction are instantiated to dummy types (X1,X2,...)
Here is my code:
fun next [] = ([], [])
| next (hd::tl) = if(not(ord(hd) >= 97 andalso ord(hd) <= 122)) then ([], (hd::tl))
else
let
fun getword [] = [] | getword (hd::tl) = if(ord(hd) >= 97 andalso ord(hd) <= 122) then [hd]#getword tl else [];
in
next (getword (hd::tl))
end;
EDIT:
Expected input and output
next [#"h", #"i", #" ", #"h", #"i"] => ([#"h", #"i"], [#" ", #"h", #"i"])
Can anybody help me with this solution? Thanks!
This functionality already exists within the standard library:
val nexts = String.tokens Char.isSpace
val nexts_test = nexts "hi hi hi" = ["hi", "hi", "hi"]
But if you were to build such a function anyway, it seems that you return ([], []) sometimes and a single list at other times. Normally in a recursive function, you can build the result by doing e.g. c :: recursive_f cs, but this is assuming your function returns a single list. If, instead, it returns a tuple, you suddenly have to unpack this tuple using e.g. pattern matching in a let-expression:
let val (x, y) = recursive_f cs
in (c :: x, y + ...) end
Or you could use an extra argument inside a helper function (since the extra argument would change the type of the function) to store the word you're extracting, instead. A consequence of doing that is that you end up with the word in reverse and have to reverse it back when you're done recursing.
fun isLegal c = ord c >= 97 andalso ord c <= 122 (* Only lowercase ASCII letters *)
(* But why not use one of the following:
fun isLegal c = Char.isAlpha c
fun isLegal c = not (Char.isSpace c) *)
fun next input =
let fun extract (c::cs) word =
if isLegal c
then extract cs (c::word)
else (rev word, c::cs)
| extract [] word = (rev word, [])
in extract input [] end
val next_test_1 =
let val (w, r) = next (explode "hello world")
in (implode w, implode r) = ("hello", " world")
end
val next_test_2 = next [] = ([], [])

Why does pattern with variable not match?

I'm writing a code that can find the median of a list, and I cannot use rec and should use List.fold_left/right. I wrote the following code, which should work.
It finds the length of the list, if it's an odd number like 5, then it sets len1, len2 to 2, 3, if it's an even number like 6, then it sets len1, len2 to 2, 3.
Then for each member in the list I match the number of those elements that are less than it.
However, the following pattern matching always math lessNum elmt to len1 - can someone tell me why it is so?
let median (lst : int list) : float option =
let len = List.length lst in
if lst = [] then None
else
let len1, len2 = (len - 1) / 2, (len + 1) / 2 in
let lessNum a =
List.length (List.find_all (fun n -> n < a) lst) in
let answer = List.fold_left (fun accm elmt ->
match (lessNum elmt) with
| len1 -> accm + elmt
| len2 -> failwith "len2"
| _ -> failwith "other"
) 0 lst in
if len mod 2 = 0
then Some ((float_of_int answer) /. 2.0)
else Some (float_of_int answer)
An identifier appearing in a pattern always matches, and binds the corresponding value to the identifier. Any current value of the identifier doesn't matter at all: the pattern causes a new binding, i.e., it gives a new value to the identifier (just inside the match).
# let a = 3;;
val a : int = 3
# match 5 with a -> a;;
- : int = 5
# a;;
- : int = 3
#
So, your match statement isn't doing what you think it is. You'll probably have to use an if for that part of your code.
Update
Here's how to use an association list to approximate the function f in your followup question:
let f x = List.assoc x [(pat1, ans1); (pat2, ans2)]
This will raise a Not_found exception if x is not equal to pat1 or pat2.
(I think your Python code is missing return.)