Swift 3: iosMath label removing all spaces - replace

I'm trying to display text which may at times contain a math expression so I am using MTMathUILabel from iosMath. I generate the labels dynamically and add them to a stack as I pull the strings from the db. The problem is that all text which is not math appears with no spaces. i.e:
In db: Solve the following equation: (math here)
In label: Solvethefollowingequation: (math here)
Here is what I have tried so far:
for question in all_questions {
let finalString = question.question?.replacingOccurrences(of: " ", with: "\\space", options: .literal, range: nil)
let label = MTMathUILabel()
label.textColor = UIColor.black
label.latex = finalString
stack.addArrangedSubview(label)
}
But the problem is that it literally places two . And xcode doesn't let me write just one \ because it is not escaped. However if I just write
print("\\space")
Then it will print just one.
How can I fix this so I add only one \? If this cannot be done, how can I achieve what I want? Is there a better library out there?

After giving a quick look at MTMathUILabel's doc and LaTeX conventions, I believe you should replace your spaces with a tilde character "~". This will make them non-breaking spaces and avoid the backslash issue (which is probably due to \space not being understood by MTMathUILabel).
Systematic replacement of all spaces may yield undesirable result if the formula itself has legitimate spaces in it.
For example, a quadratic equation would be expressed as:
x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}
You will end up replacing spaces inside curly braces, and that may or may not be what you want:
x~=~\frac{-b~\pm~\sqrt{b^2-4ac}}{2a}

Related

How to remove/replace specials characters from a 'dynamic' regex/string on ruby?

So I had this code working for a few months already, lets say I have a table called Categories, which has a string column called name, so I receive a string and I want to know if any category was mentioned (a mention occur when the string contains the substring: #name_of_a_category), the approach I follow for this was something like below:
categories.select { |category_i| content_received.downcase.match(/##{category_i.downcase}/)}
That worked pretty well until today suddenly started to receive an exception unmatched close parenthesis, I realized that the categories names can contain special chars so I decided to not consider special chars or spaces anymore (don't want to add restrictions to the user and at the same time don't want to deal with those cases so the policy is just to ignore it).
So the question is there a clean way of removing these special chars (maintaining the #) and matching the string (don't want to modify the data just ignore it while looking for mentions)?
You can also use
prep_content_received = content_received.gsub(/[^\w\s]|_/,'')
p categories.select { |c|
prep_content_received.match?(/\b#{c.gsub(/[^\w\s]|_/, '').strip()}\b/i)
}
See the Ruby demo
Details:
The prep_content_received = content_received.gsub(/[^\w\s]|_/,'') creates a copy of content_received with no special chars and _. Using it once reduced overhead if there are a lot of categories
Then, you iterate over the categories list, and each time check if the prep_content_received matches \b (word boundary) + category with all special chars, _ and leading/trailing whitespace stripped from it + \b in a case insensitive way (see the /i flag, no need to .downcase).
So after looking around I found some answers on the platform but nothing with my specific requirements (maybe I missed something, if so please let me know), and this is how I fix it for my case:
content_received = 'pepe is watching a #comedy :)'
categories = ['comedy :)', 'terror']
temp_content = content_received.downcase
categories.select { |category_i| temp_content.gsub(/[^\sa-zA-Z0-9]/, '#' => '#').match?(/##{category_i.downcase.
gsub(/[^\sa-zA-Z0-9]/, '')}/) }
For the sake of the example, I reduced the categories to a simple array of strings, basically the first gsub, remove any character that is not a letter or a number (any special character) and replace each # with an #, the second gsub is a simpler version of the first one.
You can test the snippet above here

Regex Multiple rows [duplicate]

I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :
=REGEXEXTRACT(A1, "\d-")
My problem is that it only returns the first match... how can I get all matches?
Example text:
"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"
My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).
I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula.
Something like the "global - Don't return after first match" option on regex101.com
I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).
Any help appreciated!
Thanks :)
You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:
You may create your own custom function in the Script Editor:
function ExtractAllRegex(input, pattern,groupId) {
return [Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId])];
}
Or, if you need to return all matches in a single cell joined with some separator:
function ExtractAllRegex(input, pattern,groupId,separator) {
return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
}
Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").
Description:
input - current cell value
pattern - regex pattern
groupId - Capturing group ID you want to extract
separator - text used to join the matched results.
Edit
I came up with more general solution:
=regexreplace(A1,"(.)?(\d-)|(.)","$2")
It replaces any text except the second group match (\d-) with just the second group $2.
"(.)?(\d-)|(.)"
1 2 3
Groups are in ()
---------------------------------------
"$2" -- means return the group number 2
Learn regular expressions: https://regexone.com
Try this formula:
=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")
It will handle string like this:
"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"
with output:
1-2-2-2-3-
I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:
Input:
1111 days, 123 hours 1234 minutes and 121 seconds
Expected output:
1111 123 1234 121
Formula:
=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")
The shortest possible regex:
=regexreplace(A1,".?(\d-)|.", "$1")
Which returns 1-2-2-2-2-2-2-2-2-2-3-3- for "A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq".
Explanation of regex:
.? -- optional character
(\d-) -- capture group 1 with a digit followed by a dash (specify (\d+-) multiple digits)
| -- logical or
. -- any character
the replacement "$1" uses just the capture group 1, and discards anything else
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
This seems to work and I have tried to verify it.
The logic is
(1) Replace letter followed by hyphen with nothing
(2) Replace any digit not followed by a hyphen with nothing
(3) Replace everything which is not a digit or hyphen with nothing
=regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")
Result
1-2-2-2-2-2-2-2-2-2-3-3-
Analysis
I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").
Here are some examples of how I think it must deal with the text
The solution to capture groups with RegexReplace and then do the RegexExctract works here too, but there is a catch.
=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
If the cell that you are trying to get the values has Special Characters like parentheses "(" or question mark "?" the solution provided won´t work.
In my case, I was trying to list all “variables text” contained in the cell. Those “variables text “ was wrote inside like that: “{example_name}”. But the full content of the cell had special characters making the regex formula do break. When I removed theses specials characters, then I could list all captured groups like the solution did.
There are two general ('Excel' / 'native' / non-Apps Script) solutions to return an array of regex matches in the style of REGEXEXTRACT:
Method 1)
insert a delimiter around matches, remove junk, and call SPLIT
Regexes work by iterating over the string from left to right, and 'consuming'. If we are careful to consume junk values, we can throw them away.
(This gets around the problem faced by the currently accepted solution, which is that as Carlos Eduardo Oliveira mentions, it will obviously fail if the corpus text contains special regex characters.)
First we pick a delimiter, which must not already exist in the text. The proper way to do this is to parse the text to temporarily replace our delimiter with a "temporary delimiter", like if we were going to use commas "," we'd first replace all existing commas with something like "<<QUOTED-COMMA>>" then un-replace them later. BUT, for simplicity's sake, we'll just grab a random character such as  from the private-use unicode blocks and use it as our special delimiter (note that it is 2 bytes... google spreadsheets might not count bytes in graphemes in a consistent way, but we'll be careful later).
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
"xyzSixSpaces:[ ]123ThreeSpaces:[ ]aaaa 12345",".*?( |$)",
"$1"
)
),
""
)
We just use a lambda to define temp="match1match2match3", then use that to remove the last delimiter into "match1match2match3", then SPLIT it.
Taking COLUMNS of the result will prove that the correct result is returned, i.e. {" ", " ", " "}.
This is a particularly good function to turn into a Named Function, and call it something like REGEXGLOBALEXTRACT(text,regex) or REGEXALLEXTRACT(text,regex), e.g.:
=SPLIT(
LAMBDA(temp,
MID(temp, 1, LEN(temp)-LEN(""))
)(
REGEXREPLACE(
text,
".*?("&regex&"|$)",
"$1"
)
),
""
)
Method 2)
use recursion
With LAMBDA (i.e. lets you define a function like any other programming language), you can use some tricks from the well-studied lambda calculus and function programming: you have access to recursion. Defining a recursive function is confusing because there's no easy way for it to refer to itself, so you have to use a trick/convention:
trick for recursive functions: to actually define a function f which needs to refer to itself, instead define a function that takes a parameter of itself and returns the function you actually want; pass in this 'convention' to the Y-combinator to turn it into an actual recursive function
The plumbing which takes such a function work is called the Y-combinator. Here is a good article to understand it if you have some programming background.
For example to get the result of 5! (5 factorial, i.e. implement our own FACT(5)), we could define:
Named Function Y(f)=LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) ) (this is the Y-combinator and is magic; you don't have to understand it to use it)
Named Function MY_FACTORIAL(n)=
Y(LAMBDA(self,
LAMBDA(n,
IF(n=0, 1, n*self(n-1))
)
))
result of MY_FACTORIAL(5): 120
The Y-combinator makes writing recursive functions look relatively easy, like an introduction to programming class. I'm using Named Functions for clarity, but you could just dump it all together at the expense of sanity...
=LAMBDA(Y,
Y(LAMBDA(self, LAMBDA(n, IF(n=0,1,n*self(n-1))) ))(5)
)(
LAMBDA(f, (LAMBDA(x,x(x)))( LAMBDA(x, f(LAMBDA(y, x(x)(y)))) ) )
)
How does this apply to the problem at hand? Well a recursive solution is as follows:
in pseudocode below, I use 'function' instead of LAMBDA, but it's the same thing:
// code to get around the fact that you can't have 0-length arrays
function emptyList() {
return {"ignore this value"}
}
function listToArray(myList) {
return OFFSET(myList,0,1)
}
function allMatches(text, regex) {
allMatchesHelper(emptyList(), text, regex)
}
function allMatchesHelper(resultsToReturn, text, regex) {
currentMatch = REGEXEXTRACT(...)
if (currentMatch succeeds) {
textWithoutMatch = SUBSTITUTE(text, currentMatch, "", 1)
return allMatches(
{resultsToReturn,currentMatch},
textWithoutMatch,
regex
)
} else {
return listToArray(resultsToReturn)
}
}
Unfortunately, the recursive approach is quadratic order of growth (because it's appending the results over and over to itself, while recreating the giant search string with smaller and smaller bites taken out of it, so 1+2+3+4+5+... = big^2, which can add up to a lot of time), so may be slow if you have many many matches. It's better to stay inside the regex engine for speed, since it's probably highly optimized.
You could of course avoid using Named Functions by doing temporary bindings with LAMBDA(varName, expr)(varValue) if you want to use varName in an expression. (You can define this pattern as a Named Function =cont(varValue) to invert the order of the parameters to keep code cleaner, or not.)
Whenever I use varName = varValue, write that instead.
to see if a match succeeds, use ISNA(...)
It would look something like:
Named Function allMatches(resultsToReturn, text, regex):
UNTESTED:
LAMBDA(helper,
OFFSET(
helper({"ignore"}, text, regex),
0,1)
)(
Y(LAMBDA(helperItself,
LAMBDA(results, partialText,
LAMBDA(currentMatch,
IF(ISNA(currentMatch),
results,
LAMBDA(textWithoutMatch,
helperItself({results,currentMatch}, textWithoutMatch)
)(
SUBSTITUTE(partialText, currentMatch, "", 1)
)
)
)(
REGEXEXTRACT(partialText, regex)
)
)
))
)

Regex to select text outside of underscores

I am looking for a regex to select the text which falls outside of underscore characters.
Sample text:
PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant
Basically I need to be able to select the first keyword which is always before the first underscore and the last keyword which is always after the last underscore. As an additional complexity, there case also be texts which have no underscore at all, these need to be selected completely as well.
The best I got yet was this expression:
^((?! *\_[^)]*\_ *).)*
which is only yielding me the first part, not the second and it has no support for the non-underscore yet at all.
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
Thanks!
Use JavaScript string function split(). Check below example.
var t = "PartIWant_partINeedIgnored_morePartsINeedIgnored_PartIwant";
var arr = t.split('_');
console.log(arr);
//Access the required parts like this
console.log(arr[0] + ' ' + arr[arr.length - 1]);
Perhaps something like this:
/(^[^_]+)|([^_]+$)/g
That is, match either:
^[^_]+ the beginning of the string followed by non-underscores, or
[^_]+$ non-underscores followed by the end of the string.
var regex = /(^[^_]+)|([^_]+$)/g
console.log("A_b_c_D".match(regex)) // ["A", "D"]
console.log("A_b_D".match(regex)) // ["A", "D"]
console.log("A_D".match(regex)) // ["A", "D"]
console.log("AD".match(regex)) // ["AD"]
I'm not sure if you should use a regex here. I think splitting the string at underscore, and using the first and last element of the resulting array might be faster, and less complicated.
Trivial with .replace:
str.replace(/_.*_/, '')
// "PartIWantPartIwant"
With matching, you'd need to be selecting and concatenating groups:
parts = str.match(/^([^_]*).*?([^_]*)$/)
parts[1] + parts[2]
// "PartIWantPartIwant"
EDIT
This regex is used in a tool which monitors our http traffic, which means I can only 'select' the part I need but can't invoke functions or replace logic.
This is not possible: a regular expression cannot match a discontinuous span.

Removing parentheses as unwanted text in R using gsub

I'm trying to clean up a column in my data frame where the rows look like this:
1234, text ()
and I need to keep just the number in all the rows. I used:
df$column = gsub(", text ()", "", df$column)
and got this:
1234()
I repeated the operation with only the parentheses, but they won't go away. I wasn't able to find an example that deals specifically with parentheses being eliminated as unwanted text. sub doesn't work either.
Anyone knows why this isn't working?
Parentheses are stored metacharacters in regex. You should escape them either using \\ or [] or adding fixed = TRUE. But in your case you just want to keep the number, so just remove everything else using \\D
gsub("\\D", "", "1234, text ()")
## [1] "1234"
If your column always looks like a format described above :
1234, text ()
Something like the following should work:
string extractedNumber = Regex.Match( INPUT_COLUMN, #"^\d{4,}").Value
Reads like: From the start of the string find four or more digits.

Regex Split: Split column into Name, percentage andsolvent

Looking for a regex that can split expressions like:
A-6-b 10/%XYZ
into:
A-6-b
10%
/XYZ
Note that the first group can also contain spaces and numbers:
AQDF 100 56%/ABC
and percentage can be a float:
SFSDF 0.1%/ABC
I've come up with (^[A-Z\s\d-]*)(?!%)(\d+%)(.*$) but this doe snot match any percentages that are floats and more importantly even simple examples like ABC 10%/XYZ fail because the first digit of the percentage is assigned to the first capturing group.
Any idea how I can achieve what I want? I'm not a regex expert...
EDIT: fixed errors in example
EDIT2:
The examples are not complete. Here one more:
ABC Dwsd 0.01%/XYZ QST
First part can contain spaces
Last Part can contain spaces
number can be a float
Super simple:
/^(.*) ([1-9][0-9]*(?:\.[0-9]+)?%)(.*)$/
The most easily identifiable item is your percentage, so the ([1-9][0-9]*(?:\.[0-9]+)?%) part deals with finding that.
Then it's simply a case of getting everything before (excluding the final space) to get the name, and everything after to get the solvent.
Done.
Don't overcomplicate this by using one unreadable regex.
Based on what you've said, your separators are well defined (the last space and the last %). In JavaScript, for example, you could use:
var str = "A-6-b 10/%XYZ";
var firstSeparator = str.lastIndexOf(' ');
var secondSeparator = str.lastIndexOf('%');
var name = str.substring(0, firstSeparator);
var percentage = str.substring(firstSeparator + 1, secondSeparator + 1); // we want to include the % separator in this one
var solvent = str.substring(secondSeparator + 1);
console.log(name, percentage, solvent);
Working JSFiddle: http://jsfiddle.net/rL5uymhm/
(There may be a typo in your question, as your examples differ on where the / symbol appears. So the code may need tweaking. My point still stands – don't use a regex for the sake of it when there is a more readable alternative.)
IF you really want to use a regex, /^(.+ )([^%]+%)(.*)$/ should work.
I try this Let me know if you have any problem in comment.
((?:(?!\s*[0-9]*\/%).)*)\s*([\d\/%]*)\s*(.*)
SEE DEMO : http://regex101.com/r/lL8oN4/1
This one works for me (using PCRE):
/^(.+) ([0-9.]+)[\/%]+([^\/]+)$/