Convert Regex to be safari compliant - regex

I am using the following code to find any instances of "\n" newline that doesn't have a space on both sides and add a space on both sides.
Scenarios
There is a space on both sides of /n. = Do Nothing
There is a space either before or after the /n = Add a space on both sides.
There are no spaces on either side = Add a space on both sides.
But why do you need this?
I need to space seperate words in a paragraph without affecting the paragraph structure. If I split by /s then the structure is gone, so in order to maintain it I want to put a space on either side of the /n new line.
This looks ok, whats the problem?
This works for new version of Chrome, but doesn't work for old version and doesn't work for Safari and needs to be support across browsers
Question:
How can I maintain this logic without using non Safari supported Regex using Dart.
Code Example
var regex = RegExp("\n(?! )|(?<! )\n");
if (text.contains(regex)) {
String newString = text.replaceAll(regex, " \n ");
updatedString = newString;
}

You can use
var regex = RegExp(" ?\n ?");
updatedString = text.replaceAll(regex, " \n ");
See the regex demo
The " ?\n ?" pattern matches an optional space, then a newline, and then an optional space, and then the match is replaced with a space+newline+space yielding the expected result: if there is are spaces on left and right, they are kept, else, a space is added.

Related

What is a Regex equivalent of the 'Trim' function used in languages such as VB and Java..?

I'm using Regex in a Microsoft Access 2007 database with a VBA project reference to Microsoft VBScript Regular Expressions 5.5.
All is well...mostly. I would like to know a Regular Expression that will act like the 'Trim' function..? (remove leading and trailing spaces)
I have this: ((?:.?)*) which is to "capture everything after the last match". But it always matches extra spaces which I would like to remove.
Below is the relevant code, followed by a screenshot of the debugger. Item 4 in the submatches has " CAN". How do I remove the space with Regex, so I don't have to use the Trim function..?
pattern = "^(\d{1,2})(?:\/)(\d{1,2}(?:\.\d{1,3})?)(OZ)((?:.?)*)"
regex.pattern = pattern
Set matchCollection = regex.Execute(workstring)
If matchCollection.Count > 0 Then
matchSection = "LOOSE CASES"
itemtype = "CASE"
percase = matchCollection(0).SubMatches(0)
perpack = 1
unitsize = matchCollection(0).SubMatches(1)
uom = matchCollection(0).SubMatches(2)
other = VBA.Trim(matchCollection(0).SubMatches(3))
End If
...
Ok, I finally figured it out. To reiterate (and clarify): my original regex ((?:.?)*) is meant to "capture anything left after the last match". But it also captured leading & trailing spaces.
Removing the leading spaces was fairly easy, but every attempt to remove the trailing spaces was foiled by the * in the group. Then I read about \b and dropped one in and now it works.
This is what I have now: (?: ?)((?:.?)*)\b(?: *) which is "match anything left after the last match, except leading or trailing spaces".
And in context, this is the whole of it...
(\d{1,2})/(\d{1,2})PK-(\d{1,2}(?:.\d{1,3})?)(OZ|ML)(?: ?)((?:.?)*)\b(?: *)
Which is meant to match on a string such as this...
2/12PK-11.125OZ CAN RET
...which describes cases of beer in our warehouse. =-)

Regex in Excel VBA Special Characters and Embedded Spaces

I have to parse a huge file, but one of the values is causing me a lot of grief.
It is a fixed length field of six characters. The description of the allowable values is:
Left justified; space filled. Cannot contain special characters or embedded spaces. If data is unavailable, space filled.
What I have attempted so far is to check:
If Code = " " Then
MsgBox "Code is Space Filled."
This will check if it is all space filled, which is ok.
Next I check if there is any special characters using the following function:
With ObjRegex
.Global = True
.Pattern = "[^a-zA-Z0-9\s]+"
StripNonAlpha = .Replace(Replace(TextToReplace, "-", Chr(32)),
End With
I can compare two strings, the original code and the stripped of special characters one. If they don't match then it contains a special character and is not valid.
It is the spaces that are causing me issues. I have to check for left aligned (no leading spaces followed by characters) and no embedded spaces, trailing spaces are OK.
I have tried a few variations of the above function but to no avail.
e.g. (wrong):
(^\sa-zA-Z0-9\sa-zA-Z0-9)+
I would appreciate any pointer. If there is a more 'all in one' regex that makes more sense that would be great and if regex is the wrong way to go I'm more than happy happy to abandon them.
Partial answer:
Demo
Regex: (?=[a-zA-Z0-9\s]{6})[a-zA-Z0-9]*\s*
Drawbacks: It will match > 6 chars (but not less than 6)

Regular expression to find position of the last alpha character that is followed by a space?

I am using ColdFusion 10. I rarely need to use regular expression and really need some help.
I have some lengthy content (up to 8,000 characters) and want to create a teaser. After a certain length (which I will define elsewhere), I want to find the last alpha character that is followed by a space. I will remove everything after that character. I will then add the ellipsis (...)
MyString = "The lazy brown fox is not a dog."
In this case, I would delete everything after the "a" that precedes "dog".
MyString = "There are 123 boxes on up the hill, says that 612 guy."
In this case, I would delete everything after the "that" that precedes "612 ".
MyString = "I fell down the stairs on June 30th, 1962."
In this case, I would delete everything after the "June" that precedes "30th".
What regular expression would I use to find the position of the last alpha [a-Z] character that is followed by a space?
MyReg = "";
LastPosition = reFindNoCase(MyReg, MyString);
I'm not sure about REFindNoCase, but I think you can try with REReplaceNoCase. I hope that CF can take back references like most regex engines do:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "$1", ALL);
EDIT: for the backreference, it appears that you use the backslash instead of the dollar sign:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "\1", ALL);
And if it goes well, you should have something like this.
.* matches anything besides a newline character, \b matches word boundaries, [a-zA-Z]+ are for alphabet characters and \s is for the space just after it.
The greediness of the first .*'s is being exploited here to capture as much as possible until you get the last word followed by a space.
And I guess you can add the ellpses after the $1 like so:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "\1 (...)", ALL)
If you only want to use REFind(), you could maybe use this:
REFindNoCase("[A-Za-z](?:\s\d+|\w+,)*\s[^\s]+\.$", MyString);
Note that I haven't tested this against other possible scenarios, but I tried a few which don't work with the above but with this one:
REFindNoCase("[A-Za-z](?:\s\d+|\s?\w+[,.-]+)*\s[^\s]+[.\s]*$", MyString);
And those are the few test subjects: link.
REFind will give you the position of the last alpha character. You can add 1 to get the position of the space in the original string.
If you're dealing with long strings, a regex would need to scan the whole string to get to the end, and it's likely more efficient to instead start at the end and work backwards.
Like this:
LastPos = len(String);
while( LastPos > 1 )
{
LastPos = String.lastIndexOf(' ',LastPos-1);
if ( mid(String,LastPos,1).matches('[a-zA-Z]') )
break;
}
NewString = left(String,LastPos);
The idea is to keep stepping backwards finding spaces, and break the loop when the previous character is a letter (or the start of the string is reached).
If you really want a regex solution, just do:
NewString = rematch('.*[a-zA-Z] ',MyString)[1];
To get the position, you do len(NewString).
(If newlines are involved, you'd need to put (?s) at the start of the expression so that the dot matches them.)

Removing whitespaces inside a string

I have a string lots\t of\nwhitespace\r\n which I have simplified but I still need to get rid of the other spaces in the string.
QString str = " lots\t of\nwhitespace\r\n ";
str = str.simplified();
I can do this erase_all(str, " "); in boost but I want to remain in qt.
str = str.simplified();
str.replace( " ", "" );
The first changes all of your whitespace characters to a single instance of ASCII 32, the second removes that.
Try this:
str.replace(" ","");
Option 1:
Simplify the white space, then remove it
Per the docs
[QString::simplified] Returns a string that has whitespace removed from the start and the end, and that has each sequence of internal whitespace replaced with a single space.
Once the string is simplified, the white spaces can easily be removed.
str.simplified().remove(' ')
Option 2:
Use a QRegExp to capture all types of white space in remove.
QRegExp space("\\s");
str.remove(space);
Notes
The OPs string has white space of different types (tab, carriage return, new line), all of which need to be removed. This is the tricky part.
QString::remove was introduced in Qt 5.6; prior to 5.6 removal can be achieved using QString::replace and replacing the white space with an empty string "".
You can omit the call to simplified() with a regex:
str.replace(QRegularExpression("\\s+"), QString());
I don't have measured which method is faster. I guess this regex would perform worse.

^[A-Za-z](\W|\w)* regular expression?

The regular expression ^[A-Za-z](\W|\w)* matches when the user gives the first letter as white space, and the first letter should not be a digit and remaining letters may be alpha numerical. When the user gives a white space as the first character it should automatically be trimmed. How?
^\s*([A-Za-z]\w*)
Should do it. Just get group 1.
I'm not sure the language you are using, I'm going to assume C#, so here is a C# sample:
string testString = " myMatch123 not in the match";
Regex regexObj = new Regex("^\\s*([A-Za-z]\\w*)",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
string result = regexObj.Match(testString).Groups[1].Value;
Console.WriteLine("-" + result + "-");
This will print
-myMatch123-
to the console window.
Is it possible to Trim() your input before giving it to your regex?
If you're looking for alpha-numerical, starting with non-numeric, you probably want:
\s*([A-Za-z][A-Za-z0-9]+)
If you allow one-character user names, change that plus to a star.