Removing whitespaces inside a string - c++

I have a string lots\t of\nwhitespace\r\n which I have simplified but I still need to get rid of the other spaces in the string.
QString str = " lots\t of\nwhitespace\r\n ";
str = str.simplified();
I can do this erase_all(str, " "); in boost but I want to remain in qt.

str = str.simplified();
str.replace( " ", "" );
The first changes all of your whitespace characters to a single instance of ASCII 32, the second removes that.

Try this:
str.replace(" ","");

Option 1:
Simplify the white space, then remove it
Per the docs
[QString::simplified] Returns a string that has whitespace removed from the start and the end, and that has each sequence of internal whitespace replaced with a single space.
Once the string is simplified, the white spaces can easily be removed.
str.simplified().remove(' ')
Option 2:
Use a QRegExp to capture all types of white space in remove.
QRegExp space("\\s");
str.remove(space);
Notes
The OPs string has white space of different types (tab, carriage return, new line), all of which need to be removed. This is the tricky part.
QString::remove was introduced in Qt 5.6; prior to 5.6 removal can be achieved using QString::replace and replacing the white space with an empty string "".

You can omit the call to simplified() with a regex:
str.replace(QRegularExpression("\\s+"), QString());
I don't have measured which method is faster. I guess this regex would perform worse.

Related

Convert Regex to be safari compliant

I am using the following code to find any instances of "\n" newline that doesn't have a space on both sides and add a space on both sides.
Scenarios
There is a space on both sides of /n. = Do Nothing
There is a space either before or after the /n = Add a space on both sides.
There are no spaces on either side = Add a space on both sides.
But why do you need this?
I need to space seperate words in a paragraph without affecting the paragraph structure. If I split by /s then the structure is gone, so in order to maintain it I want to put a space on either side of the /n new line.
This looks ok, whats the problem?
This works for new version of Chrome, but doesn't work for old version and doesn't work for Safari and needs to be support across browsers
Question:
How can I maintain this logic without using non Safari supported Regex using Dart.
Code Example
var regex = RegExp("\n(?! )|(?<! )\n");
if (text.contains(regex)) {
String newString = text.replaceAll(regex, " \n ");
updatedString = newString;
}
You can use
var regex = RegExp(" ?\n ?");
updatedString = text.replaceAll(regex, " \n ");
See the regex demo
The " ?\n ?" pattern matches an optional space, then a newline, and then an optional space, and then the match is replaced with a space+newline+space yielding the expected result: if there is are spaces on left and right, they are kept, else, a space is added.

MATLAB regular expression denied to remove spaces at beginning of a string

suppose that we have this string in MATLAB:
mm = [' 44412 (25.01%)'];
I want remove only fist space(s) in this string to have this output:
'44412 (25.01%)'
I'm using strrep(mm,'\^\s\s','') but didn't work. What is the problem?
The issue with strrep is that it does not allow you to utilize regex patterns. The first part of your filter ('\^') also tries to match ^ explicitly, so it won't work on your string. If you remove the leading \ your filter works fine with regexprep, but is limited to strings with exactly 2 leading whitespaces.
Try using this more generic filter instead with regexprep.
str = ' 44412 (25.01%)';
newstr = regexprep(str, '^\s+', '');
Which returns:
newstr =
44412 (25.01%)
What I've done here is match 1 or more whitespace characters at the beginning of the string. This syntax also allows us to use it on strings without any leading whitespace and not have it make any modification.
Edit: Here are some built-in alternatives!
You could use strtrim, but it strips leading and trailing whitespace:
newstr = strtrim(str);
You can also use strjust to left-justify your string:
newstr = strjust(str, 'left');
If you want to be really creative, you could flip your array and use deblank, which strips trailing whitespace:
newstr = fliplr(deblank(fliplr(str)));

Regular expression to find position of the last alpha character that is followed by a space?

I am using ColdFusion 10. I rarely need to use regular expression and really need some help.
I have some lengthy content (up to 8,000 characters) and want to create a teaser. After a certain length (which I will define elsewhere), I want to find the last alpha character that is followed by a space. I will remove everything after that character. I will then add the ellipsis (...)
MyString = "The lazy brown fox is not a dog."
In this case, I would delete everything after the "a" that precedes "dog".
MyString = "There are 123 boxes on up the hill, says that 612 guy."
In this case, I would delete everything after the "that" that precedes "612 ".
MyString = "I fell down the stairs on June 30th, 1962."
In this case, I would delete everything after the "June" that precedes "30th".
What regular expression would I use to find the position of the last alpha [a-Z] character that is followed by a space?
MyReg = "";
LastPosition = reFindNoCase(MyReg, MyString);
I'm not sure about REFindNoCase, but I think you can try with REReplaceNoCase. I hope that CF can take back references like most regex engines do:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "$1", ALL);
EDIT: for the backreference, it appears that you use the backslash instead of the dollar sign:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "\1", ALL);
And if it goes well, you should have something like this.
.* matches anything besides a newline character, \b matches word boundaries, [a-zA-Z]+ are for alphabet characters and \s is for the space just after it.
The greediness of the first .*'s is being exploited here to capture as much as possible until you get the last word followed by a space.
And I guess you can add the ellpses after the $1 like so:
REReplaceNoCase(MyString, "(.*\b[a-zA-Z]+\b)\s.*", "\1 (...)", ALL)
If you only want to use REFind(), you could maybe use this:
REFindNoCase("[A-Za-z](?:\s\d+|\w+,)*\s[^\s]+\.$", MyString);
Note that I haven't tested this against other possible scenarios, but I tried a few which don't work with the above but with this one:
REFindNoCase("[A-Za-z](?:\s\d+|\s?\w+[,.-]+)*\s[^\s]+[.\s]*$", MyString);
And those are the few test subjects: link.
REFind will give you the position of the last alpha character. You can add 1 to get the position of the space in the original string.
If you're dealing with long strings, a regex would need to scan the whole string to get to the end, and it's likely more efficient to instead start at the end and work backwards.
Like this:
LastPos = len(String);
while( LastPos > 1 )
{
LastPos = String.lastIndexOf(' ',LastPos-1);
if ( mid(String,LastPos,1).matches('[a-zA-Z]') )
break;
}
NewString = left(String,LastPos);
The idea is to keep stepping backwards finding spaces, and break the loop when the previous character is a letter (or the start of the string is reached).
If you really want a regex solution, just do:
NewString = rematch('.*[a-zA-Z] ',MyString)[1];
To get the position, you do len(NewString).
(If newlines are involved, you'd need to put (?s) at the start of the expression so that the dot matches them.)

Find substr between delimiter characters in Qt with RegEx

I need to obtain a substring in a string in Qt, but with a few details:
the substring I need is delimited by [ and ]
the substring might have some unpredictable characters like /, ^, -. This substring basically describes a unit of measurement.
Also, besides obtaining the substring itself, I need to have a test to check if such a substring exists in the string or not.
I don't know anything about RegEx and I'm new to Qt as well. Most of the examples I found here don't report to Qt and/or don't explicitly account for what I need.
QRegExp exp("\\[([^\\]]+)\\]");
QString s1 = "5 [sm^2]";
qDebug() << exp.indexIn(s1);
qDebug() << exp.capturedTexts();
Output:
2
("[sm^2]", "sm^2")
If none of the string's parts match the regexp, indexIn will indicate that by returning -1. Otherwise the result will be >= 0, and the capturedTexts()[1] will contain the text that was enclosed in brackets.

Regex - Remove all characters before and after

Is is it possible to remove all characters before (and including) every character to third ' and also everything after (and including) the fourth ', basically isolating the text inside the 3rd and 4th '
example:
a, 'something', 'ineedthistext', 'moretexthere'
should result in
ineedthistext
Regex might not be the best tool to do this (split by comma/apostrophe might actually be a better way), but if you want regex...
Maybe instead of removing all the characters before and after ineedthistext, you can capture ineedthistext from the group.
I would use something like:
^.*?'.*?'.*?'(.*?)'
Tested with rubular.
Try
public String stringSplit(String input) {
String[] wordArray = input.split("'");
String requiredText = wordArray[3];
return requiredText;
}
This will work if you always want the bit between the 3rd and 4th '.
Derived from this answer, a possible solution is:
Regex.Match(yourString, #"\('[^']*)\)").Groups[2].Value
The code looks for all strings embedded between 2 single quotes, and puts them in groups. You need the 2nd group.
To alter your string directly, effectively removing the unwanted characters, you could use:
yourString = Regex.Match(yourString, #"\('[^']*)\)").Groups[2].Value