how to trim a string without any spaces - coldfusion

How do I remove spaces and other whitespace characters from inside a string. I don't want to remove the space just from the ends of the string, but throughout the entire string.

You can use a regular expression
<cfset str = reReplace(str, "[[:space:]]", "", "ALL") />

You can also simply use Coldfusion's Replace() (if you don't want to use regular expressions for some reason - but don't forget the "ALL" optional parameter.
I've run into this in the past, trying to remove 5 spaces in the middle of a string - I would do something like:
<cfset str = Replace(str, " ", "")/>
Forgetting the "ALL" will only replace the first occurrence so I would end up with 4 spaces, if that makes sense.
Be sure to use:
<cfset str = Replace(str, " ", "", "ALL")/>
to replace multiple spaces. Hope this helps!

Related

MATLAB regular expression denied to remove spaces at beginning of a string

suppose that we have this string in MATLAB:
mm = [' 44412 (25.01%)'];
I want remove only fist space(s) in this string to have this output:
'44412 (25.01%)'
I'm using strrep(mm,'\^\s\s','') but didn't work. What is the problem?
The issue with strrep is that it does not allow you to utilize regex patterns. The first part of your filter ('\^') also tries to match ^ explicitly, so it won't work on your string. If you remove the leading \ your filter works fine with regexprep, but is limited to strings with exactly 2 leading whitespaces.
Try using this more generic filter instead with regexprep.
str = ' 44412 (25.01%)';
newstr = regexprep(str, '^\s+', '');
Which returns:
newstr =
44412 (25.01%)
What I've done here is match 1 or more whitespace characters at the beginning of the string. This syntax also allows us to use it on strings without any leading whitespace and not have it make any modification.
Edit: Here are some built-in alternatives!
You could use strtrim, but it strips leading and trailing whitespace:
newstr = strtrim(str);
You can also use strjust to left-justify your string:
newstr = strjust(str, 'left');
If you want to be really creative, you could flip your array and use deblank, which strips trailing whitespace:
newstr = fliplr(deblank(fliplr(str)));

Coldfusion string replace with ReReplace

I'm trying to replace spaces in a string with underscore for slug creation using RegEx. Its works fine when there is one space. But when there is two consecutive spaces or a space followed by an underscore and vice versa(' _' OR '_ ') its replaced as __. How can i overcome this? that is I want a single underscore instead of double or triple. Any help would be appreciated.
My code for replacing is similar to this.
rereplace(lCase('this is a sample _string'),'[ ]','_','all')
This seems to do the trick, based on your revised requirement:
original = "string with_mix _ of spaces__and_ _underscores__ __to_ _test with";
updated = reReplace(original, "[ _]+", "_", "all");
writeOutput(updated);
Results in:
string_with_mix_of_spaces_and_underscores_to_test_with
Is that to spec?

Understanding these Regular Expressions

There is a variable being set as follows (through custom tag invocation)
<cfset str = Trim( THISTAG.GeneratedContent ) />
The contents of THISTAG.GeneratedContent looks like
FNAME|MNAME|LNAME Test|Test|Test
The code I am having trouble understanding is as follows:
<cfset str = str.ReplaceAll(
"(?m)^[\t ]+|[\t ]+$",
""
) />
<cfset arrRows = str.Split( "[\r\n]+" ) />
The above line of code should generate array with contents as
arrRows[1] = FNAME|MNAME|LNAME
arrRows[2] = Test|Test|Test
But on dumping the array shows following output:
FNAME|MNAME|LNAME Test|Test|Test
I do not understand what both regular expressions are trying to achieve.
This one...
<cfset str = str.ReplaceAll(
"(?m)^[\t ]+|[\t ]+$",
""
) />
..is removing any tabs/spaces that are at the beginning or end of lines. The (?m) turns on multiline mode which causes ^ to match "start of line" (as opposed to its usual "start of content"), and similarly $ means "end of line" (rather than "end of content") in this mode.
This one...
<cfset arrRows = str.Split( "[\r\n]+" ) />
...is converting lines to an array, by splitting on any combination of consecutive carriage returns and/or newline characters.
Bonus Info
You can actually combine these two regexes into a single one, like so:
<cfset arrRows = str.split( '\s*\n\s*' ) />
The \s will match any whitespace character - i.e. [\r\n\t ] and thus this combines the removal of spaces and tabs with turning it into an array.
(Note that since it works by looking for newlines, the trim on GeneratedContent is necessary for any preceeding/trailing whitespace to be removed.)

Regex to remove last letter of each word in list

I list that I have created in coldfusion. Lets use the following list as an example:
<cfset arguments.tags = "battlefieldx, testx, wonderful, ererex">
What I would like to do is remove the "x" from the words that have an x at the end and keep the words in the list. Order doesn't matter. A regex would be fine or looping with coldfusion would be okay too.
Removing x from end of each list element...
To remove all x characters that preceed a comma or the end of string, do:
rereplace( arguments.tags , "x(?=,|$)" , "" , "all" )
The (?= ) part here is a lookahead - it matches the position of its contents, but does not include them in what is replaced. The | is alternation - it'll try to match a literal , and if that fails it'll try to match the end of the string ($).
If you don't want to remove a lone x from, e.g. "x,marks,the,spot"...
If you want to make sure that x is at the end of a word (i.e. is not alone), you can use a non-word boundary check:
rereplace( arguments.tags , "\Bx(?=,|$)" , "" , "all" )
The \B will not match if there isn't a [a-zA-Z0-9_] before the x - for more complex/accurate rules on what constitutes "end of a word", you would need a lookbehind, which can't be done with rereplace, but is still easy enough by doing:
arguments.tags.replaceAll("(?<=\S)x(?=,|$)" , "" )
(That looks for a single non-whitespace character before the x to consider it part of a word, but you can put any limited-width expression within the lookbehind.)
Obviously, to do any letter, switch the x with [a-zA-Z] or whatever is appropriate.
The regex to grab the 'x' from the end of a word is pretty straightforward. Supposing you have a given element as a string, the regex you need is simply:
REReplace(myString, "x$", "")
This matches an x at the end of the given string and replaces it with an empty string.
To do this for each substring in a comma-delimited list, try:
REReplace(myString, "x,|x$", ",", "ALL")
REReplace(myString, "x$", "")
The $ symbol is going to be used to detect the end of the string. Thus detecting an 'x' at the end of your string. The empty quotes will replace it with nothing, thus removing the 'x'.
This has already been answered, but thought I'd post a ColdFusion only solution since you said you could use either. (The RegEx is obviously much easier, but this will work too)
<cfset arguments.tags = "battlefieldx, testx, wonderful, ererex">
<cfset temparray = []>
<cfloop list="#arguments.tags#" index="i">
<cfif right(i,1) EQ 'X'>
<cfset arrayappend(temparray,left(i,len(i) - 1))>
<cfelse>
<cfset arrayappend(temparray,i)>
</cfif>
</cfloop>
<cfset arguments.tags = arraytolist(temparray)>
If you have ColdFusion 9+ or Railo you can simplify the loop using a ternary operator
<cfloop list="#arguments.tags#" index="i">
<cfset cfif right(i,1) EQ 'X' ? arrayappend(temparray,left(i,len(i) - 1)) : arrayappend(temparray,i)>
</cfloop>
You could also convert arguments.tags to an array and loop that way
<cfloop array="#listtoarray(arguments.tags)#" index="i">
<cfset cfif right(i,1) EQ 'X' ? arrayappend(temparray,left(i,len(i) - 1)) : arrayappend(temparray,i)>
</cfloop>

Coldfusion ReReplace "&" but not htmlspecialchars

I need to replace all & with with & in a string like this:
Übung 1: Ü & Ä
or in html
Übung 1: Ü & Ä
Like you see htmlspecialchars in the string (but the & is not displayed as &), so I need to exclude them from my replace. I'm not so familiar with regular expressions. All I need is an expression that does the following:
Search for & that does either follow a (space) or does not follow some chars, excluding a space, which are ending with a ;. then replace that with &.
I tried something like this:
<cfset data = ReReplace(data, "&[ ]|[^(?*^( ));]", "&", "ALL") />
but that replaces every char with the $amp;... ^^'
Sorry, I really don't get that regex things.
Problem with existing attempt
The reason your attempted pattern &[ ]|[^(?*^( ));] is failing is primarily because you have a | but no bounding container - this means you are replacing &[ ] OR [^(?*^( ));] - and that latter will match most things - you are also misunderstanding how character classes work.
Inside [..] (a character class) there are a few simple rules:
if it starts with a ^ it is negated, otherwise the ^ is literal.
if there is a hyphen it is treated as a range (e.g. a-z or 1-5 )
if there is a backslash, it either marks a shorthand class (e.g. \w), or escapes the following character (inside a char class this is only required for [ ] ^ - \).
you are only matching a single character (subject to any qualifiers); there is no ordering/sequence inside the class, and duplicates of the same character are ignored.
Also, you don't need to put a space inside a character class - a literal space works fine (unless you are in free-spacing comment mode, which needs to be explicitly enabled).
Hopefully that helps you understand what was going wrong?
As for actually solving your problem...
Solution
To match an ampersand that does not start a HTML entity, you can use:
&(?![a-z][a-z0-9]+;|#(?:\d+|x[\dA-F]+);)
That is, an ampersand, followed by a negative lookahead for either of:
a letter, then a letter or a number, the a semicolon - i.e. a named entity reference
a hash, then either a number, or an x followed by a hex number, and finally a semicolon - i.e. a numeric entity reference.
To use this in CFML, to replace & with & would be:
<cfset data = rereplaceNoCase( data , '&(?![a-z][a-z0-9]+;|##(?:\d+|x[\dA-F]+);)' , '&' , 'all' ) />
I think it would be easier to simply replace all occurrences of & with &, and then replace the wrongly replaced ones again:
<cfset data = ReReplace(ReReplace(data, "&", "&", "ALL"), "&([^;&]*;)", "&\1", "ALL") />
I haven't tested this in ColdFusion (since I have no clue how to), but it should work, because in JavaScript, the regex itself works:
var s = "I we&nt out on 1 se&123;p 2012 and& it was be&tter & than 15 jan 2012"
console.log(s.replace(/&/g, '&').replace(/&([^;&]*;)/g, '&$1'));
//"I we&nt out on 1 se&123;p 2012 and& it was be&tter & than 15 jan 2012"
So I assume the regex will also do its trick in CF.
The other option you have is to not use REGEX at all. For the sample string you listed, you are simply tying to replace the html ampersand ("&"), without affecting the html entities.
This can be accomplished just using REPLACE.
Remember that when using entities, there will be no spaces around the ampersand character, where as to convert an ampersand character to an HTML entity, there is typically a leading and trailing space. REPLACE will find every case of " & " and update, without affecting any of the "&Uuml" strings (e.g. no leading and trailing space).
<cfset html = "Übung 1: Ü & Ä">
<cfset parsedHtml = REPLACE(html," & ", " & ","All")>
For performance & issues free, just go with Decimal code point like so...
<cfset html = Replace(html, Chr(38), "&", "all")>