I have to write a text file using ColdFusion. In that text file I need 'n' number of white spaces between strings.
For example:
'This<44 spaces>is<60 spaces>a<120 spaces>sampleText.'
So for this I'm using the ljustify() function in the places of white spaces, like
'This'&#ljustify(" ",44)#&'is'&#ljustify(" ",60)#&'a'&#ljustify(" ",120)#&'sampleText.'
I'm thinking that this will not be a coding standard. So, is there any other way to do this?
It looks like what you want is RepeatString().
Creates a string that contains a specified number of repetitions of the specified string
Takes two parameters:
A string (or a variable that contains one)
Number of repeats
"This" & RepeatString(" ",44) & "is" & RepeatString(" ",60) & "a" & RepeatString(" ",120) & "sampleText."
Of course, you don't need to use a space. You can use RepeatString to repeat just about anything.
LJustify() is for "padding" a string with characters out to a set number of spaces.
Example:
[#LJustify("These",10)#]<br>
[#LJustify("are",10)#]<br>
[#LJustify("variable",10)#]<br>
[#LJustify("size",10)#]
would give you output like
[These ]
[are ]
[variable ]
[size ]
This is especially useful for creating fixed-length strings.
Related
I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.
I'm using the CAT-Function to generate variable names:
variable=CAT(ID,ID2,Number)
The variable "number" is a number from the set {1,5,52,142,299}. As I want to have the same structure for all generated variables, I would like to add zeroes in front of the numbers, i.e. as {001,005,052,142,299}. How can I achieve that inside the CAT-function?
Best
The z format adds leading zeros to a specified number of digits, in your case it would be z3.. Use the put function to convert it to a character string in the required format (cat returns a character string by default, so would have to convert the number anyhow). You may want to consider cats as an alternative, which will strip any leading or trailing blanks.
variable=CAT(ID,ID2,put(Number,z3.))
I am trying to add a new calculated column that counts the number of semi colons in a string and adds one to it. So the column i have contains a bunch of aliases and I need to know how many for each row.
For example,
A; B; C; D
So basically this means there are 4 aliases (3 semi colons + 1)
Need to do this for over 2 million rows. Help please!
Basic idea is to subtract length of your string without ; characters from it's original length:
len([columnName])-len(Substitute([columnName],";",""))+1
Here it is with a regular expression:
Len(RXReplace([Column 1], "(?!;).", "", "gis"))+1
RXReplace takes as arguments:
The string you are wanting to work on (in this case it is on Column 1)
The regular expression you want to use (here it is (?!;). )
What you want to replace matches with (blank in this situation so
that everything that matches the regex is removed)
Finally a parameter saying how you want it to work (we are passing
in gis which means replace all matches not just the first, ignore case, replace newlines)
We wrap this in a Len which gives us the amount of semicolons since that is all that is left and finally we add 1 to it to get the final result.
You can read more about the regular expression here: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx but in a nutshell it says match everything that isn't a semi colon.
You can read more about RXReplace and Len here: https://docs.tibco.com/pub/spotfire/6.0.0-november-2013/userguide-webhelp/ncfe/ncfe_text_functions.htm
I've got this odd string:
firstName:Paul Henry,retired:true,message:A, B & more,title:mr
which needs to be split into its <key>:<value> pairs. Unfortunately, key/value pairs are separated by , which itself can be part of the value. Hence, a simple string-split at , would not produce the correct result.
Keys contain only word characters and values can contain :.
What I need (I think) is something like
\w*:match-anything-but-comma-unless-comma-is-followed-by-space
What kind of works is
\w*:[\w ?!&%,]*(?![^,])
but of course I wouldn't want to explicitly list all characters in the character class (just listed a few for this example).
If you want to split on a comma, unless the comma is followed by a space, why not just:
,(?=\S)
Not sure what language you are using, but in C# the line might look like:
splitArray = Regex.Split(subjectString, #",(?=\S)");
You are trying to do something complicated with a regular expression that would be simple (and easy to understand) with a little code. That's usually a mistake. Just write a little code.
In your case, you want to split the input on commas. If you get a chunk that doesn't contain a colon, you want to treat it as part of the previous chunk. So just write that. For example, in Python, I'd do it like this:
chunks = input.split(',')
associations = []
for chunk in chunks:
if ':' in chunk:
associations.append(chunk)
else:
associations[-1] += ',' + chunk
map = dict(association.split(':') for association in associations)
I need to store a string replacing its spaces with some character. When I retrieve it back I need to replace the character with spaces again. I have thought of this strategy while storing I will replace (space with _a) and (_a with _aa) and while retrieving will replace (_a with space) and (_aa with _a). i.e even if the user enters _a in the string it will be handled. But I dont think this is a good strategy. Please let me know if anyone has a better one?
Replacing spaces with something is a problem when something is already in the string. Why don't you simply encode the string - there are many ways to do that, one is to convert all characters to hexadecimal.
For instance
Hello world!
is encoded as
48656c6c6f20776f726c6421
The space is 0x20. Then you simply decode back (hex to ascii) the string.
This way there are no space in the encoded string.
-- Edit - optimization --
You replace all % and all spaces in the string with %xx where xx is the hex code of the character.
For instance
Wine having 12% alcohol
becomes
Wine%20having%2012%25%20alcohol
%20 is space
%25 is the % character
This way, neither % nor (space) are a problem anymore - Decoding is easy.
Encoding algorithm
- replace all `%` with `%25`
- replace all ` ` with `%20`
Decoding algorithm
- replace all `%xx` with the character having `xx` as hex code
(You may even optimize more since you need to encode only two characters: use %1 for % and %2 for , but I recommend the %xx solution as it is more portable - and may be utilized later on if you need to code more characters)
I'm not sure your solution will work. When reading, how would you
distinguish between strings that were orginally " a" and strings that
were originally "_a": if I understand correctly, both will end up
"_aa".
In general, given a situation were a specific set of characters cannot
appear as such, but must be encoded, the solution is to choose one of
allowed characters as an "escape" character, remove it from the set of
allowed characters, and encode all of the forbidden characters
(including the escape character) as a two (or more) character sequence
starting with the escape character. In C++, for example, a new line is
not allowed in a string or character literal. The escape character is
\; because of that, it must be encoded as an escape sequence as well.
So we have "\n" for a new line (the choice of n is arbitrary), and
"\\" for a \. (The choice of \ for the second character is also
arbitrary, but it is fairly usual to use the escape character, escaped,
to represent itself.) In your case, if you want to use _ as the
escape character, and "_a" to represent a space, the logical choice
would be "__" to represent a _ (but I'd suggest something a little
more visually suggestive—maybe ^ as the escape, with "^_" for
a space and "^^" for a ^). When reading, anytime you see the escape
character, the following character must be mapped (and if it isn't one
of the predefined mappings, the input text is in error). This is simple
to implement, and very reliable; about the only disadvantage is that in
an extreme case, it can double the size of your string.
You want to implement this using C/C++? I think you should split your string into multiple part, separated by space.
If your string is like this : "a__b" (multiple space continuous), it will be splited into:
sub[0] = "a";
sub[1] = "";
sub[2] = "b";
Hope this will help!
With a normal string, using X characters, you cannot write or encode a string with x-1 using only 1 character/input character.
You can use a combination of 2 chars to replace a given character (this is exactly what you are trying in your example).
To do this, loop through your string to count the appearances of a space combined with its length, make a new character array and replace these spaces with "//" this is just an example though. The problem with this approach is that you cannot have "//" in your input string.
Another approach would be to use a rarely used char, for example "^" to replace the spaces.
The last approach, popular in a combination of these two approaches. It is used in unix, and php to have syntax character as a literal in a string. If you want to have a " " ", you simply write it as \" etc.
Why don't you use Replace function
String* stringWithoutSpace= stringWithSpace->Replace(S" ", S"replacementCharOrText");
So now stringWithoutSpace contains no spaces. When you want to put those spaces back,
String* stringWithSpacesBack= stringWithoutSpace ->Replace(S"replacementCharOrText", S" ");
I think just coding to ascii hexadecimal is a neat idea, but of course doubles the amount of storage needed.
If you want to do this using less memory, then you will need two-letter sequences, and have to be careful that you can go back easily.
You could e.g. replace blank by _a, but you also need to take care of your escape character _. To do this, replace every _ by __ (two underscores). You need to scan through the string once and do both replacements simultaneously.
This way, in the resulting text all original underscores will be doubled, and the only other occurence of an underscore will be in the combination _a. You can safely translate this back. Whenever you see an underscore, you need a lookahed of 1 and see what follows. If an a follows, then this was a blank before. If _ follows, then it was an underscore before.
Note that the point is to replace your escape character (_) in the original string, and not the character sequence to which you map the blank. Your idea with replacing _a breaks. as you do not know if _aa was originally _a or a (blank followed by a).
I'm guessing that there is more to this question than appears; for example, that you the strings you are storing must not only be free of spaces, but they must also look like words or some such. You should be clear about your requirements (and you might consider satisfying the curiosity of the spectators by explaining why you need to do such things.)
Edit: As JamesKanze points out in a comment, the following won't work in the case where you can have more than one consecutive space. But I'll leave it here anyway, for historical reference. (I modified it to compress consecutive spaces, so it at least produces unambiguous output.)
std::string out;
char prev = 0;
for (char ch : in) {
if (ch == ' ') {
if (prev != ' ') out.push_back('_');
} else {
if (prev == '_' && ch != '_') out.push_back('_');
out.push_back(ch);
}
prev = ch;
}
if (prev == '_') out.push_back('_');