Convert CamelCase string to uppercase with underscore - regex

I have a string "CamelCase", I use this RegEx :
string pattern = "(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])";
string[] substrings = Regex.Split("CamelCase", pattern);
In substring, I have Camel and Case, that's find, but I'd like all in uppercase like this CAMEL and CASE. Better, I'd like get a string like this CAMEL_CASE but pease ALL with Regex.

Here is a JavaScript implementation.
function camelCaseToUpperCase(str) {
return str.replace(/([a-z])([A-Z])/, '$1_$2').toUpperCase();
}
Demo
printList([ 'CamelCase', 'camelCase' ],
function(value, idx, values) {
return value + ' -> '
+ camelCaseToUpperCase(value) + ' -> '
+ camelToTitle(value, '_');
}
);
// Case Conversion Functions
function camelCaseToUpperCase(str) {
return str.replace(/([a-z])([A-Z])/, '$1_$2').toUpperCase();
}
function camelToTitle(str, delimiter) {
return str.replace(/([A-Z][a-z]+)/g, ' $1') // Words beginning with UC
.replace(/([A-Z][A-Z]+)/g, ' $1') // "Words" of only UC
.replace(/([^A-Za-z ]+)/g, ' $1') // "Words" of non-letters
.trim() // Remove any leading/trailing spaces
.replace(/[ ]/g, delimiter || ' '); // Replace all spaces with the delim
}
// Utility Functions
function printList(items, conversionFn) {
var str = '<ul>';
[].forEach.call(items, function(item, index) {
str += '<li>' + conversionFn(item, index, items) + '</li>';
});
print(str + '</ul>');
}
function print() {
write.apply(undefined, arguments);
}
function println() {
write.apply(undefined, [].splice.call(arguments,0).concat('<br />'));
}
function write() {
document.getElementById('output').innerHTML += arguments.length > 1 ?
[].join.call(arguments, ' ') : arguments[0]
}
#output {
font-family: monospace;
}
<h1>Case Conversion Demo</h1>
<div id="output"></div>

In Perl you can do this:
$string = "CamelCase";
$string =~ s/((?<=[a-z])[A-Z][a-z]+)/_\U$1/g;
$string =~ s/(\b[A-Z][a-z]+)/\U$1/g;
print "$string\n";
The replacement uses \U to convert the found group to uppercase.
That can be compressed into a single regex using Perl's e option to evaluate a replacement:
$string = "CamelCase";
$string =~ s/(?:\b|(?<=([a-z])))([A-Z][a-z]+)/(defined($1) ? "_" : "") . uc($2)/eg;
print "$string\n";

Using sed and tr unix utilities (from your terminal)...
echo "CamelCase" | sed -e 's/\([A-Z]\)/-\1/g' -e 's/^-//' | tr '-' '_' | tr '[:lower:]' '[:upper:]'
If you have camel case strings with "ID" at the end and you'd like to keep it that way, then use this one...
echo "CamelCaseID" | sed -e 's/\([A-Z]\)/-\1/g' -e 's/^-//' | tr '-' '_' | tr '[:lower:]' '[:upper:]' | sed -e 's/I_D$/ID/g'
By extending the String class in ruby...
class String
def camelcase_to_underscore
self.gsub(/::/, '/').
gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
gsub(/([a-z\d])([A-Z])/,'\1_\2').
tr("-", "_").
upcase
end
end
Now, you can execute the camelcase_to_underscore method on any string. Example:
>> "CamelCase".camelcase_to_underscore
=> "CAMEL_CASE"

using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = "CamelCase";
string output = Regex.Replace(input,
#"(?:\b|(?<=([A-Za-z])))([A-Z][a-z]*)",
m => string.Format(#"{0}{1}",
(m.Groups[1].Value.Length > 0)? "_" : "", m.Groups[2].Value.ToUpper()));
Console.WriteLine(output);
}
}
Test this code here.

Related

Could regex be used in this PowerShell script?

I have the following code, used to remove spaces and other characters from a string $m, and replace them with periods ('.'):
Function CleanupMessage([string]$m) {
$m = $m.Replace(' ', ".") # spaces to dot
$m = $m.Replace(",", ".") # commas to dot
$m = $m.Replace([char]10, ".") # linefeeds to dot
while ($m.Contains("..")) {
$m = $m.Replace("..",".") # multiple dots to dot
}
return $m
}
It works OK, but it seems like a lot of code and can be simplified. I've read that regex can work with patterns, but am not clear if that would work in this case. Any hints?
Use a regex character class:
Function CleanupMessage([string]$m) {
return $m -replace '[ ,.\n]+', '.'
}
EXPLANATION
--------------------------------------------------------------------------------
[ ,.\n]+ any character of: ' ', ',', '.', '\n' (newline)
(1 or more times (matching the most amount
possible))
Solution for this case:
cls
$str = "qwe asd,zxc`nufc..omg"
Function CleanupMessage([String]$m)
{
$m -replace "( |,|`n|\.\.)", '.'
}
CleanupMessage $str
# qwe.asd.zxc.ufc.omg
Universal solution. Just enum in $toReplace what do you want to replace:
cls
$str = "qwe asd,zxc`nufc..omg+kfc*fox"
Function CleanupMessage([String]$m)
{
$toReplace = " ", ",", "`n", "..", "+", "fox"
.{
$d = New-Guid
$regex = [Regex]::Escape($toReplace-join$d).replace($d,"|")
$m -replace $regex, '.'
}
}
CleanupMessage $str
# qwe.asd.zxc.ufc.omg.kfc*.

PowerShell regex for encoding API filter into a URL

I have a string that needs to be encoded for a URL, so I have to replace some instances of certain characters.
For example:
filter=prop1:"folder1/folder2",prop2>:"string & string",prop3~"d:d"
In this example, "filter=" and each property name and operator ("prop1:", "prop2>:", and "prop3~") need to stay. The forward slash, ampersand, space, and the last colon (in d:d) need to be replaced with %2F, %26, %20 and %3A
filter=prop1:"folder1%2Ffolder2",prop2>:"string%20%26%20string",prop3~"d%3Ad"
With this application, the properties and values can be separated by:
>:
<:
>
<
!:
:
~
!~
The characters I want to replace in each pair's "value" section:
&
$
+
,
/
:
;
=
?
#
space
"
<
>
#
%
{
}
|
\
^
~
[
]
What is the most efficient way to do this?
Hopefully this will help get you somewhere.
$test = 'filter=prop1:"folder1/folder2",prop2>:"string & string",prop3~"d:d"'
$pattern = '(?:>:|<:|:|>|<|!:|:|~|!~)(?:")(.*?)(?:")'
$regex = [Regex]::new($pattern)
$regex.Matches($test) | ForEach-Object {
$test = $test -replace ([regex]::Escape($_.Groups[1].value)), ([uri]::EscapeDataString($_.Groups[1].value))
}
$test
# Output
filter=prop1:"folder1%2Ffolder2",prop2>:"string%20%26%20string",prop3~"d%3Ad"

How to use sed to extract numbers from a comma separated string?

I managed to extract the following response and comma separate it. It's comma seperated string and I'm only interested in comma separated values of the account_id's. How do you pattern match using sed?
Input: ACCOUNT_ID,711111111119,ENVIRONMENT,dev,ACCOUNT_ID,111111111115,dev
Expected Output: 711111111119, 111111111115
My $input variable stores the input
I tried the below but it joins all the numbers and I would like to preserve the comma ','
echo $input | sed -e "s/[^0-9]//g"
I think you're better served with awk:
awk -v FS=, '{for(i=1;i<=NF;i++)if($i~/[0-9]/){printf sep $i;sep=","}}'
If you really want sed, you can go for
sed -e "s/[^0-9]/,/g" -e "s/,,*/,/g" -e "s/^,\|,$//g"
$ awk '
BEGIN {
FS = OFS = ","
}
{
c = 0
for (i = 1; i <= NF; i++) {
if ($i == "ACCOUNT_ID") {
printf "%s%s", (c++ ? OFS : ""), $(i + 1)
}
}
print ""
}' file
711111111119,111111111115

detect string case and apply to another one

How can I detect the case (lowercase, UPPERCASE, CamelCase [, maybe WhATevERcAse]) of a string to apply to another one?
I would like to do it as a oneline with sed or whatever.
This is used for a spell checker which proposes corrections.
Let's say I get something like string_to_fix:correction:
BEHAVIOUR:behavior => get BEHAVIOUR:BEHAVIOR
Behaviour:behavior => get Behaviour:Behavior
behaviour:behavior => remains behaviour:behavior
Extra case to be handled:
MySpecalCase:myspecialcase => MySpecalCase:MySpecialCase (so character would be the point of reference and not the position in the word)
With awk you can use the posix character classes to detect case:
$ cat case.awk
/^[[:lower:]]+$/ { print "lower"; next }
/^[[:upper:]]+$/ { print "upper"; next }
/^[[:upper:]][[:lower:]]+$/ { print "capitalized"; next }
/^[[:alpha:]]+$/ { print "mixed case"; next }
{ print "non alphabetic" }
Jims-MacBook-Air so $ echo chihuahua | awk -f case.awk
lower
Jims-MacBook-Air so $ echo WOLFHOUND | awk -f case.awk
upper
Jims-MacBook-Air so $ echo London | awk -f case.awk
capitalized
Jims-MacBook-Air so $ echo LaTeX | awk -f case.awk
mixed case
Jims-MacBook-Air so $ echo "Jaws 2" | awk -f case.awk
non alphabetic
Here's an example taking two strings and applying the case of the first to the second:
BEGIN { OFS = FS = ":" }
$1 ~ /^[[:lower:]]+$/ { print $1, tolower($2); next }
$1 ~ /^[[:upper:]]+$/ { print $1, toupper($2); next }
$1 ~ /^[[:upper:]][[:lower:]]+$/ { print $1, toupper(substr($2,1,1)) tolower(substr($2,2)); next }
$1 ~ /^[[:alpha:]]+$/ { print $1, $2; next }
{ print $1, $2 }
$ echo BEHAVIOUR:behavior | awk -f case.awk
BEHAVIOUR:BEHAVIOR
$ echo Behaviour:behavior | awk -f case.awk
Behaviour:Behavior
$ echo behaviour:behavior | awk -f case.awk
behaviour:behavior
With GNU sed:
sed -r 's/([A-Z]+):(.*)/\1:\U\2/;s/([A-Z][a-z]+):([a-z])/\1:\U\2\L/' file
Explanations:
s/([A-Z]+):(.*)/\1:\U\2/: search for uppercase letters up to : and using backreference and uppercase modifier \U, change letters after : to uppercase
s/([A-Z][a-z]+):([a-z])/\1:\U\2\L/ : search for words starting with uppercase letter and if found, replace first letter after : to uppercase
awk -F ':' '
{
# read Pattern to reproduce
Pat = $1
printf("%s:", Pat)
# generic
if ( $1 ~ /^[:upper:]*$/) { print toupper( $2); next}
if ( $1 ~ /^[:lower:]*$/) { print tolower( $2); next}
# Specific
gsub( /[^[:upper:][:lower:]]/, "~:", Pat)
gsub( /[[:upper:]]/, "U:", Pat)
gsub( /[[:lower:]]/, "l:", Pat)
LengPat = split( Pat, aDir, /:/)
# print with the correponsing pattern
LenSec = length( $2)
for( i = 1; i <= LenSec; i++ ) {
ThisChar = substr( $2, i, 1)
Dir = aDir[ (( i - 1) % LengPat + 1)]
if ( Dir == "U" ) printf( "%s", toupper( ThisChar))
else if ( Dir == "l" ) printf( "%s", tolower( ThisChar))
else printf( "%s", ThisChar)
}
printf( "\n")
}' YourFile
take all case (and taking same concept as #Jas for quick upper or lower pattern)
works for this strucure only (spearator by :)
second part (text) could be longer than part1, pattern is used cyclingly
This might work for you (GNU sed):
sed -r '/^([^:]*):\1$/Is//\1:\1/' file
This uses the I flag to do a caseless match and then replaces both instances of the match with the first.

Powershell regex replacement expressions

I've followed the excellent solution in this article:
PowerShell multiple string replacement efficiency
to try and normalize telephone numbers imported from Active Directory. Here is an example:
$telephoneNumbers = #(
'+61 2 90237534',
'04 2356 3713'
'(02) 4275 7954'
'61 (0) 3 9635 7899'
'+65 6535 1943'
)
# Build hashtable of search and replace values.
$replacements = #{
' ' = ''
'(0)' = ''
'+61' = '0'
'(02)' = '02'
'+65' = '001165'
'61 (0)' = '0'
}
# Join all (escaped) keys from the hashtable into one regular expression.
[regex]$r = #($replacements.Keys | foreach { [regex]::Escape( $_ ) }) -join '|'
[scriptblock]$matchEval = { param( [Text.RegularExpressions.Match]$matchInfo )
# Return replacement value for each matched value.
$matchedValue = $matchInfo.Groups[0].Value
$replacements[$matchedValue]
}
# Perform replace over every line in the file and append to log.
$telephoneNumbers |
foreach {$r.Replace($_,$matchEval)}
I'm having problems with the formatting of the match expressions in the $replacements hashtable. For example, I would like to match all +61 numbers and replace with 0, and match all other + numbers and replace with 0011.
I've tried the following regular expressions but they don't seem to match:
'^+61'
'^+[^61]'
What am I doing wrong? I've tried using \ as an escape character.
I've done some re-arrangement of this, I'm not sure if it works for your whole situation but it gives the right results for the example.
I think the key is not to try and create one big regex from the hashtable, but rather to loop over it and check the values in it against the telephone numbers.
The only other change I made was moving the ' ','' replacement from the hash into the code that prints the replacement phone number, as you want this to run in every scenario.
Code is below:
$telephoneNumbers = #(
'+61 2 90237534',
'04 2356 3713'
'(02) 4275 7954'
'61 (0) 3 9635 7899'
'+65 6535 1943'
)
$replacements = #{
'(0)' = ''
'+61' = '0'
'(02)' = '02'
'+65' = '001165'
}
foreach ($t in $telephoneNumbers) {
$m = $false
foreach($r in $replacements.getEnumerator()) {
if ( $t -match [regex]::Escape($r.key) ) {
$m = $true
$t -replace [regex]::Escape($r.key), $r.value -replace ' ', '' | write-output
}
}
if (!$m) { $t -replace ' ', '' | write-output }
}
Gives:
0290237534
0423563713
0242757954
61396357899
00116565351943