to give space between two continuous uppercase letter - regex

I need to know how to give space between two uppercase letter continuously.
Ihave large list of customer. with first name middle name and last name. GaryACloud should be split as Gary A Cloud. I used (.)([A-Z]) And replaced with \1 \2. I have no clue what it means. So if anyone can explain i will be really grateful. the above gave me a partial output only. i got Gary ACloud but how to provide space before every upper case letter? and also if you can expalin the solution, it will be very helpful

You can match:
"([A-Z])(?=[A-Z])"
And replace with:
"\1 "

var input = "CategoryName";
var result = Regex.Replace(input, "([a-z])([A-Z])", #"$1 $2"); //Category Name
UPDATE (this will treat sequence of capital letters as one word)
var input = "SimpleHTTPRequest";
var result = Regex.Replace(input, "([a-z]|[A-Z]{2,})([A-Z])", #"$1 $2");
//Simple HTTP Request

Related

RegEx for matching a string after a string up to a comma

Here is a sample string.
"BLAH, blah, going to the store &^5, light Version 12.7(2)L6, anyway
plus other stuff Version 3.3.4.6. Then goes on an on for several lines..."
I want to capture only the first version number without including the word version if possible but not include the periods and parenthesis. The result would stop when it encounters a comma. The result would be:
"1272L6"
I don't want it to include other instances of version in the text. Can this be done?
I've tried (?<=version)[^,]* I know it does not address removing the periods and parens and does not address the subsequent versions.
This exact RegEx, maybe not the best solution, but it might help you to get 1272L6:
([0-9]{2})\.([0-9]{1})\(([0-9]{1})\)([A-Z]{1}[0-9]{1})
It creates four groups (where $1$2$3$4 is your target 1272L6) and passes ., ) and (.
You might change {1} to other numbers of repetitions, such as {1,2}.
Assuming the version number is fixed on format but not on the specific digits or letters, you could do this.
String s = "this is a test 12.7(2)L6, 13.7(2)L6, 14.7(2)L6";
String reg = "(\\d\\d\\.\\d\\(\\d\\)[A-Z]\\d),";
Matcher m = Pattern.compile(reg).matcher(s);
if (m.find()) { // should only find first one
System.out.println(m.group(1).replaceAll("[.()]", ""));
}

Regex Split: Split column into Name, percentage andsolvent

Looking for a regex that can split expressions like:
A-6-b 10/%XYZ
into:
A-6-b
10%
/XYZ
Note that the first group can also contain spaces and numbers:
AQDF 100 56%/ABC
and percentage can be a float:
SFSDF 0.1%/ABC
I've come up with (^[A-Z\s\d-]*)(?!%)(\d+%)(.*$) but this doe snot match any percentages that are floats and more importantly even simple examples like ABC 10%/XYZ fail because the first digit of the percentage is assigned to the first capturing group.
Any idea how I can achieve what I want? I'm not a regex expert...
EDIT: fixed errors in example
EDIT2:
The examples are not complete. Here one more:
ABC Dwsd 0.01%/XYZ QST
First part can contain spaces
Last Part can contain spaces
number can be a float
Super simple:
/^(.*) ([1-9][0-9]*(?:\.[0-9]+)?%)(.*)$/
The most easily identifiable item is your percentage, so the ([1-9][0-9]*(?:\.[0-9]+)?%) part deals with finding that.
Then it's simply a case of getting everything before (excluding the final space) to get the name, and everything after to get the solvent.
Done.
Don't overcomplicate this by using one unreadable regex.
Based on what you've said, your separators are well defined (the last space and the last %). In JavaScript, for example, you could use:
var str = "A-6-b 10/%XYZ";
var firstSeparator = str.lastIndexOf(' ');
var secondSeparator = str.lastIndexOf('%');
var name = str.substring(0, firstSeparator);
var percentage = str.substring(firstSeparator + 1, secondSeparator + 1); // we want to include the % separator in this one
var solvent = str.substring(secondSeparator + 1);
console.log(name, percentage, solvent);
Working JSFiddle: http://jsfiddle.net/rL5uymhm/
(There may be a typo in your question, as your examples differ on where the / symbol appears. So the code may need tweaking. My point still stands – don't use a regex for the sake of it when there is a more readable alternative.)
IF you really want to use a regex, /^(.+ )([^%]+%)(.*)$/ should work.
I try this Let me know if you have any problem in comment.
((?:(?!\s*[0-9]*\/%).)*)\s*([\d\/%]*)\s*(.*)
SEE DEMO : http://regex101.com/r/lL8oN4/1
This one works for me (using PCRE):
/^(.+) ([0-9.]+)[\/%]+([^\/]+)$/

Extract root, month letter-year and yellow key from a Bloomberg futures ticker

A Bloomberg futures ticker usually looks like:
MCDZ3 Curcny
where the root is MCD, the month letter and year is Z3 and the 'yellow key' is Curcny.
Note that the root can be of variable length, 2-4 letters or 1 letter and 1 whitespace (e.g. S H4 Comdty).
The letter-year allows only the letter listed below in expr and can have two digit years.
Finally the yellow key can be one of several security type strings but I am interested in (Curncy|Equity|Index|Comdty) only.
In Matlab I have the following regular expression
expr = '[FGHJKMNQUVXZ]\d{1,2} ';
[rootyk, monthyear] = regexpi(bbergtickers, expr,'split','match','once');
where
rootyk{:}
ans =
'mcd' 'curncy'
and
monthyear =
'z3 '
I don't want to match the ' ' (space) in the monthyear. How can I do?
Assuming there are no leading or trailing whitespaces and only upcase letters in the root, this should work:
^([A-Z]{2,4}|[A-Z]\s)([FGHJKMNQUVXZ]\d{1,2}) (Curncy|Equity|Index|Comdty)$
You've got root in the first group, letter-year in the second, yellow key in the third.
I don't know Matlab nor whether it covers Perl Compatible Regex. If it fails, try e.g. with instead of \s. Also, drop the ^...$ if you'd like to extract from a bigger source text.
The expression you're feeding regexpi with contains a space and is used as a pattern for 'match'. This is why the matched monthyear string also has a space1.
If you want to keep it simple and let regexpi do the work for you (instead of postprocessing its output), try a different approach and capture tokens instead of matching, and ignore the intermediate space:
%// <$1><----------$2---------> <$3>
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2}) (.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');
You can also simplify the expression to a more genereic '(.+)(\w{1}\d{1,2})\s+(.+)', if you wish.
Example
bbergtickers = 'MCDZ3 Curncy';
expr = '(.+)([FGHJKMNQUVXZ]\d{1,2})\s+(.+)';
tickinfo = regexpi(bbergtickers, expr, 'tokens', 'once');
The result is:
tickinfo =
'MCD'
'Z3'
'Curncy'
1 This expression is also used as a delimiter for 'split'. Removing the trailing space from it won't help, as it will reappear in the rootyk output instead.
Assuming you just want to get rid of the leading and or trailing spaces at the edge, there is a very simple command for that:
monthyear = trim(monthyear)
For removing all spaces, you can do:
monthyear(isspace(monthyear))=[]
Here is a completely different approach, basically this searches the letter before your year number:
s = 'MCDZ3 Curcny'
p = regexp(s,'\d')
s(min(p)
s(min(p)-1:max(p))

Capitalize first letter of words in a string

I'm having trouble figuring out how to transform a string into camel case in groovy. Say I start out with a string that looks like "1-800 FOO.BAR". Ultimately, I want this to turn into "1800FooDotBar". I've been able to get 1800FOODotBar by doing the following:
String str = "1-800 FOO.BAR"
String tempStr = str.replaceAll(/(?i)\.com/, "DotCom")
String newStr = tempStr.replaceAll(/\\W/, "")
I'm just not sure how to get rid of those capital letters in the middle. I've come across some information about a capitalize() method that should be able to help, but I'm just not familiar enough with Groovy to know how to use it. I think I need to split the string into individual strings for each word and then capitalize the first letter of each of those strings, but then how do I build the end result back up? I know that similar questions have been asked, but I'm just not seeing how to take that information and make complete Groovy code from it. Thanks in advance!
Very roughly:
String str = "1-800 FOO.BAR"
println str.replaceAll(/\./, " Dot ").split(/[^\w]/).collect { it.toLowerCase().capitalize() }.join("")
=> 1800FooDotBar

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed