Capitalize first letter of words in a string - regex

I'm having trouble figuring out how to transform a string into camel case in groovy. Say I start out with a string that looks like "1-800 FOO.BAR". Ultimately, I want this to turn into "1800FooDotBar". I've been able to get 1800FOODotBar by doing the following:
String str = "1-800 FOO.BAR"
String tempStr = str.replaceAll(/(?i)\.com/, "DotCom")
String newStr = tempStr.replaceAll(/\\W/, "")
I'm just not sure how to get rid of those capital letters in the middle. I've come across some information about a capitalize() method that should be able to help, but I'm just not familiar enough with Groovy to know how to use it. I think I need to split the string into individual strings for each word and then capitalize the first letter of each of those strings, but then how do I build the end result back up? I know that similar questions have been asked, but I'm just not seeing how to take that information and make complete Groovy code from it. Thanks in advance!

Very roughly:
String str = "1-800 FOO.BAR"
println str.replaceAll(/\./, " Dot ").split(/[^\w]/).collect { it.toLowerCase().capitalize() }.join("")
=> 1800FooDotBar

Related

Extracting key-value pairs from a string using ruby & regex

I want to accomplish the following with ruby and if possible a regex:
Input: "something {\"key\":\"value\",\"key2\":3}"
Output: [["\"key\"", "\"value\""], [["\"key2\"", "3"]]
My attempt so far:
s = "something {key:\"value\",key2:3}"
s.scan(/.* {(?:([^:]+):([^,}]+),?)+}$/)
# Output: [["\"key2\"", "3"]]
For some reason the regex above only matches the last key value pair. Does someone know how to retrieve all the pairs?
Just to be clear, "something" can be any kind of string. For this reason, solutions such as (1) splitting the text directly on the equal or (2) a regex as used in s.scan(/(?:([^:]+):([^,}]+),?)/) don't work for me.
I know there are similar questions on SO. Still, from what I saw, they mostly tend towards the solutions 1 & 2 or focus on a single key value pair.
your string looks like a json data structure encoded as a string, you can use JSON.parse for this as long as you remove the word "something " from the string
require 'json'
string = "something {\"key\":\"value\",\"key2\":3}"
# the following line removes the word something
string = string[string.index("{")..-1]
x = JSON.parse(string)
puts x["key"]
puts x["key2"]
you can then convert that to an array if required
alternatively if you want to use regular expressions try
string.scan(/(?:"(\w+)":"?(\w+)"?)/)

Remove everything except numbers and alphabets from a string using google sheet or excel formulas

I have search but found python and related solutions.
I have a string like
"Hello 'how' are % you?"
which I want to convert to below after Remove everything except numbers and alphabets
Hello how are you
I am using Regexreplace as follows but now sure what should be the replacement or if its a right approach
=REGEXREPLACE(B2 , "([^A-Za-z0-9]+)")
The main thing i want to remove from the string are the stuff like " or strange symbols
can anyone help?
You can use:
=TRIM(REGEXREPLACE(B2,"[\W_]+"," "))
Or, include the space in your character class:
=REGEXREPLACE(B2,"[\W_ ]+"," "))
Where: \W is short for [^A-Ba-b0-9_], so to include the underscore we added it to the character class.
you can use:
=TRIM(REGEXREPLACE(A1, "'|%|""", ))

Remove all text from string after a sequence of words in Scala

I am trying to assemble a UDF in Scala that takes a column from a data frame and manipulates it to remove HTML and other useless pieces of text.
The column I need to modify is very messy, sometimes there is HTML, sometimes there is not... Searching SO I have found a regex solution to remove HTML
what I'd like to accomplish now is to find a regex that can find a specific word in the text and delete all the text after that word.
I think I understand from this SO answer that the regex should be something like \).* if you want to remove all after ), so I am trying to adapt this to my case, unsuccessfully due to my lack of knowledge about regex.
I have strings like:
I am interested to hear from you, thanks Sent from iPhone other stuff I want to delete....
I'd like to retain the first part of the string up to "Sent from" excluded, so a perfect output would be:
I am interested to hear from you, thanks
What I have so far is something like:
val toStringNoHTML = udf[String, String](_.toString
// code from SO as linked above
.replaceAll("""<(?!\/?a(?=>|\s.*>))\/?.*?>""", " ")
// delete all text after key word
.replaceAll("""'Sent from'.*""", "")
// remove all punctuation
.replaceAll("""[\p{Punct}\n]""", " ")
)
While the HTML gets remove, the "Sent from" and all the text after does not. Any hint how to adjust the regex to make it work?
EDIT
as pointed out in the comment, a small typo prevented my code to work, thanks for the help:
.replaceAll("""'Sent from'.*""", "")
should be
.replaceAll("""Sent from.*""", "")
Instead of doing multiple replaceAll(pattern, blank) I'd be tempted to start with an extraction.
val msgRE = "(.*>)?(.*)Sent from.*".r
val result = udfStr match {
case msgRE(_, msg) => Some(msg.trim) // .replaceAll() can be added here
case _ => None
}
Here the result is an Option[String] but that really depends on how you want to handle the non-matching input.
If more cleaning is needed after the extraction then replaceAll() can be added where indicated (or the extraction pattern can be better refined).

Parsing as string of data but leaving out quotes

I need to use RegEx to run through a string of text but only return that parts that I need. Let's say for example the string is as follows:
1234,Weapon Types,100,Handgun,"This is the text, "and", that is all."""
\d*,Weapon Types,(\d*),(\w+), gets me most of the way, however it is the last part that I am having an issue with. Is there a way for me to capture the rest of the string i.e.
"This is the text, "and", that is all."""
without picking up the quotes? I've tried negating them, however it just stops the string at the quote.
Please keep in mind that the text for this string is unknown so doing literal matches will not work.
You've given us something very difficult to solve. It's okay that you have nested commas inside your string. Once we come across a double-quote, we can ignore everything until the end quote. This would gooble up commas.
But how will your parser know that the next double-quote isn't ending the string. How does it know that it a nested double-quote?
If I could slightly modify your input string to make it clear what is a nested quote, then parsing is easy...
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, "and", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""([^""]+)""");
MessageBox.Show(m.Groups[3].Value);
But if your input string must have nested quotes like that, then we must come up with some other rule for detecting what is the real end of the string. How about this?
var txt = "1234,Weapon Types,100,Handgun,\"This is the text, \"and\", that is all.\",other stuff";
var m = Regex.Match(txt, #"^\d*,Weapon Types,(\d*),(\w+),""(.+)"",");
MessageBox.Show(m.Groups[3].Value);
The result is...
This is the text, "and", that is all.

How to get the count of spaces before the string starts

How do you get the number of spaces before a string starts in ColdFusion?
I mean, I have a string like this " Hello World!"
I want to get the count of spaces (in this case 3) before the word "Hello" starts.
I'm not too familiar with ColdFusion but considering this API you should be able to get the result you want with:
Len(str) - Len(LTrim(str))
But maybe there is a better solution :)