"Dave&#39 s Market" to "Daves Market" - regex

I have some strings like "Dave&#39 s Market" or "C&#39 est la vie" I would like to convert to "Daves Market" and "Cest la vie" respectively. I know it is something like '[&#39]+' but I cannot get the optional " s" to be just "s".

The regex substitution s/&#39 //g should work, see this demo.

What you'd rather want is something like replacing all those escaped literals with what they should actually be, e.g.:
Dave's -> Dave's
Me & Her -> Me & Her
Then you'll have to use some kind of replacement code and regex.
An example(in JavaScript):
var m = new Map();
m.set("'", "'");
m.set("&", "&");
// and so on
m.forEach(function(value, key) {
// text contains your text
text = text.replace(new RegExp(key), value);
}

Related

Ruby regex replace substring with hash pattern

I have to replace strings in a hash. I have:
hash = {"{STAY_ID}"=>"30030303", "{USER_NAME}"=>"test"}
And I have to replace it here:
str = "www.domain.com?person={STAY_ID}&user={USER_NAME}"
#=> www.domain.com?person=30030303&user=test
Also, it should work when there is a string with at least one match:
str = "www.domain.com?person={STAY_ID}"
#=> www.domain.com?person=30030303
I need some method/solution that can handle any situation like above.
Something great about the gsub method is that it can actually take a hash of mappings as the second argument, which it then uses to replace matched values. Therefore, if you regex any text between curly braces, you can do something like this.
str = "www.domain.com?person={STAY_ID}&user={USER_NAME}"
hash = {
"{STAY_ID}"=>"30030303",
"{USER_NAME}"=>"test"
}
str.gsub(/{(.*?)}/, hash) #www.domain.com?person=30030303&user=test
And then ya done!
I think, regex is not readable solution. You can use simple gsub method:
str = "www.domain.com?person={STAY_ID}&user={USER_NAME}"
hash = {"{STAY_ID}"=>"30030303", "{USER_NAME}"=>"test"}
result_str = hash.inject(str.dup) do |acc, (key, value)|
acc = acc.gsub(key, value)
end
result_str # www.domain.com?person=30030303&user=test

Remove text between two tags

I'm trying to remove some text between two tags [ & ]
[13:00:00]
I want to remove 13:00:00 from [] tags.
This number is not the same any time.
Its always a time of the day so, only Integer and : symbols.
Someone can help me?
UPDATE:
I forgot to say something. The time (13:00:00) was picked from a log file. Looks like that:
[10:56:49] [Client thread/ERROR]: Item entity 26367127 has no item?!
[10:57:25] [Dbutant] misterflo13 : ils coute chere les enchent aura de feu et T2 du spawn??*
[10:57:35] [Amateur] firebow ?.SkyLegend.? : ouai 0
[10:57:38] [Novice] iPasteque : ils sont gratuit me
[10:57:41] [Novice] iPasteque : ils sont gratuit mec *
[10:57:46] [Dbutant] misterflo13 : on ma dit k'ils etait payent :o
[10:57:57] [Novice] iPasteque : on t'a mytho alors
Ignore the other text I juste want to remove the time between [ & ] (need to looks like []. The time between [ & ] is updated every second.
It looks like your log has specific format. And you seem want to get rid of the time and keep all other information. Ok - read in comments
I didn't test it but it should work
' Read log
Dim logLines() As String = File.ReadAllLines("File_path")
If logLines.Length = 0 Then Return
' prepare array to fill sliced data
Dim lines(logLines.Length - 1) As String
For i As Integer = 0 To logLines.Count - 1
' just cut off time part and add empty brackets for each line
lines(i) = "[]" & logLines(i).Substring(10)
Next
What you see above - if you know that your file comes in certain format, just use position in the string where to cut it off.
Note: Code above can be done in 1 line using LINQ
If you want to actually get the data out of it, use IndexOf. Since you looking for first occurrence of "[" or "]", just use start index "0"
' get position of open bracket in string
Dim openBracketPos As Integer = myString.IndexOf("[", 0, StringComparison.OrdinalIgnoreCase)
' get position of close bracket in string
Dim closeBracketPos As Integer = myString.IndexOf("]", 0, StringComparison.OrdinalIgnoreCase)
' get string between open and close bracket
Dim data As String = myString.Substring(openBracketPos + 1, closeBracketPos - 1)
This is another possibility using Regex:
Public Function ReplaceTime(ByVal Input As String) As String
Dim m As Match = Regex.Match(Input, "(\[)(\d{1,2}\:\d{1,2}(\:\d{1,2})?)(\])(.+)")
Return m.Groups(1).Value & m.Groups(4).Value & m.Groups(5).Value
End Function
It's more of a readability nightmare but it's efficient and it takes only the brackets containing a time value.
I also took the liberty of making it match for example 13:47 as well as 13:47:12.
Test: http://ideone.com/yogWfD
(EDIT) Multiline example:
You can combine this with File.ReadAllLines() (if that's what you prefer) and a For loop to get the replacement done.
Public Function ReplaceTimeMultiline(ByVal TextLines() As String) As String
For x = 0 To TextLines.Length - 1
TextLines(x) = ReplaceTime(TextLines(x))
Next
Return String.Join(Environment.NewLine, TextLines)
End Function
Above code usage:
Dim FinalT As String = ReplaceTimeMultiline(File.ReadAllLines(<file path here>))
Another multiline example:
Public Function ReplaceTimeMultiline(ByVal Input As String) As String
Dim ReturnString As String = ""
Dim Parts() As String = Input.Split(Environment.NewLine)
For x = 0 To Parts.Length - 1
ReturnString &= ReplaceTime(Parts(x)) & If(x < (Parts.Length - 1), Environment.NewLine, "")
Next
Return ReturnString
End Function
Multiline test: http://ideone.com/nKZQHm
If your problem is to remove numeric strings in the format of 99:99:99 that appear inside [], I would do:
//assuming you want to replace the [......] numeric string with an empty []. Should you want to completely remove the tag, just replace with string.Empty
Here's a demo (in C#, not VB, but you get the point (you need the regex, not the syntax anyway)
List<string> list = new List<string>
{
"[13:00:00]",
"[4:5:0]",
"[5d2hu2d]",
"[1:1:1000]",
"[1:00:00]",
"[512341]"
};
string s = string.Join("\n", list);
Console.WriteLine("Original input string:");
Console.WriteLine(s);
Regex r = new Regex(#"\[\d{1,2}?:\d{1,2}?:\d{1,2}?\]");
foreach (Match m in r.Matches(s))
{
Console.WriteLine("{0} is a match.", m.Value);
}
Console.WriteLine();
Console.WriteLine("String with occurrences replaced with an empty string:");
Console.WriteLine(r.Replace(s, string.Empty).Trim());

Google sheet : REGEXREPLACE match everything except a particular pattern

I would try to replace everything inside this string :
[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs
1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567,
1410315029
Except the following numbers :
1303222074
1403281851
1307239335
1403277567
1410315029
I have built a REGEX to match them :
1[0-9]{9}
But I have not figured it out to do the opposite that is everything except all matches ...
google spreadsheet use the Re2 regex engine and doesn't support many usefull features that can help you to do that. So a basic workaround can help you:
match what you want to preserve first and capture it:
pattern: [0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)
replacement: $1 (with a space after)
demo
So probably something like this:
=TRIM(REGEXREPLACE("[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs 1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567, 1410315029"; "[0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)"; "$1 "))
You can also do this with dynamic native functions:
=REGEXEXTRACT(A1,rept("(\d{10}).*",counta(split(regexreplace(A1,"\d{10}","#"),"#"))-1))
basically it is first split by the desired string, to figure out how many occurrences there are of it, then repeats the regex to dynamically create that number of capture groups, thus leaving you in the end with only those values.
First of all thank you Casimir for your help. It gave me an idea that will not be possible with a built-in functions and strong regex lol.
I found out that I can make a homemade function for my own purposes (yes I'm not very "up to date").
It's not very well coded and it returns doublons. But rather than fixing it properly, I use the built in UNIQUE() function on top of if to get rid of them; it's ugly and I'm lazy but it does the job, that is, a list of all matches of on specific regex (which is: 1[0-9]{9}). Here it is:
function ti_extract(input) {
var tab_tis = new Array();
var tab_strings = new Array();
tab_tis.push(input.match(/1[0-9]{9}/)); // get the TI and insert in tab_tis
var string_modif = input.replace(tab_tis[0], " "); // modify source string (remove everything except the TI)
tab_strings.push(string_modif); // insert this new string in the table
var v = 0;
var patt = new RegExp(/1[0-9]{9}/);
var fin = patt.test(tab_strings[v]);
var first_string = tab_strings[v];
do {
first_string = tab_strings[v]; // string 0, or the string with the first removed TI
tab_tis.push(first_string.match(/1[0-9]{9}/)); // analyze the string and get the new TI to put it in the table
var string_modif2 = first_string.replace(tab_tis[v], " "); // modify the string again to remove the new TI from the old string
tab_strings.push(string_modif2);
v += 1;
}
while(v < 15)
return tab_tis;
}

regex how can I split this word?

I have a list of several phrases in the following format
thisIsAnExampleSentance
hereIsAnotherExampleWithMoreWordsInIt
and I'm trying to end up with
This Is An Example Sentance
Here Is Another Example With More Words In It
Each phrase has the white space condensed and the first letter is forced to lowercase.
Can I use regex to add a space before each A-Z and have the first letter of the phrase be capitalized?
I thought of doing something like
([a-z]+)([A-Z])([a-z]+)([A-Z])([a-z]+) // etc
$1 $2$3 $4$5 // etc
but on 50 records of varying length, my idea is a poor solution. Is there a way to regex in a way that will be more dynamic? Thanks
A Java fragment I use looks like this (now revised):
result = source.replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
result = result.substring(0, 1).toUpperCase() + result.substring(1);
This, by the way, converts the string givenProductUPCSymbol into Given Product UPC Symbol - make sure this is fine with the way you use this type of thing
Finally, a single line version could be:
result = source.substring(0, 1).toUpperCase() + source(1).replaceAll("(?<=^|[a-z])([A-Z])|([A-Z])(?=[a-z])", " $1$2");
Also, in an Example similar to one given in the question comments, the string hiMyNameIsBobAndIWantAPuppy will be changed to Hi My Name Is Bob And I Want A Puppy
For the space problem it's easy if your language supports zero-width-look-behind
var result = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "(?<=[a-z])([A-Z])", " $1");
or even if it doesn't support them
var result2 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "([a-z])([A-Z])", "$1 $2");
I'm using C#, but the regexes should be usable in any language that support the replace using the $1...$n .
But for the lower-to-upper case you can't do it directly in Regex. You can get the first character through a regex like: ^[a-z] but you can't convet it.
For example in C# you could do
var result4 = Regex.Replace(result, "^([a-z])", m =>
{
return m.ToString().ToUpperInvariant();
});
using a match evaluator to change the input string.
You could then even fuse the two together
var result4 = Regex.Replace(#"thisIsAnExampleSentanceHereIsAnotherExampleWithMoreWordsInIt", "^([a-z])|([a-z])([A-Z])", m =>
{
if (m.Groups[1].Success)
{
return m.ToString().ToUpperInvariant();
}
else
{
return m.Groups[2].ToString() + " " + m.Groups[3].ToString();
}
});
A Perl example with unicode character support:
s/\p{Lu}/ $&/g;
s/^./\U$&/;

Regular Expression in Actionscript 3

I need a AS3 regular expression that allows me to find/replace in strings like these:
var str1:String = "<value1 att="1"> some text</value1>";
var str2:String = "<value1 att="1" var="a"> some text and more</value1>";
var str3:String = "<value1 att="ok" var="b" def="12"> some text</value1>";
to this:
str1 = "<value1 att="1">*some text</value1>";
str2 = "<value1 att="1" var="a">**some text and more</value1>";
str3 = "<value1 att="ok" var="b" def="12">*****some text</value1>";
I want to be able to replace the spaces at the beginning (inside the > <) for other character. It shouldn't affect the number of character at the right of the spaces or the attributes in the value1 definition.
Assuming that there are no "* " sequences in the text blocks, this should work:
var s:String = "<value1 att='ok' var='b' def='12'> some text</value1>";
//find all spaces after a tag closing bracket and replace with a *
s = s.replace(/>\s/g, ">*");
//find all spaces after a * and replace it with a *
//keep doing this until no more can be found
while (s.match(/>\*+\s/g).length) {
s = s.replace(/\*\s/g, "**");
}
I can't think of a way to do it in one replace though.
I think the easiest way to accomplish what you need is to use a function in replace() expression.
var replaceMethod:Function = function (match:String, tagName:String, tagContent:String, spaces:String, targetText:String, index:int, whole:String) : String
{
trace("\t", "found", spaces.length,"spaces in tag '"+tagName+"'");
trace("\t", "matched string:", match);
// check tag name or whatever you may want
// do something with found spaces
var replacement:String = spaces.replace(" ", "*");
return "<"+tagName+" "+tagContent+">"+replacement+targetText;
}
var str1:String = '<value1 att="1"> some text</value1>';
var exp:RegExp = /<(\w+)([ >].*?)>(\s+)(some text)/gm;
trace("before:", str1);
str1 = str1.replace(exp, replaceMethod);
trace("after:", str1);
It's not performance-safe though; if you are using huge blocks of text and/or launching this routine very frequently, you may want to do something more comlicated, but optimized. One optimization technique is reducing the number of arguments of replaceMathod().
p.s. I think this can be done with one replace() expression and without using replaceMethod(). Look at positive lookaheads and noncapturing groups, may be you can figure it out. http://livedocs.adobe.com/flex/3/html/help.html?content=12_Using_Regular_Expressions_09.html