Regexp to match only n occurrences of a char in a string - regex

I have a string like 2005:10:29 12:23:53 and I wish to replace only first two occurrences of : with -
Expected result 2005-10-29 12:23:53
EDIT:
I need this regexp in KDE's krename tool, where I can't edit/format the original [exifExif.Image.DateTime] witch returns the unwanted 2005:10:29 12:23:53 format, but there is a Find and Replace to post process the String
(?<=\d{4}):|:(?=\d{2}\s) does the job on rubular, but does not in KDE :(
I am sure there are more solutions.
EDIT:
:(?=\d{2}:\d{2}\s)|:(?=\d{2}\s) works even on KDE
I find this solution after I read
You can use a full-fledged regular expression inside the lookahead.
Most regular expression engines only allow literal characters and
alternation inside lookbehind, since they cannot apply regular
expression backwards.
in Regex tutorial

In Ruby, as scibuff suggests, you're probably better not using Regexps.
require 'date'
date = DateTime.parse("2005:10:29 12:23:53", "%Y:%m:%d %H:%M:%S")
date.strftime("%Y-%m-%d %H:%M:%S")

JavaScript:
Version 1
str = str.split(' ')[0].replace(/\:/g,'-')+' '+str.split(' ')[1]
Version 2
str = str.replace(/(\d{4}):(\d{2}):(\d{2})(.*)/,"$1-$2-$3 $4")
DEMO

Once again using regular expressions for something that can be achieved in a simpler, more elegant and more efficient way
var date = new Date('2005:10:29 12:23:53');
then format date accordingly, e.g.
function formatDate( date ){
return date.getFullYear() + '-' + ( get.getMonth() + 1 ) + '-' + ... ;
}

Simply call replace() twice:
"2005:10:29 12:23:53".replace(/:/,'-').replace(/:/,'-')

Related

Regex match everything not between a pair of characters

Suppose a string (representing elapsed time in the format HH:MM:ss) like this:
"123:59:00"
I want to match everything except the numbers for the minutes, i.e.: the regex should match the bold parts and not the number between colons:
"123: 59 :00"
In the example, the 59 should be the only part unmatched.
Is there any way to accomplish this with a js regex?
EDIT: I'm asking explicitly for a regex, because I'm using the Notion Formula API and can only use JS regex here.
You don't necessarily need to use RegEx for this. Use split() instead.
const timeString = "12:59:00";
const [hours, _, seconds] = timeString.split(":");
console.log(hours, seconds);
If you want to use Regex you can use the following:
const timeString = "12:59:00";
const matches = timeString.match(/(?<hours>^\d{2}(?=:\d{2}:))|(?<seconds>(?<=:\d{2}:)\d{2}$)/g);
console.log(matches);
// if you want to include the colons use this
const matchesWithColons = timeString.match(/(?<hours>^\d{2}:(?=\d{2}:))|(?<seconds>(?<=:\d{2}):\d{2}$)/g);
console.log(matchesWithColons);
You can drop the named groups ?<hours> and ?<seconds>.
Using split() might be the most canonical way to go, but here is a regex approach using match():
var input = "123:59:00";
var parts = input.match(/^[^:]+|[^:]+$/g);
console.log(parts);
If you want to also capture the trailing/leading colons, then use this version:
var input = "123:59:00";
var parts = input.match(/^[^:]+:|:[^:]+$/g);
console.log(parts);
Could also work
^([0-9]{2})\:[0-9]{2}\:([0-9]{2})$/mg

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

Regex to select semicolons that are not enclosed in double quotes

I have string like
a;b;"aaa;;;bccc";deef
I want to split string based on delimiter ; only if ; is not inside double quotes. So after the split, it will be
a
b
"aaa;;;bccc"
deef
I tried using look-behind, but I'm not able to find a correct regular expression for splitting.
Regular expressions are probably not the right tool for this. If possible you should use a CSV library, specify ; as the delimiter and " as the quote character, this should give you the exact fields you are looking for.
That being said here is one approach that works by ensuring that there are an even number of quotation marks between the ; we are considering the split at and the end of the string.
;(?=(([^"]*"){2})*[^"]*$)
Example: http://www.rubular.com/r/RyLQyR8F19
This will break down if you can have escaped quotation marks within a string, for example a;"foo\"bar";c.
Here is a much cleaner example using Python's csv module:
import csv, StringIO
reader = csv.reader(StringIO.StringIO('a;b;"aaa;;;bccc";deef'),
delimiter=';', quotechar='"')
for row in reader:
print '\n'.join(row)
Regular expression will only get messier and break on even minor changes. You are better off using a csv parser with any scripting language. Perl built in module (so you don't need to download from CPAN if there are any restrictions) called Text::ParseWords allows you to specify the delimiter so that you are not limited to ,. Here is a sample snippet:
#!/usr/local/bin/perl
use strict;
use warnings;
use Text::ParseWords;
my $string = 'a;b;"aaa;;;bccc";deef';
my #ary = parse_line(q{;}, 0, $string);
print "$_\n" for #ary;
Output
a
b
aaa;;;bccc
deef
This is kind of ugly, but if you don't have \" inside your quoted strings (meaning you don't have strings that look like this ("foo bar \"badoo\" goo") you can split on the " first and then assume that all your even numbered array elements are, in fact, strings (and split the odd numbered elements into their component parts on the ; token).
If you *do have \" in your strings, then you'll want to first convert those into some other temporary token that you'll convert back later after you've performed your operation.
Here's a fiddle...
http://jsfiddle.net/VW9an/
var str = 'abc;def;ghi"some other dogs say \\"bow; wow; wow\\". yes they do!"and another; and a fifth'
var strCp = str.replace(/\\"/g,"--##--");
var parts = strCp.split(/"/);
var allPieces = new Array();
for(var i in parts){
if(i % 2 == 0){
var innerParts = parts[i].split(/\;/)
for(var j in innerParts)
allPieces.push(innerParts[j])
}
else{
allPieces.push('"' + parts[i] +'"')
}
}
for(var a in allPieces){
allPieces[a] = allPieces[a].replace(/--##--/g,'\\"');
}
console.log(allPieces)
Match All instead of Splitting
Answering long after the battle because no one used the way that seems the simplest to me.
Once you understand that Match All and Split are Two Sides of the Same Coin, you can use this simple regex:
"[^"]*"|[^";]+
See the matches in the Regex Demo.
The left side of the alternation | matches full quoted strings
The right side matches any chars that are neither ; nor "

Regular expression any character with dynamic size

I want to use a regular expression that would do the following thing ( i extracted the part where i'm in trouble in order to simplify ):
any character for 1 to 5 first characters, then an "underscore", then some digits, then an "underscore", then some digits or dot.
With a restriction on "underscore" it should give something like that:
^([^_]{1,5})_([\\d]{2,3})_([\\d\\.]*)$
But i want to allow the "_" in the 1-5 first characters in case it still match the end of the regular expression, for example if i had somethink like:
to_to_123_12.56
I think this is linked to an eager problem in the regex engine, nevertheless, i tried to do some lazy stuff like explained here but without sucess.
Any idea ?
I used the following regex and it appeared to work fine for your task. I've simply replaced your initial [^_] with ..
^.{1,5}_\d{2,3}_[\d\.]*$
It's probably best to replace your final * with + too, unless you allow nothing after the final '_'. And note your final part allows multiple '.' (I don't know if that's what you want or not).
For the record, here's a quick Python script I used to verify the regex:
import re
strs = [ "a_12_1",
"abc_12_134",
"abcd_123_1.",
"abcde_12_1",
"a_123_123.456.7890.",
"a_12_1",
"ab_de_12_1",
]
myre = r"^.{1,5}_\d{2,3}_[\d\.]+$"
for str in strs:
m = re.match(myre, str)
if m:
print "Yes:",
if m.group(0) == str:
print "ALL",
else:
print "No:",
print str
Output is:
Yes: ALL a_12_1
Yes: ALL abc_12_134
Yes: ALL abcd_134_1.
Yes: ALL abcde_12_1
Yes: ALL a_123_123.456.7890.
Yes: ALL a_12_1
Yes: ALL ab_de_12_1
^(.{1,5})_(\d{2,3})_([\d.]*)$
works for your example. The result doesn't change whether you use a lazy quantifier or not.
While answering the comment ( writing the lazy expression ), i saw that i did a mistake... if i simply use the folowing classical regex, it works:
^(.{1,5})_([\\d]{2,3})_([\\d\\.]*)$
Thank you.

RegEx for a price in £

i have: \£\d+\.\d\d
should find: £6.95 £16.95 etc
+ is one or more
\. is the dot
\d is for a digit
am i wrong? :(
JavaScript for Greasemonkey
// ==UserScript==
// #name CurConvertor
// #namespace CurConvertor
// #description noam smadja
// #include http://www.zavvi.com/*
// ==/UserScript==
textNodes = document.evaluate(
"//text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /\£[0-9]\+.[0-9][0-9];
var replace = 'pling';
for (var i=0;i<textNodes.snapshotLength;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
when i change the regex to /Free for example it finds and changes. but i guess i am missing something!
Had this written up for your last question just before it was deleted.
Here are the problems you're having with your GM script.
You're checking absolutely every
text node on the page for some
reason. This isn't causing it to
break but it's unnecessary and slow.
It would be better to look for text
nodes inside .price nodes and .rrp
.strike nodes instead.
When creating new regexp objects in
this way, backslashes must be
escaped, ex:
var searchRE = new
RegExp('\\d\\d','gi');
not
var
searchRE = new RegExp('\d\d','gi');
So you can add the backslashes, or
create your regex like this:
var
searchRE = /\d\d/gi;
Your actual regular expression is
only checking for numbers like
##ANYCHARACTER##, and will ignore £5.00 and £128.24
Your replacement needs to be either
a string or a callback function, not
a regular expression object.
Putting it all together
textNodes = document.evaluate(
"//p[contains(#class,'price')]/text() | //p[contains(#class,'rrp')]/span[contains(#class,'strike')]/text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}
for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
var node = textNodes.snapshotItem(i);
node.data = node.data.replace(searchRE, replace);
}
Changes:
Xpath now includes only p.price and p.rrp span.strke nodes
Search regular expression created with /regex/ instead of new RegExp
Search variable now includes target currency symbol
Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.
Hope that helps!
[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.
You are not wrong, but there are a few things to watch out for:
The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.
To get a better answer, say which language you are using, and preferably also post your source code.
£[0-9]+(,[0-9]{3})*\.[0-9]{2}$
this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.
The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23
Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use
£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$
Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/
It depends on what flavour of regex you are using - what is the programming language?
some older versions of regex require the + to be escaped - sed and vi for example.
Also some older versions of regex do not recognise \d as matching a digit.
Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.
Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):
>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>
£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.
In Ruby you could just write the following
/£\d+.\d{2}/
Using the braces to specify number of digits after the point makes it slightly clearer