Parsing a string by using regex - regex

I need to parse an input string that has the format of
AB~11111, AB~22222, AB~33333, AB~44444
into separate strings:
AB~11111, AB~22222, AB~33333, and AB~44444
Here is my attempted Regex:
range = "([^~,\n]+~[^,]+,)?";
non_delimiter = "[^,\n;]+";
range_regex = new RegExp(this.range + this.non_delimiter, 'g');
But somehow this regex would only parse the input string into
AB~11111, AB~22222 and AB~33333, AB~44444
instead of parsing the input string into individual strings.

Maybe this is missing the boat, but from your input what about something like:
AB~\d+
This should match each of the strings from the above: https://regex101.com/r/vVFDIG/1. And if there's variation (i.e., it can be other letters) then maybe something like:
[A-Z]{2}~\d+
Or whatever it would need to be but using the negative character class seems like quite a roundabout way of doing it. If that's the case, you could just do:
[^ ,]+

You should use a regex split here on ,\s*:
var input = "AB~11111, AB~22222, AB~33333, AB~44444";
var parts = input.split(/,\s*/);
console.log(parts);
If you need to check that the input also consists of CSV list of AB~11111 terms, then you may use test to assert that:
var input = "AB~11111, AB~22222, AB~33333, AB~44444";
console.log(/^[A-Z]{2}~\d{5}(?:,\s*[A-Z]{2}~\d{5})*$/.test(input));

Related

Regex match everything not between a pair of characters

Suppose a string (representing elapsed time in the format HH:MM:ss) like this:
"123:59:00"
I want to match everything except the numbers for the minutes, i.e.: the regex should match the bold parts and not the number between colons:
"123: 59 :00"
In the example, the 59 should be the only part unmatched.
Is there any way to accomplish this with a js regex?
EDIT: I'm asking explicitly for a regex, because I'm using the Notion Formula API and can only use JS regex here.
You don't necessarily need to use RegEx for this. Use split() instead.
const timeString = "12:59:00";
const [hours, _, seconds] = timeString.split(":");
console.log(hours, seconds);
If you want to use Regex you can use the following:
const timeString = "12:59:00";
const matches = timeString.match(/(?<hours>^\d{2}(?=:\d{2}:))|(?<seconds>(?<=:\d{2}:)\d{2}$)/g);
console.log(matches);
// if you want to include the colons use this
const matchesWithColons = timeString.match(/(?<hours>^\d{2}:(?=\d{2}:))|(?<seconds>(?<=:\d{2}):\d{2}$)/g);
console.log(matchesWithColons);
You can drop the named groups ?<hours> and ?<seconds>.
Using split() might be the most canonical way to go, but here is a regex approach using match():
var input = "123:59:00";
var parts = input.match(/^[^:]+|[^:]+$/g);
console.log(parts);
If you want to also capture the trailing/leading colons, then use this version:
var input = "123:59:00";
var parts = input.match(/^[^:]+:|:[^:]+$/g);
console.log(parts);
Could also work
^([0-9]{2})\:[0-9]{2}\:([0-9]{2})$/mg

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

add datetime as string to a string after matching a pattern in vb.net

I have this string for example: "Example_string.xml"
and i would like to add before the "." _DateTime of now so it will be like:
"Example_string_20151808185631.xml"
How can i achieve it? regex?
Yes, you can achieve that through the use of a look ahead. For instance:
Dim result As String = Regex.Replace("Example_string.xml", "(?=\.)", "_20151808185631")
Since the pattern only matches a position in the string (the position just before the period), rather than matching a portion of the text, the replace method doesn't actually replace any of the input text. It effectively just inserts the replacement text into that position in the string.
Alternatively, if you find that confusing, you could just match the period and then just include the period in the replacement text:
Dim result As String = Regex.Replace("Example_string.xml", "\.", "_20151808185631.")
If you don't want to just look for any period, and you want to be more safe about it (such as handling file names that contain multiple periods, then instead of \., you could use something like \.\w+$. However, if you need to make it that resilient, and it doesn't have to be done with RegEx, it would be better to use the Path.GetFileNameWithoutExtension and Path.GetExtension methods, as recommended by Crowcoder. For instance, you may also need to make it handle file names that have no extension, which even further complicates it.
or...
Path.GetFileNameWithoutExtension("Example_string.xml") + "_20151808185631" + Path.GetExtension("Example_string.xml")
How about:
Dim sFile As String = "Example_string.xml"
Dim sResult As String = sFile.ToLower.Replace(".xml", "_" & Format(Now(), "yyyyMMddHHmmss") & ".xml")
MsgBox(sresult, , sFile)

Splitting a string based on positions with regex

I need to convert this (date) String "12112014" to "12.11.2014"
What i would like to to is:
Split first 2 Strings "12", add ".",
then split the string from 3-4 to get "11", add "."
at the end split the last 4 strings (or 5-8) to get "2012"
I already found out how to get the first 2 characters ( "^\d{2}" ), but I failed to get characters based on a position.
Whatever be the programming language, You should try to extract the digits from string and then join them with a ".".
In perl, it can be done as :
$_ = '12112014';
s/(\d{2})(\d{2})(\d{4})/$1.$2.$3/;
print "$_";
Without you specifying the language you're after, I've picked javascript:
var s = '12012011';
var s2 = s.replace(/(\d{2})(\d{2})(\d{4})/,'$1.$2.$3'));
console.log(s2); // prints "12.01.2011"
The gist of it is that you use () to specify groups inside your regular expression and then can use the groups in your replace expression.
Same in Java:
String s = "12012011";
String s2 = s.replaceAll("(\\d{2})(\\d{2})(\\d{4})", "$1.$2.$3");
System.out.println(s2);
I dont think that you could do that only with split.
You could expand your expression to:
"(^(\d{2})(\d{2})(\d{4}))"
Then access the groups with the Regex language of your choice and build the string you want.
Note that - besides all regex learning - alternatively you could always parse the original string into strongly typed Date or DateTime variables and output the value using the appropriate locales.

How to print an integer with a thousands separator in Matlab?

I would like to turn a number into a string using a comma as a thousands separator. Something like:
x = 120501231.21;
str = sprintf('%0.0f', x);
but with the effect
str = '120,501,231.21'
If the built-in fprintf/sprintf can't do it, I imagine cool solution could be made using regular expressions, perhaps by calling Java (which I assume has some locale-based formatter), or with a basic string-insertion operation. However, I'm not an expert in either Matlab regexp's or calling Java from Matlab.
Related question: How can I print a float with thousands separators in Python?
Is there any established way to do this in Matlab?
One way to format numbers with thousands separators is to call the Java locale-aware formatter. The "formatting numbers" article at the "Undocumented Matlab" blog explains how to do this:
>> nf = java.text.DecimalFormat;
>> str = char(nf.format(1234567.890123))
str =
1,234,567.89
where the char(…) converts the Java string to a Matlab string.
voilà!
Here's the solution using regular expressions:
%# 1. create your formated string
x = 12345678;
str = sprintf('%.4f',x)
str =
12345678.0000
%# 2. use regexprep to add commas
%# flip the string to start counting from the back
%# and make use of the fact that Matlab regexp don't overlap
%# The three parts of the regex are
%# (\d+\.)? - looks for any number of digits followed by a dot
%# before starting the match (or nothing at all)
%# (\d{3}) - a packet of three digits that we want to match
%# (?=\S+) - requires that theres at least one non-whitespace character
%# after the match to avoid results like ",123.00"
str = fliplr(regexprep(fliplr(str), '(\d+\.)?(\d{3})(?=\S+)', '$1$2,'))
str =
12,345,678.0000