I have a large array of text (text, stored as cell-array), that I want to truncate in matlab, say for 5 characters. Truncating with regexprep is quite efficient, but now, I would love to append a '...' at the end of every truncated match (and only there).
(How) can this be achieved within MATLAB's regexprep?
>> text = {'123456780','1','12'}; %<- small representative sample
>> regexprep(text,'(^.{0,5})(.*)','$1') %capture first 5 characters or less in first group (and replace the text with first group captures)
ans =
1×3 cell array
{'12345'} {'1'} {'12'}
it should read:
ans =
1×3 cell array
{'12345...'} {'1'} {'12'}
You need to use
regexprep(text,'^(.{5}).+','$1...')
See the regex demo.
The main point is that you need to only trigger the replacement if a string is linger than five chars (else, you do not even need to truncate the string).
Note that regexprep returns the input string as is if there was no regex match found, thus you do not need to worry about strings that are zero to five chars long.
Details:
^ - start of string
(.{5}) - Capturing group 1 ($1): any five chars
.+ - any one or more chars, as many as possible.
Note that the string 12345... is in fact 8 characters long. You don't want to make the mistake of truncating 1234567 to 12345..., as the truncated version is longer and therefore shouldn't be truncated in the first place.
A solution that takes this into account is:
regexprep(text,'^(.{5}).{3}.+','$1...')
which will only truncate if there are more than 8 characters and, if so, will display the first 5 with the trailing ellipsis.
Take for example these two lines:
# ManyRandomCharacters 1 2 3
ManyRandomDifferentCharacters 4 5 6
I'd like a regex such that it finds the numbers at the end but only for the line that doesn't begin with #. I just want to match the numbers, not the whole line (i.e., I just want "4", "5" and "6", not "1", "2" or "3"). That's the tricky part, because everything I tried selects all the line up to the numbers. Is there a way to do this? Thanks!
^ matches the start of the string, and if the multiline flag is set (depends on implementation, usually m), it detects also the start of a line.
So, something like /^(?:[^#].*)(\d+ \d+ \d+)/gm would match any line whose first character isn't #.
I have a C# project that requires me to capture a string value from a html stream.
The pattern I need to match is:
XXXX-abc
Where:
XXXX = a 4 character integer
followed by a -
abc = a 3 character alphanumeric.
I looked at txt2re.com and got
string re1="(\\d)"; // Any Single Digit 1
string re2="(\\d)"; // Any Single Digit 2
string re3="(\\d)"; // Any Single Digit 3
string re4="(\\d)"; // Any Single Digit 4
string re5="(-)"; // Any Single Character 1
string re6="((?:[a-z][a-z]*[0-9]+[a-z0-9]*))"; // Alphanum 1
The thing I am having difficulty with is combining it into one expression instead of 6.
I know I can do:
Regex r = new Regex(re1+re2+re3+re4+re5+re6,RegexOptions.IgnoreCase|RegexOptions.Singleline);
However, my OCD cringes at this method :)
You can use the expresion \d{4}-\w{3} 4 digits follow by - follow by 3 alphanumerical characters. Here is a good site to test and learn about the regular expresion.
I'm currently using the pattern: \b\d+\b, testing it with these entries:
numb3r
2
3454
3.214
test
I only want it to catch 2, and 3454. It works great for catching number words, except that the boundary flags (\b) include "." as consideration as a separate word. I tried excluding the period, but had troubles writing the pattern.
Basically I want to remove integer words, and just them alone.
All you want is the below regex:
^\d+$
Similar to manojlds but includes the optional negative/positive numbers:
var regex = /^[-+]?\d+$/;
EDIT
If you don't want to allow zeros in the front (023 becomes invalid), you could write it this way:
var regex = /^[-+]?[1-9]\d*$/;
EDIT 2
As #DmitriyLezhnev pointed out, if you want to allow the number 0 to be valid by itself but still invalid when in front of other numbers (example: 0 is valid, but 023 is invalid). Then you could use
var regex = /^([+-]?[1-9]\d*|0)$/
You could use lookaround instead if all you want to match is whitespace:
(?<=\s|^)\d+(?=\s|$)
This just allow positive integers.
^[0-9]*[1-9][0-9]*$
I would add this as a comment to the other good answers, but I need more reputation to do so. Be sure to allow for scientific notation if necessary, i.e. 3e4 = 30000. This is default behavior in many languages. I found the following regex to work:
/^[-+]?\d+([Ee][+-]?\d+)?$/;
// ^^ If 'e' is present to denote exp notation, get it
// ^^^^^ along with optional sign of exponent
// ^^^ and the exponent itself
// ^ ^^ The entire exponent expression is optional
This solution matches integers:
Negative integers are matched (-1,-2,etc)
Single zeroes are matched (0)
Negative zeroes are not (-0, -01, -02)
Empty spaces are not matched ('')
/^(0|-*[1-9]+[0-9]*)$/
^([+-]?[0-9]\d*|0)$
will accept numbers with leading "+", leading "-" and leadings "0"
Try /^(?:-?[1-9]\d*$)|(?:^0)$/.
It matches positive, negative numbers as well as zeros.
It doesn't match input like 00, -0, +0, -00, +00, 01.
Online testing available at http://rubular.com/r/FlnXVL6SOq
^(-+)?[1-9][0-9]*$
starts with a - or + for 0 or 1 times, then you want a non zero number (because there is not such a thing -0 or +0) and then it continues with any number from 0 to 9
This worked in my case where I needed positive and negative integers that should NOT include zero-starting numbers like 01258 but should of course include 0
^(-?[1-9]+\d*)$|^0$
Example of valid values:
"3",
"-3",
"0",
"-555",
"945465464654"
Example of not valid values:
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"+100",
"--1",
"-.1",
"-0",
"00099",
"099"
The text looks like this:
"Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It 4 holes."
There may not be a dot at the end of each list element.
How can I split this text into a list as shown below?
"Beginning."
"The container is 1.5 meters long"
"It can hold up to 2lt of fluid."
"It has 4 holes."
In other words I need to match (\d+)\. such that all (\d+) are consecutive integers so that I can split and trim the text between them. Is it possible with regex? How far do I have to venture into the realm of computer science?
Use
\d+\.(?!\d)
as the splitting regex, i. e. in PHP
$result = preg_split('/\d+\.(?!\d)/', $subject);
The negative lookahead (?!\d) ensures that no digit follows after the dot has been matched.
Or make the spaces mandatory - if that's an option:
$result = preg_split('/\s+\d+\.\s+/', $subject);
This is working c# code:
string s = "Beginning. 1. The container is 1.5 meters long 2. It can hold up to 2lt of fluid. 3. It has 4 holes.";
string[] res = Regex.Split(s, #"\s*\d+\.\s+");
foreach (var r in res)
{
Console.WriteLine(r);
}
Console.ReadLine();
I split on \s*\d+\.\s+ that means optional white space, followed by at least one digit ,followed by a dot, then at least one whitespace.