matlab: truncate large text and append '...' - regex

I have a large array of text (text, stored as cell-array), that I want to truncate in matlab, say for 5 characters. Truncating with regexprep is quite efficient, but now, I would love to append a '...' at the end of every truncated match (and only there).
(How) can this be achieved within MATLAB's regexprep?
>> text = {'123456780','1','12'}; %<- small representative sample
>> regexprep(text,'(^.{0,5})(.*)','$1') %capture first 5 characters or less in first group (and replace the text with first group captures)
ans =
1×3 cell array
{'12345'} {'1'} {'12'}
it should read:
ans =
1×3 cell array
{'12345...'} {'1'} {'12'}

You need to use
regexprep(text,'^(.{5}).+','$1...')
See the regex demo.
The main point is that you need to only trigger the replacement if a string is linger than five chars (else, you do not even need to truncate the string).
Note that regexprep returns the input string as is if there was no regex match found, thus you do not need to worry about strings that are zero to five chars long.
Details:
^ - start of string
(.{5}) - Capturing group 1 ($1): any five chars
.+ - any one or more chars, as many as possible.

Note that the string 12345... is in fact 8 characters long. You don't want to make the mistake of truncating 1234567 to 12345..., as the truncated version is longer and therefore shouldn't be truncated in the first place.
A solution that takes this into account is:
regexprep(text,'^(.{5}).{3}.+','$1...')
which will only truncate if there are more than 8 characters and, if so, will display the first 5 with the trailing ellipsis.

Related

extract text data using regexp in MATLAB

I'm dealing with extracting visibility data in METAR(airport weather observation data).
Visibility is a 4 digit(0~9) data, and can also be expressed as'CAVOK' when visibility is good.
but it's quite tricky to use regexp. (METAR data have many variations.)
Data sample(MET_VIS) below:
201903072300 METAR RKPC 072300Z 17003KT 110V210 CAVOK 05/02 Q1026 NOSIG=
201903062000 METAR RKPC 062000Z 33018G29KT 4000 BR FEW012 SCT025 08/04 Q1018 WS R13 R31 NOSIG=
201903062200 METAR RKPC 062200Z 33015KT 290V350 9999 SCT030 07/03 Q1019 NOSIG=
201903080000 METAR RKPC 080000Z 29002KT CAVOK 08/02 Q1027 NOSIG=
I want to extract CAVOK, 4000, 9999, CAVOK on each line.
I tried but this code doesn't work with line 3 :( It returns blank.
regexp(MET_VIS(i),'((?<=KT\s)\d{4})|CAVOK','match')
The third value does not end on KT. What you might do is use another positive lookbehind to check if the string before it ends on KT and match a range of matching 7 times A-Z0-9 followed by a whitespace char after it.
Then you either match 4 digits or CAVOK using an alternation (?:\d{4}|CAVOK) or else you could match CAVOK anywhere in the string.
Add a word boundary after it to prevent the match being part of a larger word.
(?:(?<=KT\s)|(?<=KT [A-Z0-9]{7}\s))(?:\d{4}|CAVOK)\b
Regex demo
You could also make an assumption about the range of "words" from the end your target should be allowed to occur in. For example:
/\b(?:\d{4}|CAVOK)\b(?=(?: \S+){3,9}$)/gm
See regex demo.
Here we're looking for a four-digit number or the phrase CAVOK only, if it is followed by 3 to 9 non-space substrings of variable length until the end of the line.

Postgres Regex match the first 11 char unless there is a space or dash

12345678-1
12345678 1356456456456
221345243545634563546
using the above strings, i am trying to match the first 11 characters unless thee is a dash or a space, then grab everything (only first 8) up until the space/ dash...
i have tried \^(.*?)-\ which grabs the first 8 of the first string only (as expected. or \^(.*?) \which rightfully grabs the first 8 of the second string. But \^(.*?)(-| )\ doesn't work. Nor does `([0-9]{8,11}) as this just skips over the space and includes the extra bits...
How can I only pull the first 11 numbers unless there is a dash or space, then pull everything up to the dash/ space (fixed 8 chars)?
Add an anchor to start:
^[0-9]{8,11}

Trying to replace text string is notepad++ with wildcards

I am trying to replace text in a kicad program using notepad++. I am having trouble using wild cards.
This string I am trying to find is one similar to this...
(fp_text reference J2 (at -8.30084 1.4004 270)
J2 is a wild card, but will not be changed and it can be anywhere from 2 to 5 characters long)
-8.30084 can be any number that I want to change to zero
1.4004 can be any number that I want to change to zero
270 will not change, no matter what the number is.
In the end, I want the string to be
(fp_text reference J2 (at 0 0 270)
If in understand correctly you're looking for a regex to match that and replace the first and second (but not the third) number with 0. Without knowing what are valid characters for the token you have as J2 I'll assume that it's any non-space character.
You can reference a capture group within your replacement string. So you can capture the parts you want to preserve. (In the example below I also capture other unknown parts of the string, but that's not really necessary.
The regex should be something like:
(\S)\s\(at ([-+]?\d*\.?\d+) ([-+]?\d*\.?\d+) ([-+]?\d*\.?\d+)\)
And your replacement will be something like:
\1 (at 0 0 \4)

Perl regex match based on no. of chars

I get input string like:
A BC Y
Or
A BCY
The point being, it is position based,
i.e I have to parse First 1 char as one string, next 7 as another string, next 1 as another string and the tricky part being last one string as another string (Which is optional in input)
i.e input line length can be 9 chars or 10 chars.
I am supposed to parse this and get 4 Strings.
later I will put these strings in Database and do further processing.
I am using regex like
s/(.{1})(.{7})(.{1})(.{1})/
And copying this values in 4 variables.
But the problem is it works only when the length of line is exactly 10 chars (When we have last char).
When length is 9 chars (last optional char Y is missing) Then the regex does not match the line and thus no parsing.
Long story short, How can I modify the regex to make the last 1 char optional for parsing.
Thanks in advance.
P.S: Question may sound very trivial to experts, But....
You could almost certainly have solved this for yourself by reading either the perlre or perlretut manual pages.
As others have pointed out, the ? marks a regex atom as being optional. You can also simplify your regex by omitting all of the {1} sequences.
/(.)(.{7})(.)(.)?/
Use ? for optional (0 or 1) match
/(.{1})(.{7})(.{1})(.{1})?/
Or more concisely
/(.)(.{7})(.)(.)?/

Need to capture single character, but ignore digit

I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)