Trying to replace text string is notepad++ with wildcards - regex

I am trying to replace text in a kicad program using notepad++. I am having trouble using wild cards.
This string I am trying to find is one similar to this...
(fp_text reference J2 (at -8.30084 1.4004 270)
J2 is a wild card, but will not be changed and it can be anywhere from 2 to 5 characters long)
-8.30084 can be any number that I want to change to zero
1.4004 can be any number that I want to change to zero
270 will not change, no matter what the number is.
In the end, I want the string to be
(fp_text reference J2 (at 0 0 270)

If in understand correctly you're looking for a regex to match that and replace the first and second (but not the third) number with 0. Without knowing what are valid characters for the token you have as J2 I'll assume that it's any non-space character.
You can reference a capture group within your replacement string. So you can capture the parts you want to preserve. (In the example below I also capture other unknown parts of the string, but that's not really necessary.
The regex should be something like:
(\S)\s\(at ([-+]?\d*\.?\d+) ([-+]?\d*\.?\d+) ([-+]?\d*\.?\d+)\)
And your replacement will be something like:
\1 (at 0 0 \4)

Related

matlab: truncate large text and append '...'

I have a large array of text (text, stored as cell-array), that I want to truncate in matlab, say for 5 characters. Truncating with regexprep is quite efficient, but now, I would love to append a '...' at the end of every truncated match (and only there).
(How) can this be achieved within MATLAB's regexprep?
>> text = {'123456780','1','12'}; %<- small representative sample
>> regexprep(text,'(^.{0,5})(.*)','$1') %capture first 5 characters or less in first group (and replace the text with first group captures)
ans =
1×3 cell array
{'12345'} {'1'} {'12'}
it should read:
ans =
1×3 cell array
{'12345...'} {'1'} {'12'}
You need to use
regexprep(text,'^(.{5}).+','$1...')
See the regex demo.
The main point is that you need to only trigger the replacement if a string is linger than five chars (else, you do not even need to truncate the string).
Note that regexprep returns the input string as is if there was no regex match found, thus you do not need to worry about strings that are zero to five chars long.
Details:
^ - start of string
(.{5}) - Capturing group 1 ($1): any five chars
.+ - any one or more chars, as many as possible.
Note that the string 12345... is in fact 8 characters long. You don't want to make the mistake of truncating 1234567 to 12345..., as the truncated version is longer and therefore shouldn't be truncated in the first place.
A solution that takes this into account is:
regexprep(text,'^(.{5}).{3}.+','$1...')
which will only truncate if there are more than 8 characters and, if so, will display the first 5 with the trailing ellipsis.

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Regex allow sequence to end right in the middle

I'm trying to create a regex that would match any string containing only the '0' and '1' char as long as it does not contain the specific sequence "00" or "010".
My idea was something like this
1*(011+)*
But a problem shows up, if the string ended with a 0, like 1110, then it should be valid. With my regex, any 0 has to be followed by two or more ones, but I can't figure out how to make this specific exception.
I need a regex that would force any 0 to be followed by two or more ones OR to end. This would allow sequences ending in 0 and 01, in addition to the obvious "ends with 1" case where, as in "111111", the unautorized sequences are not present.
How can I "cut off short" a condition in a regex, allowing it to either go on according to my rules, or to end right there?
You can fix your regex like this:
1*(011+)*(01?)?
Add an (optional) group at the end that matches incomplete zero groups, i.e. 0 and 01.

Regex - select all text that does not start with a specific number

I want to get all text that does not start with 1,2,12,34.
I wrote
^((?!1|2|12|34).)*$
(^ asserts position at start of a line)
as in:
https://regex101.com/r/gI6sN8/14
Problems
It also doesn't select text that has 1 or 2 in the middle ("AB 1 CD").
It also doesn't select 13 (because it starts with 1)
How can I restrict it
Looks like you want this:
^(?!(1|2|12|34)\s).*
https://regex101.com/r/gI6sN8/16
As mentioned in comment, you need word boundary and correct parenthesis position
^(?!(?:1|2|12|34)\b)(.*)$
Regex Demo
You can also use \D
^(?!(?:1|2|12|34)\D)(.*)$
In your regex
^((?!1|2|12|34).)*$
you are finding whether any of the above alternative 1|2|12|34 is correct at every position. That's why it's not matching AB 1 CD
This works
^(?!(?:12?|2|34)(?!\d)).+$
https://regex101.com/r/gI6sN8/19
A valid boundary between the numbers you don't want it to
start with and the character after it appears to be any non-digit.

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})