matlab regexp exclude specific set of file extensions [duplicate] - regex

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 2 years ago.
I want to exclude a set of file extensions and otherwise list folder contents.
%get filenames in current directory
p = dir(pwd);
p = {p.name};
p = p(1:min(end,20))'
%construct regular expression
%exclude = {'ini','m'}; %just for your convenience
reg = '\.(^ini|m)$';
%actually print file names/paths of files without a certain extension
regexpi(p,reg,'match','once')
This, however, does not work. How can I get the files that exclude these file extensions (last X amount of characters in path)? I tried [^abc] but this excludes individual characters, which I don't want. Please use regexp or regexprep in your answer

You wrote:
reg = '\.(^ini|m)$';
The ^ caret anchor comes after a . dot,
so it will never match start-of-string.
Remove it FTW:
reg = '\.(ini|m)$';

Related

Pattern matchin using regex in grep in end of line [duplicate]

This question already has answers here:
RegEx to match full string
(4 answers)
Closed 3 years ago.
I have a file, let's say abc.txt, which contains below kind of data:
AB8PDSYU_DFRH
AB8PDSPO_RET
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
My expected output is just to be in format A{either B0 or B8}PDS{exactly 2 char}_{exactly 4 char}, as per this rule, my output should be only:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSQW_GTJY
I am using the below grep command:
grep -E '^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}' abc.txt
and getting output:
AB8PDSYU_DFRH
AB8PDSYT_DPRO
AB0PDSTR_GHRJT
AB0PDSQW_GTJY
I have mentioned [[:alpha:]]{4}, which ideally should match exactly 4 alphabets only. But, it is not working like this and giving me AB0PDSTR_GHRJT as well in the output.
Please let me know what I am missing here.
You need to add a way to detect that you want nothing more after for your match, or else it matches a part of the line, like $ to precise the end of the string, or [[:space]] (equivalent to \s) for any whitespace.
I'm no expert in grep, depending on if it treats it multiline or not, one of these should work:
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}$
^A(B0|B8)PDS[[:alpha:]]{2}_[[:alpha:]]{4}($|[[:space]])

How to replace this string with re.sub? [duplicate]

This question already has answers here:
python regular expression replacing part of a matched string
(5 answers)
Closed 4 years ago.
Assume two strings, foo.example.com/1.2.3/sdk-foo-bar.min.js and foo.example.com/1.2.3/sdk-foo-bar-dev.min.js.
By default, the first one is used in the HTML code, but depending on a parameter, I need to replace it with the second (i.e. add the -dev).
I have a regex already (foo\.example\.com/1\.2\.3/(sdk-foo-bar).min\.js) that looks for the string and captures the group sdk-foo-bar, but how can I now replace this group with sdk-foo-bar-dev??
inp = 'foo.example.com/1.2.3/sdk-foo-bar.min.js'
m = re.search('(^.*)(.min.js)', inp)
if m:
print ('%s-%s%s' % (m.group(1), 'dev', m.group(2)))

I have two regex in .net preRegex = "(?<=(^|[^A-Za-z0-9]+))" postRegex = "(?=([^A-Za-z0-9]+)|$)" . What is the alternate of it in python? [duplicate]

This question already has answers here:
Python Regex Engine - "look-behind requires fixed-width pattern" Error
(3 answers)
Closed 5 years ago.
I have two regex strings in .net
preRegex = "(?<=(^|[^A-Za-z0-9]+))"
postRegex = "(?=([^A-Za-z0-9]+)|$)"
I want their alternative in python. My problem is let say I have a string
s="I am in aeroplane."
What I need is to replace "aeroplane with aeroPlain" so I want to make regex like
newKey = preRegex + aeroplane + postRegex
pattern = re.compile(newKey,re.IGNORECASE)
new regex string look like
(?<=(^|[^A-Za-z0-9]+))aeroplane(?=([^A-Za-z0-9]+)|$)
but it is giving me error "look-behind requires fixed-width pattern".
I am new to python, Help would be appreciated. Thanks
You can use the following regex:
(^|[^A-Za-z0-9]+)aeroplane([^A-Za-z0-9]+|$)
and when you replace, you can call the back reference to the first and second part of your regex to fetch their value.
Replacement string will be something like '\1aeroPlain\2'.
For more information on backref in python:https://docs.python.org/3/howto/regex.html
Good luck!

How to find a specific string followed by a number, with any number of characters between? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I'm trying to write a regex for the following pattern:
[MyLiteralString][0 or more characters without restriction][at least 1 digit]
I thought this should do it:
(theColumnName)[\s\S]*[\d]+
As it looks for the literal string theColumnName, followed by any number of characters (whitespace or otherwise), and then at least one digit. But this matches more than I want, as you can see here:
https://www.regex101.com/r/HBsst1/1
(EDIT) Second set of more complex data - https://www.regex101.com/r/h7PCv7/1
Using the sample data in that link, I want the regex to identify the two occurrences of theColumnName] VARCHAR(10) and nothing more.
I have 300+ sql scripts which containing create statements for every type of database object: procedures, tables, triggers, indexes, functions -- everything. Because of that, I can't be too strict with my regex.
A stored procedure's file might include text like LEFT(theColumnName, 10) which I want to identify.
A create table statement would be like theColumnName VARCHAR(12).
So it needs to be very flexible as the number(s) isn't always the same. Sometimes it's 10, sometimes it's 12, sometimes it's 51 -- all kinds of different numbers.
Basically, I'm looking for the regex equivalent of this C# code:
//Get file data
string[] lines = File.ReadAllLines(filePath);
//Let's assume the first line contains 'theColumnName'
int theColumnNameIndex = lines[0].IndexOf("theColumnName");
if (theColumnNameIndex >= 0)
{
//Get the text proceeding 'theColumnName'
string temp = lines[0].Remove(0, theColumnNameIndex + "theColumnNameIndex".Length;
//Iterate over our substring
foreach (char c in temp)
{
if (Char.IsDigit(c))
//do a thing
}
}
(theColumnName).*?[\d]+
That'll make it stop capturing after the first number it sees.
The difference between * and *? is about greediness vs. laziness. .*\d for example would match abcd12ad4 in abcd12ad4, whereas .*?\d would have its first match as abcd1. Check out this page for more info.
Btw, if you don't want to match newlines, use a . (period) instead of [\s\S]

Regex in R: match everything but not "some string" [duplicate]

This question already has answers here:
How can I remove all objects but one from the workspace in R?
(14 answers)
Remove all punctuation except apostrophes in R
(4 answers)
Closed 9 years ago.
The answers to another question explain how to match a string not containing a word.
The problem (for me) is that the solutions given don't work in R.
Often I create a data.frame() from existing vectors and want to clean up my workspace. So for example, if my workspace contains:
> ls()
[1] "A" "B" "dat" "V"
>
and I want to retain only dat, I'd have to clean it up with:
> rm(list=ls(pattern="A"))
> rm(list=ls(pattern="B"))
> rm(list=ls(pattern="V"))
> ls()
[1] "dat"
>
(where A, B, and V are just examples of a large number of complicated names like my.first.vector that are not easy to match with rm(list=ls(pattern="[ABV]"))).
It would be most convenient (for me) to tell rm() to remove everything except dat, but the problem is that the solution given in the linked Q&A does not work:
> rm(list=ls(pattern="^((?!dat).)*$"))
Error in grep(pattern, all.names, value = TRUE) :
invalid regular expression '^((?!dat).)*$', reason 'Invalid regexp'
>
So how can I match everything except dat in R?
This will remove all objects except dat . (Use the ls argument all.names = TRUE if you want to remove objects whose names begin with a dot as well.)
rm( list = setdiff( ls(), "dat" ) )
Replace "dat" with a vector of names, e.g. c("dat", "some.other.object"), if you want to retain several objects; or, if the several objects can all be readily matched by a regular expression try something like this which removes all objects whose names do not start with "dat":
rm( list = setdiff( ls(), ls( pattern = "^dat" ) ) )
Another approach is to save the data, save("dat", file = "dat.RData"), exit R, start a new R session and load the data, 1oad("dat.RData"). Also note this method of restarting R.
Negative look-around requires perl=TRUE argument in R. So, you won't be able to directly use ls(pattern = ...) with that regular expression. Alternatively you can do:
rm(list = grep("^((?!dat).)*$", ls(), perl=TRUE, value=TRUE))
This is if you're looking for inexact matches. If you're looking for exact match, you should just do what Ferdinand has commented:
rm(list=ls()[ls() != "dat"])