using regular expression in list.files of R function

using regular expression in list.files of R function - regex

I want to use list.files of R to list files containing this pattern "un[a digit]" such as filename_un1.txt, filename_un2.txt etc... Here is the general code:
list_files <- list.files(path="my_file_path", recursive = TRUE, pattern = "here I need help", full.names = TRUE)
I have tried putting un\d in the pattern input but does not work.

You should bear in mind that in R, strings allow using escape sequences. However, the regex engine needs a literal \ to pass shorthand character classes (like \d for digits) or to escape special chars (like \\. to match a literal dot.)
So, you need
pattern = "_un\\d+\\.txt$"
where
_un - matches a literal substring _un
\\d+ - matches 1 or more digits (as + is a one or more quantifier)
\\. - matches a literal dot
txt - matches a literal sequence of characters txt
$ - end of string.

list_files <- list.files(path="my_file_path", recursive = TRUE, pattern = "un[0-9]", full.names = TRUE)

Related

Regex FindAll not printing results Kotlin

I have a program that is using ML Kit to use Text recognition on a document and I am taking that data and only printing the prices. So I am taking the Text Recognition String and passing it through the regex below:
val reg = Regex("\$([0-9]*.[0-9]{2})")
val matches = reg.findAll(rec)
val prices = matches.map{it.groupValues[0]}.joinToString()
recogResult.text = prices
I have tested the Regex formula on another website and it grabs all the right data. However it is printing nothing. When it gets to the reg.findAll(rec) part matches = kotlin.sequences.GeneratorSequence#bd56ff3 and prices = "".

You can use
val reg = Regex("""\$[0-9]*\.[0-9]{2}""")
val matches = reg.findAll("Price: \$1234.56 and \$1.56")
val prices = matches.map{it.groupValues[0]}.joinToString()
See the online demo. Notes:
"""...""" is a triple quoted string literal where backslashes are parsed as literal \ chars and are not used to form string escape sequences
\$ - in a triple quoted string literal defines a \$ regex escape that matches a literal $ char
[0-9]*\.[0-9]{2} matches zero or more digits, . and two digits.
Note that you may use \p{Sc} to match any currency chars, not just $.
If you want to make sure no other digit follows the two fractional digits, add (?![0-9]) at the end of your regex.

Regex to replace all non numbers but allow a '+' prefix

I want to delete all invalid letters from a string which should represent a phone number. Only a '+' prefix and numbers are allowed.
I tried in Kotlin with
"+1234abc567+".replace("[^+0-9]".toRegex(), "")
It works nearly perfect, but it does not replace the last '+'.
How can I modify the regex to only allow the first '+'?

You could do a regex replacement on the following pattern:
(?<=.)\+|[^0-9+]+
Sample script:
String input = "+1234abc567+";
String output = input.replaceAll("(?<=.)\\+|[^0-9+]+", "");
System.out.println(input); // +1234abc567+
System.out.println(output); // +1234567
Here is an explanation of the regex pattern:
(?<=.)\+ match a literal + which is NOT first (i.e. preceded by >= 1 character)
| OR
[^0-9+]+ match one or more non digit characters, excluding +

You can use
^(\+)|\D+
Replace with the backreference to the first group, $1. See the regex demo.
Details:
^(\+) - a + at the start of string captured into Group 1
| - or
\D+ - one or more non-digit chars.
NOTE: a raw string literal delimited with """ allows the use of a single backslash to form regex escapes, such as \D, \d, etc. Using this type of string literals greatly simplifies regex definitions inside code.
See the Kotlin demo:
val s = "+1234abc567+"
val regex = """^(\+)|\D+""".toRegex()
println(s.replace(regex, "$1"))
// => +1234567

How do I get Groovy String.replace to remove a substring with unknown digits?

I have a Groovy String s that contains a String variable name and follows a pattern like this:
name = "{someInformation}"
s = "0/$name|2/{moreInformation}|3/$name"
I'm trying to remove \d+/{someInformation}|* (both cases of it) from the string, but this doesn't work because \d+/{someInformation}|* is not the right pattern to match.
The following does not work:
s = s.replace ("\\d+/$name|", "")
because s.contains("\\d+/$name|") returns false. But I have been fiddling with the regular expression for a while now and I don't seem to be able to get it to match.
What do I need to put into my String.replace regex so that it can find and remove the parts, such that all I have left is
s == "2/{moreInformation}"

You may use the following regex based solution:
def name = "{someInformation}"
def s = "0/$name|2/{moreInformation}|3/$name"
println(s) // => 0/{someInformation}|2/{moreInformation}|3/{someInformation}
// => 0/{someInformation}|2/{moreInformation}|3/{someInformation}
def pat = "\\|*\\d+/${java.util.regex.Pattern.quote(name)}\\|*"
println(pat) // => \|*\d+/\Q{someInformation}\E\|*
println(s.replaceAll(pat, ""))
// => 2/{moreInformation}
See the Groovy demo
Here is the generated regex demo. Note that ${java.util.regex.Pattern.quote(name)} is used to escape all special regex metacharacters inside the name variable so that they were treated as literal symbols by the regex engine.
Details
\|* - 0+ | chars
\d+ - 1+ digits
/ - a / char
\Q{someInformation}\E - a literal {someInformation} substring
\|* - 0+ | chars

Regex excluding all numbers

Hello I'm trying to search all the matching expressions in a file through a Regex in VB.NET
I have the function:
Dim written As MatchCollection = Regex.Matches(ToTreat, "\bGlobalIndexImage = \'(?![0-9])([A-Za-z])\w+\'")
For Each writ As Match In written
For Each w As Capture In writ.Captures
MsgBox(w.Value.ToString)
Next
Next
I have this Regex now:
\bGlobalIndexImage = \'(?![0-9])([A-Za-z])\w+\'
I'm trying to match all occurrences under this form:
GlobalIndexImage = 'images'
GlobalIndexImage = 'Search'
But I also get values like this which I don't want to match:
GlobalIndexImage = 'Z0003_S16G2'
So I wanted in my Regex to simply exclude a match if it contains numbers.

The \w shorthand character class matches letters and digits and _. If you need only letters, just use [a-zA-Z]:
"\bGlobalIndexImage = '([A-Za-z]+)'"
See the regex demo.
Details:
\b - a leading word boundary
GlobalIndexImage = ' - a string of literal chars
([A-Za-z]+) - Group 1 capturing one or more (due to + quantifier) ASCII letters
' - a single quote.
If you need to match any Unicode letters, replace [a-zA-Z] with \p{L}.
VB.NET:
Dim text = "GlobalIndexImage = 'images' GlobalIndexImage = 'Search'"
Dim pattern As String = "\bGlobalIndexImage = '([A-Za-z]+)'"
Dim matches As List(Of String) = Regex.Matches(text, pattern) _
.Cast(Of Match)() _
.Select(Function(m) m.Groups(1).Value) _
.ToList()
Console.WriteLine(String.Join(vbLf, matches))
Output:

To catch everything that's not a number use \D
So your regex will be something like
\bGlobalIndexImage = \'\d+\'
But this will also include words with white spaces. To get only letters use [a-zA-Z]
\bGlobalIndexImage = \'[a-zA-Z]+\'

Scala regex match lines with special characters

I have a code segment that reads lines from a file and I want to filter certain lines out. Basically, I want to filter everything out that has not three tabulator-separated columns, where the first column is a number and the other two columns can contain every character except tabulator and newline (Dos & Unix).
I already checked my regex on http://www.regexr.com/ and there it works.
scala> val mystr = """123456\thttp://some.url/path/to/resource\t\x03U\x1D\x1F\x04D0B0#\xA0>\xA0<\x86:http://some.url/path/to/resource\x06\x08+\x06\x01\x05\x05\x07\x01\x01\x04C0A0?\n"""
scala> val myreg = "^[0-9]+(\t[^\t\r\n]+){2}(\n|\r\n)$"
scala> mystr.matches(myreg)
res2: Boolean = false
What I found out is that the problem is related to special characters. For example a simple example:
scala> val tabstr = """123456\t123456"""
scala> val tabreg = "^[0-9]+\t[0-9]+$"
scala> tabstr.matches(tabreg)
res3: Boolean = false
scala> val tabstr = "123456\t123456"
scala> val tabreg = "^[0-9]+\t[0-9]+$"
scala> tabstr.matches(tabreg)
res4: Boolean = true
It seems I mustn't use a raw string for my line (see mystr in the first code block). But if I don't use a raw string scala complains about
error: invalid escape character
So how can I deal with this messy input and still use my regex to filter out some lines?

You are using raw string literals. Inside raw string literals, \ is not used to escape sequences like tab \t or newline \n, the \n in a raw string literal is just 2 characters following each other.
In a regex, to match a literal \, you need to use 2 backslashes in a raw-string literal based regex, and 4 backslashes in a regular string.
So, to match all your inputs, you need to use the following regexps:
val mystr = """23456\thttp://some.url/path/to/resource\t\x03U\x1D\x1F\x04D0B0#\xA0>\xA0<\x86:http://some.url/path/to/resource\x06\x08+\x06\x01\x05\x05\x07\x01\x01\x04C0A0?\n"""
val myreg = """[0-9]+(?:\\t(?:(?!\\[trn]).)*){2}(?:\\r)?(?:\\n)"""
println(mystr.matches(myreg)) // => true
val tabstr = """123456\t123456"""
println(tabstr.matches("""[0-9]+\\t[0-9]+""")) // => true
val tabstr2 = "123456\t123456"
println(tabstr2.matches("""^[0-9]+(?:\\t|\t)[0-9]+$""")) // => true
Non-capturing groups are not of importance here, since you just need to check if a string matches (that means, you do not even need a ^ and $ since the whole input string must match) and you can still use capturing groups. If you later need to extract any matches/capturing groups, non-capturing groups will help you get a "cleaner" output structure, that is it.
The last two regexps are easy enough, (?:\\t|\t) matches either a \+t or a tab. \t just matches a tab.
The first one has a tempered greedy token (this is a simplified regex, a better one can be used with unrolling the loop method: [0-9]+(?:\\t[^\\]*(?:\\(?![trn])[^\\]*)*){2}(?:\\r)?(?:\\n)).
[0-9]+ - 1 or more digits
(?:\\t(?:(?!\\[trn]).)*){2} - tempered greedy token, 2 occurrences of a literal string \t followed by any characters but a newline other than 2-symbol combinations \t or \r or \n.
(?:\\r)? - 1 or 0 occurrences of \r
(?:\\n) - one occurrence of a literal combination of \ and n.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using regular expression in list.files of R function - regex

list_files <- list.files(path="my_file_path", recursive = TRUE, pattern = "un[0-9]", full.names = TRUE)

Related

Regex FindAll not printing results Kotlin

Regex to replace all non numbers but allow a '+' prefix

How do I get Groovy String.replace to remove a substring with unknown digits?

Regex excluding all numbers

Scala regex match lines with special characters

Categories

Resources