Regular Expression to replace a pattern at runtime(C#3.0) - regex

I have a requirement.
I have some files in a folder among which some file names looks like say
**EUDataFiles20100503.txt, MigrateFiles20101006.txt.**
Basically these are the files that I need to work upon.
Now I have a config file where it is mentioned as the file pattern type as
EUDataFilesYYYYMMDD, MigrateFilesYYYYMMDD.
Basically the idea is that, the user can configure the file pattern and based on the pattern mentioned, I need to search for those files that are present in the folder.
i.e. at runtime the YYYYMMDD will get replaced by the Year Month and Date Values. It does not matter what dates will be there(but not with time stamp ; only dates)).
And the EUDataFiles or MigrateFiles names will be there.(they are fixed)
i.e. If the folder has a file name as EUDataFile20100504.txt(i.e. Year 2010, Month 05, Day 04) , I should ignore this file as it is not EUDataFiles20100504.txt (kindly note that the name is plural - File(s) and not file for which the system will ignore the file).
Similarly, if the Pattern given as EUDataFilesYYYYMMDD and if the file present is of type EUDataFilesYYYYDDMM then also the system should ignore.
How can I solve this problem? Is it doable using regular expression(Replacing the pattern at runtime)?
If so can anyone be good enough in helping me out?
I am using C#3.0 and dotnet framework 3.5.
Thanks

You could construct a regex from your basic file name plus (depending on the pattern) sub-regexes.
The sub-regexes could be
yyyy = #"\d{4}"
(unless you want to restrict a certain year range)
mm = #"(1[0-2]|0[1-9])"
dd = #"(3[01]|[12][0-9]|0[1-9])"
Build your regex by adding them in the correct order:
re = #"\AEUDataFiles" + yyyy + mm + dd + #"\.txt\Z"
Then you can check whether the filename(s) you've found match the regex:
foundMatch = Regex.IsMatch(subjectString, re);
Of course, this isn't a validation for correct dates (20100231 would pass), but that's probably not a problem in this case.

Related

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

Exact pattern match in r

I am reading files from a folder using List.files but i want to read only specific files to be read. I have files like below.
D420000900100hour.1-4-2001.31-12-2001
D420000700600hour8.1-1-2001.31-12-2004
D420000500150hour.1-1-2001.31-12-2004
Notice here i have "hour" and "hour8". I want to only list files containing exactly "hour".
files <- list.files(pattern = "hour")
With this piece of code however it returns files with both "hour" and "hour8". I am trying to use ^ and $. but they dont seem to work with "pattern".
How do i do this.
Based on the example, we can change the pattern argument to hour followed by .
list.files(pattern = "hour\\.")
Or 'hour' followed by any character that is not a number
list.files(pattern = "hour[^0-9]")

Use RegEx to find dates and increment year by a value

I have a large number of files that contain dates. I would like to use a Regular Expression to find the dates and if possible increment the year of the date by 10.
The files can have multiple date formats ..
04/22/78
06-OCT-14
How would one write a regular expression that could find, increment, and replace the dates, or even just the year of the dates?
I plan to use a text editor like Text Pad, UltraEdit, or Notepad++ to search the files
Assuming the pattern of date is date.month.year. . in date can be any field separator.
You can use simple perl program to do this:
perl -ne 's/(\d+)$/($1+10)/e && print' filename
This will add 10 to the year, and print the date.
Output for this is:
04/22/88
06-OCT-24
Just wrote this python snippet to get it done.
import re
def add_ten_years(date):
reg = "((\d{2})(.)(\w{2,4})(.)(\d{2}))"
mat = re.search(reg, date)
if mat:
mat = mat.groups()
return ''.join(mat[1:5])+str(int(mat[5])+10)
print add_ten_years("04/22/78")
print add_ten_years("06-OCT-14")
You can configure the regex pattern to generalize it even more. Or can be easily translated to other languages. Hope it helped!

Regular expression: Delete everything between X and Y in filename

I have a lot of files with the beginning xxx-yy. Both xxx and yy may vary. Example:
356-01 Nielsen - Sovnen, Op. 18.mp3
Everything between "356-01" and ".mp3" must be deleted so the new filename is:
356-01.mp3
".mp3" also varies. The expression should cover ".flac" also.
Assuming xxx and yy are digits, you can do
s/(\d\d\d-\d\d).*(\..+$)/\1\2/
The \. at the end is a literal period and .+$ means every character up to the end, so it should get the extension because the .* before it is greedy.
The find and replace were written between the slashes and use capture groups.
My question was based on that I thought to have made ​​some in advance. It now turns out that it does not work and therefore I have to reformulate the task. I'm sorry.
I have a lot of files beginning with 01, 02, 03 and so on. It wil never exeece 99. Example:
01 Nielsen - Sovnen, Op. 18.mp3
A 3-digit number* must be added to the beginning of the new filename and everything between "01" and ".mp3" must be deleted so the new filename is:
356-01.mp3
".mp3" also varies. The expression should cover ".flac" also.
*) What 3-digit number you use in your response is not important as the command will be added as a line in a larger bash script that I edit manually before each use

Regex to remove footer using wildcards

Ok - this is well beyond my limited knowledge of regular expressions. We receive a report from a banking entity in a fixed with text file format. Unfortunately their system exports page headers with the data file that must be removed before processing on our end. The page headers start and end with the same text but the content changes (dates and page numbers). A typical one looks like:
00007xxxxx LAST1,FIRST1 111111 20120930
ABCD EXPORT RPT 10/04/12 at 10/04/12 16:20 Seq 1501 Page 16
MRK014 Report Date: 10/04/12
Acct# Name SH. Balance QTR (YYYYMMDD)
----------------------------------------------------------------------------------------------------
00007xxxxx LAST2,FIRST2 222222 20120930
So each header starts with "ABCD" (actually the name of the bank, just removed here for privacy) and ends with the row of -------------------.
What I need to get it down to is the customer data on two rows (00007xxxxx - those account numbers change per person).
So I need to select from the " ABCD" to the end of the "---" to remove that block of text.
Try this regex.. This is a Java code.. You can use the given pattern in your language..
str = str.replaceAll("ABCD((.*?)[\n\r])+(\\-*)", "");
Where str contains your above data.. Lines are separated by \n I assume..
To ensure you are removing correct part of report I would go with more complicated regex pattern.
Use regex pattern
(?<=[\n\r])ABCD\s+EXPORT\s+RPT\s[^-]+[\n\r]\-+[\n\r]+
and replace each match with empty string.
However if your environment does not support regex lookbehind, then you have to use pattern:
([\n\r])ABCD\s+EXPORT\s+RPT\s[^-]+[\n\r]\-+[\n\r]+
and replace each match with first group.
For example in JavaScript it would be:
str.replace(/([\n\r])ABCD\s+EXPORT\s+RPT\s[^-]+[\n\r]\-+[\n\r]+/g, "$1")
Test this code here.