Short story:
Is there a way to prevent Notepad++ from interpreting all parts of a string as regex?
The long story:
I have a list of German cities. In Germany some cities have the suffix a.d. (meaning close by) plus the name of a river to differentiate this city from others with the same name.
Unfortunately the suffix is written in various forms, for example:
Dillingen a. d. Donau
Dörnfeld a. d.Ilm
Eldena a.d.Elde
Limburg a d Lahn
To be able to join this list with other data I need a coherent form, for example:
Dillingen a.d. Donau
Dörnfeld a.d. Ilm
Eldena a.d. Elde
Limburg a.d. Lahn
I tried to search for
(a.d.)\b.+\b
but, of course, Notepad++ interprets a.d. as regex (. = any letter) giving also results such as
Fürstenwalde/Spree
Immenstaad am Bodensee
Jänschwalde Ost
making it impossible to search and replace all.
How can I realize this using regex?
I guess the answer is fairly easy but I found no hint in the forum or Notepad++ documentation.
Can someone help? Thanks a lot in advance!
Best,
David
\ba\s*\.?\s*d\b.+
You can use this.a.d here . will match any character so escape it.See demo.
https://regex101.com/r/eX9gK2/7
Related
I have this regex:
[0-9]+,[0-9]{2}
https://regexr.com/3joum
And i can enter
10,00
1,00,
100,00,
1000,00
But i also want to have this:
1.000,00
10.000,00
100.000,00
1000.000.00
and so on... to that be valid also. Any suggestions?
It's kind of hard to tell what rules exactly you're looking to follow with your regex. How about this?
([0-9]+[,.])+[0-9]{2}(,)?
This regex will allow you to match currency that uses either a , or a . and can even end in a , like in some of your examples.
I have several thousand text files containing form information (one text file for each form), including the unique id of each form.
I have been trying to extract just the form id using regex (which I am not too familiar with) to match the string of characters found before and after the form id and extract only the form ID number in between them. Usually the text looks like this: "... 12 ID 12345678 INDEPENDENT BOARD..."
The bolded 8-digit number is the form ID that I need to extract.
The code I used can be seen below:
$id= ([regex]::Match($text_file, "12 ID (.+) INDEPENDENT").Groups[1].Value)
This works pretty well, but I soon noticed that there were some files for which this script did not work. After investigation, I found that there was another variation to the text containing the form ID used by some of the text files. This variation looks like this: "... 12 ID 12345678 (a.12(3)(b),45)..."
So my first challenge is to figure out how to change the script so that it will match the first or the second pattern. My second challenge is to escape all the special characters in "(a.12(3)(b),45)".
I know that the pipe | is used as an "or" in regex and two backslashes are used to escape special characters, however the code below gives me errors:
$id= ([regex]::Match($text_one_line, "34 PR (.+) INDEPENDENT"|"34 PR (.+) //(a//.12//(3//)//(b//)//,45//)").Groups[1].Value)
Where have I gone wrong here and how I can fix my code?
Thank you!
When you approach a regex pattern always look for fixed vs. variable parts.
In your case the ID seems to be fixed, and it is, therefore, useful as a reference point.
The following pattern applies this suggestion: (?:ID\s+)(\d{8})
(click on the pattern for an explanation).
$str = "... 12 ID 12345678 INDEPENDENT BOARD..."
$ret = [Regex]::Matches($str, "(?:ID\s+)(\d{8})")
for($i = 0; $i -lt $ret.Count; $i++) {
$ret[0].Groups[1].Value
}
Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. It contains a treasure trove of useful information.
I have read the various answers on SO and also the help pages for neo4j. However, I can't get my wildcard match to work. For example, if I put in the cypher query
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ 'Art10526689' RETURN author, article.date
I get the correct answer. however, If I put in the query
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ "Art1052668*" RETURN author, article.date
I do not get anything returned. I have used '"' because it seems that the lucene might be sensitive, and the '=~' because it was suggested it was better than simply doing (article:Article {id:'Art1052668*'}), though that doesn't work either.
As usual, any help will be deeply appreciated!
Regards, Richard
Richard, you are close to an answer. It think what is happening is you an misconstruing wild carding with the regular expression syntax supported by Neo4j. In you query the 8* actually means match 8 and 0..infinitely more 8s. If you are looking to just replace the 9 in the article id with a single character then you would use the . character. If you would like 0..infinite characters after the 8 then you would use Art1052668.*. You can add case insensitivity too with (?i), see example below...
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ "(?i)Art1052668.*"
RETURN author, article.date
I am trying to find the correct regex (for use with Java and JavaScript) to validate an array of day-of-week and 24-hour time formats. I figured out the time format but am struggling to come up with the full solution.
The regex needs to validate patterns which include one or more of the following, separated by a comma.
{two-character day} HH:MM-HH:MM
Three examples of valid strings would be:
M 5:30-7:00
M 5:30-7:00, T 5:30-7:00, W 18:00-19:30
F 12:00-14:30, Sa 6:45-8:15, Su 6:45-8:15
This should validate a 24-hour time:
/^((M|T|W|Th|Fr|Sa|Su) ([01]?[0-9]|2[0-3]):[0-5][0-9]-([01]?[0-9]|2[0-3]):[0-5][0-9](, )?)+$/
Credit for the time bit goes to mkyong: http://www.mkyong.com/regular-expressions/how-to-validate-time-in-24-hours-format-with-regular-expression/
you can try this
[A-Za-z]{1,2}[ ]\d+:\d+-\d+:\d+
You could try this: ([MTWFS][ouehra]?) ([0-9]|[1-2][0-9]):([0-6][0-9])-([0-9]|[1-2][0-9]):([0-6][0-9])
I'd go with this:
(((M|T(u|h)|W|F|S(a|u)) ((1*\d)|(2[0-3])):[1-5]\d-((1*\d)|(2[0-3])):[1-5]\d(, )?)+
This should do the trick:
^(M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2}(, (M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2})*$
Note that you show T in your example above which is ambiguous. You might want to enforce Tu and Th as shown in my regex.
This will capture all sets in an array. The T in the short day of week list is debatable (tuesday or thursday?).
^((?:[MTWFS]|Tu|Th|Sa|Su)\s(?:[0-9]{1,2}:[0-9]{2})-(?:[0-9]{1,2}:[0-9]{2})(?:,\s)?)+$
The (?:) are non-capturing groups, so your actual matches will be (for example):
M 5:30-7:00
T 5:30-7:00
W 18:00-19:30
But the entire line will validate.
Added ^ and $ for line boundaries and an explicit time-time match because some regular expression parsers may not work with the previous way that I had it.
The original question that gave the idea behind this particular regex is Regex to find content not in quotes.
Let's just modify the original sample a little bit:
INSERT INTO Notifify (to_email, msg, date_log, from_email, ip_from)
VALUES
(
:to_email,
'test teste nonono',
:22,
:3mail,
:ip_from
)
I know that variables starting with numerals are not allowed in any programming language, but that doesn't mean we can't have scenarios where we need to match just :to_email or :3mail and :ip_from and not :22.
How do we proceed? Me and my friend tried it(theoretically only) this way ->
Store all string in a set
Subtract the set that contains only numbers
For online testing, I am using RegExr.
i don't know which programming language do you use, but why can't you just check if the line match:
^\s*:[0-9]+,?\s*$
and just take unmatched lines?
lookaheads will work here
\b(?=\d*[a-z])\w+\b
as will
\b\d*[a-z]\w*\b