12345678912345678T / 14750,47932 SS
Vis à 6PC 45H Din 913 M 8x20
Art. client: 294519
QTE: 200 Pce
I want to write a RegEx which can find above stated multiline string type from a long txt file where Starting condition will be "18 digit long word" comprises with numbers and Uppercase alphabets and Ending condition shoould be "Pce"
I have written this much and it only reads first line but don't know what to write next
^[0-9A-Z]{18,18}.*
Any type of help will be appreciated.
. in most engines doesn't include new lines, hence your match stopping at the end of the line. You could either use the DOTALL flag if available, otherwise hack around with an "include-all" class, for example [\s\S] (a char that is either a space or not a space).
With a lazy quantifier, you could use for example:
^[0-9A-Z]{18}[\s\S]*?Pce$
You didn't specify a programming language so something like this would work:
/^[\dA-Z]{18}[^\dA-Za-z].*?Pce$/gms
^[\dA-Z]{18} - start with 18 digits and/or capital letters
[^\dA-Za-z] - not a digit nor letter
.*? - anything, lazily
substitute with [\s\S]*? if single line modifier is not available to you
Pce$ - end with Pce
gms - global, multi line, and single line modifiers
https://regex101.com/r/RXaAT4/1
Related
Introduction:
I have the following scenario in PostgreSQL whereby I want to perform some data validation on a .csv string prior to inserting it into a table (see the fiddle here).
I've managed to get a regex (in a CHECK constraint) which disallows spaces within strings (e.g. "12 34") and also disallows preceding zeros ("00343").
Now, the icing on the cake would be if I could use regular expressions to disallow strings which contain a repeat of an integer - i.e. if a sequence \d+ matched another \d+ within the same string.
Is this beyond the capacities of regular expressions?
My table is as follows:
CREATE TABLE test
(
data TEXT NOT NULL,
CONSTRAINT d_csv_only_ck
CHECK (data ~ '^([ ]*([1-9]\d*)+[ ]*)(,[ ]*([1-9]\d*)+[ ]*)*$')
);
And I can populate it as follows:
INSERT INTO test VALUES
('992,1005,1007,992,456,456,1008'), -- want to make this line unnacceptable - repeats!
('44,1005,1110'),
('13, 44 , 1005, 10078 '), -- acceptable - spaces before and after integers
('11,1203,6666'),
('1,11,99,2222'),
('3435'),
(' 1234 '); -- acceptable
But:
INSERT INTO test VALUES ('23432, 3433 ,00343, 567'); -- leading 0 - unnacceptable
fails (as it should), and also fails (again, as it should)
INSERT INTO test VALUES ('12 34'); -- spaces within numbers - unnacceptable
The question:
However, if you notice the first string, it has repeats of 992and 456.
I would like to be able to match these.
All of these rules do not have to be in the same regex - I can use a second CHECK constraint.
I would like to know if what I am asking is possible using Regular Expressions?
I did find this post which appears to go some (all?) of the way to solving my issue, but I'm afraid it's beyond my skillset to get it to work - I've included a small test at the bottom of the fiddle.
Please let me know should you require any further information.
p.s. as an aside, I'm not very experienced with regexes and I would welcome any input on my basic one above.
Since PostegreSQL regex does not support backreferences, you cannot apply this restriction because you would need a negative lookahead with a backreference in it.
Have a look at this PCRE regex:
^(?!.*\b(\d+)\b.*\b\1\b) *[1-9]\d* *(?:, *[1-9]\d* *)*$
See this regex demo.
Details:
^ - start of string
(?!.*\b(\d+)\b.*\b\1\b) - no same two numbers as whole word allowed anywhere in the string
* - zero or more spaces
[1-9]\d* - a non-zero digit and then any zero or more digits
* - zero or more spaces
(?:, *[1-9]\d* *)* - zero or more occurrences of
, * - comma and zero or more spaces
[1-9]\d* - a non-zero digit and then any zero or more digits
* - zero or more spaces
$ - end of string.
Even if you replace \b with \y (PostgreSQL regex word boundaries) in the PostgreSQL code, it won't work due to the drawback mentioned at the top of the answer.
There are tons of examples to do the conversion from C-style line comment to 1-line block comment. But I need to do the opposite: find a regex to replace multi-line block comment with line comments.
From:
This text must not be touched
/*
This
is
random
text
*/
This text must not be touched
To
This text must not be touched
// This
// is
// random
// text
This text must not be touched
I was thinking if there's a way to represent "each line" concept in regex, then just add // in front of each line. Something like
\/\*\n(?:(.+)\n)+\*\/ -> // $1
But the greediness nature of the regex engine makes $1 just match the last line before */. I know Perl and other languages have some advanced regex features like recursion, but I need to do this in a standard engine. Is there any trick to accomplish this?
EDIT: To clarify, I'm looking for pure regex solution, not involving any programming language. Should be testable on sites like https://regex101.com/.
If you are interested in a single regex pass in the modern JavaScript engine (and other regex engines supporting infinite length patterns in lookbehinds), you can use
/(?<=^(\/)\*(?:(?!^\/\*)[\s\S])*?\r?\n)(?=[\s\S]*?^\*\/)|(?:\r?\n)?(?:^\/\*|^\*\/)/gm
Replace with $1$1, see the regex demo.
Details
(?<=^(\/)\*(?:(?!^\/\*)[\s\S])*?\r?\n) - a positive lookbehind that matches a location that is immediately preceded with
^(\/)\* - /* substring at the start of a line (with / captured into Group 1)
(?:(?!^\/\*)[\s\S])*? - any char, zero or more occurrences, as few as possible, not starting a /* char sequence that appears at the start of a line
\r?\n - a CRLF or LF ending
(?=[\s\S]*?^\*\/) - a positive lookahead that requires any 0 or more chars as few as possible followed with */ at the start of a line, immediately to the right of the current location
| - or
(?:\r?\n)? - an optional CRLF or LF linebreak
(?:^\/\*|^\*\/) - and then either /* or */ at the start of a line.
As usual in such cases, two regular expressions—the second applied to the matches of the first—can do what one cannot achieve.
const txt = `This text must not be touched
/*
This
is
random
text
*/
This text must not be touched`;
const to1line = str => str.replace(
/\/\*\s*(.*?)\s*\*\//gs,
(_, comment) => comment.replace( /^/mg, '//')
);
console.log( to1line( txt ));
I am trying to work on regular expressions. I have a mainframe file which has several fields. I have a flat file parser which distinguishes several types of records based on the first three letters of every line. How do I write a regular expression where the first three letters are 'CTR'.
Beginning of line or beginning of string?
Start and end of string
/^CTR.*$/
/ = delimiter
^ = start of string
CTR = literal CTR
$ = end of string
.* = zero or more of any character except newline
Start and end of line
/^CTR.*$/m
/ = delimiter
^ = start of line
CTR = literal CTR
$ = end of line
.* = zero or more of any character except newline
m = enables multi-line mode, this sets regex to treat every line as a string, so ^ and $ will match start and end of line
While in multi-line mode you can still match the start and end of the string with \A\Z permanent anchors
/\ACTR.*\Z/m
\A = means start of string
CTR = literal CTR
.* = zero or more of any character except newline
\Z = end of string
m = enables multi-line mode
As such, another way to match the start of the line would be like this:
/(\A|\r|\n|\r\n)CTR.*/
or
/(^|\r|\n|\r\n)CTR.*/
\r = carriage return / old Mac OS newline
\n = line-feed / Unix/Mac OS X newline
\r\n = windows newline
Note, if you are going to use the backslash \ in some program string that supports escaping, like the php double quotation marks "" then you need to escape them first
so to run \r\nCTR.* you would use it as "\\r\\nCTR.*"
^CTR
or
^CTR.*
edit:
To be more clear: ^CTR will match start of line and those chars. If all you want to do is match for a line itself (and already have the line to use), then that is all you really need. But if this is the case, you may be better off using a prefab substr() type function. I don't know, what language are you are using. But if you are trying to match and grab the line, you will need something like .* or .*$ or whatever, depending on what language/regex function you are using.
Regex symbol to match at beginning of a line:
^
Add the string you're searching for (CTR) to the regex like this:
^CTR
Example: regex
That should be enough!
However, if you need to get the text from the whole line in your language of choice, add a "match anything" pattern .*:
^CTR.*
Example: more regex
If you want to get crazy, use the end of line matcher
$
Add that to the growing regex pattern:
^CTR.*$
Example: lets get crazy
Note: Depending on how and where you're using regex, you might have to use a multi-line modifier to get it to match multiple lines. There could be a whole discussion on the best strategy for picking lines out of a file to process them, and some of the strategies would require this:
Multi-line flag m (this is specified in various ways in various languages/contexts)
/^CTR.*/gm
Example: we had to use m on regex101
Try ^CTR.\*, which literally means start of line, CTR, anything.
This will be case-sensitive, and setting non-case-sensitivity will depend on your programming language, or use ^[Cc][Tt][Rr].\* if cross-environment case-insensitivity matters.
^CTR.*$
matches a line starting with CTR.
Not sure how to apply that to your file on your server, but typically, the regex to match the beginning of a string would be :
^CTR
The ^ means beginning of string / line
There's are ambiguities in the question.
What is your input string? Is it the entire file? Or is it 1 line at a time? Some of the answers are assuming the latter. I want to answer the former.
What would you like to return from your regular expression? The fact that you want a true / false on whether a match was made? Or do you want to extract the entire line whose start begins with CTR? I'll answer you only want a true / false match.
To do this, we just need to determine if the CTR occurs at either the start of a file, or immediately following a new line.
/(?:^|\n)CTR/
(?i)^[ \r\n]*CTR
(?i) -- case insensitive -- Remove if case sensitive.
[ \r\n] -- ignore space and new lines
* -- 0 or more times the same
CTR - your starts with string.
I use EditPad Pro text editor.
I need read string into code, but I need to ignore ones that start with the label "/*" or tab + /*, for example:
/**
* Light up the dungeon using "claravoyance"
*
* memorizes all floor grids too.
**/
/** This function returns TRUE if a "line of sight" **/
#include "cave.h"
(tab here) /* Vertical "knights" */
if (g->multiple_objects) {
/* Get the "pile" feature instead */
k_ptr = &k_info[0];
}
put_str("Text inside", hgt - 1, (wid - COL_MAP) / 2);
/* More code*** */
I like to return:
"Text inside"
I have try this (reading Regular expression for a string that does not start with a sequence), but not work for me:
^(?! \*/\t).+".*"
any help?
Edit: I used:
^(?!#| |(\t*/)|(/)).+".*"
And it return:
put_str("Text inside"
I'm close to finding the solution.
EditPad obviously supports variable-length lookbehind in pro version 6 and lite version 7 since it's flavor is indicated as "JGsoft": Just Great Software regular expression engine.
Knowing this and without the use of capture groups, you could combine two variable length lookbehinds:
(?<!^[ \t]*/?[*#][^"\n]*")(?<=^[^"\n]*")[^"]+
(?<!^[ \t]*/?[*#][^"\n]*") The negative lookbehind for avoiding the quoted part to be preceded by [ \t]*/?[*#] any comments, which could be preceded by any amount of space/tab. Made the / optional, as a multi-line comment can also start with *.
(?<=^[^"\n]*") The positive lookbehind for assuring, that there's any amount of [^"\n], characters, that are no quotes or newlines followed by one quote before.
[^"]+ As supposed to be always balanced quoting, now it should be convenient, to match the non-quotes after the first double-quote (which is inside the lookbehind)
If a single " may occur in any line (not balanced), change the end: [^"]+ to [^"\n]+(?=")
Possibly there are different solutions for the problem. Hope it helps :)
Here's one approach: ^(?!\t*/\*).*?"(.+?)"
Breakdown:
^(?!\t*/\*) This is a negative lookahead anchored to the beginning of the line,
to ensure that there is no `/*` at the beginning (with or
without tabs)
.*?" Next is any amount of characters, up to a double-quote. It's lazy
so it stops at the first quote
(.+?)" This is the capture group for everything between the quotes, again
lazy so it doesn't slurp other quotes
You can use this regex:
/\*.*\*/(*SKIP)(*FAIL)|".*?"
Working demo
Edit: if you use EditPad then you can use this regex:
"[\w\s]+"(?!.*\*/)
I have a document that has a range of numbers like this:
0300010000000394001001,27
0300010000000394001002,0
0300010000000394002001,182
0300010000000394002002,51
0300010000000394003001,156
0300010000000394003002,40
I need to find the new line character and replace with a number of spaces depending on the string length.
If it has 24 characters like this - 0300010000000394001002,0 then I need to replace the new line character at the end with 5 blank spaces.
If it has 25 characters like this - 0300010000000394002002,51 then I need to replace the new line character at the end with 4 blank spaces and so on.
In my text editor I can use find and replace. I search for the line length by ^(.|\s){24}$ for 24 characters - but this will obviously replace the whole line and I only need to replace the new line character at the end.
I want to specify a new line character AFTER ^(.|\s){24}$. Is this possible?
It sounds like you need two things.
Multi-line Mode (See "Using ^ and $ as Start of Line and...")
Backreferencing
Most editors that support regex support these naturally, but you'll have to let us know what editor you're using for us to be specific. Without knowing what editor you're using, all I can say is that you want to do some combination of the following:
regex subst
----- -----
^(.{24})\n $1 <-- there are spaces here
^(.{24})^M \1 <-- there are spaces here
^(.{24})\s ^^^^^