Nested regex replacement - regex

I need to create the laravel migrations, so I have converted my SQL script to a laravel migration format using "replacement in files" with regular expressions from Sublime Text.
My problem is that i have to replace in the following string the '#' character by the 'tablename' in about 70 tables:
Schema::table('tablename', function($table) {
$table->dropForeign('#_columnname_foreign');
});
Actually I can do this using the following expression:
(Schema::table\('([a-z]+)',[\s]*function\(\$table\)[\s]*{[\s]*\$table->dropForeign\(')#(_[a-z_]+'\);)
And in the replace field:
$1$2$3
but I don't know how to do when the table has more than one fk:
Schema::table('tablename1', function($table) {
$table->dropForeign('#_field1_foreign');
$table->dropForeign('#_field2_foreign');
$table->dropForeign('#_field3_foreign');
$table->dropForeign('#_field4_foreign');
$table->dropForeign('#_field5_foreign');
$table->dropForeign('#_field6_foreign');
});
I have been using this site to validate my regular expressions RegExr

It is not an easy task for a regex in Sublime Text. The only way to do it with a regex is to make sure you capture the function singature with the optional number of table-dropForeign lines (matched lazily), and replace #s on the next line.
The regex below requires clicking Replace All multiple times until all matches are found.
(Schema::table\('([a-z0-9]+)',\s*function\(\$table\)\s*{(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*?\s*\$table->dropForeign\(')#(_\w+'\);)
Replacement is $1$2$3. See this regex demo, where you may replace the # in the second block manually with the table name and see how the match goes further.
Details:
(Schema::table\('([a-z0-9]+)',\s*function\(\$table\)\s*{(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*?\s*\$table->dropForeign\(') - Group 1 capturing:
Schema::table\(' - literal Schema::table(' substring
([a-z0-9]+) - Group 2 capturing 1+ alphanumerics (do not check Match Case option to also match uppercase ASCII letters)
',\s* - a comma and 0+ whitespaces
function\(\$table\) - a literal text function($table)
\s* - 0+ whitespaces
{ - a literal { (in SublimeText 2, it requires escaping)
(?:\s*\$table->dropForeign\('[a-z0-9]+_\w+'\);)*? - 0+ sequences, but as few as possible, matching:
\s*\$table->dropForeign\(' - 0+ whitespaces and then a literal text `$table->dropForeign('
[a-z0-9]+_\w+ - 1+ alphanumerics, _ and 1+ digits, letters or underscores (\w+)
'\); - a literal substring ');
\s* - 0+ whitespaces
\$table->dropForeign\(' - a literal text $table->dropForeign('
# - a matched # symbol to be replaced
(_\w+'\);) - Group 2 capturing:
_ - an underscore
\w+ - 1 or more letters, digits or underscores
'\); - a literal substring ');
NOTE: The issue I thought I found was related to an unescaped { that causes a regex failure in Sublime Text 2. In Sublime Text 3, the { in the regex does not have to be escaped.

Related

Match string that contains punctuations, emojis, special characters, some Chinese characters and alpha numeric

I have a string which has the following format:
Foo/FooVersion some info
Foo can contain:
punctuations
special characters
emojis
alpha numeric
Chinese characters
I have this regex to capture the following pattern:
^[\+$-¨™®é!?_ó–:—🔥😘兼职,.&\w\s]+\/\d+[\+\w.-]*
It seems quite exhaustive list of character set and I am not sure if it does cover all the characters. What I am looking for is a simplified regex that takes these characters into account and returns true if there is a match. I am using sql.
FooVersion can consists of:
start with digit followed by word including dot or hyphen
You could use such pattern ([^\/]+)\/\1Version.+
Pattern explanation:
([^\/]+) - [^\/]+ matches on or more characters other than / (this is negated character class), () means capturing group, so matched text is put into first capturing group
\/ - match / literally
\1 - back reference to match the same text as was matched by first capturing group
Version - match Version literally
.+ - match one or more of any characters (to match rest of a string - this is optional and can be removed)
Regex demo
Update
To match updated requirements, you should use ([^\/]+)\/\d[a-zA-Z\d.-]+
What's new is:
[a-zA-Z\d.-]+ - match on or more characters from set a-z (lowercase letters), A-Z (uppercase letters), \d (digits), .- - hyphen or dot
Updated demo

Regex Pattern for a Java Log

I am trying to use the regex Parser Plugin in fluentd to index the logs of my application.
Here's a snippet of it.
2020-05-06T22:34:50.860-0700 - WARN [main] o.s.b.GenericTypeAwarePropertyDescriptor: Invalid JavaBean property 'pipeline' being accessed! Ambiguous write methods found next to actually used [public void com.theoaal.module.pipeline.mbean.DynamicPhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theplatform.module.pipeline.DynamicPipeline)]: [public void com.theplatform.module.pipeline.mbean.PhaseExecutionConfigurationMBeanBuilder.setPipeline(com.theoaal.module.pipeline.Pipeline)]
I have used the regex101.com to match the regex pattern and I am not able to get a match.
^(?<date>\d{4}\-\d{2}\-\d{2})(?<timestamp>[A-Z][a-z]{1}\d{2}:\d{2}:\d{2}.\d{3}\-\d{4})\s\-\s(?<loglevel>\[\w\]{6})\s+(?<class>\[[A-Z][a-z]+\])\s(?<message>.*)$
Kindly help.
Thanks
You may use
^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)
See the regex demo
Note, in your pattern, \[\w\]{6} only matches [, a single word char and six ] chars. In the timestamp pattern, [A-Z][a-z]{1} requires two letters, but tere is a single T. Your "class" pattern requires a capitalized word with [A-Z][a-z]+, but main is all lowercase. You escape - outside of character classes unnecessarily, and you failed to escape a literal dot in the pattern.
Details
^ - start of string
(?<date>\d{4}-\d{2}-\d{2}) - date: 4 digits, -, 2 digits, -, 2 digits
[A-Z] - an uppercase ASCII letter
(?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4}) - 2 digits, :, 2 digits, :, 2 digits, ., 3 digits, - and 4 digits
\s+-\s+ - - enclosed with 1+ whitespaces
(?<loglevel>\w+) - 1+ word chars
\s+ - 1+ whitespaces
(?<class>\[\w+\]) - [, 1+ word chars, ]
\s+ - 1+ whitespaces
(?<message>.*) - the res of the line.
Copy and paste to fluent.conf or td-agent.conf:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /^(?<date>\d{4}-\d{2}-\d{2})[A-Z](?<timestamp>\d{2}:\d{2}:\d{2}\.\d{3}-\d{4})\s+-\s+(?<loglevel>\w+)\s+(?<class>\[\w+\])\s+(?<message>.*)/
</source>
Test:

Regex exclude trailing text from company names

CURRENTLY
I am try to match valid company names from strings with 4 conditions:
the name can ONLY contain alphanumeric characters + spaces + hyphens
the name can contain a hyphen (inside the name)
there are company suffixes that should be excluded from the company name i.e. Pty Ltd, Pty. Ltd., Limited, and Ltd.
If there are additional matches on the same line, these are to be excluded
What I am trying to achieve:
My regex so far:
(?:\s|^)([a-zA-Z0-9]+[a-zA-Z0-9\s-]*?[a-zA-Z0-9]+)(?: Pty Ltd| Ltd(\.){0,1}| Limited){0,1}(?:\s|$)
ISSUES
https://regex101.com/r/Gpbdln/4
It seems I am struggling with:
Excluding the suffixes to be ignored
Making the capture include spaces for the company name (while at the same time excluded suffixes)
I have been stuck on this for over an hour and would appreciate some help.
You may use
^[a-zA-Z0-9]+(?:[\s-]+[a-zA-Z0-9]+)*?(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$)
See the regex demo
If you only need to get matches that do not span across lines, replace \s with \h or [\p{Zs}\t] if supported, or [^\S\r\n], to only match horizontal whitespaces.
Details
^ - start of string
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?:[\s-]+[a-zA-Z0-9]+)*? - 0 or more (but as few as possible) occurrences of
[\s-]+ - 1+ whitespaces or hyphens
[a-zA-Z0-9]+ - 1+ ASCII alphanumeric chars
(?=(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)?$) - immediately to the right, there must be
(?:\s+(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]).*)? - an optional occurrence of a sequence of patterns:
\s+ - 1+ whitespaces
(?:(?:Pty\.?\s+)?Ltd\.?|Limited|[a-zA-Z0-9]*[^a-zA-Z0-9\s]) - any of
(?:Pty\.?\s+)?Ltd\.?| - an optional sequence of Pty, an optional dot and then 1+ whitespaces and then Ltd string and an optional . char, or
Limited| - Limited string, or
[a-zA-Z0-9]*[^a-zA-Z0-9\s] - any 0 or more ASCII alphanumeric chars followed with a char other than whitespace and alphanumeric char
.* - the rest of the string
$ - end of string.

C# Regex Expression to extract field name and values in SQL Condition

Consider following 2 SQL conditions.
1.) AssetView.[PROPTYPE] NOT IN ('B15/30','SFD','SFA')
2.) AssetView.[FICO] IN (500,600,700)
I want to break this SQL using RegEx so that I can have table name, field name, function type and field values into 4 different parts.
e.g.
Table Name - AssetView
Field Name - PROPTYPE
Function - NOT IN
Field Values (Together or separate): B15/30, SFD, SFA
Here is the regex I tried (https://rubular.com/r/WGiyz0oGrooyiA) but I am not able to split TableName, Field Name and Function type into its own group.
(.*?)[^=]['(]+(.*?)[')]
In your pattern (.*?)[^=]['(]+(.*?)[')] you make use of a character classes ['(] and [')] which match any of the listed and can also first match an opening ' and then a closing )
For your example data, you might use:
(\w+)\.\[(\w+)\] +(\w+(?: \w+)*) +\(([^)\n]+)\)
(\w+) Capture 1+ word chars in group 1
\. Match a dot
\[(\w+)\] + Capture 1+ word chars between square brackets in group 2 and 1+ spaces
(\w+(?: \w+)*) + Capture 1+ word chars followed by repeating 0+ times matching a space and 1+ word chars in group 3 and 1+ spaces
\(([^)\n]+)\) Capture 1+ times not a closing parenthesis or newline between parenthesis in group 4
Rubular regex | .NET regex (click on the Table tab)
If you want to allow more characters to match than \w you could extend that using a character class.
For example if you also want to allow a hyphen and a space use [\w-]+ or if you want to match all between the brackets you could make use of a negating character class, for example \[([^\]]+)\]

How to write nested regex to find words below some string?

I am converting one pdf to text with xpdf and then find some words
with help of regex and preg_match_all.
I am seperating my words with colon in pdftotext.
Below is my pdftotext output:
In respect of Shareholders
Name: xyx
Residential address: dublin
No of Shares: 2
Name: abc
Residential address: canada
No of Shares: 2
So i write one regex that will show me words after colon in text().
$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);
But Now i want regex that will display all data after In respect of Shareholders.
So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';
But it shows me only :
Name: xyx
I want first to find all data after In respect of shareholders and then another regex to find words after colon.
You may use
if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
print_r($matches[0]);
}
See the regex demo
Details
(?:\G(?!\A)|In respect of Shareholders) - either the end of the previous successful match or In respect of Shareholders text
\s* - 0+ whitespaces
[^:\n\r]+ - 1 or more chars other than :, CR and LF
: - a colon
\h* - 0+ horizontal whitespaces
\K - match reset operator that discards all text matched so far
.* - the rest of the line (0 or more chars other than line break chars).
In your regex (?<=: ).+ you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)
Another way to match the texts using a capturing group could be:
^.*?:[ \t]+(\w+)
Explanation
^ Assert start of the string
.*?: Match any character non greedy followed by a :
[ \t]+ Match 1+ times a space or a tab
(\w+) Capture in a group 1+ word characters
Regex demo | Php demo
Or use \K to forget what was matched if that is supported:
^.*?:\h*\K\w+
Regex demo