perl script to match comma - regex

I have a netlist generated from schematic. This netlist includes power pins. Iam trying to write a perl script to remove power pins from netlist.
As part of this i have to search for a string that matches the pattern shown below:
", );"
I have used the following code and it is not working
$line =~ s/,\s+\);//g
I have observed that pattern end with comma are matched but pattern starting with comma or pattern with comma in middle are not matched.
Any suggestions on how to get this work

You need to use this instead:
s/,\s*\);//
You should be defensive and be able to handle no whitespace between the , and the ). You have to escape the ). See perldoc perlre for more info.

Thank you every one. I have found the problem. The problem was that the pattern to be recognized is split in to two different lines. The "," is in one line followed by ");" in next line. At first, iam removing the new line character and assumed that the next line will get appended to the current line, which is not happening. Hence, the pattern matching did not work.
To resolve this, i have to read the file once again and then replace the pattern.

Related

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

Regex Match Paragraph Pattern

I am trying to match a paragraph pattern and I am having trouble.
The pattern is:
[image.gif]
some words, usually a few lines
name
emailaddress<mailto:theemailaddress#mail.com>
I tried matching everything between the gif image and the <mailto: but this happens multiple times in the file meaning I get a bad result.
I tried it with this
(?<=\[image.gif\].*?(\[image.gif\])).*?(?=<mailto:)
Is there a way to use Regex to match the general layout of a paragraph?
"the general layout of a paragraph" needs a better definition. Given the lack of an input plus expected output, I'm having to guess what you want here. I'm also guessing that you will accept any language. Here's perl, almost certainly not a language you're familiar with.
Assumed input:
do not match this line
[image.gif]
some words, usually a few lines
Bobert McBobson
emailaddress<mailto:bobertmb#example.com>
don't match this line either
[image.gif]
another few words
on another few lines
Bobina Robertsdaughter
emailaddress<mailto:bobinard#example.info>
this line is also not for matching
Expected output:
[image.gif]
some words, usually a few lines
Bobert McBobson
emailaddress<mailto:bobertmb#example.com>
---
[image.gif]
another few words
on another few lines
Bobina Robertsdaughter
emailaddress<mailto:bobinard#example.info>
Solution using perl:
#!/usr/bin/perl -n007
my $sep = "";
while (/(\[image\.gif\].*?<mailto:[^>]*>(\r)?\n)/gms) {
print $sep . $1;
$sep = "---$2\n";
}
perl is the king of regex languages; many would say that's all it is good for. Here, we use the -n007 option to tell it to read the entire contents of each file and run the code on it as the default variable.
$sep starts blank because there's nothing to separate until the second match.
Then we loop over each block of text that matches the regex:
matches a literal [image.gif]
then matches as little content following that as possible
then matches a literal <mailto: and continues until the next >
then captures the line break (including optional support for DOS line endings)
(see full regex explanation and example at regex101)
We then print the match and finally set the separator to three dashes and a line break (DOS line endings added when needed).
Now you can run it:
$ perl answer.pl input.txt
[image.gif]
some words, usually a few lines
Bobert McBobson
emailaddress<mailto:bobertmb#example.com>
---
[image.gif]
another few words
on another few lines
Bobina Robertsdaughter
emailaddress<mailto:bobinard#example.info>

Regular expression is not matching new lines

I have the following reg ex:
"^((?!([\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*#((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3})))).)*$"
But it's not matching new lines as well:
https://regex101.com/r/nT6wK0/1
Any ideas how to make it match when there is a new line?
The . at the and actually means
All but a line break character. (source)
By replacing it with [\S\s], it means
All spacing characters and all non-spacing characters; so all characters.
Then it seems to work. You could have used other variants like [\W\w], [\D\d],...
So the "correct" regex (please don't take my word for it, first test this) is:
^((?!([\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*#((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))))[\S\s])*$
regex101 demo.
Assuming that you only want to match the first line, you can add the multiline option (/m) to include the newline.
If you want the second line to be included you'll need to read ahead an extra line. How you do that depends on the regex engine: N in sed; getline in awk; -n in perl; ...

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

notepad++ regex replace word in first line

Im trying to use the following regex to search and replace in multiple files in notepad++
([^\n]*)(state="1")([^\n]*)*.
This searches and finds state="1" in the first line of each file and works fine.
However, when I try to replace state="1" using:
Replace with: $1 state="5"
it cuts off the rest of the line.
I thought that it might be possible to get the rest of the line using:
Replace with: $1 state="5" $2
However, $2 doesnt seem to exist as a variable.
Is there some way to attach the rest of the line into variable $2?
Cheers
Heres an image to show how
(?=\A[^\n]*)state="1"
is not working
Ive updated my version of notepad++ and everything
Each capture group, (…), is assigned a number, so $2 represents the second capture group, (state="1"). The remainder of the line is captured in $3.
Either remove the capture group around state="1":
([^\n]*)state="1"([^\n]*)*.
Or use $3:
Replace with: $1 state="5" $3
Also, given the simplicity of the task, I don't see why you couldn't just search for state="1" and replace with state="5". There doesn't seem to be any need for regular expressions here.
Update There's nothing in the pattern listed so far which limits the result to only matching strings on the first line. If you need that I'd recommend using a pattern like this:
(?=\A[^\n]*)state="1"
With these settings:
Update There seems to be some strange behavior with the \A (beginning of text) anchor inside the lookbehind. Removing from the lookbehind seems to work. Try this pattern:
\A([^\n]*)state="1"
And replace with:
$1state="5"
All the other settings should be fine.