I am trying to wrap quotes around certain section of content in a CSV file, the current layout is something like this:
###element1,element2,element3,element4,element5,element6,element7,element8, "element9,
element9,""element9"",element9,
element9,element9,""element9",element10,
###
the ### symbols depict a new line and each new line should have one, the problem is I need to get to all of element 9 in to one set of double quotes, however there are multiple instances of doublequotes within that area which break up the element in to new fields making my table expand beyond the fields I initially set. So I believe I need to remove all the " marks between the start and end of element9 and then reintroduce one set to highlight the whole section.
I approached this firstly by trying to select the 8th Comma from the start and the 2 comma from the end:
^((?:[^,]+,){8})(.+)((?:,[^,]*){2})$
and replacing with
$1"$2"$3
I tried to target the starting ### and ending ### to select those two elements but with no success.
any suggestions on how I can do this
UPDATE
###BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,BLAHBLAH,
BLAHBLAH,
BLAHBLAH,
BLAHBLAH, BLAHBLAH,
BLAHBLAH, BLAHBLAH,
BLAHBLAH,
"BLAHBLAH""",E,
###
The last field always seem to contain a capital letter, the fields before vary in quotation placement so to really target that whole section I need to work out how many commas along and how many back I need to go, remove the quotes and then reinstate them in the correct positions.
###(?:[^,]*,){8}\K([\s\S]*?)(?=,[^,]*,[^,]*?###)
Try this.Replace by "\1" or "$1".See demo.
https://regex101.com/r/tD0dU9/13
/^(?:[^,]*,){8}([^#]*),[^,]*,[^,]*$/s
https://regex101.com/r/hU8yO6/1
I think the regexp you had is about right, except for needing the /s modifier.
For notepad++, get the s modifier by ticking ". matches newline":
^(?:[^,]*,){8}([^#]*),[^,]*,[^,]*$
This looks like a good reference: http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
You'll probably want to add parens appropriately to make capture groups also.
^#+[^"]+"([^#]+),[^,]+,[^,]+###\s*$
Related
I have a database table that I have exported. I need to replace the image file name with a space and would like to use notepad++ and regex to do so. I have:
'data/green tea powder.jpg'
'data/prod_img/lumina herbal shampoo.JPG'
'data/ALL GREEN HERBS.jpeg'
'data/prod_img/PSORIASIS KIT (640x530) (2).jpg'
and need to make them look like this:
'data/green_tea_powder.jpg'
'data/prod_img/lumina_herbal_shampoo.JPG'
'data/ALL_GREEN_HERBS.jpeg'
'data/prod_img/PSORIASIS_KIT_(640x530)_(2).jpg'
I just want to change the spaces between the quotes (I don't want to change the capitalization). To be more specific I would like to replace any and all spaces between 'data/ and ' because there are other spaces between quotes in the DB, for example:
'data/ REPLACE ANY SPACE HERE '
I found this:
\s(?!(?:[^']*'[^']*')*[^']*$)
but there are other places where there are spaces between quotes so I'd like to search for data/ in the beging and not just a single quote but I can't figure out how. I tried \s(?!(?:[^'data\/]*'[^']*')*[^']*$) but it didn't work and I am not familiar enough with regex to make it do so.
An example of a full line from the database is:
(712, 'GRTE-P', '', 'data/green tea powder.jpg', '2014-03-12 22:52:03'),
I don't want to replace the spaces in the time and data stamp at the end of the line, just the image file names.
Thanks in advance for your help!
You have to use a \G based pattern to ensure that matches are contiguous.
search: (?:\G(?!^)|'data/)[^' ]*\K[ ]replace: _
The first match uses the second branch of the alternation, then the next matches are contiguous and use the first branch.
I have text separated by white spaces and a search range of more than 1000 words.
Approximately 70% of the words are following this pattern foo-bar-...-N, where N is unknown value for counter for words written between this sign: "-". After each word(between each word) there is a blank space.
What I need is for the script to select everything after the foo-bar up until the blank space.
I know how to select whole thing, but not how to get solution for my issue.
Here is some example for my idea:
foo-bar foo-bar-thing foo-bar-stuff-my-gosh ... foo-bar-for-educational-purposes
And regex should select them like so:
[foo-bar] [foo-bar]-thing [foo-bar]-stuff-my-gosh ... [foo-bar]-for-educational-purposes
You want a the regex to fetch a phrase and extract a substring from it.
To do that you need a group.
So here is the code you want :
foo-bar([\w-]*)
There is a space at the end don't forget it. You need to set the global flag as you can see in the demo. And your string has to end with a space if you want to match the last one. If it's multiline don't forget the multiline flag too.
Please see the Wikipedia article "List of countries by total health expenditure per capita".
The countries listed in the long table should all be links. That means in the wikitext the country names need double brackets around them. For example; [[Australia]] - This is a common problem when creating country lists.
I pasted the wikitext into Notepad++. I know how to add brackets in front of the country names. There are some unique characters and line breaks that allow me to use basic find and replace (no need for regular expressions).
But I can not figure out how to add brackets after the country names. There is a set of double bars after each country name. But unfortunately, there are multiple sets of double bars in each line. See some of the wikitext:
|-
|Australia||3866||..||..
|-
|Austria||4528||4553||..
|-
|Belgium||4225||4256||..
So I need a way to only find the first set of double bars in each line, and then add brackets in front of them.
I forked anubhava's regex demo and created this regex, instead:
^.*?\|\h*\K(.*?)(?=\h*\|) replace with [[$1]]
You can use this regex to get first || in each line:
^.*?\K\|\|
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
Make sure to use MULTILINE mode.
RegEx Demo
so I have a list that goes like this:
AudioQuest FLX-14/2
Abbey Road Cable Monitor Speaker Cable
and in in the first line I need to remove everything after first word and in the second one I need to remove everything in line after first TWO words. I figured out how to remove everything after first word, it's
.*?$
but I'm helpless with the second case. Help me out so I can toggle shortcuts on macros for both actions and process the list in the way semi-automatical way (Select and apply macros).
From what I can see, it seems the data is aligned. And from the example, only the first 10 characters are needed, the rest should be removed.
Find what: (.{10}).*
Replace with: $1
I'd do it in two passes..
Find:
"(^[a-z] [a-z]* )"
"(^[a-z] )"
Replace: "\1"
I have a string with a field like this: id="ID-120-1, ID-141-5, ID-92-5, N/A"
I'd like to capture only the "ID"s to a named capture group (i.e. without the "N/A" or other items that might creep in). I thought this might work, but no luck:
\bid=\"(?<id>(ID-\d+-\d+)+)
Any ideas?
The expression you are using only returns one because you are counting on the start of the id to be present in front of each ID value. The following adjustment should fix that.
(?:(?:=\")|(?:,\s))(?<id>(?:ID-\d+-\d+)*)
Another option would be to just drop the id=" check part all together
(?<id>(?:ID-\d+-\d+))
Or you could add the ", " check on to the end of the id to make sure you are in attribute.
(?<id>(?:ID-\d+-\d+))(?:(?:,\s)|(?:"))
You would need to capture commas and spaces also, as they are repeated in your string:
\bid=\"(?<id>(ID-\d+-\d+, )+)
I believe what you are trying to do is not possible with pure regex, especially if IDs and 'N/A' can be intermixed. You will need to have a loop in your program, or if you use Perl or PHP, you can run code in the replacement part of the regex (/e switch) to add the matches to an array.