Convert a list of values to CSV values - regex

I have a list of words and I want to convert them into a CSV.
a
b
c
d
to a,b,c,d
I replaced \n by , and it worked, but that was my 2 attempt
I first tried this regex ^([A-Za-z ]+)$\n and replacement is \1, . This particular regex is doing it for adjacent string like this:
a,b
c,d
What can I change in it to get it to work.
I am doing it in eclipse so I guess it is java, but I dont have to take into consideration the \ escape, it is same as edit+.

This regex:
^([A-Za-z ]+)$\n
matches the beginning of a line, letters and space, then the end of the line.
Once you perform your first replacement, the line contains a comma, so it would no longer match that pattern.
The regex is also a bit redundant. Because \n only comes at the end of a line anyway, you don't need both $ and \n in your pattern.
In order to fix it, you simply need to let your pattern match a comma:
^([A-Za-z ,]+)\n
Note: the specifics might vary based on your Eclipse version and/or file encoding. I needed \r\n to match a newline in mine.

From your example, you don't even need to use regular expressions. Simply replace two one newlines newline (\n) with a comma (,) and you're set.

Related

RegEx for simple pattern

I have a logfile that contains lines like:
....
unit:
...
unit.integration:
....
I would like to run a RegEx search on the file using notepad++ that returns all lines that: start with a word with no blanks and ends with :\n. I have tried:
(.:\n)
but that gives 0 results. I have looked at:
http://www.aivosto.com/vbtips/regex.html
EDIT: Updated with more specific requirements to the starting word.
I would like to run a RegEx search ... that returns all lines that start with anything but ends with :\n.
The first part of the match is :, and because you're looking for line breaks, you'll want to match the end of the line, which is $.
You said you don't care about what comes before it, so don't even include it in the regex:
:$
Update to address edit:
I would like to run a RegEx search ... that returns all lines that: start with a word with no blanks and ends with :\n.
This differs from your original post, now you want to match a "word with no blanks", however that implies that it's a single word, and doesn't contain special characters.
It seems to me that you'd like to match unit.integration which is two words and a separating . character.
If you want what you asked for ("a word with no blanks") then just prepend ^\w+ to the regex:
^\w+:$
(matches unit:, but not unit.integration:)
If, instead you want to match lines that don't contain spaces and end in :, then you should use ^\S+ instead:
^\S+$
(matches unit: and unit.integration:, but also matches ##*()$&*(&:)
The details matter, so avoid assumptions and be as explicit as possible in what you want matched.
To get all lines that have no spaces and end in :, use
^\S+:\R
\S matches non-whitespace symbols only, and \R means any line break.
See screenshot:
If you plan to match the last line, too, replace \R with $ (end of line or whole file metacharacter).

Extracting text between two keywords or a keyword and \n

I have a set of lines where most of them follow this format
STARTKEYWORD some text I want to extract ENDKEYWORD\n
I want to find these lines and extract information from them.
Note, that the text between keywords can contain a wide range of characters (latin and non-latin letters, numbers, spaces, special characters) except \n.
ENDKEYWORD is optional and sometimes can be omitted.
My attempts are revolving around this regex
STARTKEYWORD (.+)(?:\n| ENDKEYWORD)
However capturing group (.+) consumes as many characters as possible and takes ENDKEYWORD which I do not need.
Is there a way to get some text I want to extract solely with regular expressions?
You can make (.+) non greedy (which is by default greedy and eats whatever comes in its way) by adding ? and add $ instead of \n for making more efficient
STARTKEYWORD (.+?)(?:$| ENDKEYWORD$)
If you specifically want \n you can use:
STARTKEYWORD (.+?)(?:\n| ENDKEYWORD\n)
See DEMO
You could use a lookahead based regex. It always better to use $ end of the line anchor since the last line won't contain a newline character at the last.
STARTKEYWORD (.+?)(?= ENDKEYWORD|$)
OR
STARTKEYWORD (.+?)(?: ENDKEYWORD|$)
DEMO

Regex to keep specific characters from string

I need a regex command that can be used to only keep 0-9,a-z,A-Z, "-" and ":".
How can I do this?
(Also, I would like to know if there are any good Regex GUI editors)
Use a character class, the following will match any one of the characters you listed:
[0-9a-zA-Z\-:]
And here is a regex that will match strings that contain only those characters:
^[0-9a-zA-Z\-:]*$
If you don't want to allow empty strings, change the * to +.
It wasn't exactly clear if this is what you were trying to do, if you are actually trying to remove all other characters except the listed one, you can negate the character class by adding ^ to the beginning of it, like so:
[^0-9a-zA-Z\-:]
This will match all characters except the ones listed, so you should be able to replace matches of the above regex with an empty string to remove the unwanted characters.

RegEx \D matches start and end of line as well

I need to find lines that are 3 digits and 3 other characters: I thought I use the following RegEx:
^\d{3}\D{3}$
But take the following sample text file and run the RegEx above (the text must have the empty lines in it):
1
12
123xxx
123y
aaabb
The problem is that there are two matches: 123xxx (which is fine), but also 123y is matched!
I suspect the reason is that "y" + the end-of-line + the beginning-of-next-line are also matched.
How can I tell the regex engine to ignore line beginnings and endings with \D and match characters only, not positions?
The behavior of $ in UltraEdit changes depending on whether you have "Match Whole Word Only" checked or not. To get the behavior you want you need to make sure that that option is checked. Your regular expression doesn't need to change.
Maybe:
/^\d{3}\D{3}$/m
The m means
Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.
http://perldoc.perl.org/perlre.html
I don't know about UltraEdit exactly but I expect it will have something similar.
Try this :
^\d{3}[\S]{3}$
Match lines with 3 digits followed by three characters that are not blank characters.

Matching a line without either of two words

I was wondering how to match a line without either of two words?
For example, I would like to match a line without neither Chapter nor Part. So neither of these two lines is a match:
("Chapter 2 The Economic Problem 31" "#74")
("Part 2 How Markets Work 51" "#94")
while this is a match
("Scatter Diagrams 21" "#64")
My python-style regex will be like (?<!(Chapter|Part)).*?\n. I know it is not right and will appreciate your help.
Try this:
^(?!.*(Chapter|Part)).*
#MRAB's solution will work, but here's another option:
(?m)^(?:(?!\b(?:Chapter|Part)\b).)*$
The . matches one character at a time, after the lookahead checks that it's not the first character of Chapter or Part. The word boundaries (\b) make sure it doesn't incorrectly match part of a longer word, like Partition.
The ^ and $ are start- and end anchors; they ensure that you match a whole line. $ is better than \n because it also matches the end of the last line, which won't necessarily have a linefeed at the end. The (?m) at the beginning modifies the meaning of the anchors; without that, they only match at the beginning and end of the whole input, not of individual lines.