I have a text file and I would like to replace the last space with a comma to facilitate data import and processing.
My text file has the following sample lines:
Some text 123 here and then 44.99
more text 789 is 33.75
The result I'd like to obtain:
Some text 123 here and then,44.99
more text 789 is,33.75
You can use the following regex replacement:
Find what: \h+(\S+)$
Replace with: ,\1
See the regex demo.
Details
\h+ - 1 or more (+) repetitions of any horizontal whitespaces (\h)
(\S+) - Capturing group 1: any one or more chars other than whitespace (\S)
$ - end of a line.
The ,\1 replacement replaces the matched text with a comma and the contents of Group 1.
Related
I have more than a million lines of text in this format:
AAAA BBBBBBBBBBBBBBB CCCC
Separated by \t
I want to have it in a format
AAAA_CCCC BBBBBBBBBBBBBBB
But I cannot seem to figure out how to do it using regular expressions in Notepad++
You may try the following find and replace, in regex mode:
Find: ^(\S+)\t(\S+)\t(\S+)$
Replace: $1_$3 $2
Here is a demo.
If the separator is a tab, you can use
^[^\r\n\t]+\K\t([^\r\n\t]+)\t([^\r\n\t]+)$
The pattern matches:
^ Start of string
[^\r\n\t]+ Match 1+ chars other than a tab or newline
\K\t Forget what is matches so far using \K and match a tab
([^\r\n\t]+) Capture group 1, match any 1+ chars other than a newline or tab
\t Match a tab
([^\r\n\t]) Capture group 2, match 1 char other than a newline or tab
$ end of string
In the replacement use the 2 capture groups with an underscore in between.
_$2 $1
See a regex demo.
The result of the replacement:
AAAA_CCCC BBBBBBBBBBBBBBB
Probably a terrible title.
I am trying to take the following:
Joe Dane
Bob Sagget
Whitney Houston
Some
Other
Test
And trying to produce:
JOE_DANE("Joe Dane"),
BOB_SAGGET("Bob Sagget"),
WHITNEY_HOUSTON("Whitney Houston"),
SOME("Some"),
OTHER("Other"),
TEST("Test"),
I'm using Notepad++ and am close but not good enough at regex to figure out the remaining expression. So far, this is what I have:
Find what: (^.*)
Replace with: \1 \(\"\1\"\),
Produces: Joe Dane("Joe Dane"),
I've tried replacing with: \U$1 \(\"\1\"\), but this also impacts the second instance of \1 with upper case. It also does not replace the whitespace with an underscore _.
This can be done in a single step.
If you don't have more than 2 words in a line:
Ctrl+H
Find what: ^(\S+)(?: (\S+))?$
Replace with: \U$1(?2_$2)\E\("$0"\),
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space
(?: (\S+))? # non capture group, a space, group 2, 1 or more non space, optional
$
Replacement:
\U # uppercased
$1 # group 1
(?2_$2) # if group 2 exists, add and underscore before
\E # end uppercase
\("$0"\), # the whole match with parens and quote
Screenshot (after):
If you have more than 2 words (up to 5), use:
Find ^(\S+)(?: (\S+))?(?: (\S+))?(?: (\S+))?(?: (\S+))?
Replace: \U$1(?2_$2)(?3_$3)(?4_$4)(?5_$5)\E\("$0"\),
I you have more thans five word, add as many (?: (\S+))? as needed.
You might do it in 2 steps, first matching any char 1+ more times from the start of the string.
Find what
^.+
For the first replacement you can use \E to end the activation of \U and use the full match $0
Replace with
\U$0\E\("$0"\),
For the second step, to replace the spaces with underscores, you could skip over the text between parenthesis, and match spaces between uppercase chars.
Find what
\(".*?"\)(*SKIP)(*F)|[A-Z]+\K\h+(?=[A-Z])
\(".*?"\) Match from (" till ")
(*SKIP)(*F)| Skip this part of the match
[A-Z]+\K Match uppercase chars and use \K to clear the current match buffer (forget what is matches do far)
\h+(?=[A-Z]) Match 1+ horizontal whitespace chars and assert an uppercase char to the right
Replace with _
I'm trying to clean up some assembly code and I'd like to convert the spaces between the instruction and argument to tabs. However, I'd like to avoid inadvertently converting the spaces between the words in the comments after the semicolon.
So here is an example some lines of code:
label: bcf INTCON,2 ; comment comment and more comment.
btfss PORTA,2
The closest I've come is (?<=^).+(?=;). This not only matches EVERYTHING between the beginning of the line and the semicolon, but it includes all semicolons except for the very last semicolon. Imagine lines of codes with comments that was commented out. It also doesn't take into consideration line without comments.
How do I do this?
Maybe,
^([^:\r\n]+:)\s*([^\r\n]+?)(?:$|\s{2,})(;.*)?$
and a replacement of,
$1 $2 $3
might be OK to start with.
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Ctrl+H
Find what: ^(\w+:)\h+|^\h+
Replace with: (?1$1\t:\t\t)
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\w+:) # group 1, 1 or more word characters followed by colon
\h+ # 1 or more horizontal spaces
| # OR
^ # beginning of line
\h+ # 1 or more horizontal spaces
Replacement:
(?1 # if group 1 exists, then
$1\t # content of group 1 and a tab
: # else
\t\t # 2 tabs
) # end conditional replace
Screen capture:
If you want to change the space between bcf and INTCON,2 to 2 tabs, you might match the 2 "words" and make sure that they don't start with a ;
^(?:\S+:)?\h+(?!;)\S+\K\h+(?=[^\s;])
^ Start of string
(?:\S+:)? Optionally match 1+ non whitespace chars and :
\h+(?!;) Match 1+ horizontal whitespace chars, then assert what is on the right is not a ;
\S+\K Match 1+ non whitespace chars, forget what was matched
\h+ Match 1+ horizontal whitespace chars (this match will be replaced)
(?=[^\s;]) Assert what is on the right is not a whitespace char or ;
In the replacement use 2 tabs \t\t
Regex demo
Edit
If you want to find the first space between non whitespace chars, you might use
^.*?\S\K (?=\S)
I am converting one pdf to text with xpdf and then find some words
with help of regex and preg_match_all.
I am seperating my words with colon in pdftotext.
Below is my pdftotext output:
In respect of Shareholders
Name: xyx
Residential address: dublin
No of Shares: 2
Name: abc
Residential address: canada
No of Shares: 2
So i write one regex that will show me words after colon in text().
$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);
But Now i want regex that will display all data after In respect of Shareholders.
So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';
But it shows me only :
Name: xyx
I want first to find all data after In respect of shareholders and then another regex to find words after colon.
You may use
if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
print_r($matches[0]);
}
See the regex demo
Details
(?:\G(?!\A)|In respect of Shareholders) - either the end of the previous successful match or In respect of Shareholders text
\s* - 0+ whitespaces
[^:\n\r]+ - 1 or more chars other than :, CR and LF
: - a colon
\h* - 0+ horizontal whitespaces
\K - match reset operator that discards all text matched so far
.* - the rest of the line (0 or more chars other than line break chars).
In your regex (?<=: ).+ you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)
Another way to match the texts using a capturing group could be:
^.*?:[ \t]+(\w+)
Explanation
^ Assert start of the string
.*?: Match any character non greedy followed by a :
[ \t]+ Match 1+ times a space or a tab
(\w+) Capture in a group 1+ word characters
Regex demo | Php demo
Or use \K to forget what was matched if that is supported:
^.*?:\h*\K\w+
Regex demo
I have use regex to successfully extract anything right after "Abc 123" but it doesn't extract anything from the new line.
Is there any way I can use regex to extract the following:
"Abc 123 def
ghi
jkl"
"Abc 123 def ghi jkl mno"
"Abc 123 def ghi jkl
mno"
I am using Regex in Talend.
I think you want to exract substrings that start at the beginning of a line with 1+ word chars, then a whitespace, then 1 or more digits and span across multiple lines up to the same pattern.
You may use the following regex (note the flags and notation may differ depending on the language you are using):
/^(\w+)\s(\d+)(.*(?:\r?\n(?!\w+\s\d).*)*)/gm
See the regex demo.
Details:
^ - start of a line
(\w+) - Group 1: one or more word chars
\s - 1 whitespace
(\d+) - Group 2: one or more digits
(.*(?:\r?\n(?!\w+\s\d).*)*) - Group 3:
.* - any 0+ chars other than line break chars
(?:\r?\n(?!\w+\s\d).*)* - zero or more sequences of:
\r?\n - a line break...
(?!\w+\s\d) - that is not followed with 1+ word chars, whitespace, 1+ digits
.* - any 0+ chars other than line break chars
(\w)+\s(\d+)((.|\R)+) is what you want so after escaping it'll be:
(\\w)+\\s(\\d+)((.|\\R)+).
\R is a new group in Java regex available since Java 8 - it stands for a line break. Both: \r\n and \n.
If you only allow a single linebreak:
(\w)+\s(\d+)((.+)(\R.+){0,1})
I think that you should specify more what is your desired output, but from this answer you can learn how to include multiple lines or up to 2 lines