How to write to a file a regular expression match - regex

I have been parsing a log file and working on it just using print. I've got it working but I can't figure out how to write it to a file instead of printing it to screen.
I've tried opening output file o for writing, then the following regex
matched = re.search(r"(http|https)://(.*?)./+", line)
o.write(matched)
It throws an error that it has to be a string object for the .write argument. I've also tried o.write(matched(1),line) but that only gets me http. I'm a newbie so I'm sorry if this is to simple a question. But I don't know enough about this to know where to start.

Here's the documentation for Match objects where one of the functions mentions what you want:
Match.group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). [...]
Here's a runnable example:
import re
line = "Some text with https://www.example.com/ in it"
matched = re.search(r"(http|https)://(.*?)./+", line)
with open("file.txt", "w") as o:
o.write(matched.group())
which results in:
$ python3 test.py; cat file.txt; echo
https://www.example.com/
$ python2 test.py; cat file.txt; echo
https://www.example.com/

Related

Add text to the end if not already added

I have the following lines:
source = "git::ssh://git#github.abc.com/test//bar"
source = "git::ssh://git#github.abc.com/test//foo?ref=tf12"
resource = "bar"
I want to update any lines that contain source and git words by adding ?ref:tf12 to the end of the line but inside ". If the line already contains ?ref=tf12, it should skip
source = "git::ssh://git#github.abc.com/test//bar?ref=tf12"
source = "git::ssh://git#github.abc.com/test//foo?ref=tf12"
resource = "bar"
I have the following expression using sed, but it outputs wrongly
sed 's#source.*git.*//.*#&?ref=tf12#' file.tf
source = "git::ssh://git#github.abc.com/test//bar"?ref=tf12
source = "git::ssh://git#github.abc.com/test//foo"?ref=tf12?ref=tf12
resource = "bar"
Using simple regular expressions for this is rather brittle; if at all possible, using a more robust configuration file parser would probably be a better idea. If that's not possible, you might want to tighten up the regular expressions to make sure you don't modify unrelated lines. But here is a really simple solution, at least as a starting point.
sed -e '/^ *source *= *"git/!b' -e '/?ref=tf12" *$/b' -e 's/" *$/?ref=tf12"/' file.tf
This consists of three commands. Remember that sed examines one line at a time.
/^ * source *= *"git/!b - if this line does not begin with source="git (with optional spaces between the tokens) leave it alone. (! means "does not match" and b means "branch (to the end of this script)" i.e. skip this line and fetch the next one.)
/?ref=tf12" *$/b similarly says to leave alone lines which match this regex. In other words, if the line ends with ?ref=tf12" (with optional spaces after) don't modify it.
s/"* $/?ref=tf12"/ says to change the last double quote to include ?ref=tf12 before it. This will only happen on lines which were not skipped by the two previous commands.
sed '/?ref=tf12"/!s#\(source.*git.*//.*\)"#\1?ref=tf12"#' file.tf
/?ref=tf12"/! Only run substitude command if this pattern (?ref=tf12") doesn't match
\(...\)", \1 Instead of appending to the entire line using &, only match the line until the last ". Use parentheses to match everything before that " into a group which I can then refer with \1 in the replacement. (Where we re-add the ", so that it doesn't get lost)

Regex: keep same pattern found multiple times in same line and replace line by appending single pattern in front

Is it possible with notepad++ (or maybe from linux bash shell) to create multiple lines from a pattern found , as many times as the pattern is found and also append single found pattern in the newly created line?
The multi pattern is val=[0-9]+
The single pattern is id=[a-zA-Z0-9]+
Example:
Input lines:
id=af2477,val=333,val=777
id=af3456,val=222,val=444,val=678
id=af3327,val=3234,val=123,val=701
Output lines:
id=af2477,val=333
id=af2477,val=777
id=af3456,val=222
id=af3456,val=444
id=af3456,val=678
id=af3327,val=3234
id=af3327,val=123
id=af3327,val=701
I have tried with 2 subgroups but it wont work. It will only replace the second group once:
find what:(id=[a-zA-Z0-9]+,)(val=[0-9]+,)*
replace:\n\1,\2
UPDATE: Both answers from Toto and Wiktor Stribiżew seem to do the job. Haven't tested them yet. I would still like to see how this can work with the use of Notepad++ (even if multiple steps are needed)
Since you also consider using Linux tools for this, an awk solution looks much more viable:
awk 'BEGIN{FS=OFS=","} /^id=[a-zA-Z0-9]+(,val=[0-9]+)*$/{
for(i=2; i<=NF; i++) {
print $1,$i
}; next;
}{print $0}' file > outfile
See the online demo.
Here, any line that matches ^id=[a-zA-Z0-9]+(,val=[0-9]+)*$ (i.e. matches the format of the lines you need to expand) is split the way you need with for(i=2; i<=NF; i++) {print $1,$i}; next;. Else, the line is written as is (print $0).
The BEGIN{FS=OFS=","} part sets the input and output field separator to a comma.
This perl one-liner does the job (output on STDOUT):
perl -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file
id=af2477,val=333
id=af2477,val=777
id=af3456,val=222
id=af3456,val=444
id=af3456,val=678
id=af3327,val=3234
id=af3327,val=123
id=af3327,val=701
Explanation:
($id,$vals)=/(id=\w+),(.+)$/; # explode id and values for each line in input file
say "$id,$_" for split/,/,$vals # print id and each value
You can redirect the output to another file:
perl -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file > outputfile
Or do the change in-place:
perl -i -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file
It is possible, yet very complex to do that with one regular expression for which you are gonna have to use (?R) and conditional statements.
With multiple steps would be pretty simple. You can for instance do find and replace using the max number of val that you might have in the longest lines, such as, imagine 4 would be the largest number of val, then we'll have four of (,val=[^\r\n,]*) in our initial expression:
^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$
and replace that with four lines,
$1$2\n$1$3\n$1$4\n$1$5
---- ---- ---- ----
Demo for Step 1
For any additional step, we can simply remove one val and one line from the end of initial expression and replacement. For example, our expression would look like
^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$
in the second step, for which we'd replace it with:
$1$2\n$1$3\n$1$4
---- ---- ----
Demo for Step 2
In the third and final step, our expression has two vals,
^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$
and our replacement will have two lines:
$1$2\n$1$3
---- ----
Demo for Step 3
For the case exampled in the question, only two steps are required and the second and third expressions would likely work just fine.

Find and remove specific string from a line

I am hoping to receive some feedback on some code I have written in Python 3 - I am attempting to write a program that reads an input file which has page numbers in it. The page numbers are formatted as: "[13]" (this means you are on page 13). My code right now is:
pattern='\[\d\]'
for line in f:
if pattern in line:
re.sub('\[\d\]',' ')
re.compile(line)
output.write(line.replace('\[\d\]', ''))
I have also tried:
for line in f:
if pattern in line:
re.replace('\[\d\]','')
re.compile(line)
output_file.write(line)
When I run these programs, a blank file is created, rather than a file containing the original text minus the page numbers. Thank you in advance for any advice!
Your if statement won't work because not doing a regex match, it's looking for the literal string \[\d\] in line.
for line in f:
# determine if the pattern is found in the line
if re.match(r'\[\d\]', line):
subbed_line = re.sub(r'\[\d\]',' ')
output_file.writeline(subbed_line)
Additionally, you're using the re.compile() incorrectly. The purpose of it is to pre-compile your pattern into a function. This improves performance if you use the pattern a lot because you only evaluate the expression once, rather than re-evaluating each time you loop.
pattern = re.compile(r'\[\d\]')
if pattern.match(line):
# ...
Lastly, you're getting a blank file because you're using output_file.write() which writes a string as the entire file. Instead, you want to use output_file.writeline() to write lines to the file.
You don't write unmodified lines to your output.
Try something like this
if pattern in line:
#remove page number stuff
output_file.write(line) # note that it's not part of the if block above
That's why your output file is empty.

Display all lines matching a pattern in vim

I search for a particular string in a file in vim, and I want all the lines with matching string to be displayed, perhaps in another vim window.
Currently I do this:
Search for 'string'
/string
and move to next matching string
n or N
Bur, I want all the lines with matching string at one place.
For example:
1 Here is a string
2 Nothing here
3 Here is the same string
I want lines 1 and 3 to be displayed as below, highlighting string
1 Here is a string
3 Here is the same string
:g/pattern/#<CR>
lists all the lines matching pattern. You can then do :23<CR> to jump to line 23.
:ilist pattern<CR>
is an alternative that filters out comments and works across includes.
The command below:
:vimgrep pattern %|cwindow<CR>
will use Vim's built-in grep-like functionality to search for pattern in the current file (%) and display the results in the quickfix window.
:grep pattern %|cwindow<CR>
does the same but uses an external program. Note that :grep and :vimgrep work with files, not buffers.
Reference:
:help :g
:help include-search
:help :vimgrep
:help :grep
:help :cwindow
FWIW, my plugin vim-qlist combines the features of :ilist and the quickfix window.
From the comments I believe your file looks like this, i.e. the line numbers are not part of the text:
Here is a string
Nothing here
Here is the same string
You could copy all lines matching a pattern into a named register ("a" in the example below), then paste it into a new file:
:g/string/y A
:e newfile
:"ap
Which gets you:
Here is a string
Here is the same string
Alternatively, you can use the grep command and add -n to include line numbers:
:grep -n string %
1:~/tmp.txt [text] line: 3 of 3, col: 23 (All)
:!grep -nH -n string /home/christofer/tmp.txt 2>&1| tee /tmp/vHg7GcV/3
[No write since last change]
/home/christofer/tmp.txt:1:Here is a string
/home/christofer/tmp.txt:3:Here is the same string
(1 of 2): Here is a string
Press ENTER or type command to continue
By default you'll get the output in the "command buffer" down at the bottom (don't know its proper name), but you can store it in several different places, using :copen for example.
Following this answer over on the Vi StackExchange:
:v/mystring/d
This will remove all lines not containing mystring and will highlight mystring in the remaining lines.

Vim: How to delete repetition in a line

I am having a log file for analysis, in that few of the line will have repetition of it own, but not complete repetition, say
Alex is here and Alex is here and we went out
We bothWe both went out
I want to remove the first occurrence and get
Alex is here and we went out
We both went out
Please share a regex to do in Vim in Windows.
I don't recommend trying to use regex magic to solve this problem. Just write an external filter and use that.
Here's an external filter written in Python. You can use this to pre-process the log file, like so:
python prefix_chop.py logfile.txt > chopped.txt
But it also works by standard input:
cat logfile.txt | prefix_chop.py > chopped.txt
This means you can use it in vim with the ! command. Try these commands: goto line 1, then pipe from current line through the last line through the external program prefix_chop.py:
1G
!Gprefix_chop.py<Enter>
Or you can do it from ex mode:
:1,$!prefix_chop.py<Enter>
Here's the program:
#!/usr/bin/python
import sys
infile = sys.stdin if len(sys.argv) < 2 else open(sys.argv[1])
def repeated_prefix_chop(line):
"""
Check line for a repeated prefix string. If one is found,
return the line with that string removed, else return the
line unchanged.
"""
# Repeated string cannot be more than half of the line.
# So, start looking at mid-point of the line.
i = len(line) // 2 + 1
while True:
# Look for longest prefix that is found in the string after pos 0.
# The prefix starts at pos 0 and always matches itself, of course.
pos = line.rfind(line[:i])
if pos > 0:
return line[pos:]
i -= 1
# Stop testing before we hit a length-1 prefix, in case a line
# happens to start with a word like "oops" or a number like "77".
if i < 2:
return line
for line in infile:
sys.stdout.write(repeated_prefix_chop(line))
I put a #! comment on the first line, so this will work as a stand-alone program on Linux, Mac OS X, or on Windows if you are using Cygwin. If you are just using Windows without Cygwin, you might need to make a batch file to run this, or just type the whole command python prefix_chop.py. If you make a macro to run this you don't have to do the typing yourself.
EDIT: This program is pretty simple. Maybe it could be done in "vimscript" and run purely inside vim. But the external filter program can be used outside of vim... you can set things up so that the log file is run through the filter once per day every day, if you like.
Regex:\b(.*)\1\b
Replace with:\1 or $1
If you want to deal with more than two repeating sentences you can try this
\b(.+?\b)\1+\b
--
|->avoids matching individual characters in word like xxx
NOTE
Use \< and \> instead of \b
You could do it by matching as much as possible at the beginning of the line and then using a backreference to match the repeated bit.
For example, this command solves the problem you describe:
:%s/^\(.*\)\(\1.*\)/\2