I want to remove simple // comments in a string.
My String is called input
def input = '''test //kommentar
|
|//noch einer
|
|und noch //einer'''.stripMargin()
The regex is \s*\/\/.*$ and can be tested here http://regexr.com?37ks0
In my code i have input = input.replaceAll(/\s*\/\/.*$/ , '')
But it doesn't work. Can anybody help me ?
At the very least, you need to make sure that the $ anchor is allowed to match the end of each line, not just the end of the entire string:
input = input.replaceAll(/(?m)\s*\/\/.*$/ , '')
But what if // occur, say, in a quoted string? Or in any other circumstance where they do not mean "start of comment"?
And if you want to keep the //noch einer line as a blank line in your output, you could try:
input.replaceAll( '(?m)//.*$' , '' )
Of course if the line above was in your input text, then all of this regex munging would break the input code, as that line would become input.replaceAll('(?m)
As a general rule, this sort of regular expression parsing of code is never a good idea
Ok i got the answer from a college.
If it was one line, the code in the description works.
Because I have multilines, I have to use the following:
input = input.replaceAll(Pattern.compile(/\s*\/\/.*$/, Pattern.MULTILINE), '')
Related
I need a tip, tip or suggestion followed by some example of how I can add an extension in .txt format after the last character of a variable's output line.
For example:
set txt " ONLINE ENGLISH COURSE - LESSON 5 "
set result [concat "$txt" .txt]
Print:
Note that there is space in the start, means and fin of the variable phrase (txt). What must be maintained are the spaces of the start and means. But replace the last space after the end of the sentence, with the format of the extension [.txt].
With the built-in concat method of Tcl, it does not achieve the desired effect.
The expected result was something like this:
ONLINE ENGLISH COURSE - LESSON 5.txt
I know I could remove spaces with string map but I don't know how to remove just the last occurrence on the line.
And otherwise I don’t know how to remove the last space to add the text [.txt]
If anyone can point me to one or more solutions, thank you in advance.
set result "[string trimright $txt].txt"
or
set result [regsub {\s*$} $txt ".txt"]
I have a list of strings containing the names of actors in a movie that I want to extract. In some cases, the actor's character name is also included which must be ignored.
Here are a couple of examples:
# example 1
input = 'Levan Gelbakhiani as Merab\nAna Javakishvili as Mary\nAnano Makharadze'
expected_output = ['Levan Gelbakhiani', 'Ana Javakishvili', 'Anano Makharadze']
# example 2
input = 'Yoosuf Shafeeu\nAhmed Saeed\nMohamed Manik'
expected_output = ['Yoosuf Shafeeu', 'Ahmed Saeed', 'Mohamed Manik']
Here is what I've tried to no avail:
import re
output = re.findall(r'(?:\\n)?([\w ]+)(?= as )?', input)
output = re.findall(r'(?:\\n)?([\w ]+)(?: as )?', input)
output = re.findall(r'(?:\\n)?([\w ]+)(?:(?= as )|(?! as ))', input)
The \n in the input string are new line characters. We can make use of this fact in our regex.
Essentially, each line always begins with the actor's name. After the the actor's name, there could be either the word as, or the end of the line.
Using this info, we can write the regex like this:
^(?:[\w ]+?)(?:(?= as )|$)
First, we assert that we must be at the start of the line ^. Then we match some word characters and spaces lazily [\w ]+?, until we see (?:(?= as )|$), either as or the end of the line.
In code,
output = re.findall(r'^(?:[\w ]+?)(?:(?= as )|$)', input, re.MULTILINE)
Remember to use the multiline option. That is what makes ^ and $ mean "start/end of line".
You can do this without using regular expression as well.
Here is the code:
output = [x.split(' as')[0] for x in input.split('\n')]
I guess you can combine the values obtained from two regex matches :
re.findall('(?:\\n)?(.+)(?:\W[a][s].*?)|(?:\\n)?(.+)$', input)
gives
[('Levan Gelbakhiani', ''), ('Ana Javakishvili', ''), ('', 'Anano Makharadze')]
from which you filter the empty strings out
output = list(map(lambda x : list(filter(len, x))[0], output))
gives
['Levan Gelbakhiani', 'Ana Javakishvili', 'Anano Makharadze']
The below code does not return True for the match. I am wondering why? Any help is appreciated.
Note:
id_list = ['YYY-100', 'YYYMM1640ASS20', 'Cruzer', 'SSDSC2BA20', 'BBBPEDMD40']
'drives.txt' contains lines like this (and does contain above IDs in some lines).
'RED SSDSC2BA200G4R 200 GB 2.5 SATA 6G Class E: 30,000-100,000 writes per second'
So I would assume that id 'SSDSC2BA20' will match the second word in above line, but below match does not return True.
For double-checking, I tried 'if match: print match.group()' but that returns nothing as well. What am I missing?
import re
with open('drives.txt', 'r') as fr:
for id in id_list:
for line in fr:
match = re.search(r'%s' % id, line, re.I)
if match:
print 'True'
Note that instead of above regex, I tried the below also, but that did not work either.
my_regex = r".?" + re.escape(id) + r".?"
match = re.search(my_regex, line, re.I)
fr is a file pointer. With your current approach, you're iterating over the lines multiple times, once for each regex. Don't do this. Everytime you read a line, you advance the file pointer till it points to the end of the file. This happens on the first iteration itself, so forthcoming iterations will have you read empty strings from the file.
One fix for this is to do fr.seek(0, 0) after each inner loop, which I don't recommend. The other fix is to reorder your loops. Iterate over your file once. Here's how you do that:
with open('drives.txt', 'r') as fr:
for line in fr:
for id in id_list:
match = re.search(r'%s' % id, line, re.I)
if match:
print id, 'matches for line:', line
Also, I should mention that using id as a variable name shadows the builtin id() function, so I recommend you change it.
I am hoping to receive some feedback on some code I have written in Python 3 - I am attempting to write a program that reads an input file which has page numbers in it. The page numbers are formatted as: "[13]" (this means you are on page 13). My code right now is:
pattern='\[\d\]'
for line in f:
if pattern in line:
re.sub('\[\d\]',' ')
re.compile(line)
output.write(line.replace('\[\d\]', ''))
I have also tried:
for line in f:
if pattern in line:
re.replace('\[\d\]','')
re.compile(line)
output_file.write(line)
When I run these programs, a blank file is created, rather than a file containing the original text minus the page numbers. Thank you in advance for any advice!
Your if statement won't work because not doing a regex match, it's looking for the literal string \[\d\] in line.
for line in f:
# determine if the pattern is found in the line
if re.match(r'\[\d\]', line):
subbed_line = re.sub(r'\[\d\]',' ')
output_file.writeline(subbed_line)
Additionally, you're using the re.compile() incorrectly. The purpose of it is to pre-compile your pattern into a function. This improves performance if you use the pattern a lot because you only evaluate the expression once, rather than re-evaluating each time you loop.
pattern = re.compile(r'\[\d\]')
if pattern.match(line):
# ...
Lastly, you're getting a blank file because you're using output_file.write() which writes a string as the entire file. Instead, you want to use output_file.writeline() to write lines to the file.
You don't write unmodified lines to your output.
Try something like this
if pattern in line:
#remove page number stuff
output_file.write(line) # note that it's not part of the if block above
That's why your output file is empty.
So I'm new to Perl and writing a script that would read through rows in a CSV file, and rename a directory of files associated with a certain column in that CSV file.
my $filename_formatted = "$row->[3]"."_"."$row->[4]"."_"."$row->[2]\n";
my $resume_id = $row->[1];
if (-e $resume_id){
rename($resume_id, $filename_formatted);
}
Basically, how could I format $resume_id to accept only the contents up to the file extension? The $row->[1] variable contains something like "resume_1231.pdf" or "resume_1231.doc". I basically want everything up to the .
I understand I would probably need a regex, but, I've never utilized it in Perl.
$formatted_resume_id = /($row->[1])?!\..*$/
I don't know.
I suppose you would want everything up to the final dot in the file name (so you would get the full name even if the filename contained dots).
Something like this should do it:
if ( $row->[1] =~ /(.*)\./ ) {
$formatted_resume_id = $1;
}
The $row->[1] variable contains something like "resume_1231.pdf" or "resume_1231.doc".
I basically want everything up to the .
Try with capturing group.
^([^.]*)
Live demo
OR using Lazy way.
^(.*?)\.
Sample code:
$mystring = "resume_1231.pdf";
if($mystring =~ m/^([^.]*)/) {
print "The file name is $1";
}
So the answer was apparently this,
my $resume_file = "bogus_filename.doc";
my ($name) = $resume_file =~ /(.+?)(\.[^.]*$|$)/;
my($ext) = $resume_file =~ /(\.[^.]+)$/;
This would account for any extra periods, as it only accepts up to the very last period.
I'm still a bit unsure as to how this works, so if anyone can break down the first regex, that would be great. I understand (.+?) but I'm lost as to how the second part of that regex means to not include the extension.