Deleting comments from a file, using Regex

Deleting comments from a file, using Regex - regex

I want to write a program that deletes all the comments (starting with "//" until the end of the line) from a file.
I want to do it using regular expressions.
I tried this:
let mutable text = File.ReadAllText("C:\\a.txt")
let regexComment = new Regex("//.*\\r\\n$")
text <- regexComment.Replace(text, "")
File.WriteAllText("C:\\a.txt",text)
But it doesn't work...
Can you please explain to me why, and give me some suggestion to something that does work (preferable using regex..) ?
Thanks :)

Rather than loading the whole file into memory and running a regex on it, a faster approach that will handle any size file without memory issues might look like this:
open System
open System.IO
open System.Text.RegularExpressions
// regex: beginning of line, followed by optional whitespace,
// followed by comment chars.
let reComment = Regex(#"^\s*//", RegexOptions.Compiled)
let stripComments infile outfile =
File.ReadLines infile
|> Seq.filter (reComment.IsMatch >> not)
|> fun lines -> File.WriteAllLines(outfile, lines)
stripComments "input.txt" "output.txt"
The output file must be different from the input file, because we're writing to the output while we're still reading from the input. We use the regex to identify comment lines (with optional leading whitespace), and Seq.filter to make sure the comment lines don't get sent to the output file.
Because we never hold the entire input or output file in memory, this function will work on any size file, and it's likely faster than the "read entire file, regex everything, write entire file" approach.
Danger Ahead
This code will not strip out comments that appear after some code on the same line. However, a regular expression is not the right tool for that job, unless someone can come up with a regular expression that can tell the following two lines of code apart and avoid breaking the first one when you strip everything that matches the regex from the file:
let request = WebRequest.Create("http://foo.com")
let request = WebRequest.Create(inputUrl) // this used to be hard-coded

let regexComment = new Regex(#"//.*$",RegexOptions.Multiline)

Never mind, I figured it out. It should have been:
let regexComment = new Regex("//.*\\r\\n")

Your regex string seems to be wrong. "\\/\\/.*\\r\\n" worked for me.

Related

How to select specific lines regardless on the content in the middle

Let's start with the structure I have so that I can better explain what I want to do. Imagine that I have a text as it follows:
write information1/info_a/content
read information/content
write information1/info_b/content
write information1/info_c/content
write information2/info_a/content
write information2/info_b/content
write information3/format/info_b/content
I want to highlight every line that starts with a specific path and that also contains another path, for example:
starts with 'write'
and contains 'info_b'
The desired output with the give example above is then:
write information1/info_b/content
write information2/info_b/content
write information3/format/info_b/content
How can I do this with a regular expression?
Thanks to everybody in advance
I know that for selecting everything that starts with write with regex I can do:
^write
and that for saying until the end of the line I should use the key $

You can select any line containing "info_b" with the regex:
^write.*info_b.*$
Which translates to line start-> any amount of anything -> info_b -> any amount of anything -> end line

Parsing links with regex

I have a problem I can't seem to figure out how to write a regular expression correctly. How to write a regular expression that for example if I have loaded some text the part that interests me is links that end with .m3u or m3u8. For example if i specify this input in my program
Input - player = new Player({"player-id":"1","autoplay":"false","fullscreen":"false","debug":"true","content-volume":"85","ad-volume":"30","ad-load-timeout":"15000","div-id":"videoPlayer","default-quality-index":0,"title":"\u0428\u043f\u0438\u043e\u043d, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043c\u0435\u043d\u044f \u043a\u0438\u043d\u0443\u043b ","poster":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-480p.mp4/thumb-33000.jpg","content":{"mp4":[],"dash":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/manifest.mpd","hls":"https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/master.m3u8"},"about":"false","key":"4eeeb77181526bedc1025586d43a70fa","btn-play-pause":"true","btn-stop":"true","btn-fullscreen":"true","btn-prev-next":"false","btn-share":"true","btn-vk-share":"true","btn-twitter-share":"true","btn-facebook-share":"true","btn-google-share":"true","btn-linkedin-share":"true","quality":"true","volume":"true","timer":"true","timeline":"true","iframe-version":"true","max-hls-buffer-size":"10","time-from-cookie":"true","set-prerolls":["https://test/j/v.php?id=645"],"max-prerolls-impressions":1});
By using regex the output should be -
https://test/four/v1/video-file1/00/00/00/00/00/00/00/10/22/11/102211-,480,p.mp4.urlset/master.m3u8
I have tried writing this regex expression but it parses all links and not the ones that I need. I only need the links tht end with a specific tag
Thank you for your answer in advance

I dont see why there are so much downvotes, maybe the question looked totally different originally.
Using regex only, my solution in ASP.net would be to reverse the text first, then look up for everything between "u3m" until the next occurence of "ptth".
Play with it: http://refiddle.com/nwvu
Regex for m3u8 OR m3u:
(8u3m.+?ptth)|(u3m.+?ptth)
ASP String reversal (from https://forums.asp.net/t/1841367.aspx?Reverse+String+in+asp+net):
string input = TextBox1.Text;
char[] inputarray = input.ToCharArray();
Array.Reverse(inputarray);
string output = new string(inputarray);

mIRC Search for multiple words in text file

I am trying to search a text file that will return a result if more than one word is found in that line. I don't see this explained in the documentation and I have tried various loops with no success.
What I would like to do is something similar to this:
$read(name.txt, s, word1|word2|word3)
or even something like this:
$read(name.txt, w, word1*|*word2*|*word3)
I don't know RegEx that well so I'm assuming this can be done with that but I don't know how to do that.

The documentation in the client self is good but I also recommend this site: http://en.wikichip.org/wiki/mirc. And with your problem there is a nice article : http://en.wikichip.org/wiki/mirc/text_files
All the info is taken from there. So credits to wikichip.
alias testForString {
while ($read(file.txt, nw, *test*, $calc($readn + 1))) {
var %line = $v1
; you can add your own words in the regex, seperate them with a pipe (|)
noop $regex(%line,/(word1|word2|word3|test)/))
echo -a Amount of results: $regml(0)
}
}
$readn is an identifier that returns the line that $read() matched. It is used to start searching for the pattern on the next line. Which is in this case test.
In the code above, $readn starts at 0. We use $calc() to start at line 1. Every match $read() will start searching on the next line. When no more matches are after the line specified $read will return $null - terminating the loop.
The w switch is used to use a wildcard in your search
The n switch prevents evaluating the text it reads as if it was mSL code. In almost EVERY case you must use the n switch. Except if you really need it. Improper use of the $read() identifier without the 'n' switch could leave your script highly vulnerable.
The result is stored in a variable named %line to use it later in case you need it.
After that we use a noop to execute a regex to match your needs. In this case you can use $regml(0) to find the amount of matches which are specified in your regex search. Using an if-statement you can see if there are two or more matches.
Hope you find this helpful, if there's anything unclear, I will try to explain it better.
EDIT
#cp022
I can't comment, so I'll post my comment here, so how does that help in any way to read content from a text file?

Regular expression, get the rest of the text aswell

I have a text file, a regular expression that looks in that file and gets the things I want. I also write this new information into a new file, however not everything is written to the new file! The file that my regex reads from looks like this:
"This is my text, it contains of 53 or so words file. That is a very
good number. However 80 is a better number. Hopefully I can write more
words soon enough. Hopefully very very soon "
What is written to the new text file is:
"This is my text, it contains of 53 or so words file. That is a very
good number. However 80 is a better number. Hopefully I can write more
words"
I want everything to be written. Any ideas?

Without the regex you were using, it's impossible to say.
I would hazard a guess though, that what you need to do is stick .*$ on the end of the capture group, in order to grab the rest of the text on the line.

^[\s\S]*$
should do it for you.

RegEx for VB.net

I have a txt file with content
$NETS
P3V3_AUX_LGATE; PQ6.8 PU37.2
U335_PIN1; R3328.1 U335.1
$END
need to be updated in this format, and save back to another txt file
$NETS
'P3V3_AUX_LGATE'; PQ6.8 PU37.2
'U335_PIN1'; R3328.1 U335.1
$END
NOTE: number of lines may go up to 10,000 lines
My current solution is to read the txt file line by line, detect the presence of the ";" and newline character and do the changes.
Right now i have a variable that holds ALL the lines, is there other way something like Replace via RegEx to do the changes without looping thru each line, this way i can readily print the result
and follow up question, which one is more efficient?

Try
ResultString = Regex.Replace(SubjectString, "^([^;\r\n]+);", "'$1';", RegexOptions.Multiline)
on your multiline string.
This will find any string (length one or more) at the start of a line up until the first semicolon if there is one and replace it with its quoted equivalent.
It should be more efficient than looping through the string line by line as you're doing now, but if you're in doubt, you'd have to profile it.

You could probably find all the matches using something like \w+; but I don't know how you'd be able to do a replace on that using Regex.Replace to add the 's but keep the original match.
However, if you already have it as one variable, you don't have to read the file again, either you could make your code find all ;s and then find the previous newline for each, or you could use a String.Split on newlines to split the variable you've already got into lines.
And if you want to get it back to one variable you can just use String.Join.
Personally I'd normally use the String.Split (and possibly the String.Join if needed) method, since I think that would make the code easy to read.

I would say Yes! this can be done with Regular expressions. Make sure you got the "multiline" option turned on and craft your regular expression using some capture groups to ease the work.
I can however say this will NOT be the optimal one. Since you mention the amount of lines you could be processing, it seems 'resource wise' smarter to use a streaming approach instead of the in memory approach.
Taking the Regex approach (and this took 15 mins so please don't think this is an optimal solution, just prove it would work)
private static Regex matcher = new Regex(#"^\$NETS\r\n(?<entrytitle>.[^;]*);\s*(?<entryrest>.*)\r\n(?<entrytitle2>.[^;]*);\s*(?<entryrest2>.*)\r\n\$END\r\n", RegexOptions.Compiled | RegexOptions.Multiline);
static void Main(string[] args)
{
string newString = matcher.Replace(ExampleFileContent, new MatchEvaluator(evaluator));
}
static string evaluator(Match m)
{
return String.Format("$NETS\r\n'{0}'; {1}\r\n'{2}'; {3}\r\n$END\r\n",
m.Groups["entrytitle"].Value,
m.Groups["entryrest"].Value,
m.Groups["entrytitle2"].Value,
m.Groups["entryrest2"].Value);
}
Hope this helps,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Deleting comments from a file, using Regex - regex

let regexComment = new Regex(#"//.*$",RegexOptions.Multiline)

Never mind, I figured it out. It should have been: let regexComment = new Regex("//.*\\r\\n")

Your regex string seems to be wrong. "\\/\\/.*\\r\\n" worked for me.

Related

How to select specific lines regardless on the content in the middle

Parsing links with regex

mIRC Search for multiple words in text file

Regular expression, get the rest of the text aswell

RegEx for VB.net

Categories

Resources