Snippets VS Code Regex - regex

I need your help, I am building a snippets, but I need to transform the path of the file which is this:
D:\Project\test\src\EnsLib\File\aaa\bbb
and I need it to be like this:
EnsLib\File\aaa\bbb
just leave me from "SRC" forward and replace the \ with points.
Example: D:\Project\test\src\EnsLib\File\aaa\bbb
Result: EnsLib.File.aaa.bbb
that always after the src folder is the starting point
my test regex are these:
"${TM_DIRECTORY/(.*\\\\{4})/$1/}",
"${TM_DIRECTORY/.*src\\\\(.*)\\\\(.*)$/.$2/}.${TM_FILENAME_BASE}",
// "${TM_DIRECTORY/.*\\\\(.*)\\\\(.*)$/$1.$2/}.${TM_FILENAME_BASE}",
// "${RELATIVE_FILEPATH/\\D{4}(\\W)\\..+$/$1/g}",
// "${TM_DIRECTORY/(.*src\\\\)//g}.${TM_FILENAME_BASE}",
// "${RELATIVE_FILEPATH/(\\D{3})\\W|(\\..+$)/$1.$2/g}",
// "${RELATIVE_FILEPATH/\\W/./g}",

It seems you want
"${TM_DIRECTORY/^.*?\\\\src\\\\|(\\\\)/${1:+.}/g}"
The regex is ^.*?\\src\\|(\\), it matches
^ - start of string
.*? - any zero or more chars other than line break chars, as few as possible
\\src\\ - \src\ string
| - or
(\\) - Group 1 ($1): a \ char.
If Group 1 matches, the replacement is a ., else, the replacement is an empty string, i.e. the text from the start of string till \src\ is simply removed.

Related

Remove duplicate lines containing same starting text

So I have a massive list of numbers where all lines contain the same format.
#976B4B|B|0|0
#970000|B|0|1
#974B00|B|0|2
#979700|B|0|3
#4B9700|B|0|4
#009700|B|0|5
#00974B|B|0|6
#009797|B|0|7
#004B97|B|0|8
#000097|B|0|9
#4B0097|B|0|10
#970097|B|0|11
#97004B|B|0|12
#970000|B|0|13
#974B00|B|0|14
#979700|B|0|15
#4B9700|B|0|16
#009700|B|0|17
#00974B|B|0|18
#009797|B|0|19
#004B97|B|0|20
#000097|B|0|21
#4B0097|B|0|22
#970097|B|0|23
#97004B|B|0|24
#2C2C2C|B|0|25
#979797|B|0|26
#676767|B|0|27
#97694A|B|0|28
#020202|B|0|29
#6894B4|B|0|30
#976B4B|B|0|31
#808080|B|1|0
#800000|B|1|1
#803F00|B|1|2
#808000|B|1|3
What I am trying to do is remove all duplicate lines that contain the same hex codes, regardless of the text after it.
Example, in the first line #976B4B|B|0|0 the hex #976B4B shows up in line 32 as #976B4B|B|0|31. I want all lines EXCEPT the first occurrence to be removed.
I have been attempting to use regex to solve this, and found ^(.*)(\r?\n\1)+$ $1 can remove duplicate lines but obviously not what I need. Looking for some guidance and maybe a possibility to learn from this.
You can use the following regex replacement, make sure you click Replace All as many times as necessary, until no match is found:
Find What: ^((#[[:xdigit:]]+)\|.*(?:\R.+)*?)\R\2\|.*
Replace With: $1
See the regex demo and the demo screenshot:
Details:
^ - start of a line
((#[[:xdigit:]]+)\|.*(?:\R.+)*?) - Group 1 ($1, it will be kept):
(#[[:xdigit:]]+) - Group 2: # and one or more hex chars
\| - a | char
.* - the rest of the line
(?:\R.+)*? - any zero or more non-empty lines (if they can be empty, replace .+ with .*)
\R\2\|.* - a line break, Group 2 value, | and the rest of the line.

Regular expression for specific file mask

I want to have 2 regex patterns that checks files after specific file mask. The way I like to do it is written below.
Pattern 1:
check if the left side of _ has 7 digits.
checks if the right side of _ is numeric.
checks for the specified extension is there.
the input will look like this : 1234567_1.jpg
Pattern 2:
check if there is 10 digits to the left of a "Space" char
check if there is 4 digits to the right of a "Space" char
check to the right side of _ is numeric
check for the specified extension is there.
The input will look like this: 1234567891 1234_1.png
As stated above this is to be used to check for a specific file mask.
i have been playing around with ideas like : ^[0-9][0-9].jpg$
and ^[0-9] [0-9][0-9].jpg$ is my first tries.
i do apologies for not providing my tries.
I suggest combining patterns with | (or):
string pattern = string.Join("|",
#"(^[0-9]{7}_[0-9]+\.jpg$)", // 1st possibility
#"(^[0-9]{10} [0-9]{4}_[0-9]+\.png$)"); // 2nd one
....
string fileName = #"c:\myFiles\1234567_1.jpg";
// RegexOptions.IgnoreCase - let's accept ".JPG" or ".Jpg" files
if (Regex.IsMatch(Path.GetFileName(fileName), pattern, RegexOptions.IgnoreCase)) {
...
}
Let's explain the second pattern: (^[0-9]{10} [0-9]{4}_[0-9]+\.jpg$)
^ - anchor (string start)
[0-9]{10} - 10 digits - 0-9
- single space
[0-9]{4} - 4 digits
_ - single underscope
[0-9]+ - one or more digits
\.png - .png (. is escaped)
$ - anchor (string end)
This should work for first regex:
\d{7}_\d*.(jpg|png)
This should work for second regex:
\d{10}\s\d{4}_\d*.(jpg|png)
If you want to use them together just do it like below:
(\d{7}_\d*.(jpg|png)|\d{10}\s\d{4}_\d*.(jpg|png))
In this group (jpg|png) you can just add another extensions by separating them with | (or).
You can check if it works for you at: https://regex101.com/
Cheers!

Lua. Search string in a file and print second column

Looking for solution to replace following command in Lua:
grep "dhcp-range" /tmp/etc/dnsmasq.conf | awk -F "\"*,\"*" '{print $2}'
tried
for line in file:lines() do
if line:match("([^;]*),([^;]*),([^;]*),([^;]*),([^;]*)") then
print(line[2])
end
end
and it doesnt work.
/tmp/etc/dnsmasq.conf looks like this
dhcp-leasefile=/tmp/dhcp.leases
resolv-file=/tmp/resolv.conf.auto
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
stop-dns-rebind
rebind-localhost-ok
dhcp-broadcast=tag:needs-broadcast
dhcp-range=lan,192.168.34.165,192.168.34.179,255.255.255.0,12h
no-dhcp-interface=eth0
Here is a function in Lua that will print the values you need if you pass the whole file contents to it:
function getmatches(text)
for line in string.gmatch(text, "[^\r\n]+") do
m,n = string.match(line,"^dhcp%-range[^,]*,([^,]+),([^,]+)")
if m ~= nil then
print(m,n)
end
end
end
See Lua demo
With string.gmatch(text, "[^\r\n]+"), each file line is accessed (adjust as you see fit), and then the main part is m,n = string.match(line,"^dhcp%-range[^,]*,([^,]+),([^,]+)") that instantiates m with the first IP and n with the second IP found on a line that starts with dhcp-range.
Lua pattern details:
^ - start of string
dhcp%-range - a literal string dhcp-range (a - is a quantifier in Lua matching 0 or more occurrences, but as few as possible, and to match a literal -, it must be escaped. Regex escapes are formed with %.)
[^,]*, - 0+ chars other than , and then a ,
([^,]+) - Group 1 (m): one or more chars other than ,
, - a comma
([^,]+) - Group 1 (n): one or more chars other than ,.
Try this code:
for line in io.lines() do
local a,b=line:match("^dhcp%-range=.-,(.-),(.-),")
if a~=nil then
print(a,b)
end
end
The pattern reads: match dhcp-range= at the start of a line (note the need to escape - in Lua), skip everything until the next comma, and capture the next two fields between commas.

Parsing out particular text in a big text column in a Dataframe - R

Suppose I have the following data,
data
text
abc/1234&
qwertyabc/5555&
a&sdfghabc/ppp&plksa&
z&xabc/lkjh&poiuw&
lkjqwefasrjabc/855698&plkjdhweb
For example if I want to parse out the text between abc/ and first occurrence of & alone, how do I parse out those text between these texts. I want the text between first occurence of abc/ and first occurrence of & after abc/ has occurred.
My output should be as follows,
data
text parsed_out
abc/1234& 1234
qwertyabc/5555& 5555
a&sdfghabc/ppp&plksa& ppp
z&xabc/lkjh&poiuw& lkjh
lkjqwefasrjabc/855698&plkjdhweb 855698
The following is my trying,
data1 = within(data, FOO<-data.frame(do.call('rbind', strsplit(as.character(text), 'abc/', fixed=TRUE))))
data2 = within(data1, FOO1<-data.frame(do.call('rbind', strsplit(as.character(FOO$X1), '&', fixed=TRUE))))
This is using too much of memory since the text file is of 8 million rows and also data2 would be having several columns because it has several '&'. Can anybody help me in parsing text between these two characters as only one column in a best efficient way so that it doesn't occupy too much of memory?
x = "thesearepresentinthestartingwhichisnotneededhttp://google.com/needstobeparsedout&reoccurencenotneeded&"
here, the function should check for http://google.com/ and parse out until first & is found. Here the output should be needstobeparsedout.
new_x = "\"http://www.google.com/search?q=erykah+badu+with+hiatus+kaiyote,+august+3&""
Why is it not working with this link?
Thanks
I actually wanted to parse out few parts of the URL and for example, I want to parse out, the text between "http:www.google.com/" and first occurrence of "&".
Use
sub(".*?https?://(?:www\\.)?google\\.com/([^&]+).*", "\\1", x)
See the regex demo.
The pattern matches:
(optionally add a ^ in front to match the start of string position)
.*? - 0+ chars as few as possible from the start till the first
https?:// - either https:// or http:// followed with
(?:www\\.)? - 1 or 0 (optional) sequence www.
google\\.com/ - literal text google.com
([^&]+) - 1 or more chars other than & (Capture group 1)
.* - any 0+ chars (up to the end of string).
In the replacment pattern, \1 refers to the subtext captured into Group 1.

Regex check if a file has any extension

I am looking for a regex to test if a file has any extension. I define it as: file has an extension if there is no slashes present after the last ".". The slashes are always backslashes.
I started with this regex
.*\..*[^\\]
Which translates to
.* Any char, any number of repetitions
\. Literal .
.* Any char, any number of repetitions
[^\\] Any char that is NOT in a class of [single slash]
This is my test data (excluding ##, which is my comments)
\path\foo.txt ## I only want to capture this line
\pa.th\foo ## But my regex also captures this line <-- PROBLEM HERE
\path\foo ## This line is correctly filtered out
What would be a regex to do this?
Your solution is almost correct. Use this:
^.*\.[^\\]+$
Sample at rubular.
I wouldn't use a regular expression here. I'd split on / and ..
var path = '\some\path\foo\bar.htm',
hasExtension = path.split('\').pop().split('.').length > 1;
if (hasExtension) console.log('Weee!');
Here goes a more simple function to check it.
const hasExtension = path => {
const lastDotIndex = path.lastIndexOf('.')
return lastDotIndex > 1 && path.length - 1 > lastDotIndex
}
if (hasExtension(path)) console.log('Sweet')
You can also try even more simpler approach:
(\.[^\\]+)$
Details:
$ = Look from the end of string
[^\\]+ = Any character except path separator one or more time
\. = looks for <dot> character before extension
Live Demo