Lua. Search string in a file and print second column - regex

Looking for solution to replace following command in Lua:
grep "dhcp-range" /tmp/etc/dnsmasq.conf | awk -F "\"*,\"*" '{print $2}'
tried
for line in file:lines() do
if line:match("([^;]*),([^;]*),([^;]*),([^;]*),([^;]*)") then
print(line[2])
end
end
and it doesnt work.
/tmp/etc/dnsmasq.conf looks like this
dhcp-leasefile=/tmp/dhcp.leases
resolv-file=/tmp/resolv.conf.auto
addn-hosts=/tmp/hosts
conf-dir=/tmp/dnsmasq.d
stop-dns-rebind
rebind-localhost-ok
dhcp-broadcast=tag:needs-broadcast
dhcp-range=lan,192.168.34.165,192.168.34.179,255.255.255.0,12h
no-dhcp-interface=eth0

Here is a function in Lua that will print the values you need if you pass the whole file contents to it:
function getmatches(text)
for line in string.gmatch(text, "[^\r\n]+") do
m,n = string.match(line,"^dhcp%-range[^,]*,([^,]+),([^,]+)")
if m ~= nil then
print(m,n)
end
end
end
See Lua demo
With string.gmatch(text, "[^\r\n]+"), each file line is accessed (adjust as you see fit), and then the main part is m,n = string.match(line,"^dhcp%-range[^,]*,([^,]+),([^,]+)") that instantiates m with the first IP and n with the second IP found on a line that starts with dhcp-range.
Lua pattern details:
^ - start of string
dhcp%-range - a literal string dhcp-range (a - is a quantifier in Lua matching 0 or more occurrences, but as few as possible, and to match a literal -, it must be escaped. Regex escapes are formed with %.)
[^,]*, - 0+ chars other than , and then a ,
([^,]+) - Group 1 (m): one or more chars other than ,
, - a comma
([^,]+) - Group 1 (n): one or more chars other than ,.

Try this code:
for line in io.lines() do
local a,b=line:match("^dhcp%-range=.-,(.-),(.-),")
if a~=nil then
print(a,b)
end
end
The pattern reads: match dhcp-range= at the start of a line (note the need to escape - in Lua), skip everything until the next comma, and capture the next two fields between commas.

Related

Snippets VS Code Regex

I need your help, I am building a snippets, but I need to transform the path of the file which is this:
D:\Project\test\src\EnsLib\File\aaa\bbb
and I need it to be like this:
EnsLib\File\aaa\bbb
just leave me from "SRC" forward and replace the \ with points.
Example: D:\Project\test\src\EnsLib\File\aaa\bbb
Result: EnsLib.File.aaa.bbb
that always after the src folder is the starting point
my test regex are these:
"${TM_DIRECTORY/(.*\\\\{4})/$1/}",
"${TM_DIRECTORY/.*src\\\\(.*)\\\\(.*)$/.$2/}.${TM_FILENAME_BASE}",
// "${TM_DIRECTORY/.*\\\\(.*)\\\\(.*)$/$1.$2/}.${TM_FILENAME_BASE}",
// "${RELATIVE_FILEPATH/\\D{4}(\\W)\\..+$/$1/g}",
// "${TM_DIRECTORY/(.*src\\\\)//g}.${TM_FILENAME_BASE}",
// "${RELATIVE_FILEPATH/(\\D{3})\\W|(\\..+$)/$1.$2/g}",
// "${RELATIVE_FILEPATH/\\W/./g}",
It seems you want
"${TM_DIRECTORY/^.*?\\\\src\\\\|(\\\\)/${1:+.}/g}"
The regex is ^.*?\\src\\|(\\), it matches
^ - start of string
.*? - any zero or more chars other than line break chars, as few as possible
\\src\\ - \src\ string
| - or
(\\) - Group 1 ($1): a \ char.
If Group 1 matches, the replacement is a ., else, the replacement is an empty string, i.e. the text from the start of string till \src\ is simply removed.

How can I delete the rest of the line after the second pipe character "|" for every line with python?

I am using notepad++ and I want to get rid of everything after one second (including the second pipe character) for every line in my txt file.
Basically, the txt file has the following format:
3.1_1.wav|I like apples.|I like apples|I like bananas
3.1_2.wav|Isn't today a lovely day?|Right now it is 1 in the afternoon.|....
The result should be:
3.1_1.wav|I like apples.
3.1_2.wav|Isn't today a lovely day?
I have tried using \|.* but then everything after the first pipe character is matched.
In Notepad++ do this:
Find what: ^([^\|]*\|[^\|]*).*
Replace with: $1
check "Regular expression", and "Replace All"
Explanation:
^ - anchor at start of line
( - start group, can be referenced as $1
[^\|]* - scan over any character other than |
\| - scan over |
[^\|]* - scan over any character other than |
) - end group
.* - scan over everything until end of line
in replace reference the captured group with $1
I'm not sure if this is the best way to do it, but try this:
[^wav]\|.*

Notepad++ add Suffix after each 5 lines

I have a text file contains a list of usernames (+100,000 lines), I'd like to add a Suffix after each 5 lines.
Example:
Username1
Username2
Username3
Username4
Username5 SUFFIX HERE!
Username6
Username7
Username8
Username9
Username10 SUFFIX HERE!
Username11
Username12
Username13
Username14
Username15 SUFFIX here!
Username16
... etc.
I've tried to use regex to search for ^(.+)$ then \1 suffixtext! with failed attempt. it change all the lines. while i just need each 5 lines.
I want to also add a random number after the suffix.
Thank you,
regards.
You may use
^.*(?:\R.*){4}
And replace with $& SUFFIX 0.
Details:
^ - start of a line
.* - any 0+ chars other than line break chars
(?:\R.*){4} - exactly 4 occurrences of a line break (any style, \R) followed with any 0+ chars other than line break chars (.*).
The replacement contains a backreference to the whole match ($&) and then a number.
See the screenshot with settings:
To later increment the numbers after SUFFIX, use a Python Script
cnt = 0
def incrementnum(match):
global cnt
cnt = cnt + 1
return "{0}{1}".format(match.group(1), str(int(match.group(2))+cnt))
editor.rereplace(r'(SUFFIX )(\d+)$', incrementnum)
Just follow these instructions to use it in your NPP.

Parsing out particular text in a big text column in a Dataframe - R

Suppose I have the following data,
data
text
abc/1234&
qwertyabc/5555&
a&sdfghabc/ppp&plksa&
z&xabc/lkjh&poiuw&
lkjqwefasrjabc/855698&plkjdhweb
For example if I want to parse out the text between abc/ and first occurrence of & alone, how do I parse out those text between these texts. I want the text between first occurence of abc/ and first occurrence of & after abc/ has occurred.
My output should be as follows,
data
text parsed_out
abc/1234& 1234
qwertyabc/5555& 5555
a&sdfghabc/ppp&plksa& ppp
z&xabc/lkjh&poiuw& lkjh
lkjqwefasrjabc/855698&plkjdhweb 855698
The following is my trying,
data1 = within(data, FOO<-data.frame(do.call('rbind', strsplit(as.character(text), 'abc/', fixed=TRUE))))
data2 = within(data1, FOO1<-data.frame(do.call('rbind', strsplit(as.character(FOO$X1), '&', fixed=TRUE))))
This is using too much of memory since the text file is of 8 million rows and also data2 would be having several columns because it has several '&'. Can anybody help me in parsing text between these two characters as only one column in a best efficient way so that it doesn't occupy too much of memory?
x = "thesearepresentinthestartingwhichisnotneededhttp://google.com/needstobeparsedout&reoccurencenotneeded&"
here, the function should check for http://google.com/ and parse out until first & is found. Here the output should be needstobeparsedout.
new_x = "\"http://www.google.com/search?q=erykah+badu+with+hiatus+kaiyote,+august+3&""
Why is it not working with this link?
Thanks
I actually wanted to parse out few parts of the URL and for example, I want to parse out, the text between "http:www.google.com/" and first occurrence of "&".
Use
sub(".*?https?://(?:www\\.)?google\\.com/([^&]+).*", "\\1", x)
See the regex demo.
The pattern matches:
(optionally add a ^ in front to match the start of string position)
.*? - 0+ chars as few as possible from the start till the first
https?:// - either https:// or http:// followed with
(?:www\\.)? - 1 or 0 (optional) sequence www.
google\\.com/ - literal text google.com
([^&]+) - 1 or more chars other than & (Capture group 1)
.* - any 0+ chars (up to the end of string).
In the replacment pattern, \1 refers to the subtext captured into Group 1.

Regular expression to get only the first word from each line

I have a text file
#sp_id int,
#sp_name varchar(120),
#sp_gender varchar(10),
#sp_date_of_birth varchar(10),
#sp_address varchar(120),
#sp_is_active int,
#sp_role int
Here, I want to get only the first word from each line. How can I do this? The spaces between the words may be space or tab etc.
Here is what I suggest:
Find what: ^([^ \t]+).*
Replace with: $1
Explanation: ^ matches the start of line, ([^ \t]+) matches 1 or more (due to +) characters other than space and tab (due to [^ \t]), and then any number of characters up to the end of the line with .*.
See settings:
In case you might have leading whitespace, you might want to use
^\s*([^ \t]+).*
I did something similar with this:
with open('handles.txt', 'r') as handles:
handlelist = [line.rstrip('\n') for line in handles]
newlist = [str(re.findall("\w+", line)[0]) for line in handlelist]
This gets a list containing all the lines in the document,
then it changes each line to a string and uses regex to extract the first word (ignoring white spaces)
My file (handles.txt) contained info like this:
JoIyke - personal twitter link;
newMan - another twitter handle;
yourlink - yet another one.
The code will return this list:
[JoIyke, newMan, yourlink]
Find What: ^(\S+).*$
Replace by : \1
You can simply use this to get the first word.Here we are capturing the first word in a group and replace the while line by the captured group.
Find the first word of each line with /^\w+/gm.