Remove one of some possible substrings from the middle of a string - regex

i have some for loop with some strings from which i would like to remove any of _win64|_win32|_x64|_x86 and .dll at the end and all that lowercase
from re import match, sub, compile, escape, IGNORECASE, MULTILINE
filename = filename_without_extension.lower()
print("filename {}".format(filename))
pattern = compile("^(?:.*)(_win64|_win32|_x64|_x86)\.dll$", IGNORECASE|MULTILINE)
for dll in p.memory_maps():
file = dll.path.lower().rsplit('\\', 1)[1]
if not file.endswith(".dll") and not file.endswith(".DLL"): continue # TODO: Remove
print("file: {}".format(file))
print("pattern: {}".format(pattern.sub("", file)))
if pattern.sub("", file).endswith(filename): return AddonStatus.LOADED
Why does this feel so hard in regex?
EDIT: Some example strings that would be inside the file variable:
test.dll
test_win32.DLL
test_WIN64.dLL
test_X86.dll
test_x64.DLL

lst= ['test.dll',
'test_win32.DLL',
'test_WIN64.dLL',
'test_X86.dll',
'test_x64.DLL']
import re
to_remove = ('_win64', '_win32', '_x64', '_x86')
r = re.compile(r'({})\.dll$'.format('|'.join(map(re.escape, to_remove))), flags=re.I)
for l in lst:
print(r.sub(r'', l).lower().rstrip('.dll'))
This will strip all the suffixes + .dll:
test
test
test
test
test

For now i am gonna use
file = dll.path.lower().rsplit('\\', 1)[1].replace("_win64","").replace("_win32","").replace("_x86","").replace("_x64","").replace(".DLL","").replace(".dll","")
If you have a better solution feel free to share it :)

Related

Unable to capture required string from text file using Groovy - Jmeter JSR223

I need to parse a text file testresults.txt and capture serial number and then write the captured serial number onto separate text file called serialno.txt using groovy Jmeter JSR223 post processor.
Below code is not working. It didn't get into the while loop itself. Kindly help.
import java.util.regex.Pattern
import java.util.regex.Matcher
String filecontent = new File("C:/device/resources/testresults.txt").text
def regex = "SerialNumber\" value=\"(.+)\""
java.util.regex.Pattern p = java.util.regex.Pattern.compile(regex)
java.util.regex.Matcher m = p.matcher(filecontent)
File SN = new File("C:/device/resources/serialno.txt")
while(m.find()) {
SN.write m.group(1)
}
If your code doesn't enter the loop it means that there are no matches so you need to amend your regular expression, you can use i.e. Regex101 website for experiments
Given the following content of the testresults.txt file:
SerialNumber" value="foo"
SerialNumber" value="bar"
SerialNumber" value="baz"
your code works fine.
For the time being I can only suggest using match operator to make your code more "groovy"
def source = new File('C:/device/resources/testresults.txt').text
def matches = (source =~ 'SerialNumber" value="(.+?)"')
matches.each { match ->
new File('C:/device/resources/serialno.txt') << match[1] << System.getProperty('line.separator')
}
Demo:
More information: Apache Groovy - Why and How You Should Use It

Regex from Python to Kotlin

I have a question about Regular Expression (Regex) and I really newbie in this. I found a tutorial a Regex written in Python to delete the data and replace it with an empty string.
This is the code from Python:
import re
def extract_identity(data, context):
"""Background Cloud Function to be triggered by Pub/Sub.
Args:
data (dict): The dictionary with data specific to this type of event.
context (google.cloud.functions.Context): The Cloud Functions event
metadata.
"""
import base64
import json
import urllib.parse
import urllib.request
if 'data' in data:
strjson = base64.b64decode(data['data']).decode('utf-8')
text = json.loads(strjson)
text = text['data']['results'][0]['description']
lines = text.split("\n")
res = []
for line in lines:
line = re.sub('gol. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan', '', line, flags=re.IGNORECASE)
line = line.replace(":","").strip()
if line != "":
res.append(line)
p = {
"province": res[0],
"city": res[1],
"id": res[2],
"name": res[3],
"birthdate": res[4],
}
print('Information extracted:{}'.format(p))
In the above function, information extraction is done by removing all e-KTP labels with regular expressions.
This is the sample of e-KTP:
And this is the result after scanning that e-KTP using the python code:
Information extracted:{'province': 'PROVINSI JAWA TIMUR', 'city': 'KABUPATEN BANYUWANGI', 'id': '351024300b730004', 'name': 'TUHAN', 'birthdate': 'BANYUWANGI, 30-06-1973'}
This is the full tutorial from the above code.
And then my question is, can we use Regex in Kotlin to remove the label from the result of e-KTP like in python code? Because I try some logic that I understand it does not remove the label of e-KTP. My code in Kotlin like this:
....
val lines = result.text.split("\n")
val res = mutableListOf<String>()
Log.e("TAG LIST STRING", lines.toString())
for (line in lines) {
Log.e("TAG STRING", line)
line.matches(Regex("gol. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan"))
line.replace(":","")
if (line != "") {
res.add(line)
}
Log.e("TAG RES", res.toString())
}
Log.e("TAG INSERT", res.toString())
tvProvinsi.text = res[0]
tvKota.text = res[1]
tvNIK.text = res[2]
tvNama.text = res[3]
tvTgl.text = res[4]
....
And this is the result of my code:
TAG LIST STRING: [PROVINSI JAWA BARAP, KABUPATEN TASIKMALAYA, NIK 320625XXXXXXXXXX, BRiEAFAUZEROMARA, Nama, TempatTgiLahir, Jenis keiamir, etc]
TAG INSERT: [PROVINSI JAWA BARAP, KABUPATEN TASIKMALAYA, NIK 320625XXXXXXXXXX, BRiEAFAUZEROMARA, Nama, TempatTgiLahir, Jenis keiamir, etc]
The label still exists, It's possible to remove a label using Regex or something in Kotlin like in Python?
The point is to use kotlin.text.replace with a Regex as the search argument. For example:
text = text.replace(Regex("""<REGEX_PATTERN_HERE>"""), "<REPLACEMENT_STRING_HERE>")
You may use
line = line.replace(Regex("""(?i)gol\. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan"""), "")
Note that (?i) at the start of the pattern is a quick way to make the whole pattern case insensitive.
Also, when you need to match a . with a regex you need to escape it. Since a backslash can be coded in several ways and people often fail to do it correctly, it is always recommended to define regex patterns within raw string literals, in Kotlin, you may use the triple-double-quoted string literals, i.e. """...""" where each \ is treated as a literal backslash that is used to form regex escapes.

Python Dictionary re

As you can see in code using regular expression script searches for words in txt and puts them to dictionary like this:
set(["['card', 'port']"])
set(["['onu_id', 'remote_id']"])
set(["['card', 'port', 'onu_id']"])
set(["['card', 'port', 'onu_id']"])
set(["['remote_id']"])
set(["['remote_id']"])
set(["['card', 'port', 'onu_id']"])
Problem is that i need to input values to them by hand and remove everything except keys(card,port,onu_id,remote_id)(remove: set(["[' to see everything clearly:
dict{card:1, port:5, onu_id:3, remote_id:16568764}
To look like this and be easy to read.
Here is my code:
import re, string
with open("conf.txt","r") as f:
text = f.readlines()
for line in text:
match = re.findall(r'_\$(\w+)',line)
if match:
dict = {str(match)}
print dict
part of input file:
interface gpon-olt_1/_$card/_$port
onu _$onu_id type ZTE-F660 pw _$remote_id vport-mode gemport
no lock
no shutdown
exit
interface gpon-onu_1/_$card/_$port:\_$onu_id
exit
interface gpon-onu_1/_$card/_$port:\_$onu_id
name ONU-_$remote_id 102211+1
description
vport-mode gemport def-map-type 1:1

How to remove a directory path from a text file

I have a text file with these lines.
1.inputlist
D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\acmod21_I0D0CRC.ec3 -#tD:\Dolby_Harmanious_kit\DRY_run_kits\Dolby\m1_m28_switch.cfg
2.inputlist
D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\acmod_2_252_ddp.ec3 -#tD:\Dolby_Harmanious_kit\DRY_run_kits\Digital\m1_m7_switch_every_3frames.cfg
Here i need to remove the path names like
"D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\"
and "D:\Dolby_Harmanious_kit\DRY_run_kits\Digital\ "
.Note that all lines have a different path names.I have a example code to remove a path name.
Code:
import re
b = open ('Filter_Lines.txt','w')
with open('Lines.txt') as f:
for trim in f:
repl = (re.sub('D:.*\\\\','',trim).rstrip('\n'))
b.write(repl + '\n')
b.close()
But here this removes a whole text from "
D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\acmod21_I0D0CRC.ec3 -#tD:\Dolby_Harmanious_kit\DRY_run_kits\Dolby\"
.I need to remove only path names not including "acmod21_I0D0CRC.ec3" in that line.
Can you please guide me for this.
I did upto what i understand your question,
here you specified path's are not similar i.e what i understood is,
your path might be
a) D://a/b/c/file_name.cfg -#tD://a/b/c/d/e/file_name.cfg
is it correct what i understood?
here 2 path present in single line, but common thing is its contains -#t,
so simply use split method to split that.
here what i did based i understand from your post,
import re
li = []
b = open ('file_sample.txt','w')
with open ('file_sam.txt') as f:
for i in open ('file_sam.txt','r'):
a = [re.sub('.*\\\\','',i).rstrip('\n') for i in i.split('D:')]
b.write(''.join(a) + '\n')
b.close()
here my inputs are,
'D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\acmod21_I0D0CRC.ec3 -#tD:\Dolby_Harmanious_kit\DRY_run_kits\Dolby\m1_m28_switch.cfg'
'D:\Dolby_Harmanious_kit\DRY_run_kits\Dolby_Digital_Plus_Decoder_Imp\Test_Materials\Test_Signals\ITAF_Tests\seamless_switch\acmod_2_252_ddp.ec3 -#tD:\Dolby_Harmanious_kit\DRY_run_kits\Digital\m1_m7_switch_every_3frames.cfg'
it gives me,
'acmod21_I0D0CRC.ec3 -#tm1_m28_switch.cfg'
'acmod_2_252_ddp.ec3 -#tm1_m7_switch_every_3frames.cfg'
is this you want?

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]