Python 2.7 Regex not matching desired pattern - regex

I am parsing all the rows of a .m3u file containing my IPTV playlist data. I am looking to isolate and print string sections within the file of the format:
tvg-logo="http//somelinkwithapicture.png"
..within a string that looks like:
#EXTINF:-1 catchup="default" catchup-source="http://someprovider.tv/play/dvr/${start}/2480.m3u8?token=%^%=&duration=3600" catchup-days=5 tvg-name="Sky Sports Action HD" tvg-id="SkySportsAction.uk" tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png" group-title="Sports",Sky Sports Action HD
http://someprovider.tv/play/2480.m3u8?token=465454=
My class looks like this:
import re
class iptv_cleanup():
filepath = 'C:\\Users\\cg371\\Downloads\\vget.m3u'
with open(filepath, "r") as text_file:
a = text_file.read()
b = re.search(r'tvg-logo="(.*?)"', a)
c = b.group()
print c
text_file.close
iptv_cleanup()
All I am getting returned though is a string like this:
tvg-logo=""
I am a bit rusty with regexes, but I cannot see anything obviously wrong with this.
Can anyone assist?
Thanks

Check (?:tvg-logo=\")[\w\W]*(?<=.png)
import re
reg = '(?:tvg-logo=\")[\w\W]*(?<=.png)'
string = '#EXTINF:-1 catchup="default" catchup-source="http://someprovider.tv/play/dvr/${start}/2480.m3u8?token=%^%=&duration=3600" catchup-days=5 tvg-name="Sky Sports Action HD" tvg-id="SkySportsAction.uk" tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png" group-title="Sports",Sky Sports Action HD http://someprovider.tv/play/2480.m3u8?token=465454='
print re.findall(reg,string, re.DOTALL)[0]
$python main.py
tvg-logo="http://someprovider.tv/logos/sky%20sports%20action%20hd.png

This worked in the end:
import re
class iptv_cleanup():
filepath = 'C:\\Users\\cg371\\Downloads\\vget.m3u'
with open(filepath, "r") as text_file:
a = text_file.read()
b = re.findall(r'tvg-logo="(.*?)"', a)
for i in b:
print i
text_file.close
iptv_cleanup()
Thanks you for your input all...

Related

Unable to capture required string from text file using Groovy - Jmeter JSR223

I need to parse a text file testresults.txt and capture serial number and then write the captured serial number onto separate text file called serialno.txt using groovy Jmeter JSR223 post processor.
Below code is not working. It didn't get into the while loop itself. Kindly help.
import java.util.regex.Pattern
import java.util.regex.Matcher
String filecontent = new File("C:/device/resources/testresults.txt").text
def regex = "SerialNumber\" value=\"(.+)\""
java.util.regex.Pattern p = java.util.regex.Pattern.compile(regex)
java.util.regex.Matcher m = p.matcher(filecontent)
File SN = new File("C:/device/resources/serialno.txt")
while(m.find()) {
SN.write m.group(1)
}
If your code doesn't enter the loop it means that there are no matches so you need to amend your regular expression, you can use i.e. Regex101 website for experiments
Given the following content of the testresults.txt file:
SerialNumber" value="foo"
SerialNumber" value="bar"
SerialNumber" value="baz"
your code works fine.
For the time being I can only suggest using match operator to make your code more "groovy"
def source = new File('C:/device/resources/testresults.txt').text
def matches = (source =~ 'SerialNumber" value="(.+?)"')
matches.each { match ->
new File('C:/device/resources/serialno.txt') << match[1] << System.getProperty('line.separator')
}
Demo:
More information: Apache Groovy - Why and How You Should Use It

Regex from Python to Kotlin

I have a question about Regular Expression (Regex) and I really newbie in this. I found a tutorial a Regex written in Python to delete the data and replace it with an empty string.
This is the code from Python:
import re
def extract_identity(data, context):
"""Background Cloud Function to be triggered by Pub/Sub.
Args:
data (dict): The dictionary with data specific to this type of event.
context (google.cloud.functions.Context): The Cloud Functions event
metadata.
"""
import base64
import json
import urllib.parse
import urllib.request
if 'data' in data:
strjson = base64.b64decode(data['data']).decode('utf-8')
text = json.loads(strjson)
text = text['data']['results'][0]['description']
lines = text.split("\n")
res = []
for line in lines:
line = re.sub('gol. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan', '', line, flags=re.IGNORECASE)
line = line.replace(":","").strip()
if line != "":
res.append(line)
p = {
"province": res[0],
"city": res[1],
"id": res[2],
"name": res[3],
"birthdate": res[4],
}
print('Information extracted:{}'.format(p))
In the above function, information extraction is done by removing all e-KTP labels with regular expressions.
This is the sample of e-KTP:
And this is the result after scanning that e-KTP using the python code:
Information extracted:{'province': 'PROVINSI JAWA TIMUR', 'city': 'KABUPATEN BANYUWANGI', 'id': '351024300b730004', 'name': 'TUHAN', 'birthdate': 'BANYUWANGI, 30-06-1973'}
This is the full tutorial from the above code.
And then my question is, can we use Regex in Kotlin to remove the label from the result of e-KTP like in python code? Because I try some logic that I understand it does not remove the label of e-KTP. My code in Kotlin like this:
....
val lines = result.text.split("\n")
val res = mutableListOf<String>()
Log.e("TAG LIST STRING", lines.toString())
for (line in lines) {
Log.e("TAG STRING", line)
line.matches(Regex("gol. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan"))
line.replace(":","")
if (line != "") {
res.add(line)
}
Log.e("TAG RES", res.toString())
}
Log.e("TAG INSERT", res.toString())
tvProvinsi.text = res[0]
tvKota.text = res[1]
tvNIK.text = res[2]
tvNama.text = res[3]
tvTgl.text = res[4]
....
And this is the result of my code:
TAG LIST STRING: [PROVINSI JAWA BARAP, KABUPATEN TASIKMALAYA, NIK 320625XXXXXXXXXX, BRiEAFAUZEROMARA, Nama, TempatTgiLahir, Jenis keiamir, etc]
TAG INSERT: [PROVINSI JAWA BARAP, KABUPATEN TASIKMALAYA, NIK 320625XXXXXXXXXX, BRiEAFAUZEROMARA, Nama, TempatTgiLahir, Jenis keiamir, etc]
The label still exists, It's possible to remove a label using Regex or something in Kotlin like in Python?
The point is to use kotlin.text.replace with a Regex as the search argument. For example:
text = text.replace(Regex("""<REGEX_PATTERN_HERE>"""), "<REPLACEMENT_STRING_HERE>")
You may use
line = line.replace(Regex("""(?i)gol\. darah|nik|kewarganegaraan|nama|status perkawinan|berlaku hingga|alamat|agama|tempat/tgl lahir|jenis kelamin|gol darah|rt/rw|kel|desa|kecamatan"""), "")
Note that (?i) at the start of the pattern is a quick way to make the whole pattern case insensitive.
Also, when you need to match a . with a regex you need to escape it. Since a backslash can be coded in several ways and people often fail to do it correctly, it is always recommended to define regex patterns within raw string literals, in Kotlin, you may use the triple-double-quoted string literals, i.e. """...""" where each \ is treated as a literal backslash that is used to form regex escapes.

Python Dictionary re

As you can see in code using regular expression script searches for words in txt and puts them to dictionary like this:
set(["['card', 'port']"])
set(["['onu_id', 'remote_id']"])
set(["['card', 'port', 'onu_id']"])
set(["['card', 'port', 'onu_id']"])
set(["['remote_id']"])
set(["['remote_id']"])
set(["['card', 'port', 'onu_id']"])
Problem is that i need to input values to them by hand and remove everything except keys(card,port,onu_id,remote_id)(remove: set(["[' to see everything clearly:
dict{card:1, port:5, onu_id:3, remote_id:16568764}
To look like this and be easy to read.
Here is my code:
import re, string
with open("conf.txt","r") as f:
text = f.readlines()
for line in text:
match = re.findall(r'_\$(\w+)',line)
if match:
dict = {str(match)}
print dict
part of input file:
interface gpon-olt_1/_$card/_$port
onu _$onu_id type ZTE-F660 pw _$remote_id vport-mode gemport
no lock
no shutdown
exit
interface gpon-onu_1/_$card/_$port:\_$onu_id
exit
interface gpon-onu_1/_$card/_$port:\_$onu_id
name ONU-_$remote_id 102211+1
description
vport-mode gemport def-map-type 1:1

how to replace a text on XML with lxml?

I'm trying to troncate several element.text on a xml file. I succeed to get two list, the first one regroup the formers too long element.text as str (long_name) and the second regroup the same after a troncation (short_name).
Now i want to replace the element.text on my xml, i tried some script but i surrended to work with the function readlines(), i want to find a similar solution with lxml as this code :
txt = open('IF_Generic.arxml','r')
Lines = txt.readlines()
txt.close()
txt = open('IF_Genericnew.arxml','w')
for e in range(len(long_name)) :
for i in range(len(Lines)) :
if (long_name[e] in Lines[i]) == True :
Lines[i] = Lines[i].replace(long_name[e],short_name[e])
for i in Lines :
txt.write(i)
txt.close()
I tried this, but it doesn't work :
f = open('IF_Generic.arxml')
arxml = f.read()
f.close()
tree = etree.parse(StringIO(arxml))
for e,b in enumerate(long_name) :
context = etree.iterparse(StringIO(arxml))
for a,i in context:
if not i.text:
pass
else:
if (b in i.text) == True :
i.text = short_name[e]
obj_arxml = etree.tostring(tree,pretty_print=True)
f = open('IF_Genericnew.arxml','w')
f.write(obj_arxml)
f.close()
Let's say the first element of the list long_name is RoutineServices_EngMGslLim_NVMID03
<BALISE_A>
<BALISE_B>
<SHORT-NAME>RoutineServices_EngMGslLim_NVMID03</SHORT-NAME>
</BALISE_B>
</BALISE_A>
<BALISE_C>
<POSSIBLE-ERROR-REF DEST="APPLICATION-ERROR">/Interfaces/RoutineServices_EngMGslLim_NVMID03/E_NOT_OK</POSSIBLE-ERROR-REF>
<SHORT-NAME>Blah_Bleh_Bluh</SHORT-NAME>
</BALISE_C>
The first element of the list short_name is RoutineServices_EngMGslLim_NV
<BALISE_A>
<BALISE_B>
<SHORT-NAME>RoutineServices_EngMGslLim_NV</SHORT-NAME>
</BALISE_B>
</BALISE_A>
<BALISE_C>
<POSSIBLE-ERROR-REF DEST="APPLICATION-ERROR">/Interfaces/RoutineServices_EngMGslLim_NV/E_NOT_OK</POSSIBLE-ERROR-REF>
<SHORT-NAME>Blah_Bleh_Bluh</SHORT-NAME>
</BALISE_C>
I want this
P.S: I use python 2.7.9
Thanks in advance everyone !
Don't open XML files like text files. I have explained in this answer why this is a bad idea.
Simply let etree read and write the file. It's also less code to write.
from lxml import etree
# read the file and load it into a DOM tree
tree = etree.parse('IF_Generic.arxml')
for elem in tree.iterfind("//*"):
# find elements that contain only text
if len(elem) == 0 and elem.text and elem.text.strip() > '':
# do your replacements ...
elem.text = "new text"
# serialize the DOM tree and write it to file
tree.write('IF_Genericnew.arxml', pretty_print=True)
Instead of going over all elements, which is what "//*" does, you can use more specific XPath to narrow down the elements you want to work on.
For example, something like "//SHORT-NAME | //POSSIBLE-ERROR-REF" would help to reduce the overall work load.

Using lxml in Python, I need to replace "RNA" with <mark>RNA</mark> in input xml file. Code below

My input XML file is:
<?xml version='1.0' encoding='UTF-8'?>
<try>
something somethingRNA and RNA in RNA.
</try>
My Python Code:
import lxml.etree as ET
import openpyxl
import re
url = 'output_15012015_test.xml'
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try)
string = abstract[0].text
if(abstract):
anotherString = re.sub(r'\b'+lncrna.lower()+'\\b', '<mark>'+lncrna+'</mark>', string.lower())
abstract[0].text = anotherString
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)
Output
I get the following replaced text instead of <mark>RNA</mark>
<mark>RNA</mark>
I think it has to do with tree.write() method. Also I'm new to Python and the community. Please help me with this.
You are setting an XML mark in element .text, so when writing to XML it is interpreted as text, not markup, and characters are escaped with &...;.
What you want to do is:
divide .text into three parts: before new tag, in new tag,
after new tag
add new tag and set texts and tails
See code:
tree = ET.parse(url)
lncrna = "RNA"
abstract = tree.xpath('//try')
aList = re.split(r'(\b'+lncrna+r'\b)', abstract[0].text, flags=re.IGNORECASE)
abstract[0].text = aList[0]
for i in range(1,len(aList),2):
anElement = ET.SubElement(abstract[0], 'mark')
anElement.text = aList[i]
anElement.tail = aList[i+1]
abstract[0].insert( (i-1)/2, anElement )
print abstract[0].text
tree.write('FalseRoller.xml', encoding='UTF-8', pretty_print=True)