How to extract information in systeminfo after semicolon with regex? - regex

I am running systeminfo command in commandline to get the system information.
I need only a few of them, not every thing. How can I exctract the information with regex?
Edit: I am using LabView, and it uses Perl Compatible Regular Expressions (http://www.pcre.org/).
I need only following information
NameOfTheHost (Hostname:),
Microsoft Windows 8.1 Pro (Betriebssystemname:),
07.12.2015, 07:54:09 (Systemstartzeit:),
1 and [01]: Intel64 Family 6 Model 60 Stepping 3 GenuineIntel ~2501 MHz (Prozessor(en):),
username (Registrierter Benutzer:),
8.105 MB (Gesamter physischer Speicher:),
3.315 MB (Verfügbarer physischer Speicher:),
8.105 MB (Virtueller Arbeitsspeicher: Maximale Größe:),
2.485 MB (Virtueller Arbeitsspeicher: Verfügbar:),
5.620 MB (Virtueller Arbeitsspeicher: Zurzeit verwendet:)
My command systeminfo gives following text:
Hostname: NameOfTheHost
Betriebssystemname: Microsoft Windows 8.1 Pro
Betriebssystemversion: 6.3.9600 Nicht zutreffend Build 9600
Betriebssystemhersteller: Microsoft Corporation
Betriebssystemkonfiguration: Mitglied der Domäne/Arbeitsgruppe
Betriebssystem-Buildtyp: Multiprocessor Free
Registrierter Benutzer: username
Registrierte Organisation:
Produkt-ID: 0000-0000-0000
Ursprüngliches Installationsdatum: 01.01.2010, 13:41:25
Systemstartzeit: 07.12.2015, 07:54:09
Systemhersteller: Hewlett-Packard
Systemmodell: HP ProBook 650 G1
Systemtyp: x64-based PC
Prozessor(en): 1 Prozessor(en) installiert.
[01]: Intel64 Family 6 Model 60 Stepping 3 GenuineIntel ~2501 MHz
BIOS-Version: Hewlett-Packard L77 Ver. 01.05, 29.04.2014
Windows-Verzeichnis: C:\Windows
System-Verzeichnis: C:\Windows\system32
Startgerät: \Device\HarddiskVolume1
Systemgebietsschema: de-at;Deutsch (Österreich)
Eingabegebietsschema: de;Deutsch (Deutschland)
Zeitzone: (UTC+01:00) Amsterdam, Berlin, Bern, Rom, Stockholm, Wien
Gesamter physischer Speicher: 8.105 MB
Verfügbarer physischer Speicher: 3.315 MB
Virtueller Arbeitsspeicher: Maximale Größe: 8.105 MB
Virtueller Arbeitsspeicher: Verfügbar: 2.485 MB
Virtueller Arbeitsspeicher: Zurzeit verwendet: 5.620 MB
Auslagerungsdateipfad(e): Nicht zutreffend
Domäne: domainname.local
Anmeldeserver: \\loginserver
Hotfix(es): 148 Hotfix(e) installiert.
[01]: KB2899189_Microsoft-Windows-CameraCodec-Package
[02]: KB000000
[03]: KB000000
..... /* shortened */
[148]: KB000000
Netzwerkkarte(n): 3 Netzwerkadapter installiert.
[01]: Bluetooth-Gerät (PAN)
Verbindungsname: Bluetooth-Netzwerkverbindung 3
Status: Medien getrennt
[02]: Intel(R) Ethernet-Verbindung I217-V
Verbindungsname: Ethernet 2
DHCP aktiviert: Ja
DHCP-Server: 10.0.0.1
IP-Adresse(n)
[01]: 10.0.0.10
[02]: 0000::0000:0000:0000:0000
[03]: Broadcom BCM943228HMB 802.11abgn 2x2 Wi-Fi Adapter
Verbindungsname: WiFi 2
DHCP aktiviert: Ja
DHCP-Server: 10.0.0.10
IP-Adresse(n)
[01]: 10.0.0.11
[02]: 0000::0000:0000:0000:0000
Anforderungen für Hyper-V: Erweiterungen für den VM-Überwachungsmodus: Ja
Virtualisierung in Firmware aktiviert: Nein
Adressübersetzung der zweiten Ebene: Ja
Datenausführungsverhinderung verfügbar: Ja

You really need to give us more detail about the regex flavour you want to use or in what language. Assuming you want to use it in c# here is how you would loop over all matches:
try {
Regex regexObj = new Regex(#"([^:]*):\s*(.*)");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
You could then check, which ones you want. But to be honest, I wouldn't use regex for that task. Since you only need a few known entries of the list, loop through the lines and check if the lines begin with eg. "Hostname:" and if so take the rest of string.
Without further details we won't be able to give you a more precise answer.

Related

Why can only download the first episode video on bilibili with youtube-dl?

I can download the first episode of a series.
yutube-dl https://www.bilibili.com/video/av90163846?p=1
Now I want to download all episodes of the series.
for i in $(seq 1 55)
do
yutube-dl https://www.bilibili.com/video/av90163846?p=$i
done
All other episodes except the first can't be downloaded ,both of them contains same error info such as below:
[BiliBili] 90163846: Downloading webpage
[BiliBili] 90163846: Downloading video info page
[download] 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识-90163846.flv has already been downloaded
Please have a try and check what happens,how to fix then?
#Christos Lytras,strange thing happen with your code:
for i in $(seq 1 55)
do
youtube-dl https://www.bilibili.com/video/av90163846?p=$i -o "%(title)s-%(id)s-$i.%(ext)s"
done
It surely can download video on bilibili,but all of downloaded video have different name and same content,all the content are the same as the first episode,have a try and check ,you will find that fact.
This error occurs because youtube-dl ignores URI parameters after ? for the filename, so the next file it tries to download has the same name with the previous one and it fails because a file already exists with that name. The solution is to use the --output template filesystem option to set a filename which it'll have an index in its name using the variable i.
Filesystem Options
-o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info
OUTPUT TEMPLATE
The -o option allows users to indicate a
template for the output file names.
The basic usage is not to set any template arguments when downloading
a single file, like in youtube-dl -o funny_video.flv "https://some/video". However, it may contain special sequences that
will be replaced when downloading each video. The special sequences
may be formatted according to python string formatting operations. For
example, %(NAME)s or %(NAME)05d. To clarify, that is a percent symbol
followed by a name in parentheses, followed by formatting operations.
Allowed names along with sequence type are:
id (string): Video identifier
title (string): Video title
url (string): Video URL
ext (string): Video filename extension
...
For your case, to use the i in the output filename, you can use something like this:
for i in $(seq 1 55)
do
youtube-dl https://www.bilibili.com/video/av90163846?p=$i -o "%(title)s-%(id)s-$i.%(ext)s"
done
which will use the title the id the i variable for indexing and the ext for the video extension.
You can check the Output Template variables for more options defining the filename.
UPDATE
Apparently, bilibili.com has some Javascript involved to setup the video player and fetch the video files. There is no way so you can extract the whole playlist using youtube-dl. I suggest you use Lux which supports Bilibili playlists out of the box. It has installers for all major operating systems and you can use it like this to download the whole playlist:
lux -p https://www.bilibili.com/video/av90163846
of if you want to download only until 55 video, you can use -end 55 cli option like this:
lux -end 55 -p https://www.bilibili.com/video/av90163846
You can use the -start, -end or -items option to specify the download
range of the list:
-start
Playlist video to start at (default 1)
-end
Playlist video to end at
-items
Playlist video items to download. Separated by commas like: 1,5,6,8-10
For bilibili playlists only:
-eto
File name of each bilibili episode doesn't include the playlist title
If you want to only get information of a playlist without downloading files, then use the -i command line option like this:
lux -i -p https://www.bilibili.com/video/av90163846
will output something like this:
Site: 哔哩哔哩 bilibili.com
Title: 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识 P1 【001】Parts of Plants
Type: video
Streams: # All available quality
[64] -------------------
Quality: 高清 720P
Size: 308.24 MiB (323215935 Bytes)
# download with: lux -f 64 ...
[32] -------------------
Quality: 清晰 480P
Size: 201.57 MiB (211361230 Bytes)
# download with: lux -f 32 ...
[16] -------------------
Quality: 流畅 360P
Size: 124.75 MiB (130809508 Bytes)
# download with: lux -f 16 ...
Site: 哔哩哔哩 bilibili.com
Title: 【合集300集全】地道美音 美国中小学教学 自然科学 社会常识 P2 【002】Life Cycle of a Plant
Type: video
Streams: # All available quality
[64] -------------------
Quality: 高清 720P
Size: 227.75 MiB (238809781 Bytes)
# download with: lux -f 64 ...
[32] -------------------
Quality: 清晰 480P
Size: 148.96 MiB (156191413 Bytes)
# download with: lux -f 32 ...
[16] -------------------
Quality: 流畅 360P
Size: 94.82 MiB (99425641 Bytes)
# download with: lux -f 16 ...

Regex specific match

Want to filter exact FOC2345N1UG from the following input. I had used this regex (S\/N:\s\s\s(\S+)) in Python. I am fetching all the ones not the one which is required.
NODE module 0/RSP0/CPU0 ASR 9001, Route Switch Processor with 8GB memory
MAIN: board type 0x100401
S/N: FOC21456NKN7
Top Assy. Number: 00-0000-00
PID: ASR9001-RP
HwRev (UDI_VID): V01
Chip HwRev: V1.0
New Deviation Number: 0
CLEI:
Board State : IOS XR RUN
PLD: Motherboard: N/A, Processor version: 0x8023 (rev: 3.0), Power: N/A
MONLIB: QNXFFS Monlib Version 3.3
ROMMON: Version 3.4(20160331:102636) [ASR9K ROMMON]
Board FPGA/CPLD/ASIC Hardware Revision:
IntCtrl : V0.0
USB0 : V17.0
ClkCtrl : V0.0
CPUCtrl : V0.0
MLANSwitch : V0.0
EOBCSwitch : V2.0
LIU : V0.0
YDTI : V0.0
PHY : V0.0
CBC (active partition) : v22.114
CBC (inactive partition) : v22.114
NODE fantray 0/FT0/SP ASR-9001 Fan Tray Ver 2
MAIN: board type 0x900409
S/N: FOC21456NTQF
Top Assy. Number: 68-5333-03
PID: ASR-9001-FAN-V2
HwRev (UDI_VID): V03
Chip HwRev: V1.0
New Deviation Number: 0
CLEI: IPUCBLBBAA
Vendor ID: 5
PLD: Motherboard: N/A, Processor version: N/A, Power: N/A
Board FPGA/CPLD/ASIC Hardware Revision:
CBC (active partition) : v24.115
CBC (inactive partition) : v24.115
NODE module 0/0/CPU0 ASR 9001, Modular Line Card
MAIN: board type 0xf10402
S/N: FOC2124NL345L
Top Assy. Number: 00-0000-00
PCA: 73-14312-08
PID: ASR9001-LC
HwRev (UDI_VID): V01
Chip HwRev: V1.0
New Deviation Number: 0
CLEI:
Board State : IOS XR RUN
PLD: Motherboard: N/A, Processor version: 0x8023 (rev: 3.0), Power: N/A
ROMMON: Version 3.4(20160331:133429) [ASR9K ROMMON]
Board FPGA/CPLD/ASIC Hardware Revision:
NP0 : V4.194
NP1 : V4.194
FIA0 : V0.2
FIA1 : V0.2
X-Bar : V1.5
CPUCtrl : V1.18
FabArbiter : V0.0
PortCtrl : V1.18
PHYCtrl : V1.18
ClkCtrl : V1.18
PHY0 : V0.4(HwRev) V8.0(FwRev) V8.0(SwRev)
DBCtrl : V2.10
Power Sequencer0 : V0.0
Power Sequencer1 : V0.0
Power Sequencer2 : V0.0
Modular Linecard Daughter board : V1.0
CBC (active partition) : v23.114
CBC (inactive partition) : v23.114
SPA 0/0/0 : ASR 9000 4-port 10GE Modular Port Adapter
MAIN: board type 0237
68-5885-01 rev B0
dev N/A
S/N FOC2346NHB7
PCA: 73-17858-01 rev N/A
PID: A9K-MPA-4X10GE
VID: V06
CLEI: IPUIBRDRAF
Board State : OK
FPD Software Revision:
SPA 0/0/1 : ASR 9000 4-port 10GE Modular Port Adapter
MAIN: board type 0237
68-5885-01 rev B0
dev N/A
S/N FOC22345NH71
PCA: 73-17858-01 rev N/A
PID: A9K-MPA-4X10GE
VID: V06
CLEI: IPUIBRDRAF
Board State : OK
FPD Software Revision:
NODE power-module 0/PS0/M0/SP ASR-9001 AC Power Supply
MAIN: board type 0xf00404
S/N: ART22784X093
Top Assy. Number: 341-0424-01
PID: A9K-750W-AC
HwRev (UDI_VID): V01
Chip HwRev: V0.0
New Deviation Number: 0
CLEI: IPUPAJAAAA
Board State : PRESENT
PLD: Motherboard: N/A, Processor version: N/A, Power: N/A
Board FPGA/CPLD/ASIC Hardware Revision:
NODE power-module 0/PS0/M1/SP ASR-9001 AC Power Supply
MAIN: board type 0xf00404
S/N: ART21274X095
Top Assy. Number: 341-0424-01
PID: A9K-750W-AC
HwRev (UDI_VID): V01
Chip HwRev: V0.0
New Deviation Number: 0
CLEI: IPUPAJAAAA
Board State : PRESENT
PLD: Motherboard: N/A, Processor version: N/A, Power: N/A
Board FPGA/CPLD/ASIC Hardware Revision:
Rack 0 - ASR-9001 Chassis
RACK NUM: 0
S/N: FOC2345N1UG
PID: ASR-9001
VID: V07
Desc: ASR-9001 Chassis
CLEI: IPMDX00BR
Use the re.S flag to dotall along with the reg
/ASR-9001 Chassis.+S\/N:\s+([A-Z\d]+)/
This finds your target header, skips everything up until the next S/N:, then grabs the next sequence of upper alpha and digit characters.
>>> import re
>>> re.search(r"ASR-9001 Chassis.+S\/N:\s+([A-Z\d]+)", data, re.S).group(1)
'FOC2345N1UG'
I notice some of your serial numbers have missing :s in them and the data is generally loosely formatted, so you may need to make that optional with S\/N:? and make other tweaks depending on your use case.
I think you need the S/N followed by PID, so the regexp should be like this:
m = re.match(r"S/N: (\w+) PID", content)
sn = m.group(0)
Here is the example

Pandas pattern match and csv conversion of dataFrame

I have the below code which i'm using to parse my data from the text file thats contains the Multiple Fields and hundreds of columns names where i'm choosing the required fields at the time of pandas processing via read_csv which works fine, it it only works with encoding='cp1252' .
There are five key fields which i'm looking for as ['Hostname', 'IP Address', 'Aux Site', 'OS Version', 'Network Name'],
In the pattern section which i'm using a variable patt i'm looking for key words/strings as "AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|\?" which i beleive doesn't care about the case sensitiveness.
which is get matched into the column OS Version but i'm using the litral ? mark to match the ? which is working but at the same time it also gets the Windows 10 ??? which i only wants ? if its there in the OS Version field.
Secondly, When its converting the df2.to_csv the columns are not delimited rather coming into one which later i'm delimitting manually, How we can ensure that each field is correctly process as a CSV file.
#!/python/v3.6.1/bin/python3
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################
patt = "AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|\?"
col_names = ['Hostname', 'IP Address', 'Aux Site', 'CPU Model', 'CDN Version', 'OS Version', 'Kernel Version', 'LDAP Profile', 'Network Name']
df1 = pd.read_csv('/home/karn/plura/Test/Python_Panda/host.txt', delimiter = "\t", usecols=col_names, encoding='cp1252', dtype='unicode')
df2 = df1[df1['OS Version'].str.contains(patt, na=False)][['Hostname', 'IP Address', 'Aux Site', 'OS Version', 'Network Name']]
df2['Hostname'] = df2['Hostname'].str.replace("*", "")
df2.to_csv("HostList_from_Surveys.csv", sep='\t', encoding='utf-8', index=False)
Below is the data sample Image for view:
Below is again same data in text format in case to reproduce.
Hostname IP Address Aux Site OS Version Network Name
host01 192.168.1.1 yoko RHEL 5.5 CISCO
host02 192.168.1.2 chelmsford AIX 6.1
host03 192.168.1.3 sanjose RHEL 5.5
host04 192.168.1.4 rosh CentOS 6.8 CISCO
host05 192.168.1.5 noida3 CentOS 5.10 CISCO
host06 192.168.1.6 rosh RHEL 6.5 CISCO
host07 192.168.1.7 noida3 RHEL 6.5 CISCO
host08 192.168.1.8 san jose RHEL 6.5 CISCO
host09 192.168.1.9 noida3 RHEL 5.5
host10 192.168.1.10 sophia RHEL 5.5 AVAYA
host11 192.168.1.11 sanjose RHEL 5.5 AVAYA
host12 192.168.1.12 sanjose RHEL 5.3 AVAYA
host13 192.168.1.13 sanjose RHEL 5.8 AVAYA
host14 192.168.1.14 sanjose Ubuntu 14.04.1
any help will be much appreciated.
I suggest you use
patt = "(?s)AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora|(?<!\?)\?(?!\?)"
This pattern matches
(?s) - the re.DOTALL option inline equivalent that makes . match line break chars
AIX|CentOS|RHEL|SunOS|SuSE|Ubuntu|Fedora - one of the substring alternatives
| - or
(?<!\?)\?(?!\?) - a question mark not enclosed with other question marks.

How to get the specific content from the log file described bellow?

I Have a log file which is generated by nmap, which is something like this:
Nmap scan report for gateway (10.0.0.1)
Host is up (0.0060s latency).
MAC Address: 10:BE:F5:FC:9C:65 (D-Link International)
Nmap scan report for 10.0.0.2
Host is up (0.055s latency).
MAC Address: 7C:78:7E:E8:1C:2A (Samsung Electronics)
Nmap scan report for 10.0.0.3
Host is up (0.059s latency).
MAC Address: 54:60:09:83:6E:B6 (Google)
Nmap scan report for 10.0.0.200
Host is up (-0.093s latency).
MAC Address: 5C:B9:01:02:5F:D8 (Hewlett Packard)
Nmap scan report for manoj-notebook (10.0.0.4)
Host is up.
Nmap done: 256 IP addresses (5 hosts up) scanned in 16.84 seconds
It keeps on changing as the new devices connect to the network or existing device disconnects from the network. I want to fetch the ip address example: 10.0.0.1, mac address example: 10:BE:F5:FC:9C:65 and the device name example: D-Link International in a single list something like:
result = [['10.0.0.1', '10.0.0.2', '10.0.0.3', '10.0.0.200', '10.0.0.4'], ['10:BE:F5:FC:9C:65', '7C:78:7E:E8:1C:2A', '54:60:09:83:6E:B6', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Samsung Electronics', 'Google', 'Hewlett Packard']]
I tried the following regular expression to match IP address, MAC Address and Device name:
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern = re.findall(r'(?:.*?s: ){2}(.*)(?= \))', temp)
devicePattern = re.findall(r'(?:.*?\(){2}(.*)(?=\))', temp)
I'm able to match the IP Address but unable to match mac address and device name. How to match the same and store it in a single list? Thank you.
Also if I could get a pattern to fetch latency from the log file example: 0.0060s it would be a cherry on top. Thank you.
You can use the following expressions:
ipPattern : \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
macPattern : (?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b
(?:[0-9A-F]{2}:)+ Non capturing group for sequence of pairs of alphanumerical values followed by :.
[0-9A-F]+\b Final pair of alphanumerical value, followed by word boundary.
devicePattern : (?<=\()[^)0-9.]*(?=\))
(?<=\() Negative lookbehind for bracket ).
[^)0-9.]* Negated character set, matches anything that is not a ) or . or digits.
(?=\)) Positive lookahead for ).
latency : -?\d+\.\d+s(?=\slatency)
-?\d+\.\d+s Match - optionally, digits, full stop, more digits and s.
(?=\slatency) Positive lookahead, assert that what follows whitespace and latency.
Python snippet:
import re
import itertools
temp = """
b'\nStarting Nmap 7.60 ( https://nmap.org ) at 2018-08-03 19:44 IST\nNmap scan report for gateway (10.0.0.1)\nHost is up (0.0070s latency).\nMAC Address: 10:BE:F5:FC:9C:65 (D-Link International)\nNmap scan report for 10.0.0.3\nHost is up (0.11s latency).\nMAC Address: 54:60:09:83:6E:B6 (Google)\nNmap scan report for 10.0.0.5\nHost is up (0.11s latency).\nMAC Address: 7C:78:7E:A4:73:8C (Samsung Electronics)\nNmap scan report for 10.0.0.200\nHost is up (0.027s latency).\nMAC Address: 5C:B9:01:02:5F:D8
"""
ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern= re.findall(r'(?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b',temp)
devicePattern = re.findall(r'(?<=\()[^)0-9.]*(?=\))',temp)
latency = re.findall(r'-?\d+\.\d+s(?=\slatency)',temp)
print(ipPattern)
print(macPattern)
print(devicePattern)
print(latency)
Prints:
['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200']
['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8']
['D-Link International', 'Google', 'Samsung Electronics']
['0.0070s', '0.11s', '0.11s', '0.027s']
For joining in a single list use:
mylist = itertools.chain([ipPattern], [macPattern], [devicePattern], [latency])
print(list(mylist))
Prints:
[['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200'], ['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Google', 'Samsung Electronics'], ['0.0070s', '0.11s', '0.11s', '0.027s']]

gi._glib.GError: no element "pocketsphinx" error on running livedemo

I am using ubuntu 14.04
I am trying to get a python program to get speech to text from microphone.
For this, I have installed sphinxbase and pocketsphinx. pocketsphinx_continuous works.
thekindlyone#deepthought:.../lib$ pocketsphinx_continuous -inmic yes
INFO: cmd_ln.c(691): Parsing command line:
pocketsphinx_continuous \
-inmic yes
Current configuration:
[NAME] [DEFLT] [VALUE]
-adcdev
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-argfile
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-infile
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-time no no
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(294): 256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 32 bytes (4298 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 60400 bytes (58 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 60400 bytes (58 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242): 5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339): 37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359): 14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379): 36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407): 854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463): 5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): pocketsphinx_continuous COMPILED ON: Dec 22 2013, AT: 20:43:21
Then I ran livedemo.py from pocketsphinx/src/gst-plugin This is the error I get:
thekindlyone#deepthought:~/.../gst-plugin$ python livedemo.py
Using pygtkcompat and Gst from gi
Traceback (most recent call last):
File "livedemo.py", line 102, in <module>
app = DemoApp()
File "livedemo.py", line 31, in __init__
self.init_gst()
File "livedemo.py", line 53, in init_gst
+ '! pocketsphinx configured=true ! fakesink')
gi._glib.GError: no element "pocketsphinx"
thekindlyone#deepthought:~/.../gst-plugin$
I found that I have export a new path as per cmusphinx wiki. But /usr/local/lib/gstreamer-1.0 is not present. What should I do next?
output of gst-inspect-1.0 pocketsphinx
No such element or plugin 'pocketsphinx'
output of gst-inspect pocketsphinx
Factory Details:
Long name: PocketSphinx
Class: Filter/Audio
Description: Convert speech to text
Author(s): David Huggins-Daines <dhuggins#cs.cmu.edu>
Rank: none (0)
Plugin Details:
Name: pocketsphinx
Description: PocketSphinx plugin
Filename: /usr/lib/gstreamer-0.10/libgstpocketsphinx.so
Version: 0.8
License: BSD
Source module: pocketsphinx
Binary package: PocketSphinx
Origin URL: http://cmusphinx.sourceforge.net/
GObject
+----GstObject
+----GstElement
+----GstPocketSphinx
Pad Templates:
SINK template: 'sink'
Availability: Always
Capabilities:
audio/x-raw-int
width: 16
depth: 16
signed: true
endianness: 1234
channels: 1
rate: 8000
SRC template: 'src'
Availability: Always
Capabilities:
text/plain
Element Flags:
no flags set
Element Implementation:
Has change_state() function: gst_element_change_state_func
Has custom save_thyself() function: gst_element_save_thyself
Has custom restore_thyself() function: gst_element_restore_thyself
Element has no clocking capabilities.
Element has no indexing capabilities.
Element has no URI handling capabilities.
Pads:
SRC: 'src'
Implementation:
Has custom eventfunc(): gst_pad_event_default
Has custom queryfunc(): gst_pad_query_default
Has custom iterintlinkfunc(): gst_pad_iterate_internal_links_default
Has getcapsfunc(): gst_pad_get_fixed_caps_func
Has acceptcapsfunc(): gst_pad_acceptcaps_default
Pad Template: 'src'
SINK: 'sink'
Implementation:
Has chainfunc(): 0x7f4e0c00c4f0
Has custom eventfunc(): 0x7f4e0c00c1b0
Has custom queryfunc(): gst_pad_query_default
Has custom iterintlinkfunc(): gst_pad_iterate_internal_links_default
Has getcapsfunc(): gst_pad_get_fixed_caps_func
Has acceptcapsfunc(): gst_pad_acceptcaps_default
Pad Template: 'sink'
Element Properties:
name : The name of the object
flags: readable, writable
String. Default: "pocketsphinx0"
hmm : Directory containing acoustic model parameters
flags: readable, writable
String. Default: null
lm : Language model file
flags: readable, writable
String. Default: null
lmctl : Language model control file (for class LMs)
flags: readable, writable
String. Default: null
lmname : Language model name (to select LMs from lmctl)
flags: readable, writable
String. Default: "default"
dict : Dictionary File
flags: readable, writable
String. Default: null
mllr : MLLR file
flags: readable, writable
String. Default: null
fsg : Finite state grammar file
flags: readable, writable
String. Default: null
fsg-model : Finite state grammar object (fsg_model_t *)
flags: writable
Pointer. Write only
fwdflat : Enable Flat Lexicon Search
flags: readable, writable
Boolean. Default: false
bestpath : Enable Graph Search
flags: readable, writable
Boolean. Default: false
maxhmmpf : Maximum number of HMMs searched per frame
flags: readable, writable
Integer. Range: 1 - 100000 Default: 2000
maxwpf : Maximum number of words searched per frame
flags: readable, writable
Integer. Range: 1 - 100000 Default: 20
beam : Beam width applied to every frame in Viterbi search
flags: readable, writable
Float. Range: -1 - 1 Default: 0
wbeam : Beam width applied to phone transitions
flags: readable, writable
Float. Range: -1 - 1 Default: 0
pbeam : Beam width applied to phone transitions
flags: readable, writable
Float. Range: -1 - 1 Default: 0
dsratio : Evaluate acoustic model every N frames
flags: readable, writable
Integer. Range: 1 - 10 Default: 1
latdir : Output Directory for Lattices
flags: readable, writable
String. Default: null
lattice : Word lattice object for most recent result
flags: readable
Boxed pointer of type "PSLattice"
nbest : N-best results
flags: readable
Array of GValues of type "gchararray"
nbest-size : Number of hypothesis in the N-best list
flags: readable, writable
Integer. Range: 1 - 1000 Default: 10
decoder : The underlying decoder
flags: readable
Boxed pointer of type "PSDecoder"
configured : Set this to finalize configuration
flags: readable, writable
Boolean. Default: false
Element Signals:
"partial-result" : void user_function (GstElement* object,
gchararray arg0,
gchararray arg1,
gpointer user_data);
"result" : void user_function (GstElement* object,
gchararray arg0,
gchararray arg1,
gpointer user_data);
UPDATES:
I downloaded fresh copies from github and installed, no change.
sphinxbase build
sphinxbase install
pocketsphinx build
pocketsphinx install
5th attempt on clean install worked. /usr/local/lib/gstreamer1.0 created. Adding this to GST_PLUGIN_PATH worked.