Hello all data pipeline experts!
Currently, I'm about to set up data ingestion from a MQTT source. All my MQTT topics contain float values, except a few ones from RFID scanners contain uuids that should be read in as strings. The RFID topics have a "RFID" in their topic name, specifically, they are of format "/+/+/+/+/RFID".
I would like to transfer all topics EXCEPT the RFID topics to float and store them in an influx db measurement "mqtt_data". The RFID topics should be stored as strings in the measurement "mqtt_string".
Yesterday, I fiddled around a lot with Processors and got no results other than headache. Today, I had a first success:
[[outputs.influxdb_v2]]
urls = ["http://localhost:8086"]
organization = "xy"
bucket = "bucket"
token = "ExJWOb5lPdoYPrJnB8cPIUgSonQ9zutjwZ6W3zDRkx1pY0m40Q_TidPrqkKeBTt2D0_jTyHopM6LmMPJLmzAfg=="
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
qos = 0
connection_timeout = "30s"
name_override = "mqtt_data"
## Topics to subscribe to
topics = [
"+",
"+/+",
"+/+/+",
"+/+/+/+",
"+/+/+/+/+/+",
"+/+/+/+/+/+/+",
"+/+/+/+/+/+/+/+",
"+/+/+/+/+/+/+/+/+",
]
data_format = "value"
data_type = "float"
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
qos = 0
connection_timeout = "30s"
name_override = "mqtt_string"
topics = ["+/+/+/+/RFID"]
data_format = "value"
data_type = "string"
as you can see, in the first mqtt_consumer, I left out all topics containing 5 levels of hierarchy. So it would miss those topics. Listing all number of hierarchy levels isn't nice either.
My question would be:
Is there a way to formulate a regex that negates the second mqtt_consumer block, i.e. selecting all topics that are not of the form "+/+/+/+/RFID" ? ... or is there another complete different, more elegant approach I'm not aware of ...
Although I worked before with regex'es, I got stuck at this point. Thanks for any hints to that!!!
I have written a piece of code to parse the action items from a troubleshooting doc.
I want to extract phrases that start with a verb and end with a noun.
It was working as expected earlier (a month ago). But on running against the same input as earlier, its missing some action items that it was catching previously.
I haven't changed the code. Has something changed from nltk or punkt side that may be affecting my results?
Please help me figure what needs to be changed to make it run as earlier.
import re
import nltk
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
#One time downloads
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
#nltk.download('wordnet')
custom_sent_tokenizer = PunktSentenceTokenizer()
def process_content(x):
try:
#sent_tag = []
act_item = []
for i in x:
print('tokenized = ',i)
words = nltk.word_tokenize(i)
print(words)
tagged = nltk.pos_tag(words)
print('tagged = ',tagged)
#sent_tag.append(tagged)
#print('sent= ',sent_tag)
#chunking
chunkGram = r"""ActionItems: {<VB.>+<JJ.|CD|VB.|,|CC|NN.|IN|DT>*<NN|NN.>+}"""
chunkParser = nltk.RegexpParser(chunkGram)
chunked = chunkParser.parse(tagged)
print(chunked)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'ActionItems'):
print('Filtered chunks= ',subtree)
ActionItems = ' '.join([w for w, t in subtree.leaves()])
act_item.append(ActionItems)
chunked.draw()
return act_item
except Exception as e:
#print(str(e))
return str(e)
res = 'replaced rev 6 aeb with a rev 7 aeb. configured new board and regained activity. tuned, flooded and calibrated camera. scanned fi rst patient with no issues. made new backups. replaced aeb board and completed setup. however, det 2 st ill not showing any counts. performed all necessary tests and the y passed . worked with tech support to try and resolve the issue. we decided to order another board due to lower rev received. camera is st ill down.'
tokenized = custom_sent_tokenizer.tokenize(res)
tag = process_content(tokenized)
With the input as shared in the code, earlier, the following action items were being parsed:
['replaced rev 6 aeb', 'configured new board', 'regained activity', 'tuned , flooded and calibrated camera', 'scanned fi rst patient', 'made new backups', 'replaced aeb board', 'completed setup', 'det 2 st ill', 'showing any counts', 'performed all necessary tests and the y', 'worked with tech support']
But now, only these are coming up:
['regained activity', 'tuned , flooded and calibrated camera', 'completed setup', 'det 2 st ill', 'showing any counts']
I finally resolved this by replacing JJ. with JJ|JJR|JJS
So my chunk is defined as :
chunkGram = r"""ActionItems: {<VB.>+<JJ|JJR|JJS|CD|NN.|CC|IN|VB.|,|DT>*<NN|NN.>+}"""
I dont understand this change in behavior.
Dot (.) was a really good way of using all modifiers on a POS
My program sends almost 50 messages, all with different ID's, on a pcan can-bus. And then loops again continuously, starting with a new data for 1st ID.
I have been able to initialize and send the single ID message, but I'm not able to send any other ID on the bus. I am analyzing the bus signal using an oscilloscope, and therefore I can see what messages are on the bus.
This is a part of code, showing how I'm trying to send 2 consecutive messages on the bus, but it only sends the id=100 message and not the next ones. I'm only importing the python-can library, for this.
for i in range(self.n_param):
if self.headers[i] == 'StoreNo': # ID 100 byte size = 3
to_can_msg = []
byte_size = 3
hex_data = '0x{0:0{1}X}'.format(int(self.row_data[i], 10), byte_size * 2)
to_can_msg = [int(hex_data[2:4], 16), int(hex_data[5:6], 16), int(hex_data[7:8], 16)]
bus_send.send(Message(arbitration_id=100, data=to_can_msg))
elif self.headers[i] == 'Date': # ID 101 byte size = 4
to_can_msg = []
byte_size = 4
date_play = int(self.row_data[i].replace("/", ""), 10)
hex_data = '0x{0:0{1}X}'.format(date_play, byte_size * 2)
to_can_msg = message_array(hex_data)
bus_send.send(Message(arbitration_id=101, data=to_can_msg))
And I'm closing each loop with bus_send.reset() to clear any outstanding message in the queue and begin afresh in the next loop.
Much thanks!
Turns out I missed an important detail in CAN communication,the ACK bit, which needs to be set to recessive by the receiver node. And since I'm only trying to read the CAN bus using one node,that node keeps on transmitting the first message forever in hope to receive the ACK bit.
Loopback could've worked but appears like pcan doesn't support loopback functionality for linux. So would have to use a second CAN node to receive messages.
I wrote this program that reads daily gridded climate model data (6 variables) from a file and uses it in further calculations. When running the pgm for a relatively short period (e.g. 5 years) it works fine, but when I want to run it for the required 30 year period I get a "Segmentation fault".
System description: Lenovo Thinkpad with Core i7 vPro with Windows 10 Pro
Program run in Fedora (64-bit) inside Oracle VM VirtualBox
After commenting out everything and checking section-by-section I found that:
everything works fine for 30 years as long as it reads 4 variables only
as soon as the 5th or 6th variable is added, the problem creeps in
alternatively, I can run it with all 6 variables but then it only works for a shorter analysis period (e.g. 22 years)
So the problem might lie with:
the statement: recl=AX*AY*4 which I borrowed from another pgm, yet changing the 4 doesn't fix it
the system I'm running the pgm on
I have tried the "ulimit -s unlimited" command suggested elsewhere, but only get the response "cannot modify limit: Operation not permitted".
File = par_query.h
integer AX,AY,startyr,endyr,AT
character pperiod*9,GCM*4
parameter(AX=162,AY=162) ! dim of GCM array
parameter(startyr=1961,endyr=1990,AT=endyr-startyr+1,
& pperiod="1961_1990")
parameter(GCM='ukmo')
File = query.f
program query
!# A FORTRAN program that reads global climate model (GCM) data to
!# be used in further calculations
!# uses parameter file: par_query.h
!# compile as: gfortran -c -mcmodel=large query.f
!# gfortran query.o
!# then run: ./a.out
! Declarations ***************************************************
implicit none
include 'par_query.h' ! parameter file
integer :: i,j,k,m,n,nn,leapa,leapb,leapc,leapn,rec1,rec2,rec3,
& rec4,rec5,rec6
integer, dimension(12) :: mdays
real :: ydays,nyears
real, dimension(AX,AY,31,12,AT) :: tmax_d,tmin_d,rain_d,rhmax_d,
& rhmin_d,u10_d
character :: ipath*43,fname1*5,fname2*3,nname*14,yyear*4,mmonth*2,
& ext1*4
! Data statements and defining characters ************************
data mdays/31,28,31,30,31,30,31,31,30,31,30,31/ ! Days in month
ydays=365. ! Days in year
nyears=real(AT) ! Analysis period (in years)
ipath="/run/media/stephan/SS_Elements/CCAM_africa/" ! Path to
! input data directory
fname1="ccam_" ! Folder where data is located #1
fname2="_b/" ! Folder where data is located #2
nname="ccam_africa_b." ! Input filename (generic part)
ext1=".dat"
leapa=0
leapb=0
leapc=0
leapn=0
! Read daily data from GCM ***************************************
do n=startyr,endyr ! Start looping through years --------------
write(yyear,'(i4.4)')n
nn=n-startyr+1
! Test for leap years
leapa=mod(n,4)
leapb=mod(n,100)
leapc=mod(n,400)
if (leapa==0) then
if (leapb==0) then
if (leapc==0) then
leapn=1
else
leapn=0
endif
else
leapn=1
endif
else
leapn=0
endif
if (leapn==1) then
mdays(2)=29
ydays=366.
else
mdays(2)=28
ydays=365.
endif
do m=1,12 ! Start looping through months --------------------
write(mmonth,'(i2.2)')m
! Reading daily data from file
print*,"Reading data for ",n,mmonth
open(101,file=ipath//fname1//GCM//fname2//nname//GCM//"."//
& yyear//mmonth//ext1,access='direct',recl=AX*AY*4)
do k=1,mdays(m) ! Start looping through days --------------
rec1=(k-1)*6+1
rec2=(k-1)*6+2
rec3=(k-1)*6+3
rec4=(k-1)*6+4
rec5=(k-1)*6+5
rec6=(k-1)*6+6
read(101,rec=rec1)((tmax_d(i,j,k,m,nn),i=1,AX),j=1,AY)
read(101,rec=rec2)((tmin_d(i,j,k,m,nn),i=1,AX),j=1,AY)
read(101,rec=rec3)((rain_d(i,j,k,m,nn),i=1,AX),j=1,AY)
read(101,rec=rec4)((rhmax_d(i,j,k,m,nn),i=1,AX),j=1,AY)
read(101,rec=rec5)((rhmin_d(i,j,k,m,nn),i=1,AX),j=1,AY)
read(101,rec=rec6)((u10_d(i,j,k,m,nn),i=1,AX),j=1,AY)
enddo ! k-loop (days) ends --------------------------------
close(101)
enddo ! m-loop (months) ends --------------------------------
enddo ! n-loop (years) ends -----------------------------------
end program query
We we successful in extracting the data from twitter but we couldn't save it on our system using flume.Can you please explain
you might have problem in channel or sink may be that's why u r data is not storing in hdfs.
try to understan this one
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://yourIP:8020/user/flume/tweets/%Y/%M/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
and chek with jps if your data node and namenode is working