WS02 BAM - Analytics Framework

WS02 BAM - Analytics Framework - wso2

In Cassandra Cluster EVENT_KS Key Space , I have a bookTicket1 (stream) and it has columns
payload_provider,payload_totalNoTickets. When I tried to a new Analytics script as below ,
CREATE EXTERNAL TABLE IF NOT EXISTS BusTicketTable
(provider STRING, totalNoTickets STRING, version STRING)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.host" = "127.0.0.1" ,
"cassandra.port" = "9160" ,
"cassandra.ks.name" = "EVENT_KS" ,
"cassandra.ks.username" = "admin" ,
"cassandra.ks.password" = "admin" ,
"cassandra.cf.name" = "bookTicket1" ,
"cassandra.columns.mapping" = ":payload_provider,payload_totalNoTickets, Version" );
It returns the error:
ERROR: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask "

Consider this line,
"cassandra.columns.mapping" = ":payload_provider,payload_totalNoTickets, Version"
There the key is not set in Cassandra. I am not sure but I think you may have to set the key as well because the row key is mandatory for Cassandra column family.
e.g.:
"cassandra.columns.mapping" = ":key, payload_provider,payload_totalNoTickets, Version"
You may need to set a unique field as the key.

Related

Access Data from Azure Data Lake Store using Polybase with Azure Data Warehouse

I get a error when create external table
https://exoticbaryon.anset.org/2017/06/26/access-data-from-azure-data-lake-store-using-polybase-with-azure-data-warehouse/#comment-157
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'xxxxx'
CREATE DATABASE SCOPED CREDENTIAL ADLUser
WITH IDENTITY = xxxxx#/https://login.microsoftonline.com/xxxxx/oauth2/v2.0/token',
SECRET = xxxxx' ;
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH (TYPE = HADOOP,
CREDENTIAL = ADLUser,
LOCATION = N'adl://xxxxx.azuredatalakestore.net'
)
CREATE EXTERNAL FILE FORMAT TextFileFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR =',',
STRING_DELIMITER = '"',
USE_TYPE_DEFAULT = TRUE)
);
CREATE EXTERNAL TABLE [dbo].[xxxxx_external](
[EventMonth] [nvarchar](10) NULL,
[UserCount] [bigint] NULL,
[UserType] [nchar](8) NULL,
[StageType] [bigint] NULL,
[StageName] [nvarchar](9) NULL)
WITH
(
LOCATION=N'/test/xxxxx.csv',
DATA_SOURCE = AzureDataLakeStore ,
FILE_FORMAT = TextFileFormat
) ;
CREATE TABLE [dbo].[xxxxx]
WITH (DISTRIBUTION = HASH([EventMonth] ) )
AS SELECT * FROM
[dbo].[xxxxx_external] ;
When run CREATE EXTERNAL TABLE
Failed to execute query. Error: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: MalformedURLException: no protocol: /https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/v2.0/token'

You have to modify you external Data Source to the similar format
CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( LOCATION = '<prefix>://<path>[:<port>]'
[, CONNECTION_OPTIONS = '<name_value_pairs>']
[, CREDENTIAL = <credential_name> ]
[, PUSHDOWN = ON | OFF]
[, TYPE = HADOOP | BLOB_STORAGE ]
[, RESOURCE_MANAGER_LOCATION = '<resource_manager>[:<port>]'
)
[;]
You can find more info in the following link :https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-ver15
As you are accessing Azure Data Lake you need to mention your prefix with 'wasbs'
For first time try uploading a single file in your folder container and donot mention any .csv file name and load into external tables.
Later you can mention your specific filename and test your code.

HP ALM - Unable to post the attachment to a defect via python

I get the error you don't have the required permissions to execute this action,
when I try to attach a file to the defect in HP-ALM.
code
bug.post()
attachfact=bug.Attachments;
attachObj = attachfact.AddItem(None)
attachObj.Description = "Failed Records"
attachObj.Filename = "LOGS\\"+test_case.Name+"_LOG.xlsx"
attachObj.Type=1
ExScoraqe - attachObj.AttachmentStorage
ExScoraqe.ClientPath - attachObj.Filename
ExStoraqe.Save(test_case.Name+"_LOG.xlsx",True)
attachObj.Post()
defect_ids = defect_ids + ” "+str(bug.Field("BG_BUG_ID"))
print "Defect ID : "+defect_ids
error

Server Errors While Writing With Python Cassandra Driver

code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}
code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
I am inserting into Cassandra Cassandra 2.0.13(single node for testing) by python cassandra-driver version 2.6
The following are my keyspace and table definitions:
CREATE KEYSPACE test_keyspace WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' };
CREATE TABLE test_table (
key text PRIMARY KEY,
column1 text,
...,
column17 text
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
What I tried:
1) multiprocessing(protocol version set to 1)
each process has its own cluster, session(default_timeout set to 30.0)
def get_cassandra_session():
"""creates cluster and gets the session base on key space"""
# be aware that session cannot be shared between threads/processes
# or it will raise OperationTimedOut Exception
if CLUSTER_HOST2:
cluster = cassandra.cluster.Cluster([CLUSTER_HOST1, CLUSTER_HOST2])
else:
# if only one address is available, we have to use older protocol version
cluster = cassandra.cluster.Cluster([CLUSTER_HOST1], protocol_version=1)
session = cluster.connect(KEY_SPACE)
session.default_timeout = 30.0
return session
2) batch insert (protocol version set to 2 because BatchStatement is enabled on Cassandra 2.X)
def batch_insert(session, batch_queue, batch):
try:
insert_user = session.prepare("INSERT INTO " + db.TABLE + " (" + db.COLUMN1 + "," + db.COLUMN2 + "," + db.COLUMN3 +
"," + db.COLUMN4 + ") VALUES (?,?,?,?)")
while batch_queue.qsize() > 0:
'''batch queue size is 1000'''
row_tuple = batch_queue.get()
batch.add(insert_user, row_tuple)
session.execute(batch)
except Exception as e:
logger.error("batch insert fail.... %s", e)
the above function is invoked by:
batch = BatchStatement(consistency_level=ConsistencyLevel.ONE)
batch_insert(session, batch_queue, batch)
tuples are stored in batch_queue.
3) synchronizing execution
Several days ago I post another question Cassandra update fails , cassandra was complaining about TimeOut issue. I was using synchronize execution for updating.
Can anyone help, is this my code issue or python cassandra-driver issue or Cassandra itself ?
Thanks a million!

If your question is about those errors at the top, those are server-side error responses.
The first says that the coordinator you contacted cannot satisfy the request at CL.ONE, with the nodes it believes are alive. This can happen if all replicas are down (more likely with a low replication factor).
The other two errors are timeouts, where the coordinator didn't get responses from 'live' nodes in a time configured in the cassandra.yaml.
All of these indicate that the cluster you're connected to is not healthy. This could be because it is overwhelmed (high GC pauses), or experiencing network issues. Check the server logs for clues.

I got the following error, which looks very similar:
cassandra.Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level LOCAL_ONE" info={'consistency': 'LOCAL_ONE', 'alive_replicas': 0, 'required_replicas': 1}
When I added a sleep(0.5) in the code, it worked fine. I was trying to write too much too fast...

Windows Script to Parse names of WebSphere JVMs from a command output

I am writing a (batch file or VBScript) to nicely shutdown all the running WebSphere JVMs on a Windows server, but need help with some text handling. I want the script to run and parse the output of the "serverstatus" command to get the names of Application Servers on the box and store the matches (with carriage returns) in a variable for use in the rest of the script.
Sample command output:
C:\WebSphere\AppServer\bin>serverstatus -all
ADMU0116I: Tool information is being logged in file
C:\WebSphere\AppServer\profiles\MySrv01\logs\serverStatus.log
ADMU0128I: Starting tool with the MySrv01 profile
ADMU0503I: Retrieving server status for all servers
ADMU0505I: Servers found in configuration:
ADMU0506I: Server name: MyCluster_MySrv01
ADMU0506I: Server name: MyCluster_MySrv01_1
ADMU0506I: Server name: MyNextCluster_MySrv04
ADMU0506I: Server name: MyNextCluster_MySrv04_1
ADMU0506I: Server name: nodeagent
ADMU0508I: The Application Server "MyCluster_MySrv01" is STARTED
ADMU0508I: The Application Server "MyCluster_MySrv01_1" is STARTED
ADMU0508I: The Application Server "MyNextCluster_MySrv04" is STARTED
ADMU0509I: The Application Server "MyNextCluster_MySrv04_1" cannot be
reached. It appears to be stopped.
ADMU0508I: The Node Agent "nodeagent" is STARTED
*nodeagent should NOT match. The jury is still out on whether I want to target all app servers or just those with a status of "STARTED".

Here's an alternative to using Regex. It simply reads stdout and processes all started app servers - the app servers are stored in an array called AppServers. Tested on W2K3.
Edit: We have added a way to log output to a file by adding a log write function (don't forget to add the const ForAppending at the start of the script that we have just added to this answer). The log write function takes the format of:
Logwrite "some text to write - delete file if exists", "c:\Path\filename.txt", 1
Logwrite "some text to write - append to file, don't delete", "c:\path\filename.txt", 0
It is a crude function, but does what you ask. I hope that helps. :)
option explicit
Const ForAppending = 8
Dim objShell, objWshScriptExec, objStdOut
Dim objCmdString, strLine, appServers(), maxAppServers
Dim x
' File Path / Location to serverstatus.bat ----
objCmdString = "C:\WebSphere\AppServer\bin\serverstatus.bat -all"
Set objShell = CreateObject("WScript.Shell")
Set objWshScriptExec = objShell.Exec(objCmdString)
Set objStdOut = objWshScriptExec.StdOut
MaxAppServers = -1
' While we're looping through the response from the serverstatus command, look for started application servers
' and store them in an ever expanding array AppServers.
' The Variable MaxAppServers should always contain the highest number of AppServers (ie: ubound(AppServers))
While Not objStdOut.AtEndOfStream
strLine = objStdOut.ReadLine
If InStr(LCase(strLine), "admu0508i: the application server """) Then
MaxAppServers = MaxAppServers + 1
ReDim Preserve AppServers(MaxAppServers)
AppServers(MaxAppServers) = wedge(strLine, Chr(34))
End If
Wend
If MaxAppServers => 0 then
For x = 0 To ubound(AppServers) ' You could just use For x = 1 to MaxAppServers in this case.
' Add your instructions here.........
' ... We are simply echoing out the AppServer name below as an example to a log file as requested below.
Logwrite AppServers(x), "c:\Output.log", 0
Next
End If
Function Wedge(wStr, wOpr)
' This clunky function simply grabs a section of a string the is encapsulated by wOpr.
' NOTE: This function expects wOpr to be a single character (eg; for our purpose, it is pulling data between double quotes).
Dim wFlag, wCount, wFinish
wflag = False
wFinish = False
wCount = 1
Wedge = ""
Do Until wCount > Len(wStr) Or wFinish
If Mid(wStr, wCount, 1) = wOpr Then
If wFlag Then
wFinish = True
Else
wFlag = True
End If
Else
If wFlag Then Wedge = Wedge & Mid(wStr, wCount, 1)
End If
wCount = wCount + 1
Loop
End Function
Function logwrite (lstrtxt, lwLogfile, lwflag)
Dim lwObjFSO, lwObjFile, fstr, lwcounter, lwc
fstr = lstrtxt
Set lwObjFSO = CreateObject("Scripting.FileSystemObject")
If lwflag=1 And lwObjFSO.FileExists(lwLogFile) Then lwObjfso.deletefile(lwLogFile)
If lwObjFSO.FileExists(lwLogFile) then
On Error Resume next
Set lwObjFile = lwObjFSO.OpenTextFile(lwLOgFile, ForAppending)
lwCounter = 20000
Do While Err.number = 70 And lwCounter > 0
wscript.echo "ERROR: Retrying output - Permission denied; File may be in use!"
For lwc = 1 To 1000000
Next
Err.clear
Set lwObjFile = lwObjFSO.OpenTextFile(lwLogFile, ForAppending)
lwCounter = lwCounter-1
Loop
If Err.number <> 0 Then
wscript.echo "Error Number: "&Err.number
wscript.quit
End If
On Error goto 0
Else
Set lwObjFile = lwObjFSO.CreateTextFile(lwLogFile)
End If
wscript.echo (fstr)
lwObjFile.Write (fstr) & vbcrlf
lwObjFile.Close
Set lwObjFSO=Nothing
Set lwObjfile=Nothing
End Function

Use a RegExp that cuts quoted names from your input; add context - Server, Started - to fine tune the result set. In code:
Option Explicit
Function q(s) : q = "'" & s & "'" : End Function
Dim sInp : sInp = Join(Array( _
"ADMU0116I: Tool information is being logged in file C:\WebSphere\AppServer\profiles\MySrv01\logs\serverStatus.log" _
, "ADMU0128I: Starting tool with the MySrv01 profile" _
, "ADMU0503I: Retrieving server status for all servers" _
, "ADMU0505I: Servers found in configuration:" _
, "ADMU0506I: Server name: MyCluster_MySrv01" _
, "ADMU0506I: Server name: MyCluster_MySrv01_1" _
, "ADMU0506I: Server name: MyNextCluster_MySrv04" _
, "ADMU0506I: Server name: MyNextCluster_MySrv04_1" _
, "ADMU0506I: Server name: nodeagent" _
, "ADMU0508I: The Application Server ""MyCluster_MySrv01"" is STARTED" _
, "ADMU0508I: The Application Server ""MyCluster_MySrv01_1"" is STARTED" _
, "ADMU0508I: The Application Server ""MyNextCluster_MySrv04"" is STARTED" _
, "ADMU0509I: The Application Server ""MyNextCluster_MySrv04_1"" cannot be reached. It appears to be stopped." _
, "ADMU0508I: The Node Agent ""nodeagent"" is STARTED" _
), vbCrLf)
Dim aRes : aRes = Array( _
Array("all quoted names", """([^""]+)""") _
, Array("all quoted started servers", "Server ""([^""]+)"" is STARTED") _
)
Dim aRE
For Each aRe In aRes
WScript.Echo "----------------", q(aRe(0)), q(aRe(1))
Dim re : Set re = New RegExp
re.Global = True
re.Pattern = aRe(1)
Dim oMTS : Set oMTS = re.Execute(sInp)
ReDim a(oMTS.Count - 1)
Dim i
For i = 0 To UBound(a)
a(i) = q(oMTS(i).SubMatches(0))
Next
WScript.Echo " =>", Join(a)
Next
output:
cscript 20984738.vbs
---------------- 'all quoted names' '"([^"]+)"'
=> 'MyCluster_MySrv01' 'MyCluster_MySrv01_1' 'MyNextCluster_MySrv04' 'MyNextCluster_MySrv04_1' 'nodeagent'
---------------- 'all quoted started servers' 'Server "([^"]+)" is STARTED'
=> 'MyCluster_MySrv01' 'MyCluster_MySrv01_1' 'MyNextCluster_MySrv04'

HiveException (ClassCastException) using "group by" for aggregation in wso2 bam

...
using wso2bam (v. 2.3.0), the following query fails to execute :
insert overwrite table tab_summarized_parse_sessions
select parseId, count(*) from tab_session_info group by parseId;
log output:
TID: [0] [BAM] [2013-09-20 12:38:49,783] FATAL {ExecMapper} - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"mid":"1379599795585::192.168.2.17::9443::1","sessionid":null,"parseid":40}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:211)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.CassandraLazyLong cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyLong
...
TID: [0] [BAM] [2013-09-20 12:38:50,606] ERROR {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl} - Error while executing Hive script.
Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl}
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:355)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:250)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
script execution result:
ERROR: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
seems that there's a problem using the "group by" section of the query; BTW, here's more info about the Hive tables, etc ...
CREATE EXTERNAL TABLE IF NOT EXISTS tab_session_info
(
mid STRING, sessionId STRING, parseId BIGINT
)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.host" = "127.0.0.1" ,
"cassandra.port" = "9160" ,
"cassandra.ks.name" = "EVENT_KS" ,
"cassandra.ks.username" = "admin" ,
"cassandra.ks.password" = "admin" ,
"cassandra.cf.name" = "session_main" ,
"cassandra.columns.mapping" = ":key,payload_sessionId, payload_parseId" );
CREATE EXTERNAL TABLE IF NOT EXISTS tab_summarized_parse_sessions (parseId BIGINT, sessionCount INT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'
TBLPROPERTIES (
'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver' ,
'mapred.jdbc.url' = 'jdbc:mysql://localhost/MYSQL_DB_NAME' ,
'mapred.jdbc.username' = 'MYSQL_USER' ,
'mapred.jdbc.password' = 'MYSQL_PASS' ,
'hive.jdbc.update.on.duplicate' = 'true' ,
'hive.jdbc.primary.key.fields' = 'parseId' ,
'hive.jdbc.table.create.query' = 'CREATE TABLE IF NOT EXISTS summarized_parse_sessions
(parseId BIGINT NOT NULL PRIMARY KEY, sessionCount INT )');
thanks in advance :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

WS02 BAM - Analytics Framework - wso2

Related

Access Data from Azure Data Lake Store using Polybase with Azure Data Warehouse

HP ALM - Unable to post the attachment to a defect via python

Server Errors While Writing With Python Cassandra Driver

Windows Script to Parse names of WebSphere JVMs from a command output

HiveException (ClassCastException) using "group by" for aggregation in wso2 bam

Categories

Resources