\x0D\x0A character is appended in hbase table data - mapreduce

I am creating row key in HBase using combination of three columns.
It is getting inserted successfully and i can see my data in the log also .
The data in the log is correct without any junk character .
But when i scan my table i can see \x0D\x0A is getting appended in the row key
123|\x0D\x0A4295856150|404 column=cf:SegmentSequence, timestamp=1476249090712, value=2
123|\x0D\x0A4295856150|404 column=cf:SegmentSequence.segmentId, timestamp=1476249090712, value=15
123|\x0D\x0A4295856150|405 column=cf:FFAction, timestamp=1476249090712, value=I
also.
This is how i form my row key.
String strKey = strFileName + "|" + strOrgId + "|" + strLineItemId;
Put put = new Put(Bytes.toBytes(strKey));
Also this character is getting appended before strOrgId and after strFileName .

It is clear that in strOrgId there is new line character(\r\n whose toBytes() value is \x0D\x0A) in the start as per data sample. So you need to trim all the string individually that contains new line and then concatenate or need to replace all new line characters with blank. Below is trim code for only strOrgId as per data sample.
String strKey = strFileName + "|" + strOrgId.trim() + "|" + strLineItemId;

Related

Merge multiple lines into one line using Informatica

I have a .txt file that contains multiple lines separated by ~.
The input below is just an example - the actual file will have many lines which will vary every time.
abcdefgh~
asdfghjkliuy~
qwertyuiopasdfgh~
..........
Every line ends with ~, and I would like to merge all the lines into one.
Desired output:
abcdefgh~asdfghjkliuy~qwertyuiopasdfgh~..................................
How can I merge all the lines into one line using Informatica and write the result to a .txt file?
This is a concat multiple row to a column problem. Now, if you have a key on which you want to concat, it will make life easy else your concat string will be very long.
so, here are steps.
Sort the data based on key. if you dont have one ignore the step.
Create an expression transformation and create below ports.
in_key
in_data
v_data = IIF( prev_key <> in_key,in_data, v_data || in_data)
prev_data = in_data
prev_key = in_key
out_key = in_key
out_data = v_data
if you do not have key,
in_data
v_data =v_data || in_data
prev_data = in_data
out_data = v_data
Link out_key and out_data to next agg. Pls note, out_data column and v_data column should be data type string so that they can handle large concatenated string.
Attach an aggregator after this expression. Group by key if you have key. Create one output port like below.
out_data = MAX(data)
Link this field to target.

Regex searching rows in CSV for characters getting hung up on first match

I am newer to scripting, so my code may be a bit mangled, I apologize in advance.
I am trying to iterate through a CSV and write it to an excel workbook using openpyxl. But before I write it, I am performing a few checks to determine which sheet to write the row to.
The row has content such as:"KB4462941", "kb/9191919", "kb -919", "sdfklKB91919".
I am trying to pull the first numbers following "KB" then stop reading in characters once a non-numeric character is found. One I find it, then I run a separate function that queries a DB. That function works.
The problem I am running into, is once it finds the first KB: KB4462941, it gets hung up and goes over that KB multiple times until the last time it appears in that row, then the program finishes.
Unfortunately, there is not default location for where the KB characters will be in the row, and there is no default character count between the KB and the first numbers.
My code:
with open('test.csv') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
if str(row).find("SSL") != -1:
ws = book.get_sheet_by_name('SSL')
ws.append(row)
else:
mylist = list(row)
string = ''.join(mylist)
tmplist = list()
resultlist = list()
pattern = 'KB.*[0-9]*'
for i in mylist:
tmplist += re.findall(pattern, i, re.IGNORECASE)
for i in tmplist:
resultlist += re.findall('[0-9]*', i)
for i in resultlist:
if len(i) > 4:
print i
if dbFunction(i) == 1:
ws = book.get_sheet_by_name('Found')
ws.append(row)
else:
ws = book.get_sheet_by_name('Nothing')
ws.append(row)
output:
1st row is skipped
2nd row is in the right place
3rd and 4th row in the right place
5th row is written for the next nine 9 rows.
never gets to the following 3 rows.

Ruby - separating excel data contained in one column into individual columns [duplicate]

This question already has answers here:
Split string by multiple delimiters
(6 answers)
Closed 8 months ago.
I'm trying to use Ruby to manipulate some excel data, but the .csv files I'm given have all of the data in one column.
The data has the headers and values separated by commas, but they are contained within the first column. Also, some of the values within the first column have text surrounded by quotes with commas inside the quotes.
Is there a way to separate the data within the first column into separate columns with Ruby?
I know you can do this in an excel, but I'd like to be able to do this in Ruby so I don't have to correct every .csv file manually.
I've included an example of the .csv file below.
The desired output would be:
{:header 1 => integer,
:header 2 => text,
:header 3 => "this text, has a comma within the quote"
:header 4 => integer}
I appreciate the help.
Here's one crude way to do it:
require 'csv'
result = []
csv = CSV.read('./file.csv')
headers = csv.shift
csv.each do |l|
hash = {}
hash[headers[0]] = l[0]
hash[headers[1]] = l[1]
hash[headers[2]] = l[2]
hash[headers[3]] = l[3]
result << hash
end
p result
[{"header 1"=>"integer",
"header 2"=>"text",
"header 3"=>"this text, has a comma within the quote",
"header 4"=>"integer"},
{"header 1"=>"integer",
"header 2"=>"text",
"header 3"=>"this text, has a comma within the quote",
"header 4"=>"integer"}]
This of course assumes that every row has 4 values.
Edit: Here is an example of actually writing the result to a file:
CSV.open('./output.csv', 'wb') do |csv|
result.each do |hash|
temp = []
hash.each do |key, value|
temp << "#{key} => #{value}"
end
csv << temp
end
end

Having issue with inserting a entry into a Table using VB.net/Django/PostgreSQL

I am trying to insert a entry into a table but the server says this
column "Date" is of type date but expression is of type integer at character 131
This is the SQL statement I can also show the VB.net but it is a horrid mess.
INSERT INTO "Inventory_chemicalrecord"("Barcode","Action","Name_id","Building","Qty","Date") VALUES ('IEN0001','ADD',1,'Marcus',1,2013-07-10);
Here is the String that I am passing
mySQLString = "INSERT INTO "&Chr(34)&"Inventory_chemicalrecord"&Chr(34)&"("&Chr(34)&"Barcode"&Chr(34)& ","&Chr(34)&"Action"&Chr(34)& ","&Chr(34)&"Name_id"&Chr(34)& ","&Chr(34)&"Building"&Chr(34)& "," &Chr(34)&"Qty"&Chr(34)& ","&Chr(34)&"Date"&Chr(34)& ") VALUES ("& code & "," &Chr(39)& Action &Chr(39) & "," & Name_id & "," & Building & ","& OriginalQty & "," & CurDate & ");"
Sorry this is the only way I have found to do this if this is the wrong way to do this please inform me.
I have tried
Chr(39)&CurDate&Chr(39)
"'"&CurDate&"'"
and even set
CurDate = Chr(39)&CurDate&CurDate(39)
I keep getting EOF expected and Type & does not match String
Is there a better way to do this?
The error message is very clear.
The date needs to be wrapped in quotes : '2013-07-10'
INSERT INTO "Inventory_chemicalrecord"("Barcode","Action","Name_id","Building","Qty","Date") VALUES ('IEN0001','ADD',1,'Marcus',1,2013-07-10);
should be
INSERT INTO "Inventory_chemicalrecord"("Barcode","Action","Name_id","Building","Qty","Date") VALUES ('IEN0001','ADD',1,'Marcus',1,'2013-07-10');

Sanitize MS Access query using Regex

I want to sanitize (escape special characters) in a MS Access query, having these fields:
('''2', 'Last Motion', '', 'DooMotion Plugin', 1, '-', True, #12/30/2012 07:55:00#, #12/30/2012 07:55:00#, #01/1/2001 00:00:00#)
The special characters and how they are escaped are listed here: http://support.microsoft.com/kb/826763/en-us.
In short the special characters are: ? # " ' # % ~ ; [ ] { } ( ) and the can be escaped by putting them into brackets [].
My question is if it is possible to sanitize a whole query using regex in one time. If yes, please show an example.
If it is not possible, then I could break it down to field level and sanitize each field seperately. How to do this with regex.
Regards,
Joost.
To follow Anton's advice, how do I use query parameters in .NET? This is the code I am currently using:
objConn = CreateObject("ADODB.Connection")
objCatalog = CreateObject("ADOX.Catalog")
objConn.Open("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" & strDatabase)
objCatalog.activeConnection = objConn
sql = "INSERT INTO Devices (Code, Name) VALUES ("'" & fCode & "','" & fName & "')"
objConn.execute(sql)
For the ones who want to know the solution to use parameters in .NET, here is the code:
Dim queryString As String = "INSERT INTO Devices (name, version) VALUES (#Name, #Version)"
' Open database connection.
Using connection As New OleDb.OleDbConnection(connectionString)
Dim command As New OleDb.OleDbCommand(queryString)
' Strongly typed.
command.Parameters.Add("#Name", OleDb.OleDbType.VarChar, 255).Value = Me.TextBox1.Text
command.Parameters.Add("#Version", OleDb.OleDbType.Integer).Value = Me.TextBox2.Text
command.Connection = connection 'Set the Connection to the new OleDbConnection
' Open the connection and execute the insert command.
Try
connection.Open()
command.ExecuteNonQuery()
Catch ex As Exception
Console.WriteLine(ex.Message)
End Try
End Using 'The connection is automatically closed when the code exits the Using block
Again, thanks Anton for pointing me to the correct solution.