Loading Data to Pig with a pipe hyphen pipe |-| delimiter

Loading Data to Pig with a pipe hyphen pipe |-| delimiter - regex

When I try to load file.txt to Pig I am getting the following error:
pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[\|-\|]'
A sample line from the file is:
text|-|text|-|text
I am using the following command:
bag = LOAD 'file.txt' USING PigStorage('\\|-\\|') AS (v1:chararray, v2:chararray, v3:chararray);
Is it the delimiter? My regex?

If you don't want to write a custom LOAD function,you could probably load your records using '-' as the delimiter and then add another step to replace all the '|' in your fields.
bag = LOAD 'file.txt' USING PigStorage('-') AS (v1:chararray, v2:chararray, v3:chararray);
bag_new = FOREACH bag GENERATE
REPLACE(v1,'|','') as v1_new,
REPLACE(v2,'|','') as v2_new,
REPLACE(v3,'|','') as v3_new;

Related

Read file by removing the unwanted lines using python pandas

I am reading a file which contains json data and in between it contains other text.So for that i want to check that condition on reading the file if line starts with condition how can i achieve this?
with open ("inputfile.txt") as f:
content = f.read().replace('}U','},')[::-1].replace(',', '', 1)].replace(":[",":").replace("]","")
content = '[{}]'.format(content)
data=json.loads(content)
I want to check the file if the line starts with condition like this
startswith("{"+"\"M\""+":")
I Have tried reading line by line and checking if the line startswith condition but for large files it is tak
inputfile.txt
sometext
{"M":{"1":"data","2":"data2"}}U
asdklaasd
{"M":{"3":"555","5":"3333"}}U
I want to read the lines only that start with {"M":
Output I need is like this
[{"M":{"1":"data","2":"data2"}},{"M":{"3":"555","5":"3333"}}]

Unable to write text inside a file using with open r+ PYTHON

I am using Python 2.7.
I want to create a script to scan the text file for specific keywords like want to test and write or replace string (b3). My script:
#! usr/bin/python
import re
from os.path import abspath,exists
open_file = abspath("zzwrite.txt")
if exists(open_file):
with open(open_file,"r+") as write1:
for line in write1:
matching = re.match(r'.* want to test (..)',line,re.I)
if matching:
print ("Done matching")
write1.write("Success")
print >> write1,"HALELUJAH"
My input text file:
I just want to read 432
I just want to write 213
I just want to test b3 experiment
I just want to sleep for 4 hours
I managed to complete matching as there is a print "done matching" to indicates the codes are able to execute the last 'if' condition but no single string is written or replaced inside the text file. The string "b3" is different in every input text file so I do not prefer using str.replace("b3", "xx") method. Is there anything I missing in the script?

Copying txt file to Redshift

I am trying to copy the text file from S3 to Redshift using the below command but getting the same error.
Error:
Missing newline: Unexpected character 0xffffffe2 found at location 177
copy table from 's3://abc_def/txt_006'
credentials '1234567890'
DELIMITER '|'
NULL AS 'NULL'
NULL AS '' ;
The text file has No header and field delimiter is |.
I tried passing the parameters using: ACCEPTINVCHARS.
Redshift shows same error
1216 error code: invalid input line.
Can anyone provide how to resolve this issue?
Thanks in advance.

Is your file in UTF8 format? if not convert it and try reloading.

I am Assuming path to the text file is correct. Also you generated the text file with some tool and uploaded to redshift manually
I faced the same issue and the issue is with whitespaces .I recommend you to generate the text file by nulling and trimming the whitespaces .
your query should be select RTRIM(LTRIM(NULLIF({columnname}, ''))),.., from {table}. generate the output of this query into text file.
If you are using SQl Server, query out the table using BCP.exe by passing the above query with all the columns and functions
Then use the below copy command after uploading the txt file in S3
copy {table}
from 's3://{path}.txt'
access_key_id '{value}'
secret_access_key '{value}' { you can alternatively use credentials as mentioned above }
delimiter '|' COMPUPDATE ON
removequotes
acceptinvchars
emptyasnull
trimblanks
BLANKSASNULL
FILLRECORD
;
commit;
This solved my problem. Please let us know if you are facing anything else.

PowerShell - Duplicate file using a list txt of names

all!
I am trying to solve the following issue using PowerShell.
Basically, I have setup a file with the needed properties. Let's call it "FileA.xlsx".
I have a text file which contains a list of names, i.e:
FileB.xlsx
DumpA.xlsx
EditC.xlsx
What I am trying is to duplicate "FileA.xlsx" serveral times and use all the names from the text file, so in the end I should end up with 4 files (all of them are copies of "FileA.xlsx":
FileA.xlsx
FileB.xlsx
DumpA.xlsx
EditC.xlsx

Assuming that you have a file called files.txt with the following content:
bob2.txt
bob3.txt
bob4.txt
Then the following will do what you want:
$sourceFile = "FileA.xlsx"
gc .\files.txt | % {
Copy-Item $sourceFile $_
}
This will create a copy of the FileA.xlsx with the names bob2.txt, bob3.txt and bob4.txt

Weka 3-7 CSVLoader do not work with ";" (semicolon) as field separator

I think that i found a bug in weka 3.7,
When I try to load a csv file using weka.core.converters.CSVLoader with separator ";", I get the following error:
Exception in thread "main" java.io.IOException: number expected, read Token[1;2], line 1
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:294)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:656)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:477)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:445)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:430)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:202)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:803)
at de.tuhh.thesis.repower.pcanalysis.BinningWindSpeed.from_CSV_to_ARFF(BinningWindSpeed.java:99)
at de.tuhh.thesis.repower.pcanalysis.Main.main(Main.java:49)
My csv file is:
a;b
1;2
my code is:
CSVLoader loader = new CSVLoader();
File inputFile = new File(csvFileName);
loader.setSource(inputFile);
loader.setFieldSeparator(";");
data = loader.getDataSet();
if I try the same code but changing ";" for "," and using the following file, the program succeeds
a,b
1,2
I really need to work with ";"
Thanks and regards

There is (at least by now) an option to set the field separator:
CSVLoader loader = new CSVLoader();
loader.setFieldSeparator(";");
Just in case someone else stumbles upon this question..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Loading Data to Pig with a pipe hyphen pipe |-| delimiter - regex

Related

Read file by removing the unwanted lines using python pandas

Unable to write text inside a file using with open r+ PYTHON

Copying txt file to Redshift

PowerShell - Duplicate file using a list txt of names

Weka 3-7 CSVLoader do not work with ";" (semicolon) as field separator

Categories

Resources