Azure Data Lake - Conditions

Azure Data Lake - Conditions - if-statement

I would like to use if-else statement to decide at what location I have to export data.
My case is:
I extract several files from the azure blob storage (it's possible that there are no files!!).
I calculate count of records in file set.
If count of records is > 20 then I export files into specific report location
If in file set are no records, I have to output dummy empty file into different location, because I don't want replace existing report by empty report.
The solution may be IF..ELSE confition. The problem is that if I calculate count of records I got rowset variable and I cannot compare it with scalar variable.
#RECORDS =
SELECT COUNT(id) AS IdsCount
FROM #final;
IF #RECORDS <= 20 THEN //generate dummy empty file
OUTPUT #final_result
TO #EMPTY_OUTPUT_FILE
USING Outputters.Text(delimiter : '\t', quoting : true, encoding : Encoding.UTF8, outputHeader : true, dateTimeFormat : "s", nullEscape : "NULL");
ELSE
OUTPUT #final_result
TO #OUTPUT_FILE
USING Outputters.Text(delimiter : '\t', quoting : true, encoding : Encoding.UTF8, outputHeader : true, dateTimeFormat : "s", nullEscape : "NULL");
END;

U-SQL's IF statement is currently only used during compilation time. So you can do something like
IF FILE.EXIST() THEN
But if you want to output different files depending on the number of records you would have to write it at the SDK/CLI level:
The first job writes a temp file output (and maybe a status file that contains number of rows). Then you check (for example in Powershell) whether the file is empty (or whatever criteria you want to use) and if not, copy the result over otherwise create the empty output file.

Related

PowerBI M: How to code in an exclusion to an IF statement

I'm working to programmatically clean up a field in my dataset by using a Helper column that I will later filter on and remove the 'junk' records. The junk records are ID's, and the valid records are full names (in the format of "Tom Jones"). Almost all (there is a valid name value of "University") junk records do not contain a space. The pseudo code would read
Set Helper_IsName? = True
WHERE ValueField CONTAINS " " unless ValueField = "University"
ELSE False
Here is the M code excerpt that is getting me 95% of the way there:
Helper_IsName? = Text.Contains([OldValue]," ")
All results are good, except when the formula reads "University", it sets the value as FALSE, when I need it to equal TRUE.

I think you can just add that condition with an or:
Helper_IsName? = Text.Contains([OldValue]," ") or [OldValue] = "University"

Azure Data Warehouse PolyBase File format

We have a file that looks like this:
Col1,Col2,Col3,Col4,Col5
"Hello,",I,",am",some,data!
It therefore has the following 'properties':
Comma-separated
Double-quote column delimiter
Commas in some of the columns
Now, I am not sure if it's actually possible to ingest this with PolyBase, but wondered if there was a way?
The error we are seeing at present is "Could not find a delimiter after quote".. which i guess is because after the double quote it is hitting what is an expected delimiter..
Here is our current file format, for completeness:
CREATE EXTERNAL FILE FORMAT Comma
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
)
)

Specify it in hex instead.
STRING_DELIMITER = '0x22'
(Based on the problem that someone described at the end of https://msdn.microsoft.com/en-au/library/dn935026.aspx )

Sorted this out in the end by adding an intermediary step to convert the file from csv to ORC format..
It's a bit clunky (as it leaves a mess of a copy behind), but the PolyBase then does work with the fileformat:
CREATE EXTERNAL FILE FORMAT Orc
WITH (FORMAT_TYPE = ORC)
works for now, until it is addressed by the product team: https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-field-row-terminators-within-strin

How to import files from name automatically in python

Let's assume to make it easier that I have 10 files named "1","2",...,"10".
Today I am in a situation where i want to load in a script those 10 files, one at a time.
I am using that code, which is written ten times in a row with in between the mathemical operations I want to use on the Data contained in those files :
Tk().withdraw()
filename2 = askopenfilename()
with load(filename2) as data:
..."mathematical operations"...
Tk().withdraw()
filename3 = askopenfilename()
with load(filename3) as data:
etc,etc ...
This way opens 10 dialog boxes,one after one, where I need to type the name of the file to load it ( so I type "1", hit enter, then type "2" in the next box, hit enter, blablabla ).
I am looking for a way to have only one box of dialog to open (or maybe you know something even smarter to do), and type one time in a row the right order of numbers so the script load them one at a time on by himself.
In other words, in a short amount of time I will have 300 files, I just want to type once :1,2,3,4,5,...,300 and hit enter, rather than doing what I described earlier.
Or maybe a way to just type "300" and the script knows he has to look for files starting at "1" et incrementing one by one.

The open function just takes a string, and you can create that string any way you want. You can concatenate the static parts of your filename with a changing number in a for loop:
s_pre = 'file'
s_ext = '.txt'
numFiles = int(raw_input("number of files: "))
for i in range(1, numFiles + 1):
filename = s_pre + str(i) + s_ext
with open(filename) as data:
## input stuff
## math stuff
I assume load is your function, and you can just pass this filename in the loop to load as well.

Format mask for number field items: trailing and 'leading' zero

I'm having some trouble with displaying numbers in apex, but only when i fill them in through code. When numbers are fetched through an automated row fetch, they're fine!
Leading Zero
For example, i have a report where a user can click a link, which runs a javascript function. There i get detailed values for that record through an application process. The returned values are in JSON. Several fields are number fields.
My response looks as follows (fe):
{"AVAILABLE_STOCK": "15818", "WEIGHT": ".001", "VOLUME": ".00009", "BASIC_PRICE": ".06", "COST_PRICE": ".01"}
Already the numbers here 'not correct': values less than one do not have a zero before the .
I kind of hoped that the format mask on the items would catch this. If i specify FM999G990D000 for the item weight, i'd expect it to show '0.001' .
But okay, i suppose it only works that way when it comes through session state, and not when you set an item value through $("#").val() ?
Where do i go wrong? Is my only option to change my select in the app process?
Now:
SELECT '"AVAILABLE_STOCK": "' || AVAILABLE_STOCK ||'", '||
'"WEIGHT": "' || WEIGHT ||'", '||
'"VOLUME": "' || VOLUME ||'", '||
'"BASIC_PRICE": "' || BASIC_PRICE ||'", '||
Do i need to provide my numberfields a to_char with the format mask here (to_char(available_stock, 'FM999G990D000')) ?
Right now i need to put my numbers between quotes ofcourse, or i get invalid json when i parse it.
Trailing Zero
I have an application process on a page on the after header point, right after an automated row fetch. Several fields are calculated here (totals). The variables used are all specified as number(10, 2). All values are correct and rounded to 2 values after the comma. My format masks on the items are also specified as FM999G999G990D00.
However, when one of the calculated values has only one meaningfull value after the comma, the trailing zeros get dropped. Instead of '987.50', it is displayed as '987.5'.
So, i have a number variable, and assign it like this: :P12_NDB_TOTAL_INCL := v_totI;
Would i need to convert my numbers here too, with format mask?
What am i doing wrong, or what am i missing?

If you aren't doing math on it and are more concerned with formatting, I suggest treating it as a varchar/string instead of as a number wherever you can.

Delphi - User specified string manipulation

I have a problem in Delphi7. My application creates mpg video files according to a set naming convention i.e.
\000_A_Title_YYYY-MM-DD_HH-mm-ss_Index.mpg
In this filename the following rules are enforced:
The 000 is the video sequence. It is incremented whenever the user presses stop.
The A (or B,C,D) specifies the recording camera - so video files are linked with up to four video streams all played simultaneously.
Title is a variable length string. In my application it cannot contain a _.
The YYYY-MM-DD_HH-mm-ss is the starting time of the video sequence (not the single file)
The Index is the zero based ordering index and is incremented within 1 video sequence. That is, video files are a maximum of 15 minutes long, once this is reached a new video file is started with the same sequence number but next index. Using this, we can calculate the actual start time of the file (Filename decoded time + 15*Index)
Using this method my application can extract the starting time that the video file started recording.
Now we have a further requirement to handle arbitrarily named video files. The only thing i know for certain is there will be a YYYY-MM-DD HH-mm-ss somewhere in the filename.
How can i allow the user to specify the filename convention for the files he is importing? Something like Regular expressions? I understand there must be a pattern to the naming scheme.
So if the user inputs ?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg into a text box, how would i go about getting the start time? Is there a better solution? Or do i just have to handle every single possibility as we come accross them?
(I know this is probably not the best way to handle such a problem, but we cannot change the issue - the new video files are recorded by another company)

I'm not sure if your trying to parse the user input into components '?(Camera)*_YYYY-MM-DD_HH-mm-ss_(Index).mpg` but if your just trying to grab the date and time something like this, the date is in group 1, time in group 2
(\d{4}-\d{2}-\d{2})_(d{2}-\d{2}-\d{2})
Otherwise, not sure what your trying to do.

Possibly you can use the underscores "_" as your positional indicator since you smartly don't allow them in the title.
In your example of a filename convention:
?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg
you can parse this user-specified string to see that the date YYYY-MM-DD is always between the 3rd and 4th underscore and the time HH-mm-ss is between the 4th and 5th.
Then it becomes a simple matter when getting the actual filenames following this convention, to find the 3rd underscore and know the date and time follow it.

If you want phone-calls 24/7, then you should go for the RegEx-thing and let the user freely enter some cryptography in a TEdit.
If you want happy users and a good night sleep, then be creative and drop the boring RegEx-approach. Create your own filename-decoder by using an Angry bird approach.
Here's the idea:
Create some birds with different string manipulation personalities.
Let the user select and arrange these birds.
Execute the user generated string manipulation.
Sample code:
program AngryBirdFilenameDecoder;
{$APPTYPE CONSOLE}
uses
SysUtils;
procedure PerformEatUntilDash(var aStr: String);
begin
if Pos('-', aStr) > 0 then
Delete(aStr, 1, Pos('-', aStr));
WriteLn(':-{ > ' + aStr);
end;
procedure PerformEatUntilUnderscore(var aStr: String);
begin
if Pos('_', aStr) > 0 then
Delete(aStr, 1, Pos('_', aStr));
WriteLn(':-/ > ' + aStr);
end;
function FetchDate(var aStr: String): String;
begin
Result := Copy(aStr, 1, 10);
Delete(aStr, 1, 10);
WriteLn(':-) > ' + aStr);
end;
var
i: Integer;
FileName: String;
TempFileName: String;
SelectedBirds: String;
MyDate: String;
begin
Write('Enter a filename to decode (eg. ''01-ThisIsAText-Img_01-Date_2011-03-08.png''): ');
ReadLn(FileName);
if FileName = '' then
FileName := '01-ThisIsAText-Img_01-Date_2011-03-08.png';
repeat
TempFileName := FileName;
WriteLn('Now, select some birds:');
WriteLn('Bird No.1 :-{ ==> I''ll eat letters until I find a dash (-)');
WriteLn('Bird No.2 :-/ ==> I''ll eat letters until I find a underscore (_)');
WriteLn('Bird No.3 :-) ==> I''ll remember the date before I eat it');
WriteLn;
Write('Chose your birds: (eg. 112123):');
ReadLn(SelectedBirds);
if SelectedBirds = '' then
SelectedBirds := '112123';
for i := 1 to Length(SelectedBirds) do
case SelectedBirds[i] of
'1': PerformEatUntilDash(TempFileName);
'2': PerformEatUntilUnderscore(TempFileName);
'3': MyDate := FetchDate(TempFileName);
end;
WriteLn('Bird No.3 found this date: ' + MyDate);
WriteLn;
WriteLn;
Write('Check filename with some other birds? (Y/N): ');
ReadLn(SelectedBirds);
until (Length(SelectedBirds)=0) or (Uppercase(SelectedBirds[1])<>'Y');
end.
When you'll do this in Delphi with GUI, you'll add more birds and more checking of course. And find some nice bird glyphs.
Use two list boxes. One one the left with all possible birds, and one on the right with all the selected birds. Drag'n'drop birds from left to right. Rearrange (and remove) birds in the list on the right.
The user should be able to test the setup by entering a filename and see the result of the process. Internally you store the script by using enumerators etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js