Can the pub / sub message ID be held as a number? - google-cloud-platform

Pub/sub guarantes that messageId is always unique number. Therefore, i use this id as deviceId and i hold this value on bigquery table. Google documents say this value string. But, messageId return 15-digit number according to my experiments. Should I keep this value as number on bigquery? Does it cause any trouble?
Pubsub Message Format

The issue is the max length of an Integer (10) and not the fact it contains only numeric values.
This is why you should keep the value as String and not as an Integer as defined in the documentation

Pub/sub guarantes that messageId is always unique per topic - not that it is a number (ref)
The data type as stated in the docs is a String, so it can contain any unicode character.
So, as others have said, although it is a 15 digit number now, if at some point in
the future, google generates a non-numeric string, or a number greater than what your low level code can store, then your app will fail.

Google Support Says :
"MessageId consist of the maximum possible digits are 19. As long as an ID hasn't been used before (since they are unique), there can be up to 19 digits, but realistically that amount of digits may not be reached."

Related

powerquery: extra digits added to number when importing table

Glad to ask a question here again after more than 10 years (last one was about BASH scripting, now as I'm in corporate, guess what... it's about excel ;) )
here it's my question/issue:
I am importing data with powerquery for further analysis
I have discovered is that the values imported contains extradigits not present in the original table.
I have googled for this problem but I have not been able to find an explanation nor a solution ( a similar issue is this one this one , more than one year old, but with no feedback from Microsoft )
(columns are formatted as text in the screenshot but the issue is still present even if formatted as number)
The workaround I am using now, but I am not happy with that is the following:
I "increased decimal" to make sure all my digits are captured (in my source the entries do not have all the same significant digits),
saved as csv
imported impacted columns as number
convert columns as text (for future text match
I am really annoyed by this unwanted and unpredictable behaviour of excel.
I see a serious issue of data integrity, if we cannot rely on the powerquery/powerbi platform to maintain accurate queries, I wonder why would be use it
adding another screenshot to clarify that changing the source format to text does not solve the problem
another screenshot added following #David Bacci comments:
I think I wrongfully assumed my data was stored as text in the source, can you confirm?
If you are exporting and importing as text, then this will not happen. If you convert to number, you will lose precision. From the docs (my bold):
Represents a 64-bit (eight-byte) floating-point number. It's the most
common number type, and corresponds to numbers as you usually think of
them. Although designed to handle numbers with fractional values, it
also handles whole numbers. The Decimal Number type can handle
negative values from –1.79E +308 through –2.23E –308, 0, and positive
values from 2.23E –308 through 1.79E + 308. For example, numbers like
34, 34.01, and 34.000367063 are valid decimal numbers. The largest
precision that can be represented in a Decimal Number type is 15
digits long. The decimal separator can occur anywhere in the number.
The Decimal Number type corresponds to how Excel stores its numbers.
Note that a binary floating-point number can't represent all numbers
within its supported range with 100% accuracy. Thus, minor differences
in precision might occur when representing certain decimal numbers.
BTW, you should probably accept some of the good answers from your previous questions from 10 years ago.

Meaning of 3F7.1 in Fortran data format

I am trying to create an MDM file using HLM 7 Student version, but since I don't have access to SPSS I am trying to import my data using ASCII input. As part of this process I am required to input the data format Fortran style. Try as I might I have not been able to understand this step. Could someone familiar with Fortran (or even better HLM itself) explain to me how this works? Here is my current understanding
From the example EG3.DAT they give
(A4,1X,3F7.1)
I think
A4 signifies that the ID is 4 characters long.
1X means skip a space.
F.1 means that it should read 1 decimal places.
I am very confused about what 3F7 might mean.
EG3.DAT
2020 380.0 40.3 12.5
2040 502.0 83.1 18.6
2180 777.0 96.6 44.4
Below are examples from the help documents.
Rules for format statement
Format statement example
EG1 data format
EG2 data format
EG3 data format
One similar question is Explaining Fortran Write Format. Unfortunately it does not explicitly treat the F descriptor.
3F7.1 means 3 floating point numbers, each printed over 7 characters, each with one decimal number behind the decimal point. Leading characters are blanks.
For reading you don't need the .1 info at all, just read a floating point number from those 7 characters.
You guessed the meaning of A4 (string of four characters) and 1X (one blank) correctly.
In Fortran, so-called data edit descriptors (which format the input or output of data) may have repeat specifications.
In the format (A4,1X,3F7.1) the data edit descriptors are A4 and F7.1. Only F7.1 has a repeat specification (the number before the F). This simply means that the format is as though the descriptor appeared repeated: like F7.1, F7.1, F7.1. With a repeat specification of 1, or not given, there is just the single appearance.
The format of the question, then, is like
(A4,1X,F7.1,F7.1,F7.1)
This format is one that is covered by the rules provided in one of the images of the question. In particular, the aspect of repeat specification is given in rule 2 with the corresponding example of rule 3.
Further, in Fortran proper, a repeat count specifier may also be * as special case: that's like an exceptionally large repeat count. *(F7.1) would be like F7.1, F7.1, F7.1, .... I see no indication that this is supported by HLM but if this is needed a very large repeat count may be given instead.
In 1X the 1 isn't a repeat specification but an integral, and necessary, part of the position edit descriptor.
Procedure for making MDM file from excel for HLM:
-Make sure ALL the characters in ALL the columns line up
Select a column, then right click and select Format Cells
Then click on 'Custom' and go to the 'Type' box and enter the number
of 0s you need to line everything up
-Remove all the tabs from the document and replace them with spaces.
Open the document in word and use find and replace
-To save the document as .dat
First save it as .txt
Then open it in Notepad and save it as .dat
To enter the data format (FORTRAN-Style)
The program wants to read the data file space by space, so you have to specify it perfectly so that it reads the whole set properly.
If something is off, even by a single space, then your descriptive stats will be wonky compared to if you check them in another program.
Enclose the code with brackets ()
Divide the entries with commas ,
-Need ID column for all levels
ID column needs to be sorted so that it is in order from smallest to
largest
Use A# with # being the number of characters in the ID
Use an X1 to
move from the ID to the next column
-Need to say how many characters are needed in each column
Use F
After F is the number of characters needed for that column -Use F# (#= number)
There need to be enough character spaces to provide one 'gap' space
between each column
There need to be enough to character spaces to allow for the decimal
As part of the F you need to specify the number of decimal places
You do this by adding a decimal point after the F number and then a
number to represent the spaces you need -F#.#
You can use a number in front of the F so as to 'repeat' it. Not
necessary though. -#F#.#
All in all, it should look something like this:
(A4,X1,F4.0,F5.1)
Helpful links:
https://books.google.de/books?id=VdmVtz6Wtc0C&pg=PA78&lpg=PA78&dq=data+format+fortran+style+hlm&source=bl&ots=kURJ6USN5e&sig=fdtsmTGSKFxn04wkxvRc2Vw1l5Q&hl=en&sa=X&ved=0ahUKEwi_yPurjYrYAhWIJuwKHa0uCuAQ6AEIPzAC#v=onepage&q&f=false
http://www.ssicentral.com/hlm/help6/error/Problems_creating_MDM_files.pdf
http://www.ssicentral.com/hlm/help7/faq/FAQ_Format_specifications_for_ASCII_data.pdf

Amazon Lex "slots" for alphanumeric values

I have simple ask, how do I create a Amazon Lex slot for Alphanumeric values.
So far I have tried -
AMAZON.Number: only takes decimal numbers
AMAZON.PostalAdress: takes
everything except numbers
Custom Slot with no values: Only numbers
Is there any way to create a slot which takes alphanumeric values?
Thanks
You can use a custom slot type.
Remember you don't need to enumerate all possible values, just provide enough training data so patterns match. Try giving it around 20-30 values and see if that's enough to train the slot type.
There is no particular datatype to take alphanumberic values.
AMAZON.NUMBER: accepts only numbers
AMAZON.US_FIRST_NAME: accepts only letters
As part of creating a chatbot in Amazon Lex, I have used AMAZON.Movie to accept both letters and numbers(alphanumberic values) it worked for me. As movie name can have alphanumberic values (Ex: The Incredibles2). I hope it works for you too.

Amazon DynamoDB: Storing integers as Numbers vs. Strings

I have an application using DynamoDB and I need to be able to store some numbers. At the moment these numbers will only ever be positive integers or occasionally positive numbers with 1 decimal place.
Since DynamoDB has only one Number data type that I assume is somewhat equivalent to a float, is it safe to store integers as Numbers without having to worry about precision causing the returning value to be incorrect (i.e. 1.999999999999 instead of 2)? Or should I save them as strings and parse integers from the strings when I need them. I know that DynamoDB already stores Numbers as strings at some point in the background, but I'm unsure if there is possibility for accuracy loss before that conversion.
As I said, I will only ever be using positive numbers with up to 1 decimal place.
An attribute of type Number. For example:
"N": "123.45"
Numbers are sent across the network to DynamoDB as strings, to
maximize compatibility across languages and libraries. However,
DynamoDB treats them as number type attributes for mathematical
operations.
Type: String Required: No
From the documentation, if you store a number as 1.999999 you will get it as 1.999999.
Also further documentation:
Number
A Number can have up to 38 digits of precision, and can be positive,
negative, or zero.
Positive range: 1E-130 to 9.9999999999999999999999999999999999999E+125
Negative range: -9.9999999999999999999999999999999999999E+125 to
-1E-130 DynamoDB uses JSON strings to represent Number data in requests and replies. For more information, see DynamoDB Low-Level
API.
If number precision is important, you should pass numbers to DynamoDB
using strings that you convert from a number type.
Another advantage of storing numbers as DynamoDB Number data type.
In the UpdateExpression, you can use + or - to add / subtract the values.
Example:-
Add:-
UpdateExpression : "SET total_val = total_val + :value",
ExpressionAttributeValues: {
':value': 2
},
Subtract:-
UpdateExpression : "SET total_val = total_val - :value",
ExpressionAttributeValues: {
':value': 2
},
The above is not possible if you store the number as String data type in DynamoDB.

Convert alphanumeric string to 16 digit GCID

I'm building our inventory feed for Amazon Seller Central in OpenOffice Calc but can't work out how to convert our inhouse product IDs to the Amazon required format GCID.
The standard-product-id must have a specific number of characters according to type: GCID (16 alphanumeric characters), UPC (12 digit number), EAN (13 digit number) or GTIN(14 digit number).
Our product IDs vary by manufacturer, eg:-
123456
AB123456
1234AB
Where the ID is numerical only I can format the cells with leading zeros, however this doesn't work if the cell contains letters.
My file has over 10,000 products so I'm wondering if there is a formula I can apply to all cells to instantly convert them to GCID?
It seems the question was asked when under a misapprehension but having noticed that the example 123456 AB123456 1234AB represents three different IDs and aware that padding to a specified length is quite a common requirement (eg see String.PadLeft Method) a suggestion for OpenOffice might be of use to someone, one day.
Convention is to pad with 0s but since some spreadsheets automatically strip these off the front of numbers (as first example) and databases tend to prefer that fields are of consistent format I suggest separating the padding from the example with a hyphen, to aid identification of alpha numeric codes and to force text format:
=REPT(0;15-LEN(A1))&"-"&A1