How do I check where Lot number of GS1 ends when scanning with Expo barcode scanner? - expo

Since Lot number in GS1 standard starts with 10 and has length UP TO 20, and the end is determined with FNC1 symbol, which is invisible, I have no idea how to check wether it really ended or not
This case scenario would work if it was actually seperated by whitespace but not FNC1, any ideas?
function getCode(code, pos, len){
let str = ''
for(pos; pos < len + pos; pos++){
str+=code[pos]
if(str[pos+1] === ' '){
break;
}
}
}

The transmission protocol for all GS1-supported barcode symbologies is that FNC1 non-data characters serving as AI separators in the barcode message be transferred a Group Separator data characters (ASCII 29).
The leading FNC1 character in first position must also be indicated, e.g. via the modifier value of an AIM Symbology Identifier prefix or a similar proxy.
Any scanner that does not do this, or cannot be configured to do this, is seriously defective since it does not comply with the generic symbology standards.

Related

Parsing log files containing NMEA sentences C++

I have multiple log files of NMEA sentences that contain geographical positions captured by a camera.
Example of one of the sentences: $GPRMC,100101.000,A,3723.1741,N,00559.5624,W,0.000,0.00,150914,,A*63
My question is, how do you reckon I can start on that? Just need someone to push me to the right direction, thanks.
I use this checksum function in my GPSReverse driver.
string chk(const char* data)
{
// Assuming data contains a NMEA sentence (check it)
// Variables for keeping track of data index and checksum
const char *datapointer = &data[1];
char checksum = 0;
// Loop through entire string, XORing each character to the next
while (*datapointer != '\0')
{
checksum ^= *datapointer;
datapointer++;
}
// Print out the checksum in ASCII hex nybbles
char x[100] = {0};
sprintf_s(x,100,"%02X",checksum);
return x;
}
And after that, some append to the NMEA string (say, GGA) :
string re = chk(gga.c_str());
gga += "*";
gga += re;
gga += "\r\n";
So you can read up to the *, calculate the checksum, and see if it matches the string after the *.
Read more here.
Each sentence begins with a '$' and ends with a carriage return/line
feed sequence and can be no longer than 80 characters of visible text
(plus the line terminators). The data is contained within this single
line with data items separated by commas. The data itself is just
ascii text and may extend over multiple sentences in certain
specialized instances but is normally fully contained in one variable
length sentence. The data may vary in the amount of precision
contained in the message. For example time might be indicated to
decimal parts of a second or location may be show with 3 or even 4
digits after the decimal point. Programs that read the data should
only use the commas to determine the field boundaries and not depend
on column positions. There is a provision for a checksum at the end of
each sentence which may or may not be checked by the unit that reads
the data. The checksum field consists of a '' and two hex digits
representing an 8 bit exclusive OR of all characters between, but not
including, the '$' and ''. A checksum is required on some sentences.

ICU combining Thai vowel signs and when to ignore

I'm processing Thai keyboard input. Some of the keys are vowel signs and only allowed when combined with certain preceding characters.
Here 0x0E33 is the vowel sign
For example 0x0E1C + 0x0E33 is valid
but 0x0E44 + 0x0E33 is not valid and the 0x0E33 should be ignored.
I'm looking to find a way to know when I should ignore the vowel sign, or when it does not combine with the previous character.
Any ideas please?
Many Thai vowels (and Tone Marks, by the way) belong to the Non-Spacing Combining Marks category. Your goal is to use some library that would tell which category each character belongs to. Then you may decide whether to "ignore" it, whatever the "ignoring" means in your application context.
Check Unicode General Category Values
Your two points of interest are:
Lo | Other_Letter for normal character;
Mn | Nonspacing_Mark for zero-width non-spacing marks;
Further reading:
Unicode data for Thai script (scroll down till the first occurrence of "THAI CHARACTER")
I know his thread is from a few years ago but this is what I have come up with using the icu lib I suspect it can be improved ...
UChar32 newChar;
UChar32 previousChar;
int32_t gcb = u_getIntPropertyValue(newChar, UCHAR_GRAPHEME_CLUSTER_BREAK);
if (gcb != U_GCB_OTHER)
{
int32_t insc = u_getIntPropertyValue(newChar, UCHAR_INDIC_SYLLABIC_CATEGORY);
if (insc == U_INSC_VOWEL_DEPENDENT || insc == U_INSC_TONE_MARK)
{
if (u_getIntPropertyValue(prevChar, UCHAR_INDIC_SYLLABIC_CATEGORY) != U_INSC_CONSONANT)
{
// invalid combination, ignore
}
}
}

How can I parse a char array with octal values in Python?

EDIT: I should note that I want a general case for any hex array, not just the google one I provided.
EDIT BACKGROUND: Background is networking: I'm parsing a DNS packet and trying to get its QNAME. I'm taking in the whole packet as a string, and every character represents a byte. Apparently this problem looks like a Pascal string problem, and using the struct module seems like the way to go.
I have a char array in Python 2.7 which includes octal values. For example, let's say I have an array
DNS = "\03www\06google\03com\0"
I want to get:
www.google.com
What's an efficient way to do this? My first thought would be iterating through the DNS char array and adding chars to my new array answer. Every time i see a '\' char, I would ignore the '\' and two chars after it. Is there a way to get the resulting www.google.com without using a new array?
my disgusting implementation (my answer is an array of chars, which is not what i want, i want just the string www.google.com:
DNS = "\\03www\\06google\\03com\\0"
answer = []
i = 0
while i < len(DNS):
if DNS[i] == '\\' and DNS[i+1] != 0:
i += 3
elif DNS[i] == '\\' and DNS[i+1] == 0:
break
else:
answer.append(DNS[i])
i += 1
Now that you've explained your real problem, none of the answers you've gotten so far will work. Why? Because they're all ways to remove sequences like \03 from a string. But you don't have sequences like \03, you have single control characters.
You could, of course, do something similar, just replacing any control character with a dot.
But what you're really trying to do is not replace control characters with dots, but parse DNS packets.
DNS is defined by RFC 1035. The QNAME in a DNS packet is:
a domain name represented as a sequence of labels, where each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root. Note that this field may be an odd number of octets; no padding is used.
So, let's parse that. If you understand how "labels consisting of "a length octet followed by that number of octets" relates to "Pascal strings", there's a quicker way. Also, you could write this more cleanly and less verbosely as a generator. But let's do it the dead-simple way:
def parse_qname(packet):
components = []
offset = 0
while True:
length, = struct.unpack_from('B', packet, offset)
offset += 1
if not length:
break
component = struct.unpack_from('{}s'.format(length), packet, offset)
offset += length
components.append(component)
return components, offset
import re
DNS = "\\03www\\06google\\03com\\0"
m = re.sub("\\\\([0-9,a-f]){2}", "", DNS)
print(m)
Maybe something like this?
#!/usr/bin/python3
import re
def convert(adorned_hostname):
result1 = re.sub(r'^\\03', '', adorned_hostname )
result2 = re.sub(r'\\0[36]', '.', result1)
result3 = re.sub(r'\\0$', '', result2)
return result3
def main():
adorned_hostname = r"\03www\06google\03com\0"
expected_result = 'www.google.com'
actual_result = convert(adorned_hostname)
print(actual_result, expected_result)
assert actual_result == expected_result
main()
For the question as originally asked, replacing the backslash-hex sequences in strings like "\\03www\\06google\\03com\\0" with dots…
If you want to do this with a regular expression:
\\ matches a backslash.
[0-9A-Fa-f] matches any hex digit.
[0-9A-Fa-f]+ matches one or more hex digits.
\\[0-9A-Fa-f]+ matches a backslash followed by one or more hex digits.
You want to find each such sequence, and replace it with a dot, right? If you look through the re docs, you'll find a function called sub which is used for replacing a pattern with a replacement string:
re.sub(r'\\[0-9A-Fa-f]+', '.', DNS)
I suspect these may actually be octal, not hex, in which case you want [0-7] rather than [0-9A-Fa-f], but nothing else would change.
A different way to do this is to recognize that these are valid Python escape sequences. And, if we unescape them back to where they came from (e.g., with DNS.decode('string_escape')), this turns into a sequence of length-prefixed (aka "Pascal") strings, a standard format that you can parse in any number of ways, including the stdlib struct module. This has the advantage of validating the data as you read it, and not being thrown off by any false positives that could show up if one of the string components, say, had a backslash in the middle of it.
Of course that's presuming more about the data. It seems likely that the real meaning of this is "a sequence of length-prefixed strings, concatenated, then backslash-escaped", in which case you should parse it as such. But it could be just a coincidence that it looks like that, in which case it would be a very bad idea to parse it as such.

REGEX - Insert space after every 4 characters, and a line break after every 40 characters

I have a huge string (22000+ characters) of encoded text. The code is consisted of digits [0-9] and lower case letters [a-z]. I need a regular expression to insert a space after every 4 characters, and one to insert a line break [\n] after every fourty characters. Any ideas?
Many people would prefer to do this with a for loop and string concatenation, but I hate those substring calls. I am really against using regexes when they aren't the right tool for the job (parsing HTML), but I think it'd pretty easy to work with in this case.
JSFiddle Example
Let's say you have the string
var str = "aaaabbbbccccddddeeeeffffgggghhhhiiiijjjjkkkkllllmmmmnnnnoooo";
And you want to insert a space after every four characters, and a newline after 40 characters, you could use the following code
str.replace(/.{4}g/, function (value, index){
return value + (index % 40 == 36? '\n' : ' ');
});
Note that this wouldn't work if the newline(40) index wasn't a multiple of the space index(4)
I abstracted this in a project, here's a simple way to do it
/**
* Adds padding and newlines into a string without whitespace
* #param {str} str The str to be modified (any whitespace will be stripped)
* #param {int} spaceEvery number of characters before inserting a space
* #param {int} wrapeEvery number of spaces before using a newline instead
* return {string} The replaced string
*/
function addPadding(str, spaceEvery, wrapEvery) {
var regex = new RegExp(".{"+spaceEvery+"}", "g");
// Add space every {spaceEvery} chars, newline after {wrapEvery} spaces
return str.replace(/[\n\s]/g, '').replace(regex, function(value, index) {
// The index is the group that just finished
var newlineIndex = spaceEvery * (wrapEvery - 1);
return value + ((index % (spaceEvery * wrapEvery) === newlineIndex) ? '\n' : ' ');
});
}
Well, a regexp in itself doesn't insert a space, so I'll assume you have some command in whatever language you're using that inserts based on finding a regexp.
So, finding 4 characters and finding 40 characters: that's not pretty in general regular expressions (unless your particular implementation has nice ways to express numbers). For finding 4 characters, use
....
Because typical regexp finders use maximal munch, then from the end of one regexp, search forward and maximally munch again, that'll chunk your string into 4 character pieces. The ugly part is that in standard regular expressions, you'll have to use
........................................
to find chuncks of 40 characters, although I'll note that if you run your 4 character one first, you'll have to run
..................................................
or
.... .... .... .... .... .... .... .... .... ....
to account for the spaces you've already put in.
The period finds any characters, but given that you're only using [0-9|a-z], you could use that regexp in place of each period if you need to ensure nothing else slipped in, I was just avoiding making it even more gross.
As you may be noting, regexp have some limitations. Take a look at the Chomsky hierarchy to really get into their theoretical limitations.

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.
Doesn't require a %:
/^[^%]*(%20[^%]*)*$/
Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".
Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$
Reject the string if it matches %[^2][^0]
I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/
Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);
/^([^%]|%20)*$/
This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.
I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;