A Problem in Parsing CNAME with Libpcap: some CNAMEs seems missing TLD

A Problem in Parsing CNAME with Libpcap: some CNAMEs seems missing TLD - c++

I am writing a DNS reply parser with libpcap and find that some CNAMEs' TLDs seem to be missing from the corresponding DNS packet payload. One example is shown in an example packet's wireshark dissection where wireshark shows the actual CNAME is
prd-push-access-net5-175542503.us-east-1.elb.amazonaws.com
but I can only find
prd-push-access-net5-175542503.us-east-1.elb.amazonaws
(i.e. no ".com") in the corresponding part of payload. I wonder how could one (and how did wireshark) parse the full CNAME (with ".com") out of this payload?
(Also this CNAME seems malformed since per RFC1035, a QNAME in question section should "terminates with the zero length octet for the null label of the root" and I guess the same applies for CNAME?)

DNS packets use name compression, see https://www.rfc-editor.org/rfc/rfc1035 section 4.1.4
In many places (where names appear), each label can be represented by a pointer to a former place in the packet where it appears already, instead of the string.
In your example, we can clearly see com in myfoscam.com earlier in the packet.
So with the content (using only the end because it is tedious to extract data from an image, you should have copied things as text) 03656c6209616d617a6f6e617773c019c02e00 we have to analyze it like this:
03: the following is a string of length 3
656c62: this is the string elb, lenght 3 as advertised
09: the following is a string of length 9
616d617a6f6e617773: this is the string amazonaws
c0 : this has the first two bits as 1 (since it is value 192, so more or equal to 128+64), which means it is a part of a two bytes pointer. Hence c019 is a pointer here at offset 25 in decimal (19 in hexadecimal) into the packet.
So if you start from the whole packet, and switch to offset 25, you should find the sequence 03636f6d which is com (with the prefix of a length of 3).
Or maybe something else, because you have another pointer after in fact: c02e, so this is for offset 46 in the message. Or that part is for something else completely, it really depends on what is pointed by previous pointer, if it finishes with a null label or not (if it is 03636f6d00 at offset 25 or not). See example in the RFC (and/or provide all the packet content as text in your question)
Then it ends with 00 the null label, which means the root (the hidden . at the end of any name).

Related

regex - extract strings at specifc positions

I have a huge fixed-width string that looks something like below:
B100000DA3F19C Android 600 AND 2011-08-29 15:03:21.537
352a0D21ffd800000a3a95911801700e iPad 600 iOS 2011-08-29 19:35:12.753
.
.
.
I need to extract the first part (id) and the fourth part (device type - "AND" or "iOS"). The first column starts at 0 and ends at the 51st position for all lines. The fourth part starts at 168 and ends at 171 for all lines. The length of each line is 244 characters. If this is complicated, the other option is to delete everything in this file except id and device type. This single file has around 800K records measuring 180mb but Notepad++ seems to be handling it okay.
I tried doing a SQL Server import data but even though the Preview looks fine, when the data gets inserted into the table, it is not accurate.
I have the following so far which gives me the first 51 characters -
^(.{51}).*
It would be great if I could one regex that will keep id and device type and delete the rest.

Well if you are certain it is always at that position a very simple way is this:
^(.{51}).{117}(.{3})
The parentheses are the captures (the results you are getting out), while the brackets are the counters.
EDIT: Use the following to explicitly discard the rest of the line:
^(.{51}).{117}(.{3}).*$

RegEx to match Bitcoin addresses?

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:
A Bitcoin address, or simply address, is an identifier of 27-34
alphanumeric characters, beginning with the number 1 or 3 [...]
I figured it would look something like this
/^[13][a-zA-Z0-9]{27,34}/
Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.
I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
bitcoin address is
an identifier of 26-35 alphanumeric characters
beginning with the number 1 or 3
random digits
uppercase
lowercase letters
with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.

[^OIl] matches any character that's not O, I or l. The problems in your regex are:
You don't have a $ at the end, so it'd match any string beginning with a BC address.
You didn't count the first character in your {27,34} - that should be {26,33}
However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
Based on the new address type Bech32

Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:
\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
Including testnet address:
\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
Only testnet:
\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b

Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:
\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b
Here are some other links where I found infos:
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
http://r6.ca/blog/20180106T164028Z.html

As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.
These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).
Here are my test cases:
--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line? Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
- Standalone address:
1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg"
src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
(Too narrow and misses BitCoin addresses within a paragraph)
(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
(Still misses text after BTC on same line and triples execution time)
\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
(Too broad and catches URL formats)
The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):
[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
One reference point for execution times (shows cost in steps and time): https://regex101.com/
Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.
Seth

for mainnet bitcoin
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
if you don't want to understand the above regex you can skip the detail below
breaking it down
For regular addresses
/[13]{1}/
address will start with 1 or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
For segwit
/bc1/
starts with bc1
/bc1[a-z0-9]/
can only contain lower case letters and numbers
/bc1[a-z0-9]{39,59}/
can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long

I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.
\w{25,}

For matching legacy, nested SegWit, and native SegWit addresses:
/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
Source: Regex for Bitcoin Addresses.

convert IPv6 to decimal (ip number)

I've been trying to convert all ip addresses (both IPv4 and IPv6) to decimal format (ip number), store those numbers in the database which already contains ip ranges and get country location based on user's IP. Although this can be done easily for IPv4 addresses, I run into a stone wall when it comes to IPv6 ones.
say the fallowing IP should be converted to decimal
2a03:29ff:ffff:ffff:ffff:ffff:ffff:ffff
I tested it through some online services (that convert IPv6 to decimal) simply to check the consistency, namely what my final result should look like.
https://www.ultratools.com/tools/decimalCalc
http://www.ipaddressguide.com/ipv6-to-decimal
both returned the same number - 55844004574745424515003293805316145151
now within my coldfusion code I first removed : from the IP to get hex format and then tried to convert it to decimal with this
<cfset ipv6='2a0329ffffffffffffffffffffffffff'>
<cfoutput>#inputBaseN(ipv6, 16)#</cfoutput>
resulting in error msg
is it possible to achieve this? what do you think about my approach for handling this sort of thing? is there a better way to get country location based on IP? note: do not want to rely on any online service!!

InputBaseN is trying to convert to an Integer, and that value is too big for the maximum Integer value, hence why the error is claiming it is not a valid number.
(The error is actually only thrown for hex values of 8000000000000000 and higher (i.e. 263 or higher, the max for Long) - between 231 and 263-1 the InputBaseN function doesn't tell you it has failed but incorrectly returns zero.)
The solution is to create a BigInteger, which doesn't have a max value, and convert from your base 16 string like so:
BigInt = createObject("java","java.math.BigInteger").init( ipv6 , 16 ).toString()

I think you wont be able to get cf to generate a decimal that large. You need to do it manually as a string.

How to set the ASN1 NumericString type to SubjectDN OID?

I have a working program, which generates a CSR, from specified SubjectDN string (example: 2.5.4.3=Name Surname, 1.2.300.38.22=12345678), using MS Crypto API. I use the function: CertStrToName(), to encode it, and everything is working fine, except one thing: all OID values is created with ASN1 type PrintableString.
Is there any way to make OID 1.2.300.38.22 of type NumericString ?

So, i've found 2 ways to fix that:
1. programmatically, using the function CryptEncodeObject()
2. my cryptoprovider supports some specific oid's, so i could use the CertStrToName with them, without touching the code.

Microsoft's CertStrToName()-method is not RFC 4514 compliant. Instead of treating #-encodings as the AttributeValue-encodings, it treats them as values to be encoded in OctetStrings. This means that not all Distringuished Names can be generated from the CertStrToName-method - in particular yours cannot be generated.
The string representation of the distinguished name is the one from RFC 4514: String Representation of Distinguished Names.
Here you can see that if the attribute-type is in the dotted-decimal form, you are actually supposed to encode the attribute-value as a # followed by a BER encoding in hexadecimal of the ASN.1 AttributeValue. I.e.:
2.5.4.3=Name Surname, 1.2.300.38.22=#12083132333435363738
You can also read in the documentation for CertStrToName() that:
A value that starts with a number sign (#) is treated as ASCII
hexadecimal and converted to a CERT_RDN_OCTET_STRING. Embedded white
space is ignored. For example, 1.2.3 = # AB CD 01 is the same as
1.2.3=#ABCD01.

HTTP Response Status-Line maximum size

Quick question - is there a maximum size for the Status-Line of a HTTP Response?
In the RFC I could not find this information, just something like this:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
According to this i could assume:
HTTP-Version is usually 8 Bytes ( e.g. HTTP/1.1 )
Status-Code is 3 Bytes
2 Spaces + CRLF is 4 Bytes
Reason-Phrase -> The longest according to the RFC is Requested range not satisfiable so 31 Bytes
This would be a sum of 46 Bytes.
Is this assumption correct or did I miss anything?
UPDATE:
Due to the answer below, I just want to specify my problem a bit:
I am parsing some kind of Log file with TCP messages from a server. Now there is some random Data I don't care for and some HTTP Messages which I want to read. Now all data I get I parse for a \r\n to find the Status Line. Since I need to make assumption that my header is split into several TCP packages I just buffer all data and parse it.
If there is no maximum size for the header status-line, I need to buffer all data until the next \r\n occurs. In the worst case this means I save like kilobytes over kilobytes of random data, since it could ( but will most likely will not ) be part of the Header Status Line.
Or would it , in this case, be rather appropriate to parse for the HTTP Version String instead of the CRLF ?

RFC 2616, 6.1.1:
The reason phrases listed here are only recommendations -- they MAY be
replaced by local equivalents without affecting the protocol.
Aside from this, the HTTP protocol is "allowed" to add more status codes (in a new RFC) without changing the HTTP version to 1.2, provided that the new codes don't introduce additional requirements on HTTP clients. Clients are supposed to treat an unknown status code as if it were x00 (where x is the first digit of the code they get, indicating the category of response), except that they shouldn't cache the response.
So the only limit is the max length of an HTTP header line or of the response headers in total. As far as I can see, the RFC doesn't define any limit, although specific servers impose their own.
What you can be sure of is that the user-agent may ignore the Reason Phrase entirely. So if it's big, you can read it in small pieces and throw them away one at a time until you reach CRLF. If you want to display a human-readable message, mostly you can use the recommended Reason Phrase for the status code that the server provides, regardless of what Reason Phrase the server sends.

I don't think there is any limit on the length of the ReasonPHrase. The W3C doc states it is a "short message" but that is not canonical.
I would not assume Version is 8 characters. Perhaps a version in the future could have 3 digits, ie: HTTP/10.1. The syntax specifies Version is delimited by a SPACE, so I would parse it by stopping at the first SPACE.
https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason- Phrase.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

A Problem in Parsing CNAME with Libpcap: some CNAMEs seems missing TLD - c++

Related

regex - extract strings at specifc positions

RegEx to match Bitcoin addresses?

convert IPv6 to decimal (ip number)

How to set the ASN1 NumericString type to SubjectDN OID?

HTTP Response Status-Line maximum size

Categories

Resources