python google ouath authentication decode and verify id_token - django

Well, I am trying to implement google oauth authentication with my django project.
I follow the guide here:
https://developers.google.com/accounts/docs/OAuth2Login?hl=de-DE
I have got the response from exchanging code. I got a string type json which contains multiple info like access_token, id_token, etc.
Id_token is a cryptographically-signed JSON object encoded in base 64.
I try to decode id_token with python module base64, but failed.
I also tried PyJWT, failed.
Is there any way to decode and verify it?

Know this is an old post but I found it via Google so I thought somebody else might drop in...
I ended up doing:
segments = response['id_token'].split('.')
if (len(segments) != 3):
raise Exception('Wrong number of segments in token: %s' % id_token)
b64string = segments[1]
b64string = b64string.encode('ascii')
padded = b64string + '=' * (4 - len(b64string) % 4)
padded = base64.urlsafe_b64decode(padded)

ID token(aka JSON Web Signature (JWS)) has 3 parts separated by . character:
Header.Payload.Signature
We can get each part by splitting the token:
parts = token.split(".")
Now I don't know the reason, but these parts do not have the base64 padding. Maybe because it is not enforced(see this)? And python base64 library requires it.
The padding character is =, and the padding should be added to the base64 string so that it is length is multiple of 4 characters. For example if the string is 14 characters, it should have the padding == at the end so that it is 16 characters in total.
So the formula to calculate correct padding is this:
4 - len(base64_string) % 4
After we add the right padding and decode the string:
payload = parts[1]
padded = payload + '=' * (4 - len(payload) % 4)
base64.b64decode(padded)
what we will get is a string representation of JSON object, we can convert it to JSON with:
json.loads(base64.b64decode(padded))
Finally we can put everything in a convenience function:
import base64
import json
def parse_id_token(token: str) -> dict:
parts = token.split(".")
if len(parts) != 3:
raise Exception("Incorrect id token format")
payload = parts[1]
padded = payload + '=' * (4 - len(payload) % 4)
decoded = base64.b64decode(padded)
return json.loads(decoded)
To learn more details about id token check Takahiko Kawasaki(founder of authlete.com)'s excellent article

Well, I figured out why...
I used base64.b46decode(id_token) to decode it.
However, I should split id_token by '.' and decode them separately.
So I can get header, claims and signature from id_token.
I was just too stupid for ignoring those little '.' in the string....

Related

Django query param get stripped if there is (+) sign

Whenever I try to to get my query string parameter everything works but only + sign gets stripped.
Here is url file:
urlpatterns = [
re_path(r'^forecast/(?P<city>[\w|\W]+)/$', weather_service_api_views.getCurrentWeather)]
Here is view File:
#api_view(['GET'])
def getCurrentWeather(request, city):
at = request.GET["at"]
print(at)
return JsonResponse({"status": "ok"}, status=200)
So if I hit the server with this URL:
http://192.168.0.5:8282/forecast/Bangladesh/?at=2018-10-14T14:34:40+0100
the output of at is like this:
2018-10-14T14:34:40 0100
Always + sign gets stripped. No other characters get stripped. I have used characters like !, = , - etc.
Since + is a special character, you will have to encode your value. Where to encode? it depends how are you generating the values for at. Based on your URL's and endpoints it looks like you are working on a weather app and at value is generated by Javascript. You can encode your values with encodeURIComponent
let at = encodeURIComponent(<your_existing_logic>)
eg:
let at = encodeURIComponent('2018-10-14T14:34:40+0100')
it will return a result
'2018-10-14T14%3A34%3A40%2B0100'
then in your backend you can get that value with:
at = request.GET.get('at')
it will give you the desired value, 2018-10-14T14:34:40+0100 in this case.
If you are creating your at param in your backend, then there are multiple ways to achieve that. You can look into this solution:
How to percent-encode URL parameters in Python?

Django: rest api: Not recieving the complete json string when its very big

I am sending a very long json string using
#api_view(['GET'])
def sendlargedata(request):
....
return HttpResponse(json.dumps(all_graphs_data,default=str),status=200,content_type="application/json")
When i check the data in the firefox response it says
SyntaxError: JSON.parse: unterminated string at line 1 column 1048577 of the JSON data
so how to oversome any size or length restrictions and send the data and recieve
Your default doesn't work in all the cases, I believe you have to escape too.

Decoding and encoding JSON in Django

I was following some django rest framework tutorials and found some obscure codes. This snippet is from the customised user model, the project from which uses jwt for authentication.
As I commented in the snippet, I can't notice the reason Why they first encodes data and decode it again. I thought this kind of pattern is not only specific to this tutorial, but quite a general pattern. Could anyone explain me please?
def _generate_jwt_token(self):
"""
Generates a JSON Web Token that stores this user's ID and
has an expiry date set to 60 days into the future.
"""
dt = datetime.now() + timedelta(days=60)
token = jwt.encode({ #first encode here
'id': self.pk,
'exp': int(dt.strftime('%s'))
}, settings.SECRET_KEY, algorithm='HS256')
return token.decode('utf-8') #returns decoded object
“Encoding” usually refers to converting data to its binary representation (bytes).
JWT (JSON Web Token) encoding uses a specific data structure and cryptographic signing to allow secure, authenticated exchanges.
The steps to encode data as JWT are as follows :
The payload is converted to json and encoded using base64.
A header, specifying the token type (eg. jwt) and the signature algorithm to use (eg. HS256), is encoded similarly.
A signature is derived from your private key and the two previous values.
Result is obtained by joining header, payload and signature with dots. The output is a binary string.
More informations here.
Decoding it using UTF-8 transforms this binary string into an Unicode string :
>>> encoded_bin = jwt.encode({'some': 'data'}, 'secret_sig', algorithm='HS256')
>>> type(encoded_bin)
<class 'bytes'>
>>> encoded_string = encoded_bin.decode('utf-8')
>>> type(encoded_string)
<class 'str'>
Notes:
It is not always possible to decode bytes to string. Base64-encoding your data allows you to store any bytes as a text representation, but the encoded form requires more space (+33%) than it's raw representation.
A binary string is prefixed by a b in your Python interpreter (eg. b"a binary string")

Latin1/UTF-8 Encoding Problems in AngularJS

I have a Python 2.7 Django + AngularJS app. There's an input field that feeds into the data model and the data is sent to the server using Angular's $http. When the input field contains the character "é", Django doesn't like it. When I use "★é" Django has no problem with it. It seems to me that the star character being outside the latin1 charset forces the encoding to utf-8, while when the only non-latin character is "é", Angular sends the data as latin1, which confuses my python code.
The error message from Django is:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
Telling the simplejson.loads() function on the server to read the data using the ISO-8859-1 (latin1) encoding worked fine when my input string contained just the é in it and no star, so that proves that the data coming from the browser is latin1 unless forced to utf-8 by non-latin1 characters, like the star.
Is there a way to tell Angular to always send data using utf-8?
The Angular code that sends the data to the server:
$http({
url: $scope.dataUrl,
method: 'POST',
data: JSON.stringify({recipe: recipe}),
headers: {'Content-Type': 'application/json'}
}).success(...).error(...);
The Django code that reads the data:
recipe = simplejson.loads(request.raw_post_data)['recipe']
I found one way that works, using the transformRequest config parameter.
transformRequest: function (data, headersGetter) {
return encode_utf8(JSON.stringify(data));
}
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
I'm using the encode function found and explained at http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html and the JSON library found at http://www.JSON.org/json2.js.

Removing invalid characters from amazon cloud search sdf

While trying to post the data extracted from a pdf file to a amazon cloud search domain for indexing, the indexing failed due to invalid chars in the data.
How can i remove these invalid charecters before posting to the search end point?
I tried escaping and replacing the chars, but didn't work.
I was getting an error like this when uploading document to CloudSearch (using aws sdk / json):
Error with source for field content_stemmed: Validation error for field 'content_stemmed': Invalid codepoint B
The solution for me, as documented by AWS (reference below), was to remove invalid characters from the document prior to uploading:
For example this is what I did using javascript:
const cleaned = someFieldValue.replace(
/[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/g,
''
)
ref:
Both JSON and XML batches can only contain UTF-8 characters that are valid in XML. Valid characters are the control characters tab (0009), carriage return (000D), and line feed (000A), and the legal characters of Unicode and ISO/IEC 10646. FFFE, FFFF, and the surrogate blocks D800–DBFF and DC00–DFFF are invalid and will cause errors.
You can use the following regular expression to match invalid characters so you can remove them: /[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]/
I have fixed the problem using the solution available here
RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
u'|' + \
u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
(unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff))
x = u"<foo>text\u001a</foo>"
x = re.sub(RE_XML_ILLEGAL, "?", x)