I would like to serve my visitors the best results possible when they use our search feature.
To achieve this I would like to interpret the search query.
For example a user searches for 'red beds for kids 120cm'
I would like to interpret it as following:
Category-Filter is "beds" AND "children"
Color-filter is red
Size-filter is 120cm
Are there ready to go tools for Elasticsearch?
Will I need NLP in front of Elasticsearch?
Elasticsearch is pretty powerful on its own and is very much capable of returning the most relevant results to full-text search queries, provided that data is indexed and queried adequately.
Under the hood it always performs text analysis for full-text searches (for fields of type text). A text analyzer consists of a character filter, tokenizer and a token filter.
For instance, synonym token filter can replace kids with children in the user query.
Above that search queries on modern websites are often facilitated via category selectors in the UI, which can easily be implemented with querying keyword fields of Elasticsearch.
It might be enough to model your data correctly and tune its indexing to implement the search you need - and if that is not enough, you can always add some extra layer of NLP-like logic on the client side, like #2ps suggested.
Now let me show a toy example of what you can achieve with a synonym token filter and copy_to feature.
Let's define the mapping
Let's pretend that our products are characterized by the following properties: Category, Color, and Size.LengthCM.
The mapping will look something like:
PUT /my_index
{
"mappings": {
"properties": {
"Category": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Color": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Size": {
"properties": {
"LengthCM": {
"type": "integer",
"copy_to": "DescriptionAuto"
}
}
},
"DescriptionAuto": {
"type": "text",
"analyzer": "MySynonymAnalyzer"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"MySynonymAnalyzer": {
"tokenizer": "standard",
"filter": [
"MySynonymFilter"
]
}
},
"filter": {
"MySynonymFilter": {
"type": "synonym",
"lenient": true,
"synonyms": [
"kid, kids => children"
]
}
}
}
}
}
}
Notice that we selected type keyword for the fields Category and Color.
Now, what about these copy_to and synonym?
What will copy_to do?
Every time we send an object for indexing into our index, value of the keyword field Category will be copied to a full-text field DescritpionAuto. This is what copy_to does.
What will synonym do?
To enable synonym we need to define a custom analyzer, see MySynonymAnalyzer which we defined under "settings" above.
Roughly, it will replace every token that matches something on the left of => with the token on the right.
How will the documents look like?
Let's insert a few example documents:
POST /my_index/_doc
{
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
POST /my_index/_doc
{
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "blue",
"Size": {
"LengthCM": 200
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
As you can see, DescriptionAuto is not present in the original documents - though due to copy_to we will be able to query it.
Let's see how.
Performing the search!
Now we can try out our index with a simple query_string query:
POST /my_index/_doc/_search
{
"query": {
"query_string": {
"query": "red beds for kids 120cm",
"default_field": "DescriptionAuto"
}
}
}
The results will look something like the following:
"hits": {
...
"max_score": 2.3611186,
"hits": [
{
...
"_score": 2.3611186,
"_source": {
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
},
{
...
"_score": 1.0998137,
"_source": {
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
},
{
...
"_score": 0.34116736,
"_source": {
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
}
]
}
The document with categories beds and children and color red is on top. And its relevance score is twice bigger than of its follow-up!
How can I check how Elasticsearch interpreted the user's query?
It is easy to do via analyze API:
POST /my_index/_analyze
{
"text": "red bed for kids 120cm",
"analyzer": "MySynonymAnalyzer"
}
{
"tokens": [
{
"token": "red",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bed",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "for",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "children",
"start_offset": 12,
"end_offset": 16,
"type": "SYNONYM",
"position": 3
},
{
"token": "120cm",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}
As you can see, there is no token kids, but there is token children.
On a side note, in this example Elasticsearch wasn't able, though, to parse the size of the bed: token 120cm didn't match to anything, since all sizes are integers, like 120, 150, etc. Another layer of tweaking will be needed to extract 120 from 120cm token.
I hope this gives an idea of what can be achieved with Elasticsearch's built-in text analysis capabilities!
I'm attempting to use Boost to read a JSON file from my Firefox configuration folder called sessionstore.js, where the information on the current/last Firefox session is saved for purposes of recovery. I've written a program based on the XML-based tutorial from the Boost website, simply swapping out the XML parts for the JSON parts, which can be seen below
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
#include <boost/foreach.hpp>
#include <string>
#include <set>
#include <exception>
using boost::property_tree::ptree;
using namespace std;
const string FILENAME = "sessionstore.js";
const string WINDOW_TAG = "windows";
struct session_settings
{
void load (const string &FILENAME);
};
void session_settings::load (const string &FILENAME)
{
ptree pt;
read_json (FILENAME, pt);
}
int main()
{
try
{
session_settings Settings;
Settings.load(FILENAME);
}
catch (exception &e)
{
cout << "Error: " << e.what() << endl;
}
return 0;
}
The contents of the JSON file I'm trying to read are
{"windows":[{"tabs":[{"entries":[{"url":"about:home","title":"Mozilla Firefox Start Page","ID":5,"docshellID":11,"owner_b64":"NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAS8nfAAOr03buTZBMmukiq4HoizADOUR05MxABBLoP1AAAAAAAVhYm91dAAAAARob21l4NodcC97EdOM0ABgsPwUoweiLMAM5RHTkzEAEEug/UAAAAAADm1vei1zYWZlLWFib3V0AAAABGhvbWUAAAA=","docIdentifier":5},{"url":"http://www.google.co.uk/","title":"Google","ID":6,"docshellID":11,"docIdentifier":6,"children":[{"url":"about:blank","ID":7,"docshellID":12,"owner_b64":"NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAd6UctCANBHTk5kAEEug/UAHoizADOUR05MxABBLoP1AAAAAAv////8AAABQAQAAABhodHRwOi8vd3d3Lmdvb2dsZS5jby51ay8AAAAAAAAABAAAAAcAAAAQAAAAB/////8AAAAH/////wAAAAcAAAAQAAAAFwAAAAEAAAAXAAAAAQAAABcAAAABAAAAGAAAAAAAAAAY/////wAAABf/////AAAAF/////8AAAAX/////wEAAAAAAAAAAAABAAA=","docIdentifier":7,"scroll":"0,0"}],"formdata":{"#csi":"1","#hcache":"{\"BInSTfL-EtSt8QOl24nrCg\":[[69,{}],[14,{}],[60,{}],[81,{\"persisted\":true}],[42,{}],[43,{}],[83,{}],[95,{\"kfe\":{\"kfeHost\":\"clients1.google.co.uk\",\"kfeUrlPrefix\":\"/webpagethumbnail?r=2&f=2&s=300:585&query=&hl=en&gl=uk\",\"maxPrefetchConnections\":2,\"prefetch\":90,\"slowConnection\":false},\"logging\":{\"csiFraction\":0.05,\"gen204Fraction\":0.05},\"msgs\":{\"loading\":\"Still loading...\",\"mute\":\"Mute\",\"noPreview\":\"Preview not available\",\"sound\":\"Sound:\",\"soundOff\":\"off\",\"soundOn\":\"on\",\"unmute\":\"Unmute\"},\"pb\":{\"desiredHeight\":585,\"desiredWidth\":300,\"minHeight\":200,\"minWidth\":300},\"time\":{\"hoverClose\":300,\"hoverModeTimeout\":60,\"hoverOpen\":125,\"loading\":100,\"longHoverOpen\":725,\"prefetchOnLoad\":3000,\"timeout\":2500}}],[78,{}],[25,{\"m\":{\"bks\":true,\"blg\":true,\"dsc\":true,\"evn\":true,\"frm\":true,\"isch\":true,\"klg\":true,\"mbl\":true,\"nws\":true,\"plcs\":true,\"ppl\":true,\"prc\":true,\"pts\":true,\"rcp\":true,\"shop\":true,\"vid\":true},\"t\":null}],[64,{}],[105,{}],[22,{\"m_errors\":{\"32\":\"Sorry, no more results to show.\",\"default\":\"<font color=red>Error:</font> The server could not complete your request. Try again in 30 seconds.\"},\"m_tip\":\"Click for more information\"}],[77,{}],[84,{}],[99,{}],[29,{\"mcr\":5}],[92,{\"avgTtfc\":2000,\"fd\":1000,\"fl\":true,\"focus\":true,\"hpt\":250,\"kn\":true,\"mds\":\"clir,clue,dfn,evn,frim,klg,prc,rl,show,sp,sts,ww,mbl_he,mbl_hs,mbl_re,mbl_rs,mbl_sv,isch\",\"msg\":{\"dym\":\"Did you mean:\",\"gs\":\"Google Search\",\"kntt\":\"Use the up and down arrow keys to select each result. Press Enter to go to the selection.\",\"sif\":\"Search instead for\",\"srf\":\"Showing results for\"},\"odef\":true,\"ophe\":true,\"pq\":true,\"rpt\":41,\"tct\":\" ?\",\"tdur\":50}],[24,{}],[38,{}]]}"},"scroll":"0,0"}],"index":2,"hidden":false,"attributes":{"image":"http://www.google.co.uk/favicon.ico"},"storage":{"http://www.google.co.uk":{"web-v":"12_c9c918f0"}}}],"selected":1,"_closedTabs":[],"width":994,"height":688,"screenX":1650,"screenY":24,"sizemode":"normal","title":"Google"}],"selectedWindow":0,"_closedWindows":[{"tabs":[{"entries":[{"url":"about:home","title":"Mozilla Firefox Start Page","ID":0,"docshellID":5,"owner_b64":"NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAS8nfAAOr03buTZBMmukiq4HoizADOUR05MxABBLoP1AAAAAAAVhYm91dAAAAARob21l4NodcC97EdOM0ABgsPwUoweiLMAM5RHTkzEAEEug/UAAAAAADm1vei1zYWZlLWFib3V0AAAABGhvbWUAAAA="},{"url":"http://www.facebook.com/","title":"Welcome to Facebook - Log In, Sign Up or Learn More","ID":1,"docshellID":5,"docIdentifier":1,"formdata":{"//xhtml:div[#id='reg_form_box']/xhtml:table/xhtml:tbody/xhtml:tr[6]/xhtml:td[2]/xhtml:div/xhtml:div/xhtml:select":0,"//xhtml:div[#id='reg_form_box']/xhtml:table/xhtml:tbody/xhtml:tr[6]/xhtml:td[2]/xhtml:div/xhtml:div/xhtml:select[2]":0,"#sex":0,"#birthday_month":0,"#birthday_day":0,"#birthday_year":0},"scroll":"0,0"}],"index":2,"hidden":false,"attributes":{"image":"http://www.facebook.com/favicon.ico"}},{"entries":[{"url":"http://twitter.com/","title":"Twitter","ID":3,"docshellID":6,"docIdentifier":3,"children":[{"url":"http://api.twitter.com/receiver.html","ID":4,"docshellID":7,"referrer":"http://twitter.com/","docIdentifier":4,"scroll":"0,0"}],"formdata":{},"scroll":"0,0"}],"index":1,"hidden":false,"attributes":{"image":"http://twitter.com/phoenix/favicon.ico"}}],"selected":2,"_closedTabs":[],"width":994,"height":688,"screenX":1366,"screenY":307,"sizemode":"normal","cookies":[{"host":".facebook.com","value":"J4-69","path":"/","name":"lsd"},{"host":".facebook.com","value":"http%3A%2F%2Fwww.facebook.com%2F","path":"/","name":"reg_fb_gate"},{"host":".facebook.com","value":"http%3A%2F%2Fwww.facebook.com%2F","path":"/","name":"reg_fb_ref"},{"host":".facebook.com","value":"994x624","path":"/","name":"wd"},{"host":".twitter.com","value":"43838368","path":"/","name":"__utmc"},{"host":"twitter.com","value":"4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl","path":"/","name":"original_referer"},{"host":"scribe.twitter.com","value":"4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl","path":"/","name":"original_referer"},{"host":".twitter.com","value":"BAh7CToPY3JlYXRlZF9hdGwrCDoVZ%252F4vAToMY3NyZl9pZCIlODE2MGI1ZjJh%250AYmViNDMwODMxNDlkN2U5ZDg5Yjk4ZmU6B2lkIiU2N2I4YjdmNGExNWFkNzlk%250AODI0MDVjMGM1NmMzYjVhYSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6%250ARmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7AA%253D%253D--8b0d751e9774c5cfaa61fdec567cb782aa8757dd","path":"/","name":"_twitter_sess","httponly":true},{"host":".twitter.com","value":"43838368","path":"/","name":"__utmc"},{"host":"twitter.com","value":"4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl","path":"/","name":"original_referer"},{"host":"scribe.twitter.com","value":"4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl","path":"/","name":"original_referer"},{"host":".twitter.com","value":"BAh7CToPY3JlYXRlZF9hdGwrCDoVZ%252F4vAToMY3NyZl9pZCIlODE2MGI1ZjJh%250AYmViNDMwODMxNDlkN2U5ZDg5Yjk4ZmU6B2lkIiU2N2I4YjdmNGExNWFkNzlk%250AODI0MDVjMGM1NmMzYjVhYSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6%250ARmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7AA%253D%253D--8b0d751e9774c5cfaa61fdec567cb782aa8757dd","path":"/","name":"_twitter_sess","httponly":true}],"title":"Twitter"}],"session":{"state":"stopped","lastUpdate":1305658398727}}
and when I tried to load that with my program I got the error
Error: sessionstore.js(1): expected value
Since the file is formatted all on a single line, this meant the error could be anywhere in the file, so I ran it though a Javascript beautifier, keeping the default options, and pasted the results back into the original file and executed the program.
The formatted JSON is
{
"windows": [{
"tabs": [{
"entries": [{
"url": "about:home",
"title": "Mozilla Firefox Start Page",
"ID": 5,
"docshellID": 11,
"owner_b64": "NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAS8nfAAOr03buTZBMmukiq4HoizADOUR05MxABBLoP1AAAAAAAVhYm91dAAAAARob21l4NodcC97EdOM0ABgsPwUoweiLMAM5RHTkzEAEEug/UAAAAAADm1vei1zYWZlLWFib3V0AAAABGhvbWUAAAA=",
"docIdentifier": 5
}, {
"url": "http://www.google.co.uk/",
"title": "Google",
"ID": 6,
"docshellID": 11,
"docIdentifier": 6,
"children": [{
"url": "about:blank",
"ID": 7,
"docshellID": 12,
"owner_b64": "NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAd6UctCANBHTk5kAEEug/UAHoizADOUR05MxABBLoP1AAAAAAv////8AAABQAQAAABhodHRwOi8vd3d3Lmdvb2dsZS5jby51ay8AAAAAAAAABAAAAAcAAAAQAAAAB/////8AAAAH/////wAAAAcAAAAQAAAAFwAAAAEAAAAXAAAAAQAAABcAAAABAAAAGAAAAAAAAAAY/////wAAABf/////AAAAF/////8AAAAX/////wEAAAAAAAAAAAABAAA=",
"docIdentifier": 7,
"scroll": "0,0"
}],
"formdata": {
"#csi": "1",
"#hcache": "{\"BInSTfL-EtSt8QOl24nrCg\":[[69,{}],[14,{}],[60,{}],[81,{\"persisted\":true}],[42,{}],[43,{}],[83,{}],[95,{\"kfe\":{\"kfeHost\":\"clients1.google.co.uk\",\"kfeUrlPrefix\":\"/webpagethumbnail?r=2&f=2&s=300:585&query=&hl=en&gl=uk\",\"maxPrefetchConnections\":2,\"prefetch\":90,\"slowConnection\":false},\"logging\":{\"csiFraction\":0.05,\"gen204Fraction\":0.05},\"msgs\":{\"loading\":\"Still loading...\",\"mute\":\"Mute\",\"noPreview\":\"Preview not available\",\"sound\":\"Sound:\",\"soundOff\":\"off\",\"soundOn\":\"on\",\"unmute\":\"Unmute\"},\"pb\":{\"desiredHeight\":585,\"desiredWidth\":300,\"minHeight\":200,\"minWidth\":300},\"time\":{\"hoverClose\":300,\"hoverModeTimeout\":60,\"hoverOpen\":125,\"loading\":100,\"longHoverOpen\":725,\"prefetchOnLoad\":3000,\"timeout\":2500}}],[78,{}],[25,{\"m\":{\"bks\":true,\"blg\":true,\"dsc\":true,\"evn\":true,\"frm\":true,\"isch\":true,\"klg\":true,\"mbl\":true,\"nws\":true,\"plcs\":true,\"ppl\":true,\"prc\":true,\"pts\":true,\"rcp\":true,\"shop\":true,\"vid\":true},\"t\":null}],[64,{}],[105,{}],[22,{\"m_errors\":{\"32\":\"Sorry, no more results to show.\",\"default\":\"<font color=red>Error:</font> The server could not complete your request. Try again in 30 seconds.\"},\"m_tip\":\"Click for more information\"}],[77,{}],[84,{}],[99,{}],[29,{\"mcr\":5}],[92,{\"avgTtfc\":2000,\"fd\":1000,\"fl\":true,\"focus\":true,\"hpt\":250,\"kn\":true,\"mds\":\"clir,clue,dfn,evn,frim,klg,prc,rl,show,sp,sts,ww,mbl_he,mbl_hs,mbl_re,mbl_rs,mbl_sv,isch\",\"msg\":{\"dym\":\"Did you mean:\",\"gs\":\"Google Search\",\"kntt\":\"Use the up and down arrow keys to select each result. Press Enter to go to the selection.\",\"sif\":\"Search instead for\",\"srf\":\"Showing results for\"},\"odef\":true,\"ophe\":true,\"pq\":true,\"rpt\":41,\"tct\":\" ?\",\"tdur\":50}],[24,{}],[38,{}]]}"
},
"scroll": "0,0"
}],
"index": 2,
"hidden": false,
"attributes": {
"image": "http://www.google.co.uk/favicon.ico"
},
"storage": {
"http://www.google.co.uk": {
"web-v": "12_c9c918f0"
}
}
}],
"selected": 1,
"_closedTabs": [],
"width": 994,
"height": 688,
"screenX": 1650,
"screenY": 24,
"sizemode": "normal",
"title": "Google"
}],
"selectedWindow": 0,
"_closedWindows": [{
"tabs": [{
"entries": [{
"url": "about:home",
"title": "Mozilla Firefox Start Page",
"ID": 0,
"docshellID": 5,
"owner_b64": "NhAra3tiRRqhyKDUVsktxQAAAAAAAAAAwAAAAAAAAEYAAQAAAAAAAS8nfAAOr03buTZBMmukiq4HoizADOUR05MxABBLoP1AAAAAAAVhYm91dAAAAARob21l4NodcC97EdOM0ABgsPwUoweiLMAM5RHTkzEAEEug/UAAAAAADm1vei1zYWZlLWFib3V0AAAABGhvbWUAAAA="
}, {
"url": "http://www.facebook.com/",
"title": "Welcome to Facebook - Log In, Sign Up or Learn More",
"ID": 1,
"docshellID": 5,
"docIdentifier": 1,
"formdata": {
"//xhtml:div[#id='reg_form_box']/xhtml:table/xhtml:tbody/xhtml:tr[6]/xhtml:td[2]/xhtml:div/xhtml:div/xhtml:select": 0,
"//xhtml:div[#id='reg_form_box']/xhtml:table/xhtml:tbody/xhtml:tr[6]/xhtml:td[2]/xhtml:div/xhtml:div/xhtml:select[2]": 0,
"#sex": 0,
"#birthday_month": 0,
"#birthday_day": 0,
"#birthday_year": 0
},
"scroll": "0,0"
}],
"index": 2,
"hidden": false,
"attributes": {
"image": "http://www.facebook.com/favicon.ico"
}
}, {
"entries": [{
"url": "http://twitter.com/",
"title": "Twitter",
"ID": 3,
"docshellID": 6,
"docIdentifier": 3,
"children": [{
"url": "http://api.twitter.com/receiver.html",
"ID": 4,
"docshellID": 7,
"referrer": "http://twitter.com/",
"docIdentifier": 4,
"scroll": "0,0"
}],
"formdata": {},
"scroll": "0,0"
}],
"index": 1,
"hidden": false,
"attributes": {
"image": "http://twitter.com/phoenix/favicon.ico"
}
}],
"selected": 2,
"_closedTabs": [],
"width": 994,
"height": 688,
"screenX": 1366,
"screenY": 307,
"sizemode": "normal",
"cookies": [{
"host": ".facebook.com",
"value": "J4-69",
"path": "/",
"name": "lsd"
}, {
"host": ".facebook.com",
"value": "http%3A%2F%2Fwww.facebook.com%2F",
"path": "/",
"name": "reg_fb_gate"
}, {
"host": ".facebook.com",
"value": "http%3A%2F%2Fwww.facebook.com%2F",
"path": "/",
"name": "reg_fb_ref"
}, {
"host": ".facebook.com",
"value": "994x624",
"path": "/",
"name": "wd"
}, {
"host": ".twitter.com",
"value": "43838368",
"path": "/",
"name": "__utmc"
}, {
"host": "twitter.com",
"value": "4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl",
"path": "/",
"name": "original_referer"
}, {
"host": "scribe.twitter.com",
"value": "4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl",
"path": "/",
"name": "original_referer"
}, {
"host": ".twitter.com",
"value": "BAh7CToPY3JlYXRlZF9hdGwrCDoVZ%252F4vAToMY3NyZl9pZCIlODE2MGI1ZjJh%250AYmViNDMwODMxNDlkN2U5ZDg5Yjk4ZmU6B2lkIiU2N2I4YjdmNGExNWFkNzlk%250AODI0MDVjMGM1NmMzYjVhYSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6%250ARmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7AA%253D%253D--8b0d751e9774c5cfaa61fdec567cb782aa8757dd",
"path": "/",
"name": "_twitter_sess",
"httponly": true
}, {
"host": ".twitter.com",
"value": "43838368",
"path": "/",
"name": "__utmc"
}, {
"host": "twitter.com",
"value": "4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl",
"path": "/",
"name": "original_referer"
}, {
"host": "scribe.twitter.com",
"value": "4bfz%2B%2BmebEkRkMWFCXm%2FCUOsvDoVeFTl",
"path": "/",
"name": "original_referer"
}, {
"host": ".twitter.com",
"value": "BAh7CToPY3JlYXRlZF9hdGwrCDoVZ%252F4vAToMY3NyZl9pZCIlODE2MGI1ZjJh%250AYmViNDMwODMxNDlkN2U5ZDg5Yjk4ZmU6B2lkIiU2N2I4YjdmNGExNWFkNzlk%250AODI0MDVjMGM1NmMzYjVhYSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6%250ARmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7AA%253D%253D--8b0d751e9774c5cfaa61fdec567cb782aa8757dd",
"path": "/",
"name": "_twitter_sess",
"httponly": true
}],
"title": "Twitter"
}],
"session": {
"state": "stopped",
"lastUpdate": 1305658398727
}
}
The error
Error: sessionstore.js(179): expected value
now identifies the fault as being on the third-last line, the one that reads "lastUpdate": 1305658398727. From what I've read about the JSON format, this sounds to me like a comma or bracket is missing from this line, but this is a file that has been produced my Mozilla to work with Firefox, and I don't believe that they would make a mistake like that, so I am lead to believe that there is a problem with the JSON parser in Boost. Can anyone please confirm if this is the case, or if I'm the one doing something wrong?
I think the problem is this value is bigger than an int or a double. I don't know what data type uses BOOST JSON for reading numbers. To test this, just change the number to be a string and parse it again. In the standard, numbers are not limited, but you have to select a data type to represent them, and maybe they selected double, clearly not enough for this number. I'll take a look to see if you can configure the type used for numbers.
EDIT:
Looking again at the implementation, the "number" rule is implemented using Spirit as follows:
number
= strict_real_p
| int_p
;
Looking at Spirit strict_real_p uses double as the underlying type, and int_p actually uses an int.
The bad news is that, for what I see in the code, this is not configurable, so you have to change the JSON to be interpreted.
After receiving answers from Diego Sevilla and c-smile, I did a bit of Googling to figure out how I would incorporate their suggestions into Boost, since changing the JSON file unfortunately isn't an option in my case, and I came across this ticket on the Boost bug tracker that describes my exact problem. It has since been fixed and released with Boost 1.45. I, however, am using version 1.42 from the Ubuntu repositories, so will need to install the newer version manually.
As Diego said that is because 1305658398727 does not fit into neither strict_real_p nor int_p production.
I suspect you will need either other JSON parser or to modify Spirit definitions by yourself.
Either like this:
number
= strict_real_p
| int_p
| int64_p
;
or just as:
number
= real_p;
Ideally date/time in JSON should be presented by strings in ISO format. In this case you will not have such problems. I suspect that data there is just a number of milliseconds since 1970-01-01 (JavaScript Date.valueOf())