I am using Amazon Textract to extract data from a scanned document. Now I want to convert the output to a PDF file. Below is a sample output of Textract:
[1] => Array
(
[BlockType] => LINE
[Confidence] => 99.4744720459
[Text] => Hello
[Geometry] => Array
(
[BoundingBox] => Array
(
[Width] => 0.243866533041
[Height] => 0.0134594505653
[Left] => 0.176409825683
[Top] => 0.0463116429746
)
[Polygon] => Array
(
[0] => Array
(
[X] => 0.176409825683
[Y] => 0.0463116429746
)
[1] => Array
(
[X] => 0.420276373625
[Y] => 0.0463116429746
)
[2] => Array
(
[X] => 0.420276373625
[Y] => 0.0597710944712
)
[3] => Array
(
[X] => 0.176409825683
[Y] => 0.0597710944712
)
)
)
[Id] => 75e8917d-701e-4e26-bade-f00bde9d87db
[Relationships] => Array
(
[0] => Array
(
[Type] => CHILD
[Ids] => Array
(
[0] => 46f44500-4960-4405-99f3-fa43101bc2ca
)
)
)
)
As you can see, the output contains text, height, width and its XY coordinates. How can I place the text with same co-ordinates into a PDF file?
Assuming you can convert the above to JSON, you can use jsPDF or PDFkit to create the PDF. The functionality maps pretty well based upon the limited data you posted, but I have not seen the full structure of Textract as it is still in Beta and I didn't get an invite to the program. Both these projects can use Node to create a server-side solution, but they also work in the Browser.
At the time of this writing, Google Cloud has an OCR component in their Vision - Document Text Detection feature. Unlike Textract, it approaches the task as just reporting what visual elements the document has and creating a comprehensive (and large) data structure that describes what it "sees." Textract, according to Amazon, uses machine learning to organize the data in a more human understandable form that seeks to differentiate the form from the data that constitutes the filled-out part of the form. If you are trying to create a relatively complete PDF, the Google product is well suited. Textract might be too, but I don't know yet.
This repository contains code examples (in Java) showing how you can generate a searchable PDF using AWS Textract. If you are not using Java, you may also deploy it as an AWS Lambda function and then invoke it via the AWS SDK or as a REST API call using AWS API Gateway.
There is a corresponding blog post also available here.
I have a Card, with this
try {
$menu_items = array();
$card = new \Google_Service_Mirror_TimelineItem();
//$card->setText("Test");
$card->setHtml('<img src="attachment:0"><img src="attachment:1">');
$menu_item = new \Google_Service_Mirror_MenuItem();
$menu_item->setAction("DELETE");
array_push($menu_items, $menu_item);
$card->setMenuItems($menu_items);
$opt_params = array();
$sr = $this->service->timeline->insert($card, $opt_params);
error_log('Send Card');
error_log(print_r($sr,true));
//return $sr;
$itemId = $sr->getId();
$params = array(
'data' => file_get_contents('https://XXXX.com/1.jpg'),
'mimeType'=>'image/jpg',
'uploadType' => 'media'
);
$sr = $this->service->timeline_attachments->insert($itemId, $params);
error_log('Send Card Attachment');
error_log(print_r($sr,true));
$params = array(
'data' => file_get_contents('https://XXXX.com/2.jpg'),
'mimeType'=>'image/jpg',
'uploadType' => 'media'
);
$sr = $this->service->timeline_attachments->insert($itemId, $params);
error_log('Send Card Attachment');
error_log(print_r($sr,true));
} catch (\Exception $e) {
error_log('Error while sending card '.$e->getMessage());
}
This works.
I get a card with two images.
Documentation states that I can use the attachments ids.. what is the logic behind that? for updates/patch only?
Aso, I am guessing if I send a card, and then I push the files, I would need to set notification.deliveryTime to the near future to avoid a weird card while the files are being uploaded?
It depends on the exact use. Some of the frameworks allow the attachments to be uploaded at the same time as the HTML for the card, so you'll be sure of the order and be sure that everything is available at once.
If you're uploading the attachments separately, it makes sense to use the attachment id that is returned when you do the upload since you have the information.
Good thought, but I wouldn't go with playing with notification.deliveryTime, since it hasn't worked very well the times I've tried using it. Instead, you might want to post the original card with some text such as "Loading..." and not send the notification at all. Then, when the attachments are uploaded, update the card to reference the attachments and set the notification so it generates the audio.
Update:
As you've noticed, you can't upload an attachment and attach it to multiple cards for the same reason you can't create a single timeline item and send it to multiple people - security. Attachments "belong" to a timeline item in the same way timeline items "belong" to a person. This is somewhat analogous to email and attachments - once you send the email out, each email has its own copy of the attachment.
I am using CakePHP to develop a website and currently struggling with cookie.
The problem is that when I write cookie with multiple dots,like,
$this->Cookie->write("Figure.1.id",$figureId);
$this->Cookie->write("Figure.1.name",$figureName);`
and then read, cakePHP doesn't return nested array but it returns,
array(
'1.id' => '82',
'1.name' => '1'
)
I expected something like
array(
(int) 1 => array(
'id'=>'82',
'name'=>'1'
)
)
Actually I didn't see the result for the first time when I read after I write them. But from second time, result was like that. Do you know what is going on?
I'm afraid it doesn't look as if multiple dots are supported. If you look at the read() method of the CookieComponent (http://api.cakephp.org/2.4/source-class-CookieComponent.html#256-289), you see this:
277: if (strpos($key, '.') !== false) {
278: $names = explode('.', $key, 2);
279: $key = $names[0];
280: }
and that explode() method is being told to explode the name of your cookie into a maximum of two parts around the dot.
You might be best serializing the data you want to store before saving and then deserializing after reading as shown here: http://abakalidis.blogspot.co.uk/2011/11/cakephp-storing-multi-dimentional.html
I am creating google charts to show Google Analytics data from the past 7 days. I have an issue with the X-Axis labels stacking on top of each other when I have certain data (or at least that's all I can tell is different.)
I am generating the API call using this gem: https://github.com/mattetti/googlecharts and I've looked at what each part of the URL is doing and can't find the issue, but I'm sure I'm missing something.
Here is an example of two sites data over the same time period, the first one shows the issue and the second one is a working example:
Here is the URL, these are text encoded for readability, but it has the same issues when switched to simple or extended encoding:
BROKEN VERSION:
https://chart.apis.google.com/chart?chxl=0:|11-22|11-23|11-24|11-25|11-26|11-27&chxt=x&chco=58838C,BF996B,BF5841,A61C1C&chf=bg,s,ffffff&chd=t:979,807,681,653,580,509|822,724,602,562,519,455|540,409,381,375,336,301|307,156,173,176,155,133&chds=0,979&chdl=Visits|Visitors|New+Visits|Organic+Searches&chtt=Google+Analytics+-+Last+7+Days&cht=lc&chs=600x200&chxr=0,979,979|1,822,822|2,540,540|3,307,307
WORKING VERSION:
https://chart.apis.google.com/chart?chxl=0:|11-22|11-23|11-24|11-25|11-26|11-27&chxt=x&chco=58838C,BF996B,BF5841,A61C1C&chf=bg,s,ffffff&chd=t:1385,1395,981,947,863,731|1083,1222,832,715,690,546|580,566,427,413,387,329|247,151,151,171,162,135&chds=0,1395&chdl=Visits|Visitors|New+Visits|Organic+Searches&chtt=Google+Analytics+-+Last+7+Days&cht=lc&chs=600x200&chxr=0,1385,1395|1,1083,1222|2,580,580|3,247,247
The chxr values were incorrect. The gem was generating them for multiple axes when it should have only been generating them for one. I manually overrode the min, max and step in the gem and it worked.
Here is my code using the gem, first getting the max value from all my data points:
[#visits,#visitors,#new_visits,#organic_searches].each do |array|
array.values.each do |value|
#max_value = value if (value > #max_value)
end
end
# Chart it
chart = Gchart.line(
:title => prop.to_s.upcase + ' Google Analytics - Past 7 Days',
:size => '600x200',
:bg => 'ffffff',
:axis_with_labels => ['x'],
:axis_labels => [#visits.keys],
:legend => ['Visits','Visitors','New Visits','Organic Searches'],
:line_colors => ['58838C','BF996B','BF5841','A61C1C'],
:encoding => 'text',
:data => [#visits.values,#visitors.values, #new_visits.values, #organic_searches.values],
:max_value => #max_value,
:axis_range => [nil, [0, #max_value, (#max_value / 10).to_i]],
:format => 'image_tag')
I need to geocode, i.e. translate street address to latitude,longitude for ~8,000 street addresses. I am using both Yahoo and Google geocoding engines at http://www.gpsvisualizer.com/geocoder/, and found out that for a large number of addresses those engines (one of them or both) either could not perform geocoding (i.e.return latitude=0,longitude=0), or return the wrong coordinates (incl. cases when Yahoo and Google give different results).
What is the best way to handle this problem? Which engine is (usually) more accurate? I would appreciate any thoughts, suggestions, ideas from people who had previous experience with this kind of task.
When doing a large number of requests to Google geocoding service you need to throttle the requests as responses start failing. To give you a better idea here is a snippet from a Drupal (PHP) module that I wrote.
function gmap_api_geocode($address) {
$delay = 0;
$api_key = keys_get_key('gmap_api');
while( TRUE ) {
$response = drupal_http_request('http://maps.google.com/maps/geo?q='. drupal_urlencode($address) .'&output=csv&sensor=false&oe=utf8&key='. $api_key);
switch( $response->code ) {
case 200: //OK
$data = explode(',', $response->data);
return array(
'latitude' => $data[2],
'longitude' => $data[3],
);
// Adopted from http://code.google.com/apis/maps/articles/phpsqlgeocode.html
case 620: //Too many requests
$delay += 100000;
break;
default:
return FALSE;
}
usleep($delay);
}
}
you can use
https://github.com/darkphnx/fetegeo/
for offline batch geocoding