AWS Glue won't Classify my Data - amazon-web-services

I have a html file which structured like this:
<!doctype html public "-//w3c//dtd html 4.0transitional//en">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Author" content="ERA">
<LINK REL=STYLESHEET TYPE="text/css" HREF="Style_Sheets/ERA_Internet_Printer.css">
</head>
<body>
<pre>
<font face="courier new" size=-4> 14V-IG-TEST-DATA - SERVC - EXEC# 4515
[11| Blubb,abcons, Port: 18 For: abcons
For period : GE 08/04/18 AND LE 11/04/18 OR GE 11/04/18 AND LE 11/05/18
01:45:40 11-04-18 - Page # 1
Serial#........................ 564561215
Make Desc...................... VW
Carline........................ MUX
Year........................... 2015
Cust# ........................ 512
License#....................... 78365HH
Open RO........................ R25625
EOR............................ EOR
Serial#........................ 2151512315
Make Desc...................... VOLKSWAGEN
Carline........................ VOLKSWAGEN
Year........................... 2017
Cust# ........................ 552
License#....................... DPA2151
Open RO........................ T52165
EOR............................ EOR
2 records listed.
</pre>
</body>
</html>
I want to get the Information out of the file like "Key.......... Value".
So I've created a custom classifier in AWS Glue with Grok to get the Info.
The classifier is configured like this:
Custom Classifier
So the Grok Pattern is configured as followed:
%{KEY:mykey}%{GREEDYDATA:myvalue}
with the custom Pattern:
KEY ([a-zA-Z# 1-9]+\.+ )
Every Grok online debugger (like https://grokdebug.herokuapp.com/) get the information out of the data structure with this configuration. But when I start the crawler in Glue with the custom classifier, it won't find any tables or structures.
What am I doing wrong?

I think you're running into the problem I answered here: https://github.com/aws-samples/aws-glue-samples/issues/4
There's a buried sentence in AWS documentation that states "To reclassify data to correct an incorrect classifier, create a new crawler with the updated classifier"
Simply updating the classifier and re-running the crawler will not use the updated classifier.

Related

A Livewire component was not found after creating a component in a custom path

As stated in the docs, you can create a component in a custom path, which is different from the default views/livewire/ and Http/Livewire. Just for the sake of better organization, I created subfolders:
$ php artisan make:livewire tutorial/counter
So I got my files in the following paths:
views/livewire/tutorial/counter.blade.php
Http/Livewire/Tutorial/Counter.php
To test the component in a view, I created a /livewire/tutorial/welcome.blade.php which has the following:
<!DOCTYPE html>
<html lang="{{ str_replace('_', '-', app()->getLocale()) }}">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Livewire tutorial: counter</title>
<!-- Fonts -->
<link href="https://fonts.googleapis.com/css2?family=Nunito:wght#400;600;700&display=swap" rel="stylesheet">
<!-- Styles -->
<style>
/*! normalize.css v8.0.1 | MIT License | github.com/necolas/normalize.css */html{line-height:1.15;-webkit-text-size-adjust:100%}body{margin:0}a{background-color:transparent}[hidden]{display:none}html{font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji;line-height:1.5}*,:after,:before{box-sizing:border-box;border:0 solid #e2e8f0}a{color:inherit;text-decoration:inherit}svg,video{display:block;vertical-align:middle}video{max-width:100%;height:auto}.bg-white{--bg-opacity:1;background-color:#fff;background-color:rgba(255,255,255,var(--bg-opacity))}.bg-gray-100{--bg-opacity:1;background-color:#f7fafc;background-color:rgba(247,250,252,var(--bg-opacity))}.border-gray-200{--border-opacity:1;border-color:#edf2f7;border-color:rgba(237,242,247,var(--border-opacity))}.border-t{border-top-width:1px}.flex{display:flex}.grid{display:grid}.hidden{display:none}.items-center{align-items:center}.justify-center{justify-content:center}.font-semibold{font-weight:600}.h-5{height:1.25rem}.h-8{height:2rem}.h-16{height:4rem}.text-sm{font-size:.875rem}.text-lg{font-size:1.125rem}.leading-7{line-height:1.75rem}.mx-auto{margin-left:auto;margin-right:auto}.ml-1{margin-left:.25rem}.mt-2{margin-top:.5rem}.mr-2{margin-right:.5rem}.ml-2{margin-left:.5rem}.mt-4{margin-top:1rem}.ml-4{margin-left:1rem}.mt-8{margin-top:2rem}.ml-12{margin-left:3rem}.-mt-px{margin-top:-1px}.max-w-6xl{max-width:72rem}.min-h-screen{min-height:100vh}.overflow-hidden{overflow:hidden}.p-6{padding:1.5rem}.py-4{padding-top:1rem;padding-bottom:1rem}.px-6{padding-left:1.5rem;padding-right:1.5rem}.pt-8{padding-top:2rem}.fixed{position:fixed}.relative{position:relative}.top-0{top:0}.right-0{right:0}.shadow{box-shadow:0 1px 3px 0 rgba(0,0,0,.1),0 1px 2px 0 rgba(0,0,0,.06)}.text-center{text-align:center}.text-gray-200{--text-opacity:1;color:#edf2f7;color:rgba(237,242,247,var(--text-opacity))}.text-gray-300{--text-opacity:1;color:#e2e8f0;color:rgba(226,232,240,var(--text-opacity))}.text-gray-400{--text-opacity:1;color:#cbd5e0;color:rgba(203,213,224,var(--text-opacity))}.text-gray-500{--text-opacity:1;color:#a0aec0;color:rgba(160,174,192,var(--text-opacity))}.text-gray-600{--text-opacity:1;color:#718096;color:rgba(113,128,150,var(--text-opacity))}.text-gray-700{--text-opacity:1;color:#4a5568;color:rgba(74,85,104,var(--text-opacity))}.text-gray-900{--text-opacity:1;color:#1a202c;color:rgba(26,32,44,var(--text-opacity))}.underline{text-decoration:underline}.antialiased{-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.w-5{width:1.25rem}.w-8{width:2rem}.w-auto{width:auto}.grid-cols-1{grid-template-columns:repeat(1,minmax(0,1fr))}#media (min-width:640px){.sm\:rounded-lg{border-radius:.5rem}.sm\:block{display:block}.sm\:items-center{align-items:center}.sm\:justify-start{justify-content:flex-start}.sm\:justify-between{justify-content:space-between}.sm\:h-20{height:5rem}.sm\:ml-0{margin-left:0}.sm\:px-6{padding-left:1.5rem;padding-right:1.5rem}.sm\:pt-0{padding-top:0}.sm\:text-left{text-align:left}.sm\:text-right{text-align:right}}#media (min-width:768px){.md\:border-t-0{border-top-width:0}.md\:border-l{border-left-width:1px}.md\:grid-cols-2{grid-template-columns:repeat(2,minmax(0,1fr))}}#media (min-width:1024px){.lg\:px-8{padding-left:2rem;padding-right:2rem}}#media (prefers-color-scheme:dark){.dark\:bg-gray-800{--bg-opacity:1;background-color:#2d3748;background-color:rgba(45,55,72,var(--bg-opacity))}.dark\:bg-gray-900{--bg-opacity:1;background-color:#1a202c;background-color:rgba(26,32,44,var(--bg-opacity))}.dark\:border-gray-700{--border-opacity:1;border-color:#4a5568;border-color:rgba(74,85,104,var(--border-opacity))}.dark\:text-white{--text-opacity:1;color:#fff;color:rgba(255,255,255,var(--text-opacity))}.dark\:text-gray-400{--text-opacity:1;color:#cbd5e0;color:rgba(203,213,224,var(--text-opacity))}}
</style>
<style>
body {
font-family: 'Nunito';
}
</style>
#livewireStyles
</head>
<body class="antialiased">
<livewire:counter />
#livewireScripts
</body>
</html>
When I hit the route, I get the following error:
Livewire\Exceptions\ComponentNotFoundException Unable to find
component: [counter]
I already have tried:
$ php artisan livewire:discover
But I still get that error.
What Am I missing?
Solved
Just like Najmus suggested, I tried:
<livewire:path.to.counter />
And that did the trick!
Try this in the welcome.blade.php file
<livewire:tutorial.counter />
instead of
<livewire:counter />
you can create components under subdirectory using php artisan make:livewire users.register User is your folder name and register is component...
and if you want to move existing component then simply you can run this command php artisan livewire:mv register users.register
ie : livewire:mv {your component name} {your folder name}.{component name}
instead of mv you can use move also
may this will help
-Check your class name propely(if any spelling mistake made).
-Check header in class defined or not.
-If component is not there then create one:
php artisan make:liveware component_name
In my case, i typed the extension along with the file name that why it was showing error.
<livewire:abandoned-property-case-list.blade.php/>
But this should be
<livewire:abandoned-property-case-list/>

Integrating Cloudflare player into angular 7 for Cloudflare stream

I am trying to get cloudflare stream to work in angular. I have tried the solution given here: Angular attribute for HTML stream.
However, it is always a hit or a miss:
Out of 10 reloads, one loads the player.
But anytime I make a
change to the <stream> tag, and angular re-compiles, the player is
loaded. After this if I refresh the browser, it is a blank screen
again.
The component which shows the video is deep in the tree and the component belongs to a module that is lazy loaded:
In the index.html file:
<!doctype html>
<html lang="en">
<head>
...............
</head>
<body>
<app-root></app-root>
</body>
<script src="https://embed.cloudflarestream.com/embed/r4xu.fla9.latest.js" id="video_embed" defer="" async=""></script>
</html>
In the videoFile.component.ts:
<stream src="5d5bc37ffcf54c9b82e996823bffbb81" height="480px" width="240px" controls></stream>
Found a solution here: https://github.com/angular/angular/issues/13965
So everytime the component loads, the script is removed and reattached using ngOninit, like so:
document.getElementById("video_embed").remove();
let testScript = document.createElement("script");
testScript.setAttribute("id", "video_embed");
testScript.setAttribute("src", "https://embed.cloudflarestream.com/embed/r4xu.fla9.latest.js");
document.body.appendChild(testScript);
If anyone has any other solutions, please do let me know.

Go language strange behavior by handling templates

gotemplates
Hello!
I'm learning Go language now and trying to port some simple WEB code (Laravel 4).
Everything was well, until I tried to reproduce Blade templates into text templates.
I found that Go can load my CSS and JavaScript files only from the catalog with a name "bootstrap" only.
Here is my catalog tree which I tried to use:
start-catalog
bootstrap (link to bootstrap-3.3.1)
bootstrap-3.3.1
css
bootstrap.min.css
js
bootstrap.min.js
jquery
jquery (link to jquery-2.1.1.min.js)
jsquery-2.1.1.min.js
go_prg.go
Here are my templates:
base_js.tmpl
{{define "base_js"}}
{{template "login_1"}}
<script src = "/bootstrap/js/jquery"></script>
<script src = "/bootstrap/js/bootstrap.min.js"></script>
{{end}}
base_header.tmpl
{{define "base_header"}}
<head>
<title>PAGE TITLE</title>
<meta name = "viewport" content = "width=device-width, initial-scale=1.0">
<meta charset="utf-8">
<link href = "/bootstrap/css/bootstrap.min.css" rel = "stylesheet">
</head>
{{end}}
If the catalog name differs from "bootstrap" Go language or Firefox can't load files from the templates above: bootstrap.min.css, bootstrap.min.js, jquery.
If I use not the link but the catalog name directly "bootstrap-3.3.1" than Go or Firefox can't load.
If all required files are moved under "bootstrap" I'm getting the results I expected (exactly the same as in Laravel 4).
To launch go language code the command go run go_prg.go was used.
Environment: Ubuntu 14.04, go-1.3.3, Firefox 31.
Who's wrong: Go language, Firefox or me?
Any help will be highly appreciated!
The problem described was caused by
http.Handle("/bootstrap/", http.StripPrefix("/bootstrap/", http.FileServer(http.Dir("bootstrap"))))
before any template was handled. It allowed access files under the directory 'bootstrap' only.
The problem was fixed by changing to
http.Handle( , http.StripPrefix(, http.FileServer(http.Dir("."))))
and adding to pathes for CSS and JavaScript files. Like so
/bootstrap/js/jquery">.

How to post opengraph objects

Need help understanding how opengraph works and how it relates to the "FB Like" button.
We do have opengraph meta tags deployed on all of the pages for our content. However
it looks like the only way to get "FB Like" button to work, is to run the URL thru the facebook linter.
If a user attempts to "Like" the page, that was never liked before, only the URL will be posted to the wall.
If the url is ran thru the linter, all the consecutive likes will work properly, by image, title and description will be pulled.
is it possible that the app_id is not linking properly with the pages?
having our FB Admin go and like all the content that is produced is not an option
http://www.nydailynews.com/life-style/travel/underwater-photos-amazing-shots-sea-gallery-1.1078782
<meta property="fb:app_id" content="366487756153">
<meta property="fb:admins" content="1594068001">
<meta property="og:site_name" content="NY Daily News">
<meta property="og:title" content="Mark Tipple's Underwater Project - Underwater photos: Amazing shots from under the sea">
<meta property="og:type" content="article">
<meta property="og:url" content="http://www.nydailynews.com/life-style/travel/underwater-photos-amazing-shots-sea-gallery-1.1078782">
<meta property="og:image" content="http://assets.nydailynews.com/polopoly_fs/1.1078770!/img/httpImage/image.jpg_gen/derivatives/searchthumbnail_75/image.jpg">
<meta property="og:description" content="Talented underwater photographer Mark Tipple, from Sydney, Australia, lies in wait for unsuspecting swimmers and surfers before snapping a perfect picture of them from beneath the waves.">
http://www.nydailynews.com/gossip/john-travolta-experienced-bed-passionate-hotel-romp-claims-masseur-luis-gonzalez-article-1.1079272
<meta property="fb:app_id" content="366487756153">
<meta property="fb:admins" content="1594068001">
<meta property="og:site_name" content="NY Daily News">
<meta property="og:title" content="John Travolta was 'a great kisser' and ‘very experienced’ in bed during passionate hotel romp, claims masseur Luis Gonzalez ">
<meta property="og:type" content="article">
<meta property="og:url" content="http://www.nydailynews.com/gossip/john-travolta-experienced-bed-passionate-hotel-romp-claims-masseur-luis-gonzalez-article-1.1079272">
<meta property="og:image" content="http://assets.nydailynews.com/polopoly_fs/1.1079279!/img/httpImage/image.jpg_gen/derivatives/searchthumbnail_75/image.jpg">
<meta property="og:description" content="Another hotel masseur is claiming sexual shenanigans on the part of John Travolta -- only this accuser says he welcomed the actor's horny horseplay and found him "very experienced" in bed.">
In order to get your page recognized as a custom open graph object you'll need to follow these steps.
Create a facebook app
Setup site domain, namespace etc...
Create a custom action and a custom object under app settings >> Open Graph: Getting Started (see this link for help)
After creating it, you'll see your object under object types in Open Graph Settings
Click on "Get Code" button to get the html tags
Update your page to show up the same html head & meta tags
Go to http://developers.facebook.com/tools/debug to test if you have setup everything right
P.S. Don't forget to hit correct answer if it works, or PM me for more help
Your URL isn't set to the application as noted in the linter,
Object at URL 'http://www.nydailynews.com/life-style/travel/underwater-photos-amazing-shots-sea-gallery-1.1078782' of type 'article' is invalid because the domain 'www.nydailynews.com' is not allowed for the specified application id '366487756153'. You can verify your configured 'App Domain' at https://developers.facebook.com/apps/366487756153.
Add the proper domain to your settings and it should work.

HTML parsing using pugixml or an actual HTML parser

I'm interested in using pugixml to parse HTML documents, but HTML has some optional closing tags. Here is an example: <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
Pugixml stops reading the HTML as soon as it encounters a tag that's not closed, but in HTML missing a closing tag does not necessarily mean that there is a start-end tag mismatch.
A simple test of parsing the HTML documentation of pugixml fails because the meta tag is the second line of the HTML document: http://pugixml.googlecode.com/svn/tags/latest/docs/quickstart.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>pugixml 1.0</title>
<link rel="stylesheet" href="pugixml.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
<link rel="home" href="quickstart.html" title="pugixml 1.0">
</head>
<!--- etc... -->
A lot of HTML documents in the wild would fail if I try to parse them with pugixml. Is there a way to avoid that? If there is no way to "fix" that, then is there another HTML parsing tool that's as about as fast as pugixml?
Update
It would also be great if the HTML parser also supports XPATH.
I ended up taking pugixml, converting it into an HTML parser and I created a github project for it: https://github.com/rofldev/pugihtml
For now it's not fully compliant with the HTML specifications, but it does a decent enough job at parsing HTML that I can use it. I'm working on making it compliant with the HTML specifications.
One way to address this is to do some pre-processing that converts the HTML to XHTML, then it would "officially" be considered XML and usable with XML tools. If you want to go that route, see this question: How to convert HTML to XHTML?