Hi i have some script that i want to write, first i took from the html the image, and then i wanted to use tesseract to take the output txt from it.
i cant really figure out how to do it.
Here is the code:
#!/usr/bin/perl -X
##########
$user = ''; # Enter your username here
$pass = ''; # Enter your password here
###########
# Server settings (no need to modify)
$home = "http://37.48.90.31";
$url = "$home/c/test.cgi?u=$user&p=$pass";
# Get HTML code
$html = `GET "$url"`;
#### Add code here:
# Grab img from HTML code
if ($html =~ /\img[^>]* src=\"([^\"]*)\"[^>]*/) {
$takeImg = $1;
}
#dirs = split m!/!, $takeImg;
$img = $dirs[2];
#########
die "<img> not found\n" if (!$img);
# Download img to server (save as: ocr_me.img)
print "GET '$img' > ocr_me.img\n";
system "GET '$img' > ocr_me.img";
#### Add code here:
# Run OCR (using shell command tesseract) on img and save text as ocr_result.txt
system ("tesseract", "tesseract ocr_me.img ocr_result");
###########
die "ocr_result.txt not found\n" if (!-e "ocr_result.txt");
# Check OCR results:
$txt = `cat ocr_result.txt`;
I took the image right from the html or i need another Regex?
and how to display the 'ocr_result.txt'
Thanks for all who will help!
Related
I already have a GCloud bucket divided by label as follows:
gs://my_bucket/dataset/label1/
gs://my_bucket/dataset/label2/
...
Each label folder has photos inside. I would like to generate the required CSV – as explained here – but I don't know how to do it programmatically, considering that I have hundreds of photos in each folder. The CSV file should look like this:
gs://my_bucket/dataset/label1/photo1.jpg,label1
gs://my_bucket/dataset/label1/photo12.jpg,label1
gs://my_bucket/dataset/label2/photo7.jpg,label2
...
You need to list all files inside the dataset folder with their complete path and then parse it to obtain the name of the folder containing the file, as in your case this is the label you want to use. This can be done in several different ways. I will include two examples from which you can base your code on:
Gsutil has a method that lists bucket contents, then you can parse the string with a bash script:
# Create csv file and define bucket path
bucket_path="gs://buckbuckbuckbuck/dataset/"
filename="labels_csv_bash.csv"
touch $filename
IFS=$'\n' # Internal field separator variable has to be set to separate on new lines
# List of every .jpg file inside the buckets folder. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.jpg`
do
# Cuts the address using the / limiter and gets the second item starting from the end.
label=$(echo $i | rev | cut -d'/' -f2 | rev)
echo "$i, $label" >> $filename
done
IFS=' ' # Reset to originnal value
gsutil cp $filename $bucket_path
It also can be done using the Google Cloud Client libraries provided for different languages. Here you have an example using python:
# Imports the Google Cloud client library
import os
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client()
# The name for the new bucket
bucket_name = 'my_bucket'
path_in_bucket = 'dataset'
blobs = storage_client.list_blobs(bucket_name, prefix=path_in_bucket)
# Reading blobs, parsing information and creating the csv file
filename = 'labels_csv_python.csv'
with open(filename, 'w+') as f:
for blob in blobs:
if '.jpg' in blob.name:
bucket_path = 'gs://' + os.path.join(bucket_name, blob.name)
label = blob.name.split('/')[-2]
f.write(', '.join([bucket_path, label]))
f.write("\n")
# Uploading csv file to the bucket
bucket = storage_client.get_bucket(bucket_name)
destination_blob_name = os.path.join(path_in_bucket, filename)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename)
For those, like me, who were looking for a way to create the .csv file for batch processing in googleAutoML, but don't need the label column :
# Create csv file and define bucket path
bucket_path="gs:YOUR_BUCKET/FOLDER"
filename="THE_FILENAME_YOU_WANT.csv"
touch $filename
IFS=$'\n' # Internal field separator variable has to be set to separate on new lines
# List of every [YOUREXTENSION] file inside the buckets folder - change in next line - ie **.png beceomes **.your_extension. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.png`
do
echo "$i" >> $filename
done
IFS=' ' # Reset to originnal value
gsutil cp $filename $bucket_path
I created a module in Drupal 8 that needs to load a csv file from the module folder, but I was unable to do it, I have already tried:
$directory = drupal_get_path('module', 'my_module');
$file = 'source.csv';
$path = $directory . '/' . $file;
kint($path);
// open the CVS file
$handle = fopen($path, 'r');
if (!$handle) {
// ...
}
But I'm getting false when loading the file, so looks like it's not the correct way.
I found a way to got it using the following code:
$file = 'source.csv';
$path = __DIR__ . '/' . $file;
// open the CVS file
$handle = #fopen($path, 'r');
if (!$handle) {
// ...
}
If there is a better way just let me know.
basically:
$moduleDir = drupal_get_path('module','my_module');
is the right way
so if your source.csv file is located under modules/MY_MODULE/files/sources.csv
then you should be able to do something like the following in your my_module.module file or elsewhere:
$file = $moduleDir . DIRECTORY_SEPARATOR . 'files' . DIRECTORY_SEPARATOR . 'sources.csv;
if(file_exists($file)){
//do your stuff
}
I have a top level dir path and I want to convert all the relative paths to absolute paths existing in all files inside this directory recursively.
e.g. I have this dir structure:
$ tree
.
|-- DIR
| |-- inner_level.ext1
| `-- inner_level.ext2
|-- top_level.ext1
`-- top_level.ext2
Content of top_level.ext1:
../../a/b/c/filename_1.txt
../../a/d/e/filename_2.txt
Assume the top level dir path is /this/is/the/abs/dir/path/
Want to convert the content of top_level.ext1 to:
/this/is/the/abs/a/b/c/filename_1.txt
/this/is/the/abs/a/d/e/filename_2.txt
Content of top_level.ext2:
cc_include+=-I../../util1/src/module1/moduleController -I../../util/src/module1/module2Controller;
cc_include+=-I../../util2/src/module2/moduleUtility;
Want to convert the content of top_level.ext2 to:
cc_include+=-I/this/is/the/abs/util1/src/module1/moduleController -I/this/is/the/abs/util/src/module1/module2Controller;
cc_include+=-I/this/is/the/abs/util2/src/module2/moduleUtility;
Also, want to apply this same conversion over the files inside DIR.
e.g.
Content of DIR/inner_level.ext1:
../../../a/b/c/filename_1.txt
../../../a/d/e/filename_2.txt
Want to convert the content of DIR/inner_level.ext1 to:
/this/is/the/abs/a/b/c/filename_1.txt
/this/is/the/abs/a/d/e/filename_2.txt
Same for the DIR/inner_level.ext2 also.
Have written this two scripts.
Conversion of top_level.ext1 is working successfully.
file_manager.sh:
#!/usr/bin/bash
file='resolve_path.pl'
basedir='/this/is/the/abs/dir/path'
run_perl(){
echo -e "\n File getting modified: $1"
cp $1 tmp.in
perl $file
mv tmp.out $1
rm tmp.in
}
find $basedir -type f |while read inputfile
do
run_perl $inputfile
done
resolve_path.pl:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use 5.010;
use Switch;
#******************************************************
# Set-up Directory And Input/Output File Names
#******************************************************
our $in_file = glob('tmp.in');
my $out_file1 = 'tmp.out';
print "Input file: $in_file\n";
#************************************
# Local and Global Variables
#*************************************
my $current_path = "/this/is/the/abs/dir/path";
my $temp_path = $current_path;
#************************************
# Open Read and Write File
#************************************
open(READ, $in_file) || die "cannot open $in_file";
open(WRITE, ">$out_file1") || die "cannot open $out_file1";
#******************************************************
# Read The Input [*.out] File Line By Line
#******************************************************
while (<READ>) {
if(/^(\.\.\/){1,}(\w+\/)*(\w+).(\w+)/){
my $file_name = $3;
my $file_ext = $4;
my #count = ($_ =~ /\.\.\//g);
my $cnt = #count;
my #prev_dir = ($_ =~ /\w+\//g);
my $prev_dir_cnt = #prev_dir;
my $file_prev_dir = join('', #prev_dir);
$temp_path = $current_path;
for(my $i=0; $i<$cnt; $i++){
if($temp_path =~m/(\/.*)\/\w+/){
$temp_path = $1;
}
}
print WRITE "$temp_path"."\/"."$file_prev_dir"."$file_name"."\."."$file_ext"."\n";
} else {
print WRITE "$_";
}
}
Issues I am facing:
No conversion is applied over top_level.ext2 & DIR/inner_level.ext2
as my Perl script is not parsing properly for ../es (i.e.
cc_include+=-I is coming at the beginning).
conversion from relative path to absolute path is not working
properly for DIR/inner_level.ext1 and a wrong path is getting
appended.
It would be helpful if someone can suggest expected changes in my scripts to solve the above said two issues.
Why the 2 scripts? That's inefficient.
Perl is perfectly capable of retrieving the list of files and has modules which simplifies that process as well as modules to parse and alter the paths.
File::Find - Traverse a directory tree.
File::Find::Rule - Alternative interface to File::Find
File::Basename - Parse file paths into directory, filename and suffix.
File::Spec - portably perform operations on file names
I want to convert my WordPress website to a static site on GitHub using Jekyll.
I used a plugin that exports my 62 posts to GitHub as Markdown. I now have these posts with extra frontmatter at the beginning of each file. It looks like this:
---
ID: 51
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
post_excerpt: ""
layout: post
permalink: >
https://myurl.com/slug
published: true
sw_timestamp:
- "399956"
sw_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
sw_cache_timestamp:
- "408644"
swp_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
swp_open_graph_image_data:
- '["https://i0.wp.com/myurl.com/wp-content/uploads/2014/08/Featured_image.jpg?fit=800%2C400&ssl=1",800,400,false]'
swp_cache_timestamp:
- "410228"
---
This block isn't parsed right by Jekyll, plus I don't need all this frontmatter. I would like to have each file's frontmatter converted to
---
ID: 51
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
layout: post
published: true
---
I would like to do this with regular expressions. But my knowledge of regex is not that great. With the help of this forum and lots of Google searches I didn't get very far. I know how to find the complete piece of frontmatter but how do I replace it with a part of it as specified above?
I might have to do this in steps, but I can't wrap my head around how to do this.
I use Textwrangler as the editor to do the search and replace.
YAML (and other relatively free formats like HTML, JSON, XML) is best not transformed using regular expressions, it is easy to work for one example and break for the next that has extra whitespace, different indentation etc.
Using a YAML parser in this situation is not trivial, as many either expect a single YAML document in the file (and barf on the Markdown part as extraneous stuff) or expect multiple YAML documents in the file (and barf because the Markdown is not YAML). Moreover most YAML parser throw away useful things like comments and reorder mapping keys.
I have used a similar format (YAML header, followed by reStructuredText) for many years for my ToDo items, and use a small Python program to extract and update these files. Given input like this:
---
ID: 51 # one of the key/values to preserve
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
post_excerpt: ""
layout: post
permalink: >
https://myurl.com/slug
published: true
sw_timestamp:
- "399956"
sw_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
sw_cache_timestamp:
- "408644"
swp_open_thumbnail_url:
- >
https://myurl.com/wp-content/uploads/2014/08/Featured_image.jpg
swp_open_graph_image_data:
- '["https://i0.wp.com/myurl.com/wp-content/uploads/2014/08/Featured_image.jpg?fit=800%2C400&ssl=1",800,400,false]'
swp_cache_timestamp:
- "410228"
---
additional stuff that is not YAML
and more
and more
And this program ¹:
import sys
import ruamel.yaml
from pathlib import Path
def extract(file_name, position=0):
doc_nr = 0
if not isinstance(file_name, Path):
file_name = Path(file_name)
yaml_str = ""
with file_name.open() as fp:
for line_nr, line in enumerate(fp):
if line.startswith('---'):
if line_nr == 0: # don't count --- on first line as next document
continue
else:
doc_nr += 1
if position == doc_nr:
yaml_str += line
return ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
def reinsert(ofp, file_name, data, position=0):
doc_nr = 0
inserted = False
if not isinstance(file_name, Path):
file_name = Path(file_name)
with file_name.open() as fp:
for line_nr, line in enumerate(fp):
if line.startswith('---'):
if line_nr == 0:
ofp.write(line)
continue
else:
doc_nr += 1
if position == doc_nr:
if inserted:
continue
ruamel.yaml.round_trip_dump(data, ofp)
inserted = True
continue
ofp.write(line)
data = extract('input.yaml')
for k in list(data.keys()):
if k not in ['ID', 'post_title', 'author', 'post_date', 'layout', 'published']:
del data[k]
reinsert(sys.stdout, 'input.yaml', data)
You get this output:
---
ID: 51 # one of the key/values to preserve
post_title: Here's my post title
author: Frank Meeuwsen
post_date: 2014-07-03 22:10:11
layout: post
published: true
---
additional stuff that is not YAML
and more
and more
Please note that the comment on the ID line is properly preserved.
¹ This was done using ruamel.yaml a YAML 1.2 parser, which tries to preserve as much information as possible on round-trips, of which I am the author.
Editing my post because I misinterpreted the question the first time, I failed to understand that the actual post was in the same file, right after the ---
Using egrep and GNU sed, so not the bash built-in, it's relatively easy:
# create a working copy
mv file file.old
# get only the fields you need from the frontmatter and redirect that to a new file
egrep '(---|ID|post_title|author|post_date|layout|published)' file.old > file
# get everything from the old file, but discard the frontmatter
cat file.old |gsed '/---/,/---/ d' >> file
# remove working copy
rm file.old
And if you want it all in one go:
for i in `ls`; do mv $i $i.old; egrep '(---|ID|post_title|author|post_date|layout|published)' $i.old > $i; cat $.old |gsed '/---/,/---/ d' >> $i; rm $i.old; done
For good measure, here's what I wrote as my first response:
===========================================================
I think you're making this way too complicated.
A simple egrep will do what you want:
egrep '(---|ID|post_title|author|post_date|layout|published)' file
redirect to a new file:
egrep '(---|ID|post_title|author|post_date|layout|published)' file > newfile
a whole dir at once:
for i in `ls`; do egrep '(---|ID|post_title|author|post_date|layout|published)' $i > $i.new; done
In cases like yours it is better to use actual YAML parser and some scripting language. Cut off metadata from each file to standalone files (or strings), then use YAML library to load the metadata. Once the metadata are loaded, you can modify them safely with no trouble. Then use serialize method from the very same library to create a new metadata file and finally put the files back together.
Something like this:
<?php
list ($before, $metadata, $after) = preg_split("/\n----*\n/ms", file_get_contents($argv[1]));
$yaml = yaml_parse($metadata);
$yaml_copy = [];
foreach ($yaml as $k => $v) {
// copy the data you wish to preserve to $yaml_copy
if (...) {
$yaml_copy[$k] = $yaml[$k];
}
}
file_put_contents('new/'.$argv[1], $before."\n---\n".yaml_emit($yaml_copy)."\n---\n".$after);
(It is just an untested draft with no error checks.)
You could do it with gawk like this:
gawk 'BEGIN {RS="---"; FS="\000" } (FNR == 2) { print "---"; split($1, fm, "\n"); for (line in fm) { if ( fm[line] ~ /^(ID|post_title|author|post_date|layout|published):/) {print fm[line]} } print "---" } (FNR > 2) {print}' post1.html > post1_without_frontmatter_fields.html
You basically want to edit the file. That is what sed (stream editor) is for.
sed -e s/^ID:(*)$^post_title:()$^author:()$^postdate:()$^layout:()$^published:()$/ID:\1\npost_title:\2\nauthor:\3\npostdate:\4\nlayout:\5\npublished:\6/g
You also can use python-frontmatter:
import frontmatter
import io
from os.path import basename, splitext
import glob
# Where are the files to modify
path = "*.markdown"
# Loop through all files
for fname in glob.glob(path):
with io.open(fname, 'r') as f:
# Parse file's front matter
post = frontmatter.load(f)
for k in post.metadata:
if k not in ['ID', 'post_title', 'author', 'post_date', 'layout', 'published']:
del post[k]
# Save the modified file
newfile = io.open(fname, 'w', encoding='utf8')
frontmatter.dump(post, newfile)
newfile.close()
If you want to see more examples visit this page
Hope it helps.
I am running Net-Snmp (environment is a virtual machine running Linux Mint OS 11) and have configured it to send trap information to a text file that I have called trapd.txt.
If I reboot the VM, any trap that is generated is sent to the file no problem. However If I run a C++ program using ifstream to open it and then close it no trap information can be written to it again until I reboot.
When I generate a trap during this state I will sometimes even see the trapd.txt file flicker in the GUI as if it tried to write but failed. This situation happens if I do a clean reboot and run the following code and it alone:
ifstream file;
file.open("trapd.txt");
if(file)
cout<<"open"<<endl;
file.close();
file.open("nothing.txt");
file.close();
exit(0);
Clearly this code is not changing permissions or the SNMP configuration files. The only reason I can think that would prevent trap information from coming in afterwards is that the ifstream is not actually getting closed all the way.
If you have any ideas for a fix or a work around or any insight whatsoever I will be extremely grateful! This is a fairly important to me...
Here's my snmp.conf file:
oidOutputFormat 1
oidOutputFormat 5
logTimestamp yes
escapeQuotes yes
snmptrapd.conf:
authCommunity log,execute,net public
authCommunity log,execute,net private
outputOption auSs
logOption f /home/utd/Desktop/REPO/src/Manager/trapd.txt
snmpd.conf:
authtrapenable 1
master all
linkUpDownNotifications yes
defaultMonitors yes
trap2sink localhost public
rwcommunity private localhost
rocommunity public localhost
###############################################################################
#
# EXAMPLE.conf:
# An example configuration file for configuring the Net-SNMP agent ('snmpd')
# See the 'snmpd.conf(5)' man page for details
#
# Some entries are deliberately commented out, and will need to be explicitly activated
#
###############################################################################
#
# AGENT BEHAVIOUR
#
# Listen for connections from the local system only
agentAddress udp:127.0.0.1:161
# Listen for connections on all interfaces (both IPv4 *and* IPv6)
#agentAddress udp:161,udp6:[::1]:161
###############################################################################
#
# SNMPv3 AUTHENTICATION
#
# Note that these particular settings don't actually belong here.
# They should be copied to the file /var/lib/snmp/snmpd.conf
# and the passwords changed, before being uncommented in that file *only*.
# Then restart the agent
# createUser authOnlyUser MD5 "remember to change this password"
# createUser authPrivUser SHA "remember to change this one too" DES
# createUser internalUser MD5 "this is only ever used internally, but still change the password"
# If you also change the usernames (which might be sensible),
# then remember to update the other occurances in this example config file to match.
###############################################################################
#
# ACCESS CONTROL
#
# system + hrSystem groups only
view systemonly included .1.3.6.1.2.1.1
view systemonly included .1.3.6.1.2.1.25.1
# Full access from the local host
# Default access to basic system info
# Full access from an example network
# Adjust this network address to match your local
# settings, change the community string,
# and check the 'agentAddress' setting above
# Full read-only access for SNMPv3
rouser authOnlyUser
# Full write access for encrypted requests
# Remember to activate the 'createUser' lines above
#rwuser authPrivUser priv
# It's no longer typically necessary to use the full 'com2sec/group/access' configuration
# r[ou]user and r[ow]community, together with suitable views, should cover most requirements
###############################################################################
#
# SYSTEM INFORMATION
#
# Note that setting these values here, results in the corresponding MIB objects being 'read-only'
# See snmpd.conf(5) for more details
sysContact Me <me#example.org>
# Application + End-to-End layers
sysServices 72
#
# Process Monitoring
#
# At least one 'mountd' process
proc mountd
# No more than 4 'ntalkd' processes - 0 is OK
proc ntalkd 4
# At least one 'sendmail' process, but no more than 10
proc sendmail 10 1
# Walk the UCD-SNMP-MIB::prTable to see the resulting output
# Note that this table will be empty if there are no "proc" entries in the snmpd.conf file
#
# Disk Monitoring
#
# 10 MB required on root disk, 5% free on /var, 10% free on all other disks
disk / 10000
disk /var 5%
includeAllDisks 10%
# Walk the UCD-SNMP-MIB::dskTable to see the resulting output
# Note that this table will be empty if there are no "disk" entries in the snmpd.conf file
#
# System Load
#
# Unacceptable 1-, 5-, and 15-minute load averages
load 12 10 5
# Walk the UCD-SNMP-MIB::laTable to see the resulting output
# Note that this table *will* be populated, even without a "load" entry in the snmpd.conf file
###############################################################################
#
# ACTIVE MONITORING
#
# Send SNMPv1 traps
# Send SNMPv2c traps
# Send SNMPv2c INFORMs
# Note that you typically only want *one* of these three lines
# Uncommenting two (or all three) will result in multiple copies of each notification.
#
# Event MIB - automatically generate alerts
#
# Remember to activate the 'createUser' lines above
iquerySecName internalUser
rouser internalUser
# Generate traps on UCD error conditions
# Generate traps on linkUp/Down
###############################################################################
#
# EXTENDING THE AGENT
#
#
# Arbitrary extension commands
#
extend test1 /bin/echo Hello, world!
extend-sh test2 echo Hello, world! ; echo Hi there ; exit 35
#perl $debugging = \'1\';
#perl $verbose = \'1\';
#perl {$regat = \'.1.3.6.1.4.1.8072.999\'; $extenstion = \'1\'; $mibdata = \'/etc/passwd\'; $delimT=\'\'; $delimV=\':\'; do \'/etc/snmp/snmpagent.pl\';}
#perl print STDERR 'Test'
#perl $debugging = '1';
#perl $verbose = '1';
#perl $regat = '.1.3.6.1.4.8072.999';
#perl $extenstion = '1';
#perl $mibdata = '/etc/passwd';
#perl $delimT='';
#perl $delimV=':';
#perl do '/home/utd/snmpagent.pl';
#perl print STDERR 'Now loading Perl extensions...\n'
#perl $mibdata = "dick.txt";
#perl do '/home/utd/mymod.pl';
#extend-sh test3 /bin/sh /tmp/shtest
# Note that this last entry requires the script '/tmp/shtest' to be created first,
# containing the same three shell commands, before the line is uncommented
# Walk the NET-SNMP-EXTEND-MIB tables (nsExtendConfigTable, nsExtendOutput1Table
# and nsExtendOutput2Table) to see the resulting output
# Note that the "extend" directive supercedes the previous "exec" and "sh" directives
# However, walking the UCD-SNMP-MIB::extTable should still returns the same output,
# as well as the fuller results in the above tables.
#
# "Pass-through" MIB extension command
#
#pass .1.3.6.1.4.1.8072.2.255 /bin/sh PREFIX/local/passtest
#pass .1.3.6.1.4.1.8072.2.255 /usr/bin/perl PREFIX/local/passtest.pl
# Note that this requires one of the two 'passtest' scripts to be installed first,
# before the appropriate line is uncommented.
# These scripts can be found in the 'local' directory of the source distribution,
# and are not installed automatically.
# Walk the NET-SNMP-PASS-MIB::netSnmpPassExamples subtree to see the resulting output
#
# AgentX Sub-agents
#
# Run as an AgentX master agent
master agentx
# Listen for network connections (from localhost)
# rather than the default named socket /var/agentx/master
#agentXSocket tcp:localhost:705
perl $mibdata = "/etc/snmp/agenty.conf";
perl do "/etc/snmp/agenty.pl";
The problem's origin was actually from the editing and saving of the file itself by myself using gedit. While I still do not understand why this would cause the issue I can work around it by not editing the file. Thanks to everyone who replied.