I am trying to use the textcleaner script for cleaning up real life images that I am using with OCR. The issue I am having is that the images sent to me are rather large sometimes. (3.5mb - 5mb 12MP pics) The command I run with textcleaner ( textcleaner -g -e none -f <int # 10 - 100> -o 5 result1.jpg out1.jpg ) takes about 10 seconds at -f 10 and minutes or more on a -f 100.
To get around this I tried using ImageMagick to compress the image so it was much smaller. Using convert -strip -interlace Plane -gaussian-blur 0.05 -quality 50% main.jpg result1.jpg I was able to take a 3.5mb and convert it almost loss-lessly to ~400kb. However when I run textcleaner on this new file it STILL acts like its a 3.5mb file. (Times are almost exactly the same). I have tested these textcleaner settings against a file NOT compressed #400kb and it is almost instant while -f 100 takes about 12 seconds.
I am about out of ideas. I would like to follow the example here as I am in almost exactly the same situation. However, at the current speed of transformation an entire OCR process could take over 10 minutes when I need this to be around 30 seconds.
Related
I'm using Rasp4 and I'm trying to let rasp speaks using gtts-cli. It works but the first time I play it skips 1 second.
I run this command:
gtts-cli -l en 'Good morning my dear friend' | mpg321 -q -
It works but the first time I run it, it misses the word Good, then if I run it again quickly after finished the first command it includes all the words, and sounds ok. If I wait for a minute and try again I get the same problem.
Then I try to create an mp3 from the gtts-cli command with this:
gtts-cli -l en 'Good morning my dear friend' --output test.mp3
Then if I play it with mpg321 I have the same problem, so it's not gtts-cli.
I try different players like play from sox but same issue.
RESOLVED: check this out:
https://raspberrypi.stackexchange.com/questions/132715/skip-1second-in-play-mp3-the-first-time/132722?noredirect=1#comment225721_132722
Working hard for 4 days now to fix the google cloud speech to text api to work, but still see no light at the end of the tunnel. Searched on the net a lot, read the documentations a lot but see no result.
Our site is bbsradio.com, we are trying to auto extract transcript from our mp3 files using google speech-to-text api. Code is written on PHP and almost exact copy of this: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/speech/src/transcribe_async.php
I see process is completed and its reached out here "$operation->pollUntilComplete();" but its not showing it was successful at "if ($operation->operationSucceeded()) {" and its not returning any error either at $operation->getError().
I am converting the mp3 to raw file like this: ffmpeg -y -loglevel panic -i /public_html/sites/default/files/show-archives/audio-clips-9-23-2020/911freefall2020-05-24.mp3 -f s16le -acodec pcm_s16le -vn -ac 1 -ar 16000 -map_metadata -1 /home/mp3_to_raw/911freefall2020-05-24.raw
While tried with FLAC format as well, not worked. I tested converted FLAC file using windows media player, I can listen conversation clearly. I checked the files its Hz 16000, channel = 1 and its 16 bit. I see file is uploaded in cloud storage. Checked this:
https://cloud.google.com/speech-to-text/docs/troubleshooting and
https://cloud.google.com/speech-to-text/docs/best-practices
There are lot of discussion and documentation, seems nothing is helpful at this moment. If some one can really help me out to find out the issue, it will be really really really great!
TLDR; convert from MP3 to a 1-channel FLAC file with the same sample rate as your MP3 file.
Long explanation:
Since you're using MP3 files as your process input, probably you MP3 compression artifacts might be hurting you when you resample to to 16KHz (you cannot hear this, but the algoritm will).
To confirm this theory:
Execute ffprobe -hide_banner filename.mp3 it will output something like this:
Metadata:
...
Duration: 00:02:12.21, start: 0.025057, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : LAME3.99r
In this case, the sample rate is OK for Google-Spech-Api. Just transcode the file without changing the sample rate (remove the -ar 16000 from your ffmpeg command)
You might get into trouble if the original MP3 bitrate is low. 320kb/s seems safe (unless the recording has a lot of noise).
Take into account that voice recoded under 64kb/s (ISDN line quality) can be understood only by humans if there is some noise.
At last I found the solution and reason of the issue. Actually getting empty results is a bug of the php api code. What you need to do:
Replace this:
$operation->pollUntilComplete();
by this:
while(!$operation->isDone()){
$operation->pollUntilComplete();
}
Read this: enter link description here
I need to extract the number of pages and their sizes in px/mm/cm/some-unit from PDF files using Python (sadly, 2.7, because it's a legacy project). The problem is that the files can be truly huge (hundreds of MiBs) because they'll contain large images.
I do not care for this content and I really want just a list of page sizes from the file, with as little consumption of RAM as possible.
I found quite a few libraries that can do that (included, but not limited, to the ones in the answers here), but none provide any remarks on the memory usage, and I suspect that most of them - if not all - read the whole file in memory before doing anything with it, which doesn't fit my purpose.
Are there any libraries that extract only structure and give me the data that I need without clogging my RAM?
pyvips can do this. It loads the file structure when you open the PDF and only renders each page when you ask for pixels.
For example:
#!/usr/bin/python
import sys
import pyvips
i = 0
while True:
try:
x = pyvips.Image.new_from_file(sys.argv[1], dpi=300, page=i)
print("page =", i)
print("width =", x.width)
print("height =", x.height)
except:
break
i += 1
libvips 8.7, due in another week or so, adds a new metadata item called n-pages you can use to get the length of the document. Until that is released though you need to just keep incrementing the page number until you get an error.
Using this PDF, when I run the program I see:
$ /usr/bin/time -f %M:%e ./sizes.py ~/pics/r8.pdf
page = 0
width = 2480
height = 2480
page = 1
width = 2480
height = 2480
page = 2
width = 4960
height = 4960
...
page = 49
width = 2480
height = 2480
55400:0.19
So it opened 50 pages in 0.2s real time, with a total peak memory use of 55mb. That's with py3, but it works fine with py2 as well. The dimensions are in pixels at 300 DPI.
If you set page to -1, it'll load all the pages in the document as a single very tall image. All the pages need to be the same size for this though, sadly.
Inspired by the other answer, I found that libvips, which is suggested there, uses poppler (it can fall back to some other library if it cannot find poppler).
So, instead of using the superpowerful pyvips, which seems great for multiple types of documents, I went with just poppler, which has multiple Python libraries. I picked pdflib and came up with this solution:
from sys import argv
from pdflib import Document
doc = Document(argv[1])
for num, page in enumerate(doc, start=1):
print(num, tuple(2.54 * x / 72 for x in page.size))
The 2.54 * x / 72 part converts from px to cm, nothing more.
Speed and memory test on a 264MiB file with one huge image per page:
$ /usr/bin/time -f %M\ %e python t2.py big.pdf
1 (27.99926666666667, 20.997333333333337)
2 (27.99926666666667, 20.997333333333337)
...
56 (27.99926666666667, 20.997333333333337)
21856 0.09
Just for the reference, if anyone is looking a pure Python solution, I made a crude one which is available here. Not thoroughly tested and much, much slower than this (some 30sec for the above).
I'm working with IP camera, and I have got Jpeg frames and audio data (PCM) from camera.
Now, I want to create video file (both audio and video) under .avi or .mp4 format from above data.
I searched and I knew that ffmpeg library can do it. But I don't know how to using ffmpeg to do this.
Can you suggest me some sample code or the function of ffmpeg to do it?
If your objective is to write a c++ app to do this for you please disregard this answer, I'll just leave it here for future reference. If not, here's how you can do it in bash:
First, make sure your images are in a nice format, easy to handle by ffmpeg. You can copy the images to a different directory:
mkdir tmp
x=1; for i in *jpg; do counter=$(printf %03d $x); cp "$i" tmp/img"$counter".jpg; x=$(($x+1)); done
Copy your audio data to the tmp directory and encode the video. Let's say your camera took a picture every ten seconds:
cd tmp
ffmpeg -i audio.wav -f image2 -i img%03d.jpg -vcodec msmpeg4v2 -r 0.1 -intra out.avi
Where -r 0.1 indicates a framerate of 0.1 which is one frame every 10 seconds.
The possible issues here are:
Your audio/video might go slightly out of sync unless you calculate your desired framerate carefully in advance. You should be able to get the length of the audio (or video) using ffmpeg and some grep magic. Even so the sync might be an issue with longer clips.
if you have more than 999 images the %03d format will not be enough, make sure to change the 3 to the desired length of the index
The video will inherit its length from the longer of the streams, you can restrict it using the -t switch:
-t duration - Restrict the transcoded/captured video sequence to the duration specified in seconds. "hh:mm:ss[.xxx]" syntax is also supported.
My RRD file is not updating, what is the reason?
The graph shows the legend with: -nanv
I created the RRD file using this syntax:
rrdtool create ups.rrd --step 300
DS:input:GAUGE:600:0:360
DS:output:GAUGE:600:0:360
DS:temp:GAUGE:600:0:100
DS:load:GAUGE:600:0:100
DS:bcharge:GAUGE:600:0:100
DS:battv:GAUGE:600:0:100
RRA:AVERAGE:0.5:12:24
RRA:AVERAGE:0.5:288:31
Then I updated the file with this syntax:
rrdtool update ups.rrd N:$inputv:$outputv:$temp:$load:$bcharge:$battv
And graphed it with this:
rrdtool graph ups-day.png
-t "ups "
-s -1day
-h 120 -w 616
-a PNG
-cBACK#F9F9F9
-cSHADEA#DDDDDD
-cSHADEB#DDDDDD
-cGRID#D0D0D0
-cMGRID#D0D0D0
-cARROW#0033CC
DEF:input=ups.rrd:input:AVERAGE
DEF:output=ups.rrd:output:AVERAGE
DEF:temp=ups.rrd:temp:AVERAGE
DEF:load=ups.rrd:load:AVERAGE
DEF:bcharge=ups.rrd:bcharge:AVERAGE
DEF:battv=ups.rrd:battv:AVERAGE
LINE:input#336600
AREA:input#32CD3260:"Input Voltage"
GPRINT:input:MAX:" Max %lgv"
GPRINT:input:AVERAGE:" Avg %lgv"
GPRINT:input:LAST:"Current %lgv\n"
LINE:output#4169E1:"Output Voltage"
GPRINT:output:MAX:"Max %lgv"
GPRINT:output:AVERAGE:" Avg %lgv"
GPRINT:output:LAST:"Current %lgv\n"
LINE:load#FD570E:"Load"
GPRINT:load:MAX:" Max %lg%%"
GPRINT:load:AVERAGE:" Avg %lg%%"
GPRINT:load:LAST:" Current %lg%%\n"
LINE:temp#000ACE:"Temperature"
GPRINT:temp:MAX:" Max %lgc"
GPRINT:temp:AVERAGE:" Avg %lgc"
GPRINT:temp:LAST:" Current %lgc"
You will need at least 13 updates, each 5min apart (IE, 12 PDP (primary data points)) before you can get a single CDP (consolidated data point) written to your RRAs, enabling you to get a data point on the graph. This is because your smallest resolution RRA is a Count 12, meaning you need 12 PDP to make one CDP.
Until you have enough data to write a CDP, you have nothing to graph, and your graph will always have unknown data.
Alternatively, add a smaller resolution RRA (maybe Count 1) so that you do not need to collect data for so long before you have a full CDP.
The update script needs to be run at exactly the same interval as defined in your database.
I see it has a step value of 300 so the database should be updated every 5 minutes.
Just place you update script in a cron job (you can also do it for your graph script)
For example,
sudo crontab -e
If run for the first time choose your favorite editor (I usually go with Vim) and add the full path location of your script and run it every 5 minutes. So add this (don't forget to rename the path):
*/5 * * * * /usr/local/update_script > /dev/null && /usr/local/graph_script > /dev/null
Save it, and wait a couple of minutes. I usually redirect the output to /dev/null in case of the output that can be generated by a script. So if a script that will be executed gives an output crontab will fail and send a notification.