I'm trying to use the TagLib C++ API to read ID3v2 metadata from an arbitrary audio file. This file is not necessarily an .mp3 file, and may be of the other common audio formats. I have the following:
std::string readId3v2Tag(std::string filePath, std::string tagName) {
// read from file
TagLib::FileRef f(filePath.c_str());
if (!f.isNull() && f.file()) {
// get tags from property map
TagLib::PropertyMap tags = f.file()->properties();
if (tags.find(tag) != tags.end()) {
return std::string(tags[tag][0].toCString());
}
}
}
However, when I input an ID3v2 frame name, it doesn't return anything. I believe this is because the f.file()->properties() map contains TagLib's tag format. I must be able to access ID3v2 frames by name.
I have been told to use the ID3v2 class, however I don't see how to access this from a file, and am having trouble reading the API docs. Does anyone know how to do this?
Always read the manual: it tells you to not use the file() approach. Also properties() won't give you ID3v2 tag frames - you should iterate all of them to see their keys and values.
Instead
use MPEG::File (see manual) and from there go/read on
over ID3v2Tag() (see manual)
to frameList() (see manual).
It's pretty straightforward once the terms are clear: a file can have zero to multiple tags, where a ID3v2 tag can have one to multiple frames. The file alone however can also have multiple properties that are unbound to tags (i.e. audio duration, bit depth...) - no wonder that none of your frame names you search for won't show up in the file's properties.
I'm looking into using a rich text editor in my Django project. TinyMCE looks like the obvious solution, however i see that the output format is html (here). Goal is to store user input and then serve it inside a word document using python-docx( which is not html).
Do you know of any solution for this? Either a feature of tinyMCE or a html to word-format converter which keeps styles, or maybe another rich text editor similar to tinymce?
UPDATE:
This is another option which i found to be working fine. Still at the point of trying to convert HTML to Word without losing styles. A solution for this may be pywin32 as stated here but it doesn't help me that much + it's Windows only.
Update2
After quite some digging i found pandoc and pypandoc which appear to be able to translate in any of these output formats:
"asciidoc, beamer, commonmark, context, docbook, docbook4, docbook5, docx, dokuwiki, dzslides, epub, epub2, epub3, fb2, gfm, haddock, html, html4, html5, icml, jats, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, ms, muse, native, odt, opendocument, opml, org, plain, pptx, revealjs, rst, rtf, s5, slideous, slidy, tei, texinfo, textile, zimwiki"
I haven't figured out how to integrate such an input to python-docx.
I had the same challenge. You'll want to use Python's Beautiful Soup library to iterate through the content in your HTML editor (I use Summernote, but any HTML editor should work) then parse HTML tags into a usable format for python-docx. Pandoc and Pypandoc will convert files for you (e.g. you start with a LateX file and need to convert it to Word), but will not provide the tools to need to convert to and from xml/html.
Good luck!
I'm new in Django and im trying to achieve what mentioned...but i was not able to get some AudioField or MediaField in Django models, more or less likely ImageField.
Explaining better what i want:
I want to give to the user a form where he can fill with some information and can upload a zip file containing mp3 files. Then, in server i want to get this zip file, unzip it, get all of mp3 inside and get some information about these files (name, artist, duration, etc) and save this in my model (Music).
Is there some tutorial explaining how to achieve that or some links explaining how to work with zip files, and mp3 files?
I think all you need here are the 2 following links:
Python standard library (both 2.x.x and 3.x.x) contains module for work with zip files. https://docs.python.org/3/library/zipfile.html
i.e.:
with ZipFile('music_files.zip') as zip_file:
# get the list of files
names = zip_file.namelist()
# handle your files as you need. You can read the file with:
with zip_file.open(name) as f:
music_file = f.read()
# retrieve music_file metadata here
As for the extraction of mp3 files metadata there is a library: http://eyed3.nicfit.net
Hope it will help you.
The Field you are looking for is the FileField which is agnostic to the type of file it references.
The Python's standard library includes a package to work with Zip archives: zipfile.
You can use the eyeD3 library to extract ID3 metadata from the MP3 files.
I have a few html pages, each with a number of posts that follow a given pattern and that contain a lot of different information, among others a well-identified url and an associated name and date. I would like to produce a table containing date + name + url in separate columns and ignore the rest of the text in the document (both data and html formatting).
I was thinking of using OpenOffice and its regex functions to do so but I don’t see how I would do the actual extraction from html to a table (I am familiar with search and replace but am not sure there is a way to do extraction; Jan Dvorak’s third comments to the question on How to extract file name from random image <img> tags in Open Office speaks against it).
Is there a good way to do this text extraction, in OpenOffice or with any other tool?
Is there a good way to do this text extraction, in OpenOffice or with
any other tool?
Since you're parsing HTML, it would be easier to use an HTML parsing engine. For example in PHP you could pull all the links or all the images from a page with a few simple lines.
// Create DOM from URL or file
$html = file_get_html('path and file name');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
This could be further refined if you had some additional information about the values being pulled and how they are stored in the file.
I'm working on an Image processing project(C++) and I need to write custom metadata to jpeg file after the processing is complete. How can I accomplish this? Is there any library available to do it ?
If you're talking about EXIF Metadata you may want to look at exiv2 which is a C++ library for processing EXIF metadata. There is a second lib which is called libexif and is written in C.
Exiv2 has a view examples on their website and a the API is well documented.
UPDATE: If you want to add custom metadata you could either use the MakerNote or the Comment tag.
Exif Standard: PDF see Section 4.6.5 EXIF IFD Attribute Information Table 7, Tags Relating to User Information.
MakerNote Type Undefined Count Any
Comment Type Undefined Count Any
which means you're allowed to use those 2 tags for any data you want.