I can't shed much light on other media, but I've put serious effort into my ebook collection.
I collected more than 3 or 4 thousand old fashioned paper books by the early oughts, and I was running out of space.
Plus, I can stick about 50-60 thousand ebook files on a 1 TB thumb drive. 2 TB if I'm remediating ebook flaws
and saving originals on the same media.
I have a hierarchical file system, based on subject, on my network file share. That may well out live me, since it's implemented as a zraid2 array in two volumes of 6 drives each. The component hard drives are the old Samsung HD204UI 2 TB units that'd you'd practically need an fire axe to kill. But that system also badly needs an upgrade: it's about a dozen years in service by this time.
As you'd expect from a math geek, the math ebook directory hierarchy is the most broadly elaborated one. OTOH, archaeology, anthropology, and history go into a common \History\ directory because that's how I
use books on archaeology and anthropology.
The magic is in the format of the ebook file names themselves. It's a standard, broken up into four space separated sections:
Title (Author) (Mon Year) ISBN
Example:
Catastrophe Theory, 3rd Ed (Vladimir I. Arnold) (Jan 2004) 3540548114.pdf
ISBN is ISBN-10 for publication dates through 2005, ISBN-13 for 2006 and later. I collect the the raw ISBN from the file itself whenever it's possible (about 999 times out of 1000).
I deal with the colon, an NTFS reserved symbol, and present in about half the history books published since 1980, by replacing it with the utf-8 character ":", a.k.a
"fullwidth colon" U+FF1A. I use a trick that's a bit more complex to replace forward slashes in titles. The intent is to generate something that
reads to the Mark I eyeball as ":" or "/" even though it's really a utf-8 and/or multi-character hack.
And every non-fiction ebook is listed in my catalog spreadsheet that indexes on the title string and contains ...
- legibility data (E, VG, G, F-G, F, etc),
- file size in bytes
- metadata type and breadth ("subchaptered" vs "chaptered" vs etc...) ToC, which metadata types are hyperlinked (if any), flaws in cover/virtual paging/body text legibility, etc
- geometric data on PDFs (page size in postscript points, image DPI for covers, etc)
- page number if the file is a PDF or DJVU.
- Which ebook aspects, if any, need remediation, and which have a
eady been remediated.
That last is a huge deal, since ebook publishing is still in diapers and essential navigational metadata is often incomplete and/or absent. Doubly so if the ebook in question is bootleg, which a hell of a lot are liable to be since they were published before ebooks were even
thought of as an actual form of book publishing.
For example, I have never seen anything that Vladimir I. Arnold wrote in his entire life that was actually published as an ebook at the same time it hit the bricks as a conventional book. He died in 2010.
But if you're seriously interested in qualitative behavior of dynamical systems, you're probably going to want to read Arnold's stuff. He practically founded the study of dynamical systems in Russia, decades before anybody in the West knew anything more than what conventional ODE theory could tell them.
So if you find one of Arnold's works as a PDF, you're probably going to have to build a bookmark table of contents yourself in order to navigate the thing. And you may have to fix the virtual paging too. It may also lack a cover.
These flaws are almost always fixable.
You will need at least one PDF editor. I use two: PDF-XChange and Nitro PDF, plus several freeware tools to do things like insert a bookmark tables of contents, replace missing or substandard covers, correct virtual page tables (which are very often botched or simply not there), etc.
Don't even get me started about DJVUs. I won't stop raving and frothing at the mouth for a good quarter hour. And damnit, Einar Hille's lovely series of books on Applied and Computational Complex Analysis has never surfaced as a decent PDF. Only as DJVUs, and with a shitty low resolution at that.
Make that a half hour of screaming and raving if the subject is MOBIs.
Calibre is
the must-have tool for dealing with flawed EPUBs. These days, I don't use Sigil at all.
Note: there are several formats you
cannot remediate because no tool exists. Two outstanding villains are CHM, which is god-thankfully almost totally obsolete by now, and MOBI, which will, regrettably, probably be alive and kicking long after I'm fertilizing a lawn from below.
A practical ebook catalog needs to include detailed entries on both both the original copy and its current progeny.
I'm still using a spreadsheet, but
I really ought to rework that into a database sometime soon. The spreadsheet I'm using has about two dozen fields to handle all the data about file name, format, condition, legibility, navigational areas (table of contents, linked reference, & index if any), dimensional data for PDFs & DJVUs, remediations possible, required, and performed, if any, etc. And the god damned thing has more than 50,000 lines by this time. So updating it is slow as pouring half-frozen ketchup.