How to organize collections

38 Replies, 4142 Views

I have a serious collection of geological specimens amassed over many years that I intend to bequeath to a university. One thing universities don't like is a solely computerised catalogue system, technology changes too fast. A dab of white paint on each specimen, then a unique number in pencil and seal it with clear nail polish. Re the cataloguing, I use LibreOfficeCalc and print hard copies regularly.

Perhaps related, a east European guy came to read my electricity meter recently. He said 'You are professori?'
I said ' No. why do you think that?'
'Ah. the specimens, the books'.
(This post was last modified: 14 Mar 2020, 22:12 by Culmor.)
(14 Mar 2020, 21:47 )Culmor Wrote: I have a serious collection of geological specimens amassed over many years
Same 😊 Not a "serious" but still a collection. Quite unorganized I must say.

(14 Mar 2020, 21:47 )Culmor Wrote: A dab of white paint on each specimen, then a unique number in pencil and seal it with clear nail polish.
That's something I definitely do not like. I prefer to keep my rocks as "untouched" as possible out of both aesthetics and "respect for the stone". A plastic box with a label - ideal. But I'm too far from this as well. And I hate glue.

(14 Mar 2020, 21:47 )Culmor Wrote: the cataloguing, I use LibreOfficeCalc and print hard copies regularly.
Still have to do that for the rock collection...
(This post was last modified: 14 Mar 2020, 22:12 by Like Ra.)
Quote:That's something what I definitely do not like.


I don't like doing it either but once your collection gets into three or four figures it's the only option. I'm as discreet as I can be but there's no other way I could keep track of stuff.
For music I use MusicBrainz Picard to tag and put the files in the correct folders according to my own sorting method. Picard is cross platform. For listening I use Strawberry player. If you really want a nerd/geeks/OCD dream have a look at the terminal program beets https://beets.io/ You can write your own Python plugins for it too. I would use it myself but the initial setup and library importing would take me some time and I'd be constantly referencing the manual and tweaking the config file until I got it just right.

Photos all go in folders using this basic pattern YYYY/YYYY-MM-DD - Name_or_subject/photo_number_and_or_name.extension

For films/TV I use the Kodi naming conventions so it can look them up online. Films go in their own folder:
Films/Film Name (2021)/Film Name (2021).mp4
Films/Film Name (2021)/Film Name (2021)/Subs/Eng.sub
Films/Film Name (2021)/Film Name (2021)/Extras/Trailer.mp4
You need the Extras plugin and to change a setting by hand in a config file to use the Extras folder scheme.

And TV shows go in:
TV Shows/TV Show Name/S01.E01. Episode Name.mp4
TV Shows/TV Show Name/S01.E02. Episode Name.mp4
And so on.

They need to be separate in Films and TV shows folders so when you add the Films and TV Shows folders to Kodi the correct scraping method is used or you get really messed up results.
(01 May 2021, 14:58 )cjtl Wrote: For music I use MusicBrainz Picard to tag
I'm disappointed with Picard as it crashes on my collection if used on more, than one album, does not support CUE, APE, WV, does not have the bit rate field in the GUI. E.g. Clementine and Quod Libet can do this.

(01 May 2021, 14:58 )cjtl Wrote: Photos all go in folders using this basic pattern YYYY/YYYY-MM-DD
Same until this point 😊 Sometimes (for travel photos) I add the location. For tagging I use Digikam, which can do a reasonable job at face detection and tagging. QNAP Photo Magic can even recognize objects (but very unreliable) and even supports Google Coral TPU module, but I hate the interface.
(02 May 2021, 01:37 )Like Ra Wrote:
(01 May 2021, 14:58 )cjtl Wrote: For music I use MusicBrainz Picard to tag
I'm disappointed with Picard as it crashes on my collection if used on more, than one album, does not support CUE, APE, WV, does not have the bit rate field in the GUI. E.g. Clementine and Quod Libet can do this.

(01 May 2021, 14:58 )cjtl Wrote: Photos all go in folders using this basic pattern YYYY/YYYY-MM-DD
Same until this point 😊 Sometimes (for travel photos) I add the location. For tagging I use Digikam, which can do a reasonable job at face detection and tagging. QNAP Photo Magic can even recognize objects (but very unreliable) and even supports Google Coral TPU module, but I hate the interface.

My needs are simple so Picard is enough. I've not had it crash on multiple albums though. If I have a .cue and an ape,flac,wv file I use a splitter to process the .cue and spit out stand alone tracks. I still keep the original files.
Beets sounded just the ticket for me, until I found out how involved it would be to initially setup. What I have works and I don't have a lot of music so it's easy to manage.
So I'm not motivated enough to go through all the hassle. The cost would far out weight any benefit. I understand that some people with massive collections do find beets very useful though.

I wrote a simple Python script to copy photos off a camera using the path I mentioned, then hash the files and append the new files hashes to a list.
I wrote it before I started using a fully checksumming filesystem so as I mirrored the files to different PCs I would always know if one became corrupt by checking the hashes. Sometime the corruption in photos is not always obvious unless you overlay then and subtract the difference between them. And that's not counting metadata corruption. Having a list of hashes I can check them against does away with all that. My script isn't portable so I doubt it would be of value to anyone else. You could also use tools from the md5deep and hashdeep collection http://md5deep.sourceforge.net/

My Python script outputs hash files which are compatible with those tools, so to check the photos I just run a batch file or shell script which in turn runs those hash tools automatically for me. Writing that part as a shell script is preferable to trying to remember the switches as I seem to recall the order is important or something like that.
I still use that Python script as I still mirror my archive of photos on to a Windows PC, so having the hashes is still useful.
Documents? TV shows? Music? Porn? Hypno? Doesn't matter what it is, how do you organise it?

I have a lot of stuff on my PC and I passed the point of a folder structure being "good enough" 10's of TB ago

My normal media (TV shows, Movies, Music) is well taken care of by plex so that's a non issue.

My books I have in Calibre-Web. Automatic book metadata (at least for what I read) is terrible so I need to add all of that metadata manually but there are many sources I can pull from and once its added the experience is great.

My general data (code, projects, virtual machines, games) is taken care of in a "fine" way. Not super happy with it but it isn't causing me any severe issues other than a few extra clicks to find something if I don't remember where it is. It could absolutely be improved but there aren't any good solutions for adding metadata to general purpose files. I have the majority of stuff in a sane folder structure while my code repos are spread across both git and p4 depending on the contents.

My kinky media (hypno, porn, anything else) is definitely the worst off. Its not easily searchable, it has very inconsistent metadata or no metadata at all and requires the cross referencing of multiple files to find out any extra information.
For hypno I am currently using plex and treating hypno as music which works "fine" but metadata even across different files from the same creator is usually inconsistent or sometimes just completely missing meaning there is tons of manual work to even add a single creator (I don't think I have even one creator fully tagged and searchable with all of their files) The only hypno creator right now that makes it easy to add metadata to local files is shibby. All the information you could need is always laid out in the same consistent way.
For video media I'm currently experimenting. I think I have a good solution in the works (started playing with it today) with all the functionality I want but it does obviously require the manual addition of metadata. Generally I have found the video media sources to be tagged much better than hypno so the manual metadata stage isn't nearly as bad for this but its still not amazing.
For image media I'm just completely lost. I don't have a good way to organise, search or view images and images are almost never tagged the same way videos are which means I can't just copy the source tags and I need to write my own.
Will merge with this one: https://www.likera.com/forum/mybb/showth...p?tid=2548
I can't shed much light on other media, but I've put serious effort into my ebook collection.

I collected more than 3 or 4 thousand old fashioned paper books by the early oughts, and I was running out of space.

Plus, I can stick about 50-60 thousand ebook files on a 1 TB thumb drive.  2 TB if I'm remediating ebook flaws and saving originals on the same media.

I have a hierarchical file system, based on subject, on my network file share.  That may well out live me, since it's implemented as a zraid2 array in two volumes of 6 drives each.  The component hard drives are the old Samsung HD204UI 2 TB units that'd you'd practically need an fire axe to kill.  But that system also badly needs an upgrade: it's about a dozen years in service by this time.

As you'd expect from a math geek, the math ebook directory hierarchy is the most broadly elaborated one.  OTOH, archaeology, anthropology, and history go into a common \History\ directory because that's how I use books on archaeology and anthropology.

The magic is in the format of the ebook file names themselves.  It's a standard, broken up into four space separated sections:

Title (Author) (Mon Year) ISBN

Example:

Catastrophe Theory, 3rd Ed (Vladimir I. Arnold) (Jan 2004) 3540548114.pdf

ISBN is ISBN-10 for publication dates through 2005, ISBN-13 for 2006 and later.  I collect the the raw ISBN from the file itself whenever it's possible (about 999 times out of 1000).

I deal with the colon, an NTFS reserved symbol, and present in about half the history books published since 1980, by replacing it with the utf-8 character ":", a.k.a "fullwidth colon" U+FF1A.  I use a trick that's a bit more complex to replace forward slashes in titles.  The intent is to generate something that reads to the Mark I eyeball as ":" or "/" even though it's really a utf-8 and/or multi-character hack.

And every non-fiction ebook is listed in my catalog spreadsheet that indexes on the title string and contains ...
  • legibility data (E, VG, G, F-G, F, etc),
  • file size in bytes
  • metadata type and breadth ("subchaptered" vs "chaptered" vs etc...) ToC, which metadata types are hyperlinked (if any), flaws in cover/virtual paging/body text legibility, etc
  • geometric data on PDFs (page size in postscript points, image DPI for covers, etc)
  • page number if the file is a PDF or DJVU.
  • Which ebook aspects, if any, need remediation, and which have aeady been remediated.
That last is a huge deal, since ebook publishing is still in diapers and essential navigational metadata is often incomplete and/or absent.  Doubly so if the ebook in question is bootleg, which a hell of a lot are liable to be since they were published before ebooks were even thought of as an actual form of book publishing.

For example, I have never seen anything that Vladimir I. Arnold wrote in his entire life that was actually published as an ebook at the same time it hit the bricks as a conventional book.  He died in 2010.

But if you're seriously interested in qualitative behavior of dynamical systems, you're probably going to want to read Arnold's stuff.  He practically founded the study of dynamical systems in Russia, decades before anybody in the West knew anything more than what conventional ODE theory could tell them.

So if you find one of Arnold's works as a PDF, you're probably going to have to build a bookmark table of contents yourself in order to navigate the thing.  And you may have to fix the virtual paging too.  It may also lack a cover.  These flaws are almost always fixable.

You will need at least one PDF editor.  I use two: PDF-XChange and Nitro PDF, plus several freeware tools to do things like insert a bookmark tables of contents, replace missing or substandard covers, correct virtual page tables (which are very often botched or simply not there), etc.

Don't even get me started about DJVUs.  I won't stop raving and frothing at the mouth for a good quarter hour.  And damnit, Einar Hille's lovely series of books on Applied and Computational Complex Analysis has never surfaced as a decent PDF.  Only as DJVUs, and with a shitty low resolution at that.

Make that a half hour of screaming and raving if the subject is MOBIs.

Calibre is the must-have tool for dealing with flawed EPUBs.  These days, I don't use Sigil at all.

Note: there are several formats you cannot remediate because no tool exists.  Two outstanding villains are CHM, which is god-thankfully almost totally obsolete by now, and MOBI, which will, regrettably, probably be alive and kicking long after I'm fertilizing a lawn from below.

A practical ebook catalog needs to include detailed entries on both both the original copy and its current progeny.

I'm still using a spreadsheet, but I really ought to rework that into a database sometime soon.  The spreadsheet I'm using has about two dozen fields to handle all the data about file name, format, condition, legibility, navigational areas (table of contents, linked reference, & index if any), dimensional data for PDFs & DJVUs, remediations possible, required, and performed, if any, etc.  And the god damned thing has more than 50,000 lines by this time.  So updating it is slow as pouring half-frozen ketchup.
(03 Feb 2025, 14:58 )Like Ra Wrote: Will merge with this one: https://www.likera.com/forum/mybb/showth...p?tid=2548

oopsie I didn't see this one. My bad