During development of my Final Fantasy XIV Tracker(FFXIV), I made a mistake, that took me almost 15 months to fully remediate. Or rather 99% of it.
FFXIV has several types of groups in it: Free Companies (guilds), PvP teams and 2 types of Linkshells (chatgroups). The first 2 have crests, which are a combination of 3 components: emblem, background and a frame. On official site (Lodestone) they are presented as 3 PNG files overlayed over each other, but to make my tracker just a tiny bit unique (including having custom favicons for groups' pages), I decided to merge them, and name them using a hash function. This also allowed to avoid duplicates. On the database level I stored only that hash, which me some space.
And that was the mistake.
Or rather it was the bigger of 2 mistakes. First mistake was actually, that on filesystem I was saving all files in just 1 folder, and that can result not only in some performance degradation when there is a lot of files, but it actually has a limit. Yes, it is quite large, but at the time of writing there are 180926 Free Companies and 11335 PvP teams, which means potentially 192261 files, and that's assuming groups won't be changing those ever. If I was using FAT32 file system that would be way above the limit of 65535. I am not, though, so I am if I am using NTFS or EXT2 and newer. But if you have more than 10000 files per folder you can choke the performance in some workloads. So, yes, it was a mistake to store them like this, and I need to change that.
I decided to store them in a way which is quite common for storing cache on filesystem: use 2 first characters from the hash to create a subfolder, then use the next 2 characters from hash to create a subfolder in that one, and store the file in the last subfolder (some do 3 layers or even more). It means, that if I have a filename like bf2360d413807d355def4028e5a7119e871c41e4dd826fce6f5bda7107621b3e.png would be stored in folder like bf\23\bf2360d413807d355def4028e5a7119e871c41e4dd826fce6f5bda7107621b3e.png.
Easy, and it was not that difficult. But I also started thinking of the size, those files are taking, and since around the same time I learnt about WEBP format, I decided to switch to it. And I did not want to use the PNG files, that I already had, because as some tests showed saving them into WEBP resulted in having slightly larger files compared to creating it from original 3 files. At least when using PHP's GD2 library. And as I've mentioned above, I did not store links to original files - only hash.
Thus I had a problem: I had hash, but I had no idea what files were used to get that hash. I could obviously find them visually, but doing that for tens of thousands of files is not something I was willing to do. I was able to get all components from Lodestone (and shared them on GitHub, too), but what's next? I could just go merging them together same way I did before, and getting that hash, and from that derive links, put them into an UPDATE query and then run the query on the database. Sounds easy right?
Yeah... There are 63 frames, more than 1300 (it's growing) emblems, and... 16385 backgrounds (even though a portion of them are visual duplicates). This gives us 1067287500 possible combinations. Running all those combinations is what took me almost 15 months. Technically less, since my PC was not running 24/7, but that still a lot. And even then ~60 hashes remained unmatched somehow (luckily for groups that had no activity during their lifespans).
It probably could have been done much faster, but I was not able to find a good way to allow running it in multiple threads, while also keeping track of what has been checked already, so that I could resume from last position in case of interruption. Or rather, I did find a way, but it was getting slower with every new thread, and all threads besides 1 would randomly crash for no apparent reason (possibly some race condition).
So what's the point of this? Think extra hard what you decide to store in your database. Yes, you want to store only data that you need, but you should always consider potential future needs. Especially if you plan on storing derived information, and not original one. Please, learn from my mistake.
Oh, and if you're wondering, the banner for this article is made out of 45 most used crests (50 would be a bit too crowdy). As you can see simple ones without any backgrounds and frames are dominating, as well as those with heart motives.