Week 8: It's the tripwires you don't see...
Added 2025-02-23 15:28:44 +0000 UTCThat'll kill you...
I made a post on the r/skarchive subreddit that kind of ended up serving as last week's update. I had family stuff to deal with last weekend. The "Grand Plan" was to sort that out, return, probably spend Monday catching up with some house stuff and then over the next few days get stuck into getting some content on the site.
Grand Plans indeed. The sites were deployed and I started importing and uploading and that's where it all went pear shaped. Everything worked fine when it was local and whilst I knew it would be slower I figured it would be bareable if somewhat frustrating at times, but we could work with that, you only had to upload stuff once right?
I wish.
My eyes got too big for my belly as they like to say. I got greedy, in wanting to do a bunch of things for the site it meant more processing and more processing was causing problems. They were predictable problems and I naively thought they would be managable problems. i was wrong.
The thing with it is it COULD be really simple. Create story, create chapter, upload images, boom done! And it really is that simple. However uploading 30+ images takes time even when everything goes right and if you add to that and consider the fact that 1000 posts in the main subreddit only takes us back until Oct/Nov last year you'll realise how much stuff is being posted.
It because pretty obvious that at least a certain amount of automation was required.
Exactly what to automate and it what way has been the question of the last week or so. I already had lots of things being automated but they were all small disparate chunks. it was how to take this tasks and glue them all together so they would work reliably.
In the end a large part of my goal was to store things in predictable places and name them in predictable ways. In doing this I could let the code do the hard work of finding and uploading the images I needed for Chapter 7 of The Greatest Story Never Told (cause the upload failed on the old system).
The very good news is that in going through all these processes and working out a number of different ways to go about things I came to learn some things about what we were really doing.
I learned that some of my original ideas for how to go about things were far too influenced the early days of my collecting and life pre-LFAS threads trying to help people find stuff.
Back then any collections that existed were usually pretty basic and we'ren't all that big on information keeping. If you got the title of the story in the images you were very lucky. Author's weren't as "professional" as they are today where everyone has cover/title cards with story names and author and often performer details.
So when I started collecting I went nuts about metadata and grabbing all I could and storing it in the filenames of the images. That technique is fantastic if plan is that you're going to distribute things in the old fashion way, just zip stuff up and let it loose on the internet.
A number of reddit old-timers will remember the archive that was posted on one of the subs in 2019 I think. Was the most comprehensive collection I'd seen to that point. but reddit being reddit and it's 1000 post limit it essentially disappeared after a few months. (it was, and I think still is, there, if you know where and how to look, but for all extents for most people it's gone)
This was the downside of this "artform" getting it's start on reddit. it meant there was stuff that was always harder to collect (at least in an automated way) things like performer names etc. In the early stuff they weren't usually included and if they were a lot of the time it was the author answering a reader in the comments when they asked.
This lack of permanence is a large part of what inspired me to start the archive. Ironically it wasn't THIS kind of story that did it, but it was a reddit story. It was a user on reddit who started telling their story in one of the subs. There story was so well written and engaging. I didn't even really care if it was true or not. And then basically overnight they were gone. Account deleted and all trace of the story gone. There we all sorts of rumours etc, but it didn't matter. the story was gone.
I HATE that shit.
A little while after this happened I thought I'd deleted the archive of text stories that'd been posted. (It turned out later that I hadn't I'd just renamed it and when I went searching I was using the old name, ironically I renamed it so it'd be easier to find) Whilst I'd been backing stuff up for a little while then I hadn't been overly serious about it and it was more stuff that I liked rather than everything regardless.
I went looking to see if I could find the archive and then later I realised i could do the same things for the story. I eventually found both after about 3 solid months of looking. But what's far more important is that I learned an insane amount about how reddit works and stores it's stuff.
I took that knowledge and started customising some software and scripts I'd been using before to properly backup everything. And that just kept evolving over time.
All of this is a long winded way of saying because of how I started in this I'd over complicated things, because back then a lot of what I needed or wanted was very very complicated to get.
It isn't now. I've finally realised that and I've simplified things accordingly.
I spent the the last two days moving over 150GB of files onto a server, (it should have been 1 day but I used the wrong word in a command, it was 4am in my defence, and deleted 135Gb of files so had to basically start from scratch again).
These are the sorted image archives, organised is probably a better word. They are actually only a very small chunk of the complete archive. Which last time I looked was taking up about 1.5TB on my drives. it's not QUITE that big there's still a lot of duplication there. But part of the reason that duplication is still there is the simple and efficient storage solution we now have hadn't revealed itself to me yet.
This change in both how we store stuff and also in how we look at what we're doing, gives us a very definitive plan now.
Sadly it's not complete and there's still a bit more wiring up to do on the site to be able to make use of this new structure. But once that's done then I'll be able to put together a bunch of the work I'd done on the site previously and this new file setup and make that simple create, create, upload loop actually simple after all.
There are a few hiccups to the plan, the least of which is the recent banning/deletion of one of the subs. I have a way around that, but it'll require a little more work.
And the thing is, that work is fine, because this isn't just about OMG I want to read the stories ASAP. This is an archive. And the archiving won't stop until I stop. The easiest way to get content on the site at the moment is to use the current content. it's easily accessible in a predictable way. Once I have the core in place and work out exactly what it is I need (and that has deinfitely been a thing) then I'll be able to take the work I've done on the current stuff, add a few things to it and then be able to take all the old stuff in whatever form I have it and start feeding that into the site as well.
One day I'll tell the story of the chapter manager and the long and winding road it took from humble chapter manager to eventually becoming an Import god. And then the crushing realisaton that after all that work that we didn't actually need an import god after all.
I honestly should have known because the backup scripts have gone the same way. One of the scripts last time I looked was on like version 7 and that's major verision, where there's lots of changes, not the actual small changes that happen. the front end of the site hasn't changed since the end of the first week really but the backend keeps changing as I learn more about what Ir eally need rather than what I thought I needed.
In the end I hope the result is the same somewhere for these stories to live and people to enjoy them until the heat death of the universe or until I get my backup scripts right... probably the former, because I'll still be working on the latter then.
Fo those of you who have become paid members you have my greatest thanks and adoration. I've completed the scripts that will give you guys access and to be honest I'm just waiting for there to be some content for you to have access to. As soon as I start uploading things the emails will be going out to your patreon emails. All you'll need to do is reset you password (there will be a link) and you'll be good to go. As a bonus and a little bit of an apology for the delay I THINK I'll be able to add a cool new feature (I wasn't sure if I was going to put it in before, but with this change it should be pretty easy) for you guys only. Shh no spoilers!
So many thanks and many apologies, in pretty much equal parts.
Until Next Time,
Kosh.