In the late 1990s I spent a few days in Paris - these days its unimaginable that I wouldn't take a few photos, post something, message friends/family and maybe even send an email from such an interesting destination - but back then, without a cell phone, mobile data or even a laptop I managed to spend a week in Europe with no evidence of it other than some very (very!) hazy memories...
In mid-2023 I had renewed interest in creating good off-site backups for some of my personal data. After years of using a few different backup products at home and at work I had a specific set of requirements:
- Storage in an S3 Compatible Service: I want to be able to get to my backups from a public computer, to have many options for how I can access my backups and not have any issues if the program that made the backup is not available.
- 'Flat' Backup: To make finding and accessing files in the backup as simple as possible I don't want versioned files/backups - just a simple mirror of my data. (For me only doing versioned/'Time Machine' style backups locally is a good compromise.)
- Data Files Only: No machine/system backup needed, my desktop setup is easy enough to recreate and not something that needs to be backed-up - I'm only interested in backing up data.
- No Integrated Synchronization or Restore Features: Many years ago I lost some files by putting in the wrong synchronization settings for a backup. That mistake has stuck with me and I would rather just eliminate the possibility of that error by not even having those options!
- Hash Based File Change Detection: Skip date/size comparisons and always compare file hashes to determine if a file has changed.
- Easy to Schedule/Time Limited Runs: Our Starlink setup has been fantastic but it has limited upload capacity - I want to be able to schedule backups to start late at night and set them to only run for a few hours.
When I looked for backup software with those requirements in mind I didn't find exactly what I wanted and didn't find anything that I was excited about. In many cases - and especially at work - I would have picked the best all-things-considered compromise and moved on, but I have some data that is valuable enough to me that I don't want to compromise on how it is taken care of. It is fair to say that to some extent everything in my Pointless Waymarks Project is about taking care of the data I care deeply about...
So while there is always some absurdity in choosing to re-invent the wheel I decided to write my own backup program! Why? Because with a very specific set of requirements and no need to 'productize' the program the scope is reasonable for the time I have to invest and maybe more importantly I continue to find it incredibly fun, educational and satisfying to write and use my own software. After using, living with and working on this project for about a year here are some of the things that I did in addition to the requirements above:
- Console App + WPF GUI: Console App 'Runner' for easy scheduling and a GUI to make setting up and tracking the backups fast and easy.
- SQLite: Both to save the Backup Jobs and to cache information about S3 and local files. Initially I had hoped to largely avoid caching information about files - but listing S3 objects can be both slow and costly (depending on S3 provider) so over time more caching made sense.
- Multiple S3 Provider Options: I started with only AWS S3 - I have buckets on S3 that are over a decade old and it is a great service. But AWS doesn't hide the fact that they charge you for every action - over time my bill has grown and more alternatives have come online - currently I've added support for Cloudflare R2 and Wasabi.
Surprises and interesting details:
- Listing 100k objects in an S3 Bucket and Retrieving Metadata takes longer than I expected... Between dealing with smaller buckets, writing programs that weren't performance sensitive and a general expectations of reasonable speed these days it was a surprise how long it can take to list and retrieve metadata for hundreds of thousands of objects. With more infrastructure there are probably 'better ways' - but straight thru the API I never found a way to make listing 100k+ objects impressively fast and eventually ended up caching S3 file information in Sqlite to help with this.
- API Compatibility Might Not Be As Expected... I think it's fair to say that Cloudflare and Wasabi's marketing makes you believe that you are going to seamlessly use Amazon's S3 API. I actually thought this would be the case since my use of the API (imho) is fairly simple, but for me it didn't work out. I didn't run into major problems but things like Why doesn't CopyObject for CloudFlare R2 not work with the AWS SDK for .NET? and Wasabi erroring with paginators both cost me time...
- TinyIpc: .NET inter process broadcast message bus - Working with an inter-process communications library for any desktop apps now seems like a very smart default choice. Having components broadcast and respond to messages from any source on your local computer is a great feature -> same process, different instance of the application, messages from related console programs, other application all together, ... - extending your messaging to all those sources/scenarios with a library like TinyIpc can be a great lift to user experience without much, if any, additional cost in time, complexity or infrastructure.
Overall the Pointless Waymarks Cloud Backup is a fairly simple project and this write up reflects that - my goal wasn't to architect a novel new backup paradigm, just to reliably backup files I care about to S3. The code is MIT Licensed and available under the Pointless Waymarks Project. I don't plan on doing any pre-built installers or public releases - it is a constant work in process and probably only appropriate to use if you don't mind working on code and occasionally debugging issues - but it is open and available to share with friends and fellow devs in case it is useful and there are scripts to build auto-updating installers if you do want to run the software.
Some days I appreciate that my first few decades of life have huge chunks that are only hazy memories - no photographs, no social media, not even text or email messages - but other days I wish I had some photographs from Paris, so it was probably partly nostalgia that drove me to pick Paris for my offsite photograph backups. If I can't have photographs from Paris I might as well have photographs in Paris!!!