Doing things right

Doing things right
A poor foundation will impact everything and the fix will cost way more than a good foundation.

Human nature is the same everywhere. People are appealed by the features they see, they can use, they can show. If we consider the technologies used at JBLan, it would mean to go for Nextcloud before something like SQL Read / Write splitting as provided by Maxscale.

As explained in this post about security, the way to achieve best security at the lowest cost is by doing as much as possible at the architecture level. Unfortunately, architecture is often invisible to users. Either because they do not know about it like R/W Splitting or because they have bigger interest in selling a service to their clients, people in authority often skip or dismiss critical components.

Here, the answer is not to argue with them about the required architecture. It would not be any better to just do as requested and deliver a project that will crack everywhere in no time. The right way to do is something that is very well known and easy to do. Unfortunately, even my late godmother had to teach that all around her decades ago and that mission will never end. What is needed is a statement of needs.

What do you really need ?

It's easy to just say that you wish to offer services like the ones provided by Nextcloud. Unfortunately, without the complete statement of needs, there is a significant risk to either over-develop things that were not required or worst, risk to forget about things that would have been essential. Without the statement of needs, there is no way to ensure that whatever is delivered will be fit, of use or valuable.

Are you interested to sell your clients a solution that will need to be interrupted weekly for maintenance ? A solution that may be down for days should it need to be rebuilt after an incident ? Clearly there is a need to ensure the service will be available but up to what limits ?

At JBLan, one need was defined as ensuring data's availability before service availability. Service may go down but it should always be possible to bring it back online. Not only that, it should come back with the data in a state as close as possible to the state they were when the service went down.

That need translated to the need to enforce the 3 copies rule :

Data needs to exist in at least 3 copies to ensure it will survive any single incident. By definition, copy No 1 is always online and on site. As such, it is always exposed to all risks. Copy No 2 must be offsite to protect against any physical incident like fire or water on site No 1. Copy No 3 must be offline to protect against any logical incident like intrusion or software error.

To build such a data hosting solution, I selected TrueNAS. I configured 3 servers with ZFS replication and that is how I put in place that part of my foundation. Main server at my place, Copy No 2 at my father's place hundreds of kilometers away and offline copy in my old T-110. That was it for storing files but Nextcloud also needs a database.

I started looking at database HA solutions and chose to do Master / Slave replication. It was yet another thing to learn and deploy before Nextcloud itself but thanks to it, I knew I would have the quality I was looking for.

With the database in place, itself was essential for backups and recovery. Thanks to my first HA solution with ZFS, it was easy to do. I just had to put the backup file within that ZFS space and voilà. This fact illustrates the benefit of a solid foundation. Once in place, you just need to re-use the existing to benefit from a high quality solution.

It was only after I deployed these 2 services that I started to work on Nextcloud itself. And still, it was not to put it in service right away.

Remember that my most critical need was to ensure I could recover the data and re-deploy the service to access them. So once Nextcloud deployed, first thing was to put dummy data in it and try to complete a backup - restore procedure. Once successful, I documented the procedure and scheduled a yearly restore test. Good for me because on my second year, I failed my restore test. I figured out that it was back in June that I touched the script encrypting my backup before saving it to ZFS. For some reasons, I was unable to decrypt it. I acknowledged that I lost my last backups, fixed everything and managed to complete a restore.

Thanks to all of that, whenever I ended up in situations where I needed to restore / redeploy my Nextcloud instance, it was easy and safe. Unfortunately, my experience showed me that not so many people build from the ground up like I did and instead start by the finishing touch.

The typical error to avoid

From the first moments I deployed TrueNAS for myself, I joined TrueNAS's community forum and offered support to others. Countless times I saw people in despair after loosing everything. Unfortunately, it was almost always because they did not consider their own needs before deploying their solution.

It was also true in Nextcloud's forum where new users were hoping to get themselves free from public providers like Google or Apple by hosting their own cloud. Beware here! If your need is to ensure availability, compatibility or features as advanced as these giants' solutions, better to keep with them! Even here at JBLan I do not pretend I can do as good as these giants despite everything I put in place. Don't fool yourself: No, you will not do it with your Raspberry PI and an external USB drive in a corner of your desk.

Start building yourself a solid foundation. That will decrease every cost down the road : operation, risks and security will all be kept to a minimum. The only drawback is that it will force you to acknowledge the full cost of your solution from day 1. A defective foundation's extra cost may end up blurred into more generic costs but blurred costs are still costs and you still pay them.