The Backup Problem Nobody Talks About
When teams evaluate Neo4j for their graph database needs, the Community Edition looks like an obvious starting point. It’s free, powerful, and more than capable of handling serious workloads. But somewhere between the initial setup and the first production deployment, a quiet realization sets in: native backup functionality in Neo4j is an Enterprise-only feature.
This isn’t unique to Neo4j. Across the open-source ecosystem, backup and recovery tooling is one of the most common capabilities locked behind a paid tier. For teams running on Community Edition – whether by budget constraints, preference, or principle – this creates a real operational gap.
The standard workaround, stopping the service and copying files manually or using neo4j-admin dump, works. But it requires full downtime, scales poorly, and gives you no fast rollback mechanism if something goes wrong after a deployment or data migration.
There’s a better approach. And it lives at the filesystem level.
Enter OpenZFS: Filesystem-Level Backup Without the Database Knowing
OpenZFS is a combined filesystem and logical volume manager originally developed by Sun Microsystems and now maintained as a mature open-source project. It’s widely used in storage-heavy environments for its reliability, data integrity guarantees, and crucially for this use case – its snapshot capability.
A ZFS snapshot captures the exact state of a filesystem at a point in time. The snapshot is created almost instantaneously, consumes minimal space initially (it only grows as data changes), and can be used to restore the filesystem to that exact state or to transfer the data elsewhere.
The key insight here: ZFS doesn’t care what’s running on top of it. Any application – if its data lives on a ZFS filesystem, you can snapshot it. No database-level backup API required.
How This Works in Practice
The setup involves placing Neo4j’s data directory on a ZFS-managed disk. Here’s the conceptual flow:
Infrastructure:
- A Hossted VM running Ubuntu 20.04 or later
- A secondary attached disk designated for ZFS storage
- OpenZFS installed and configured on the VM (your tech team or Hossted support can help with this — see requirements below)
The backup flow:
- Stop Neo4j briefly – This ensures application-level consistency by flushing all in-memory transactions to disk. The service only needs to be down for the few seconds it takes to perform a clean shutdown. Think of it like taking a photo: the subject needs to hold still for just a moment.
- Create the ZFS snapshot – This happens instantly (in milliseconds), regardless of database size, because ZFS simply clones the metadata pointers rather than copying data. The command is straightforward: zfs snapshot poolname/dataset@snapshot-name. The exact moment the command finishes, Neo4j can be immediately started back up.
- Resume normal operations – Your database is back online while you replicate, browse, or archive the frozen snapshot in the background – with zero performance impact to the active database.
From there, you have options:
- Roll back in place – if something goes wrong (bad migration, corrupted data, failed upgrade), you can revert the entire filesystem to the snapshot state almost instantly
- Access snapshot contents as files – ZFS snapshots are browsable as a hidden .zfs/snapshot directory, so you can cherry-pick specific files if needed
- Transfer to another VM – ZFS supports streaming snapshots to remote systems using zfs send and zfs receive, including incremental transfers (zfs send -i) that only ship the changes since the last snapshot – making off-site replication efficient even at scale
Why This Is Better Than neo4j-admin dump
The most obvious alternative for Neo4j Community Edition users is neo4j-admin dump – and it works, but it comes with real limitations.
First, it requires a full service stop for the entire duration of the export, which scales with data size. A small database might be done in seconds; a large one can take minutes. With ZFS, the service stop is only as long as a clean shutdown – the snapshot itself finishes in milliseconds regardless of how much data you have.
Rollback tells a similar story. Restoring from a dump means re-importing the entire dataset, which is slow and operationally awkward. Rolling back a ZFS snapshot is near-instant – you’re not moving data, you’re just pointing the filesystem back to a previous state.
Incremental transfers are another gap. neo4j-admin dump produces full exports every time, with no way to ship only what changed. ZFS handles this natively via zfs send -i, which streams only the delta between two snapshots – making off-site replication practical even as your dataset grows. Worth noting: incremental backup through neo4j-admin backup does exist, but it’s also an Enterprise-only feature.
Finally, ZFS snapshots give you file-level access through the hidden .zfs/snapshot directory. You can browse the snapshot like any other folder and pull out specific files without touching the live database. A dump gives you a single opaque file with no way to access its contents without a full restore.
None of this requires an Enterprise license. That’s the point.
Requirements
- A Hossted VM running Ubuntu 20.04 or later.
- An additional attached disk for the ZFS pool – separate from the OS disk (standard best practice; you never want to pool your root drive).
- Setup assistance – ZFS configuration, pool creation, and pointing Neo4j’s data directory to the ZFS filesystem requires hands-on work. Your tech team can follow Hossted’s setup documentation, or you can request help directly through Hossted support.
A Few Honest Trade-offs
You still need a brief service stop. This isn’t a hot backup. Neo4j must be cleanly stopped before the snapshot is taken. The total downtime is just the length of a normal service restart cycle – typically a few seconds. The ZFS snapshot itself takes milliseconds.
Performance tuning may be necessary. ZFS has its own I/O characteristics. In some configurations you may see a modest performance impact compared to a standard ext4 or xfs filesystem. This is usually addressable through ZFS tuning – most importantly, setting the recordsize to match Neo4j’s data block size, which prevents write amplification and can significantly improve I/O performance. ARC (Adaptive Replacement Cache) sizing also matters: ZFS and Neo4j both want RAM, and tuning them to coexist cleanly avoids memory contention.
It’s a filesystem-level backup, not a logical backup. You’re capturing the state of the data files, not a database-level export. Restore is always a full filesystem rollback to the snapshot point – you can’t restore a single graph or a subset of nodes the way you might with a logical dump.
Who This Is For
This solution is a good fit if:
- You’re running Neo4j Community Edition and need a reliable backup strategy beyond manual file copies
- You can tolerate a brief service interruption during the backup window
- You want fast rollback capability for deployments, migrations, or upgrades
- You’re already on a Hossted VM or willing to use one
It’s not the right fit if you need true zero-downtime online backups or granular data-level restore. For those requirements, Neo4j Enterprise is the appropriate path – and Hossted can help you evaluate whether the upgrade makes sense for your situation.
The Broader Point
Neo4j is just one example. The same OpenZFS approach applies to any open-source application where backup functionality is gated behind a paid tier. If your data lives on a filesystem, ZFS can snapshot it.
At Hossted, we see this pattern frequently: teams running capable open-source software that covers 95% of their needs, hitting one limitation that seems to demand an expensive upgrade. Sometimes the upgrade is worth it. Sometimes there’s a practical workaround that closes the gap.
This is one of those cases.