Return to Notes

Large File Storage

Implementation details of large file storage solutions for the website

status: In Progress
·
certainty: certain
·
importance: 8/10

As of (4/28/2025) krisyotam.com now has support for Large Files (>200MB). The post here from gwern.net details much of the necessity of this for a "long-now" focused site. For those who would rather not read I will provide a synopsis of such here.

Why Large File Storage

Dropcap lots of people make use of services like OneDrive, Google Drive, iCloud Drive, and Mega for cloud storage services. These platforms tend to be failry straightforward, designed for ease of use, and []. This does come with tons of draw backs however. With these platforms there is a lot of High Level Abstraction (you interact with apps, not servers), Freemium Models (Small free amounts that scale prices terribly), Shared Hosting (your files live with a million others), Integrated Ecosystems (Locked into apple, microsoft, ect.), and severe Privacy Concerns. So how is Hetzner Storage Boxes different? HSB is Infrastructure Focused (designed for developers, and power users), and provides Low-Level Control (Access via FTP, SFTP, SCP, rsync, ect.). It can also act as a system hardrive (otehr cloud providers can use this specific feature). It has no Ecosystem Lock-in (works with any os, tool, or client).

Need for Large-File Support

Dropcap it's no secret that this site will have tons expository work done over the years. That comes with thousands of references, resources, and citations. With this number easily multiplying itself due to the ease of implementing tools like perplexity to help myself discover new sources I would have otherwise not found. I find with sources the more the better. Why lean towards a scarcity mindset? Would it not be more convicning to see 10, 20, or even 30 studies with reproducable resutls on a certain topic. Then say being motivatied to make a otherwise significant change in your life based off a one of source. So for this first reason it is less a need for the store of singular "Large Files" but the need for elphantine storage sizes caused by the consistent accumulation of small pdfs, mp3s, mp4s, ect. See also my method of countering this via my automated url archving script. Beyond the simple things my main need for this implementation of the preservation of sites. Such as that of geologist, and Independant Researcher Leuren Moret. Her site recently when down between the last time I had visited it in mid 2024. It's no telling when valuable assets of information may go down. I have more thoughts on the extended preservation of acess in situations like these. Such as the retaining of eth domains such as leurenmoret.eth and pinning her content via IPFS, or even using off-shore hosting providers which I have been deliberating of which to switch to before the content here starts to get more serious. For now at least using a viable LFS provider allows me to take my time weighing my options and retain offline access to such information while my doing so. I am still weighing the value of mp4s for long-term storage. They might be pleasing and add visual stimuli but for most content with great information density it is just not necessary. I would rather sacrifice the visual fiedlity of a mp4 for the extra stroage gained by using mp3s. In such situations I am also thinking about ideas for the transcription of videos through things like OpenAI whisper. Maybe getting a extra Hetzner server for such purposes would not be a bad idea.

The opposite of courage in our society is not cowardice... it is conformity.
Rollo May

Drawbacks of Implementation

Dropcap i have deliberated for a while about tons of alternative solutions to this problem as mentioned above in (#Why Large File Storage?). There where several drawbacks to consider such as privacy, security, ease of access, pricing over the long-term, portability, ect. a number of these factors out right eliminated the more traditional options such as OneDrive, Google Drive (G-Suite), iCloud Drive, and even Mega as a main source for LFS. Even though I am still particularly fond of Mega, and still retain my subscription due to the not infrequent use of receiving massive amounts of data from people via the platform. The biggest drawback that most of these platforms have was either the inabillity to download via HTTPS. I was also drastically influenced by the massive pricing difference which we will dicuss next.

Competitor Price Comparison

Unlike the popular consumer cloud providers above there are a number of viable alternatives to Hetzner. Those include Wasabi, Scaleway, Storj, and my own system backup provider BackBlaze B2. As you can see below however there is simply no comparison when it comes to pricing models.

Hetzner Competitor Price Comparison
Click to enlarge
As of 4/28/2025 a comparison between Hetzner, BackBlaze B2, Wasabi, Scaleway, Storj.

Use Cases

Dropcap the immediate use case that comes to mind for the newly integrated Large File Storage (LFS) is the completion of the formerly delayed archive page. It should be of no suprise that the future of this blog will discuss at length a variety of topics that must be heavily supported with documentation. Documentation like Panama Papers, Paradise Papers, Offshore Leaks, Silk Road Archives, RaidForums Dumps, and many more data sets. It will also include the storage of youtube videos I think are in danger of being silenced. Things like this as well as research papers, historical documents, ect. are to be stored for reference here on the site and the archive page to be made available to people at my discretion.

More fun things to store include the GeoCities Archives, which represent an important piece of early internet history that would otherwise be lost to time.

Implementation

Dropcap for now I’ve decided to keep Vercel as my main host. I already have both a Hetzner server and a Storage Box set up, but for large file hosting I went with something a little more elegant. I’m using two S3-compatible Object Storage buckets—doc and cdn—and routing traffic through a reverse proxy so they appear under custom domains: krisyotam.com/doc and cdn.krisyotam.com.

Instead of the typical CNAME setup, I went with a reverse proxy using Caddy. It gives me full control over routing, lets me handle TLS certs easily, and keeps everything on brand. Here’s how I did it.

Step 1 – Create the Object Storage Buckets

  • I created two public buckets on Hetzner’s Object Storage dashboard: one for documents, one for serving static files.
  • Then I generated S3 access keys so my reverse proxy could actually hit the bucket.

Step 2 – Set Up the Server

Step 3 – Set Up Caddy as a Reverse Proxy

SSH into the box:

1ssh root@<your-server-ip> 2

Create the directory structure:

1mkdir -p /opt/caddy/data 2

Now create the Docker Compose file:

1vim /opt/caddy/compose.yaml 2

Paste this in:

1services: 2 caddy: 3 container_name: caddy 4 image: caddy:latest 5 restart: unless-stopped 6 ports: 7 - 80:80 8 - 443:443 9 volumes: 10 - ./data/Caddyfile:/etc/caddy/Caddyfile 11 - ./data/certs:/certs 12 - ./data/config:/config 13 - ./data/data:/data 14 - ./data/sites:/srv 15

Then the actual Caddyfile:

1vim /opt/caddy/data/Caddyfile 2

Here’s what I put in there:

1cdn.krisyotam.com { 2 tls { 3 issuer acme { 4 dir https://acme-v02.api.letsencrypt.org/directory 5 } 6 } 7 reverse_proxy https://cdn.hel1.your-objectstorage.com { 8 header_up Host {http.reverse_proxy.upstream.hostport} 9 header_up X-Forwarded-Host {host} 10 } 11} 12 13krisyotam.com/doc { 14 tls { 15 issuer acme { 16 dir https://acme-v02.api.letsencrypt.org/directory 17 } 18 } 19 reverse_proxy https://doc.hel1.your-objectstorage.com { 20 header_up Host {http.reverse_proxy.upstream.hostport} 21 header_up X-Forwarded-Host {host} 22 } 23} 24

If your object storage provider uses the bucket name in the path instead of the subdomain, make sure to adjust the reverse_proxy URL accordingly (e.g. https://hel1.your-objectstorage.com/<bucket_name>).

Step 4 – Run It

Start up the container:

1cd /opt/caddy 2docker compose up -d 3

Check that it’s running:

1docker ps 2

At this point, both krisyotam.com/doc and cdn.krisyotam.com are serving public files cleanly through Caddy, with HTTPS certs, and no ugly S3 links in sight.

Sign in with GitHub to comment

Loading comments...

Citation

Cited as:

Yotam, Kris. (Jun 2025). Large File Storage. krisyotam.com. https://krisyotam.com/notes/website/large-file-storage

Or

@article{yotam2025large-file-storage,
  title   = "Large File Storage",
  author  = "Yotam, Kris",
  journal = "krisyotam.com",
  year    = "2025",
  month   = "Jun",
  url     = "https://krisyotam.com/notes/website/large-file-storage"
}