Return to Notes

Automated URL Archiving

how I set up automated archiving of URLs using the Internet Archive's Wayback Machine and other tools

status: Finished
·
certainty: certain
·
importance: 5/10
Without libraries, our history would be like a house without windows or doors.
Norman Cousins

Toward a Mindset of Profusion

Dropcap since circa 1945, we have been in the "Age of Information", and as with anything in abundance, its value dwindled. People take for granted the information at their fingertips. They think it will always be there when they need it. I, too, once held such a mindset of scarcity. It is now moving into the recently coined "Age of Intelligence", referencing the new AI wave, which has massive potential to change the way society at large thinks. People will no longer use search engines as using a chatbot is more convenient, faster, and an experience with less

friction
. This is innately dangerous. These megacorps lure people in under a false sense of comfort, and then tear the rug from under you. At that point, however, you are far too lazy and complacent to do anything about it. The information bans, modification, and censoring—it's all "ok," right? No. This mindset is both ignorant and one of scarcity. The average middle-wage American makes more than enough to afford a couple of 8TB HDDs. This would naturally be far more information than most would need—enough to give you the power back, to be proactive in the active preservation of valuable information that is being taken down, censored, and hidden from the public.

The price of apathy towards public affairs is to be ruled by evil men.
Plato

The above sentiment echoes

Proverbs 21:25-26
perfectly. Those who refuse to take an active approach in helping preserve valuable information will soon come to find a lack of it. A recent controversy, for example, is the one with "Kindle". Amazon has been banning users without explanations, silently tampering with books that were already purchased, and committing many other atrocities against digital ownership.

Automated URL Archiving

The way I have settled on to accomplish this is a script that runs on my server once a day. The goal of it is to sift through my .mdx files and collect all the links, then filter out any that belong to this site, and then proceed in checking if each link is still active.

Loading file content...

On the Automated Archiving of URLs

Sign in with GitHub to comment

Loading comments...

Citation

Cited as:

Yotam, Kris. (Jun 2025). Automated URL Archiving. krisyotam.com. https://krisyotam.com/notes/internet-archiving/automated-url-archiving

Or

@article{yotam2025automated-url-archiving,
  title   = "Automated URL Archiving",
  author  = "Yotam, Kris",
  journal = "krisyotam.com",
  year    = "2025",
  month   = "Jun",
  url     = "https://krisyotam.com/notes/internet-archiving/automated-url-archiving"
}