ArchiveBox
Open-source self-hosted web archiving tool that saves snapshots of URLs in multiple redundant formats (HTML, PDF, WARC, screenshots, etc.) with a web UI, CLI, and API.
Smart Download
Visit Project Homepage
No installer available yet — head to the source repository
Self-host your own personal Internet Archive: save any URL as HTML, PDF, screenshots, and more.
Core Features
- Saves web pages as HTML, PDF, screenshots, WARC, plain text, and more
- Extracts media from YouTube, SoundCloud, GitHub, and social media
- Imports bookmarks from Pocket, Pinboard, browser history, RSS feeds
- Supports Docker deployment, web UI, CLI, and Python API
- Stores data in plain files/folders for long-term readability
What It Can't Do
- •First-time setup downloads Chromium (~200 MB); stable internet connection recommended,Archiving media (e.g., YouTube videos) can consume significant disk space,Docker installation is more reliable than pip (which may have system dependency issues like Chrome),Default web UI uses port 8000; adjust firewall accordingly
Use Cases
- Backup important web pages, bookmarks, and online content
- Legal evidence preservation against content removal
- Research archiving of academic papers and news articles
- Personal digital asset library for social media posts and videos
Detailed Introduction
ArchiveBox is a self-hosted application that lets you preserve web content in a variety of durable, standardized formats. It saves original HTML, CSS, JS (via SingleFile), full-page screenshots (PNG), PDFs, WARC, article text, and metadata — all stored in ordinary files and folders. It also extracts content from social media (posts, comments, images), YouTube/SoundCloud (MP3/MP4, subtitles), and GitHub (git clones). You can feed it URLs one at a time, import bookmarks from Pocket/Pinboard, browser history, RSS feeds, or use the browser extension. ArchiveBox runs as a Docker web app or via CLI/Python API, and guarantees data readability for decades without needing the tool itself.
Tags
Getting Started
Download installer
Click the button above to download the installer for your system
Install the software
Double-click the downloaded installer and follow the prompts
Step 1: Create data directory and download Docker Compose file: mkdir -p ~/archivebox/data && cd ~/archivebox curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
Step 2: Initialize ArchiveBox: docker compose run archivebox init --install
Step 3: Add a URL to archive: docker compose run archivebox add 'https://example.com'
- Step 1: Create data directory and download Docker Compose file: mkdir -p ~/archivebox/data && cd ~/archivebox curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
- Step 2: Initialize ArchiveBox: docker compose run archivebox init --install
- Step 3: Add a URL to archive: docker compose run archivebox add 'https://example.com'
Checksum not available
This project has not published a SHA-256 checksum on its GitHub Release page
SHA256 Checksum
No checksum available
Download directly from GitHub Releases and verify file integrity yourself
All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.
Open Source Transparency
View GitHub SourceUninstall Info
Remove the ArchiveBox data directory (default ~/archivebox). For Docker, run 'docker compose down -v' first to clean up containers and volumes.
No Extra Dependencies
Ready to use after download. No additional runtime required.
Similar Projects
Immich
High performance self-hosted photo and video management solution with automatic backup, AI search, facial recognition, and multi-user support.
Vaultwarden
A lightweight, self-hosted Bitwarden server alternative written in Rust, compatible with official clients.
Umami
Umami is a simple, fast, privacy-focused web analytics tool that gives you full control over your data.