OpenSource-Hub
A

ArchiveBox

27.4k stars·Privacy Protection·SHA-256 checksum verified

Open-source self-hosted web archiving tool that saves snapshots of URLs in multiple redundant formats (HTML, PDF, WARC, screenshots, etc.) with a web UI, CLI, and API.

Smart Download

Visit Project Homepage

No installer available yet — head to the source repository

Self-host your own personal Internet Archive: save any URL as HTML, PDF, screenshots, and more.

Core Features

  • Saves web pages as HTML, PDF, screenshots, WARC, plain text, and more
  • Extracts media from YouTube, SoundCloud, GitHub, and social media
  • Imports bookmarks from Pocket, Pinboard, browser history, RSS feeds
  • Supports Docker deployment, web UI, CLI, and Python API
  • Stores data in plain files/folders for long-term readability

What It Can't Do

  • First-time setup downloads Chromium (~200 MB); stable internet connection recommended,Archiving media (e.g., YouTube videos) can consume significant disk space,Docker installation is more reliable than pip (which may have system dependency issues like Chrome),Default web UI uses port 8000; adjust firewall accordingly

Use Cases

  • Backup important web pages, bookmarks, and online content
  • Legal evidence preservation against content removal
  • Research archiving of academic papers and news articles
  • Personal digital asset library for social media posts and videos

Detailed Introduction

ArchiveBox is a self-hosted application that lets you preserve web content in a variety of durable, standardized formats. It saves original HTML, CSS, JS (via SingleFile), full-page screenshots (PNG), PDFs, WARC, article text, and metadata — all stored in ordinary files and folders. It also extracts content from social media (posts, comments, images), YouTube/SoundCloud (MP3/MP4, subtitles), and GitHub (git clones). You can feed it URLs one at a time, import bookmarks from Pocket/Pinboard, browser history, RSS feeds, or use the browser extension. ArchiveBox runs as a Docker web app or via CLI/Python API, and guarantees data readability for decades without needing the tool itself.

Tags

web-archivingself-hostedbookmarkbackupdockercliWARCdigital-preservation

Getting Started

1

Download installer

Click the button above to download the installer for your system

2

Install the software

Double-click the downloaded installer and follow the prompts

3

Step 1: Create data directory and download Docker Compose file: mkdir -p ~/archivebox/data && cd ~/archivebox curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml

4

Step 2: Initialize ArchiveBox: docker compose run archivebox init --install

5

Step 3: Add a URL to archive: docker compose run archivebox add 'https://example.com'

Install Guide
  1. Step 1: Create data directory and download Docker Compose file: mkdir -p ~/archivebox/data && cd ~/archivebox curl -fsSL 'https://docker-compose.archivebox.io' > docker-compose.yml
  2. Step 2: Initialize ArchiveBox: docker compose run archivebox init --install
  3. Step 3: Add a URL to archive: docker compose run archivebox add 'https://example.com'
File Integrity

Checksum not available

This project has not published a SHA-256 checksum on its GitHub Release page

SHA256 Checksum

No checksum available

Download directly from GitHub Releases and verify file integrity yourself

All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.

Open Source Transparency

View GitHub Source
Environment Guide

Uninstall Info

Remove the ArchiveBox data directory (default ~/archivebox). For Docker, run 'docker compose down -v' first to clean up containers and volumes.

No Extra Dependencies

Ready to use after download. No additional runtime required.

Project Info
LicenseMIT
Last Updated2026-05-13 12:39:38
GitHub RepositoryOfficial Website

Similar Projects