PDF Merging and Splitting Reference

Esta página aún no está disponible en tu idioma.

This reference covers merging and splitting operations. Read it when the user asks to combine PDFs, merge files, reorder pages, split a PDF into parts, or extract specific pages as a new PDF.

Basic merge

Combine multiple PDFs into one, in the order provided:

python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
    combined.pdf \
    chapter1.pdf \
    chapter2.pdf \
    chapter3.pdf

The output file is the first argument. Input files follow in the desired order. The script handles any number of input files.

Page ranges within a merge

To include only specific pages from an input file, append a colon and a range:

python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
    report.pdf \
    cover.pdf \
    "body.pdf:2-15" \
    appendix.pdf

Range formats:

Format	Meaning
`file.pdf:3`	Page 3 only
`file.pdf:2-8`	Pages 2 through 8 inclusive
`file.pdf:1,4,7`	Pages 1, 4, and 7
`file.pdf:5-`	Page 5 through the last page

Pages are 1-indexed.

Reordering pages within a single PDF

To reorder pages in an existing PDF, treat it as a merge with page range selections:

# Reverse all pages of a 10-page PDF
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
    reversed.pdf \
    "original.pdf:10" \
    "original.pdf:9" \
    "original.pdf:8" \
    "original.pdf:7" \
    "original.pdf:6" \
    "original.pdf:5" \
    "original.pdf:4" \
    "original.pdf:3" \
    "original.pdf:2" \
    "original.pdf:1"

For large reordering operations, use pypdf directly:

import pypdf

reader = pypdf.PdfReader("original.pdf")
writer = pypdf.PdfWriter()

# Example: move the last page to the front
order = [len(reader.pages) - 1] + list(range(len(reader.pages) - 1))
for i in order:
    writer.add_page(reader.pages[i])

with open("reordered.pdf", "wb") as f:
    writer.write(f)

Adding bookmarks (outlines)

Bookmarks help readers navigate a merged document. After merging, add top-level bookmarks for each section:

import pypdf

reader = pypdf.PdfReader("merged.pdf")
writer = pypdf.PdfWriter()
writer.append(reader)

# Add bookmarks (page indices are 0-based here)
writer.add_outline_item("Introduction", 0)
writer.add_outline_item("Chapter 1", 2)
writer.add_outline_item("Chapter 2", 15)
writer.add_outline_item("Appendix", 42)

with open("merged-with-bookmarks.pdf", "wb") as f:
    writer.write(f)

Nested bookmarks (sub-sections) use the parent parameter:

ch1 = writer.add_outline_item("Chapter 1", 2)
writer.add_outline_item("1.1 Background", 3, parent=ch1)
writer.add_outline_item("1.2 Methods", 7, parent=ch1)

Splitting a PDF into individual pages

To extract every page as its own file:

python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py --split original.pdf

This creates original-page-001.pdf, original-page-002.pdf, etc., in the current directory. The zero-padded numbering ensures correct alphabetical sort order for up to 999 pages.

For custom naming or a specific output directory, use pypdf directly:

import pypdf
from pathlib import Path

reader = pypdf.PdfReader("original.pdf")
out_dir = Path("pages")
out_dir.mkdir(exist_ok=True)

for i, page in enumerate(reader.pages, 1):
    writer = pypdf.PdfWriter()
    writer.add_page(page)
    out_path = out_dir / f"page-{i:03d}.pdf"
    with open(out_path, "wb") as f:
        writer.write(f)
    print(f"Wrote {out_path}")

Splitting at a page boundary

To split a 50-page PDF at page 20 (pages 1-20 in part A, 21-50 in part B):

python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
    part-a.pdf \
    "document.pdf:1-20"

python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
    part-b.pdf \
    "document.pdf:21-50"

Handling encrypted PDFs

Encrypted PDFs require the user password to open. pypdf needs it before merging:

import pypdf

reader = pypdf.PdfReader("locked.pdf", password="the-password")
writer = pypdf.PdfWriter()
writer.append(reader)

# The output will NOT be encrypted by default
with open("unlocked.pdf", "wb") as f:
    writer.write(f)

If you need to merge an encrypted PDF without stripping its encryption in the output, this is not reliably supported by pypdf alone — inform the user.

Preserving metadata

By default pypdf does not copy metadata from the input files to the merged output. To copy metadata from the first input file:

import pypdf

reader = pypdf.PdfReader("first.pdf")
writer = pypdf.PdfWriter()
# ... append pages ...
writer.add_metadata(reader.metadata)

with open("merged.pdf", "wb") as f:
    writer.write(f)

Common errors and fixes

Error	Cause	Fix
`ModuleNotFoundError: No module named 'pypdf'`	Library not installed	`pip install pypdf`
`FileNotFoundError`	Input file path wrong	Verify each path with `ls`
`pypdf.errors.FileNotDecryptedError`	Encrypted input	Provide password as shown above
Output PDF is larger than expected	Duplicate embedded fonts/images	Normal for merging unrelated PDFs; not fixable without re-distilling
Bookmarks missing in merged output	Source bookmarks not copied	Copy them explicitly with `writer.add_outline_item`