Saltearse al contenido

PDF Merging and Splitting Reference

Esta página aún no está disponible en tu idioma.

This reference covers merging and splitting operations. Read it when the user asks to combine PDFs, merge files, reorder pages, split a PDF into parts, or extract specific pages as a new PDF.

Basic merge

Combine multiple PDFs into one, in the order provided:

Terminal window
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
combined.pdf \
chapter1.pdf \
chapter2.pdf \
chapter3.pdf

The output file is the first argument. Input files follow in the desired order. The script handles any number of input files.

Page ranges within a merge

To include only specific pages from an input file, append a colon and a range:

Terminal window
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
report.pdf \
cover.pdf \
"body.pdf:2-15" \
appendix.pdf

Range formats:

FormatMeaning
file.pdf:3Page 3 only
file.pdf:2-8Pages 2 through 8 inclusive
file.pdf:1,4,7Pages 1, 4, and 7
file.pdf:5-Page 5 through the last page

Pages are 1-indexed.

Reordering pages within a single PDF

To reorder pages in an existing PDF, treat it as a merge with page range selections:

Terminal window
# Reverse all pages of a 10-page PDF
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
reversed.pdf \
"original.pdf:10" \
"original.pdf:9" \
"original.pdf:8" \
"original.pdf:7" \
"original.pdf:6" \
"original.pdf:5" \
"original.pdf:4" \
"original.pdf:3" \
"original.pdf:2" \
"original.pdf:1"

For large reordering operations, use pypdf directly:

import pypdf
reader = pypdf.PdfReader("original.pdf")
writer = pypdf.PdfWriter()
# Example: move the last page to the front
order = [len(reader.pages) - 1] + list(range(len(reader.pages) - 1))
for i in order:
writer.add_page(reader.pages[i])
with open("reordered.pdf", "wb") as f:
writer.write(f)

Adding bookmarks (outlines)

Bookmarks help readers navigate a merged document. After merging, add top-level bookmarks for each section:

import pypdf
reader = pypdf.PdfReader("merged.pdf")
writer = pypdf.PdfWriter()
writer.append(reader)
# Add bookmarks (page indices are 0-based here)
writer.add_outline_item("Introduction", 0)
writer.add_outline_item("Chapter 1", 2)
writer.add_outline_item("Chapter 2", 15)
writer.add_outline_item("Appendix", 42)
with open("merged-with-bookmarks.pdf", "wb") as f:
writer.write(f)

Nested bookmarks (sub-sections) use the parent parameter:

ch1 = writer.add_outline_item("Chapter 1", 2)
writer.add_outline_item("1.1 Background", 3, parent=ch1)
writer.add_outline_item("1.2 Methods", 7, parent=ch1)

Splitting a PDF into individual pages

To extract every page as its own file:

Terminal window
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py --split original.pdf

This creates original-page-001.pdf, original-page-002.pdf, etc., in the current directory. The zero-padded numbering ensures correct alphabetical sort order for up to 999 pages.

For custom naming or a specific output directory, use pypdf directly:

import pypdf
from pathlib import Path
reader = pypdf.PdfReader("original.pdf")
out_dir = Path("pages")
out_dir.mkdir(exist_ok=True)
for i, page in enumerate(reader.pages, 1):
writer = pypdf.PdfWriter()
writer.add_page(page)
out_path = out_dir / f"page-{i:03d}.pdf"
with open(out_path, "wb") as f:
writer.write(f)
print(f"Wrote {out_path}")

Splitting at a page boundary

To split a 50-page PDF at page 20 (pages 1-20 in part A, 21-50 in part B):

Terminal window
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
part-a.pdf \
"document.pdf:1-20"
python3 ${CLAUDE_SKILL_DIR}/scripts/merge.py \
part-b.pdf \
"document.pdf:21-50"

Handling encrypted PDFs

Encrypted PDFs require the user password to open. pypdf needs it before merging:

import pypdf
reader = pypdf.PdfReader("locked.pdf", password="the-password")
writer = pypdf.PdfWriter()
writer.append(reader)
# The output will NOT be encrypted by default
with open("unlocked.pdf", "wb") as f:
writer.write(f)

If you need to merge an encrypted PDF without stripping its encryption in the output, this is not reliably supported by pypdf alone — inform the user.

Preserving metadata

By default pypdf does not copy metadata from the input files to the merged output. To copy metadata from the first input file:

import pypdf
reader = pypdf.PdfReader("first.pdf")
writer = pypdf.PdfWriter()
# ... append pages ...
writer.add_metadata(reader.metadata)
with open("merged.pdf", "wb") as f:
writer.write(f)

Common errors and fixes

ErrorCauseFix
ModuleNotFoundError: No module named 'pypdf'Library not installedpip install pypdf
FileNotFoundErrorInput file path wrongVerify each path with ls
pypdf.errors.FileNotDecryptedErrorEncrypted inputProvide password as shown above
Output PDF is larger than expectedDuplicate embedded fonts/imagesNormal for merging unrelated PDFs; not fixable without re-distilling
Bookmarks missing in merged outputSource bookmarks not copiedCopy them explicitly with writer.add_outline_item