Skip to content
Engineering

Watermarking PDFs without breaking the reading experience

Invisible stamps, visible stamps, and the rendering edge cases that tripped us up before we rewrote the stamper engine.

JP
Julian Park · · 8 min read

PDF watermarking sounds simple. You have a PDF, you have some text or an image, you stamp one onto the other. In practice, PDFs are one of the messiest file format specifications ever created, and stamping them reliably across all the edge cases is significantly harder than it looks.

The two watermark modes

Content Vault supports two types of watermarks: visible and invisible (forensic). They serve different purposes and have different implementation challenges.

Visible watermarks are for deterrence — you can see 'Licensed to user@example.com' in the PDF, which discourages casual sharing. Forensic watermarks are embedded in the PDF's byte structure and survive most text extraction attempts, useful for tracking if a file surfaces somewhere it shouldn't.

The rendering engine problem

Our first watermarking implementation used a popular open-source PDF library. It worked well for simple PDFs — text-heavy documents, basic layouts. It failed badly on PDFs with:

  • Embedded Type 1 or CIDFont fonts (common in older print-optimized PDFs)
  • PDF/A compliance mode (frequently used in academic publishing)
  • Transparency groups with complex blending modes
  • Interactive form fields that needed to remain functional
  • PDFs created by certain versions of InDesign with custom color profiles

We spent three months chasing rendering bugs before deciding to rewrite the stamper engine from scratch using a lower-level PDF manipulation library that gave us direct access to the content stream.

The rewrite

The new engine treats watermark placement as a PDF transformation problem, not a rendering problem. Instead of rasterizing the PDF and stamping onto pixels, we inject the watermark directly into each page's content stream as a PDF graphics operator sequence.

python
# Simplified content stream injection for visible watermark
def inject_watermark_to_page(page, text: str, opacity: float = 0.15):
    # Save graphics state
    content = b"q\n"
    # Set text rendering mode to fill with transparency
    content += f"BT\n/{WATERMARK_FONT} {FONT_SIZE} Tf\n".encode()
    content += f"{opacity} g\n".encode()  # gray fill
    # Translate and rotate to diagonal
    content += b"1 0 0 1 200 400 cm\n"
    content += b"45 cos 45 sin neg 45 sin 45 cos 0 0 cm\n"
    content += f"({text}) Tj\n".encode()
    content += b"ET\n"
    # Restore graphics state
    content += b"Q\n"
    page.prepend_to_content_stream(content)

This approach means the watermark is preserved across print, extract, and re-save operations. It also means we're not touching the document's font embedding or color profiles at all.

Lessons for merchants

If you're uploading PDFs to Content Vault for watermarking, a few things to know: color PDFs watermark more successfully than pure black-and-white PDFs (contrast issues), files under 50 MB process in real-time, and files over 50 MB are processed asynchronously with the download link emailed to the subscriber.

“We tested 4,200 real PDFs from merchant uploads to validate the new engine. 99.1% passed without manual intervention.”

— Content Vault engineering team
JP

Julian Park

Community Manager · Content Vault

Julian Park is Community Manager at Content Vault. Writes about the operator perspective on digital subscriptions.

Written by operators, not interns.

Monthly notes on subscription metrics, pricing experiments, and what's working for real Shopify merchants. No spam, unsubscribe anytime.

Related reading

Want to try Content Vault?

Turn your digital files into a recurring subscription on Shopify in minutes.

Install free on Shopify