All projects
Active October 2025

AI Data Scrubber

A lightweight privacy-focused tool that removes personal information from text documents before uploading them to LLMs.

Python Privacy NLP spaCy CLI

A lightweight privacy-focused tool that removes personal information from text documents before uploading them to Large Language Models.

What it does

Uses regex and spaCy’s named entity recognition to scrub sensitive data from text, replacing it with labelled placeholders like [NAME] and [EMAIL]. Handles:

  • Names
  • Email addresses
  • Phone numbers
  • Physical addresses and ZIP codes
  • URLs
  • US license plates

Usage

Available as both a CLI tool and a Python API:

from ai_data_scrubber import scrub_text

clean = scrub_text("Send a message to Jane Smith at jane@example.com")
# "Send a message to [NAME] at [EMAIL]"

Note: No automated tool catches everything. Always verify manually before uploading sensitive documents to an LLM.