ai-sandbox

Goal

I'd like to increase my velocity by creating a space where Claude Code could run unfettered, but safely. Initially this is for coding projects, but in the future I could imagine this being used for other agent-based OpenClaw-style workflows, like sending messages, booking appointments, or monitoring things.

[Claude] "OpenClaw" — is this a typo or a specific tool? If you mean OpenHands (formerly OpenDevin), that's a reasonable reference. Worth clarifying the roadmap: "coding first, then agentic workflows" is smart sequencing, but the identity/account setup you're planning (Gmail, iCloud, GitHub) is more load-bearing once you add agent actions like sending messages. Get that right upfront.

[Gemini] Phasing this starting with coding is a great way to "burn in" the security model before giving the agent more social/financial agency. Regarding "OpenClaw," if you're looking for a framework to build those future agentic workflows, LangGraph or PydanticAI might be worth keeping on your radar for the "monitoring" and "booking" phase.

Access

CAN read my calendar

CAN read my github repos

I would like to share my Claude Code token budget. Gemini not so important.

CANNOT access 1Password – not installed

CANNOT read my email

CANNOT read my files in iCloud and Google drive

[Claude] The "CANNOT" list is a good instinct. Worth being explicit about write access too — can it push commits? Open PRs? Post comments? The read/write distinction matters more than read/no-read once this is live.

[Gemini] For the "CANNOT" list, I'd also recommend blocking local network discovery (mDNS/Bonjour) and access to other devices on your LAN. You might also want to explicitly define a "Quarantine Zone" for downloads/build artifacts so they don't accidentally leak out of the VM via shared clipboards or folders.

Context

I have a M1 Mac studio with 1TB disk and 64GB and two monitors that will lend itself nicely to this setup. It's my dev machine, running current Tahoe with xcode on it. Occasionally I work from a M3 Mac laptop. I am very comfortable with ssh and command line tools. I prefer console tools to IDE's, for example my standard setup is ghostty with Claude Code and neovim. Chrome is my browser of choice.

For more context on me you can look up https://sef.kloninger.com/, especially https://sef.kloninger.com/tools

[Gemini] Given your preference for Ghostty and CLI tools, you'll likely want to ensure the VM's SSH daemon is locked down to only accept keys from your host machine. This keeps the "unfettered" agent inside the box while giving you a seamless entry point.

Questions

Q: Mac or Linux? Generally I've been impressed with the MacOS native desktop and keyboard-video-mouse sharing, so likely that's the answer.

[Claude] Mac makes sense here — you already have Apple Silicon hardware and MacOS tooling (Xcode, Homebrew, Ghostty). The main tradeoff: macOS VMs on Apple Silicon require a macOS license per instance and are heavier (~30–40GB). Linux VMs are lighter, cheaper to snapshot/clone, and easier to script. If the primary use is headless coding with Claude Code, a Linux VM (Debian/Ubuntu arm64) would actually work fine — Claude Code runs great on Linux and most dev toolchains do too. Worth asking: do you actually need macOS-specific things (Keychain, macOS SDK) in the sandbox?

[Gemini] If you plan to use Xcode or test macOS-specific apps, stay with macOS. However, if this is 90% "headless" terminal work, a Linux VM is significantly easier to "hardened" at the kernel level (using namespaces/cgroups) to prevent an agent from getting too creative with system resources. If you go macOS, look into the Virtualization.framework limits.

Q: What virtualization tool? - Heard good things about both UTM and tart - While predisposed to use vmware fusion, recent poor experiences with them say no.

[Claude] If you go macOS VM: tart is the right call. It's purpose-built for Apple Silicon macOS VMs, has a clean CLI, supports fast snapshot/restore, and is designed for exactly this kind of automated use. UTM is more general-purpose and better for Linux or Windows guests. If you go Linux: UTM or just Lima (very lightweight, no GUI overhead, great for headless dev environments). Lima is worth a look — limactl is dead simple and already popular in the Docker-on-Mac crowd.

[Gemini] Tart is the winner for macOS guests due to its tart clone and tart run CLI-first workflow, which fits your terminal-heavy style perfectly. If you decide on Linux, OrbStack is also worth a mention—it's incredibly fast and has a much smaller memory footprint than a full UTM VM, though it's less of a "hard sandbox" than a traditional VM.

Q: What's a good user name? I'm thinking sefk-robot since that's what will show up on emails and commit messages, so I want to signal that this is my responsibility, but not me actually talking. Other ideas: - sefk-ai - sefk-delegate (less scary) - not-sef

[Claude] Of these, sefk-bot (not listed but worth considering) or sefk-agent reads as clearly automated and professional. sefk-robot is fine but a bit playful. not-sef is cute but ambiguous to outsiders. sefk-delegate undersells the AI-ness in a way that could confuse collaborators. My vote: sefk-agent — signals agentic AI, clearly yours, neutral tone.

[Gemini] sefk-agent is the most descriptive. However, sefk-bot is the industry standard for git commits and service accounts. One benefit of sefk-bot is that many CI/CD tools and GitHub actions have built-in logic to handle [bot] suffixes differently (e.g., excluding them from certain metrics or notifications).

Plan

Create @gmail.com and log into Chrome with that. Not part of the Google Family for file/cost sharing

Create @icloud.com and log into the VM with that. Not part of Apple Family account.

Create @github.com with it's own ssh keypair. Explicitly add that to ACL's of private repos, there aren't that many.

[Claude] Solid isolation approach. A few additions worth considering: (1) For GitHub, also set up a fine-grained Personal Access Token scoped to specific repos rather than relying solely on SSH keys — gives you repo-level permission control and easy revocation. (2) Consider a separate Anthropic account or sub-account for the Claude Code token budget so usage is tracked separately and you can cap spend independently. (3) If/when this does email or calendar writes, the Gmail account being outside your Google Family is critical — good call to note that explicitly.

[Gemini] This "Clean Room" identity approach is excellent. For the GitHub account, I strongly recommend enabling Commit Signoff or GPG signing for the bot account so you can cryptographically verify which commits were generated in the sandbox vs. your main machine. Also, ensure the GitHub account has 2FA enabled but uses a separate TOTP seed (stored in your 1Password, but not the VM's) to prevent the agent from being able to reset its own credentials.