AI is crossing a threshold. It’s going from “helpful chat assistant” to “coworker that actually does things.”

This is happening because the key components are converging at the same time: reasoning, memory, tools, skills, permissioning, agents. Each one has been improving separately for years. Together, they hit a tipping point.

Reasoning LLMAccurate enough to trust with real decisionsSkillsFollows your workflows consistentlySub-agents & SwarmsDelegates work without hitting context limitsPermissioningAsks when it should, doesn't when it shouldn'tMemoryRemembers what matters, forgets what doesn'tCLI and MCP toolsReads and writes to your actual systems AI that does real work

I’m a product leader at HG Insights, a data/AI company. Over the past few months, I’ve built a Claude Code setup that hit that threshold. My work now splits into two modes: meetings, and working with Claude on everything else.

There’s a lot of noise in this space. People exaggerate what they’ve built because the hype makes it tempting. This post is the opposite: a real setup, warts and all. If you have your own, or tips on what I could do better, I’d love to hear it.

CLAUDE.md: The Rules File

CLAUDE.md loads into every Claude Code session automatically. Mine breaks down into four sections: philosophy, interaction design, behavioral corrections, and CLI reference.

Philosophy: Collaborator, Not Autonomous Agent

I work with Claude, not through it. It’s not an agent I send off to do things. It works step by step with me, presents options, and waits for my call. Here’s what that looks like in the file:

- Work step by step WITH Sam, not ahead of him. Don't go off on long solo explorations or try to solve everything at once. Take one step, check in, take the next.

### Decision-making: ASK before acting
When facing a design decision, a tradeoff, or when something doesn't work as expected:
- STOP and ask Sam before choosing an approach. Don't pick a direction and run with it.
- Present the situation clearly: what's the problem, what are the options, what are the tradeoffs.
- Never decide to remove functionality or downgrade the experience without asking. If something seems hard, that's a reason to discuss, not a reason to cut scope.

### Chat style
Don't be overly pleasing and apologetic. Be more like a friend and business partner. Not a subordinate that says sorry all the time.

A Claude that’s trying to please you will agree with bad ideas (there’s a benchmark for this: how often models accept nonsense instead of pushing back). A Claude that’s acting like a peer will push back.

Interaction Design: Small UX Choices That Compound

These are rules that make the back-and-forth faster:

### Asking questions
ALWAYS use numbered questions with lettered options. Makes it easy for Sam to respond like `1a 2b 3c`.

### Quick decision format
When presenting tasks/options for Sam to triage:
- Show ALL items at once in a table
- Unique number per item (no duplicates)
- Include suggested action
- Sam responds like: `1y 2y 3 delete` or `1. y 2. MadKudu 3. n`

### Collaborative editing
When we need to review/edit structured data together:
1. Create temp file at `/tmp/<descriptive-name>.md`
2. Use sectioned format with headers per item
3. Open in editor
4. Wait for Sam to edit and say "done"
5. Parse and execute changes

### Drafting messages on Sam's behalf
- Always write to a temp file and open for review first. Never post comments, emails, or messages directly.

### Writing guidelines
Don't use the "—" character. It shows the text has been generated by an AI.

Battle Scars: Behavioral Corrections

Rules I added after things went wrong. Each one came from a specific incident.

### Behavior
Claude has a pattern of making confident assumptions that turn out wrong. When dealing with destructive operations, file systems, or tools Claude hasn't verified recently, Claude MUST say 'I'm not certain about X, let me verify' rather than proceeding. If Claude catches itself about to say 'this will...' or 'this should...', that's a signal to stop and ask.

When something doesn't work as expected: STOP. Say "I don't know why." Ask Sam to help verify. Never theorize about a cause and then code a fix based on that theory. The pattern to break: unknown -> theory -> treat theory as fact -> code solution. The correct pattern: unknown -> "I don't know" -> ask Sam to help verify -> understand together -> then act.

Never suppress errors when exploring or debugging. No `2>/dev/null`, no `|| true` on commands you haven't verified work. Errors are information. Hiding them just delays diagnosis.

### Infrastructure and deployments
Never assume how deployments work. Don't say "X should be deploying now" unless you've verified the deployment mechanism.

I add a new one every few weeks. The file only grows, and that’s fine.

CLI Reference

The longest section: every CLI with its path, how to run it, and key commands. Here’s a sample:

### gmail-cli
Path: `~/Code/CLIs/gmail-cli`
Run: `uv run gmail-cli <command> --account hginsights`

  gmail-cli list                    # recent emails
  gmail-cli read <msg_id>           # full thread
  gmail-cli search "query"          # search emails
  gmail-cli draft create -t "to@email.com" -s "Subject" -b "Body"

### jira-cli
Path: `~/Code/CLIs/jira-cli`
Run: `uv run python jira_cli.py <project> <command>`

  jira mitb create "Summary" -d "Description"
  jira cpf view CPF-100
  jira mitb status MITB-123 p1

Mine has 20+ CLIs documented like this. Claude can look up any tool mid-session without guessing.

Skills: Encoding How You Work

A skill is a markdown file that teaches Claude a specific workflow. It fires automatically when Claude matches the situation to the skill description, or manually via /slash-command.

I’ve built 60+ over the past few months. Here’s a snapshot:

Customer Intelligence

Account ReviewAccount OutreachRenewal Risk ScoringCustomer Fit ModelsCopilot PlaygroundHG360 AnalysisCS Leader ReportsCompetitive Intel

Product & Engineering

Jira TicketsTechnical Req DocsProduct DocsPull RequestsMCP ServerswatchOS AppsUI DesignSkill WritingPermissions ConfigProduct Management

People & Planning

Direct ReportsHiring PipelinesWeekly PlanningDaily PlanningMeeting PrepCRM ContactsScheduling

Communication

Email TriageSlack ListsWriting Style

Data & Research

LinkedIn EnrichmentSales Nav CaptureCrunchbase CaptureNews AnalysisGemini Search

Content & Presentations

Blog PostsSite ManagementGoogle SlidesiA PresenterMarkdown to PPTXVisual ExplainerCode Documentation

Daily Operations

Expense ReportsWellness ReimbursementTime TrackingYear CalendarTodoist ManagementDisk CleanupTodos to RemarkableAsana Tasks

Knowledge Management

PARA MaintenanceBear NotesCodify LearningsSession SearchSession Repair

Learning & Media

Video LessonsVideo TranscriptsSlide Extraction

Personal

Knowing SamBedtime StoriesTravel GuidesAmazon ShoppingWaking Up DownloadsRemarkable Tools

You wouldn’t have these exact skills, but you’d have your own. A recruiter might have screen-resume, prep-debrief, draft-offer-letter. An engineer might have review-pr, write-migration, debug-production. The pattern: take a workflow you repeat, write it down, let Claude follow it.

Here’s what one looks like. This is my daily-planning skill:

---
name: daily-planning
description: Creates and manages daily plans with calendar, todos, and carry-over tracking. Use when Sam asks to plan his day, review todos, or start a new day.
---

# Daily Planning

## Creating a daily plan

This is a **collaborative** process. Don't generate a finished plan and present it. Work through it with Sam.

1. **Gather context** (do in parallel):
   - Check previous day's plan for uncompleted todos
   - Get calendar: `gcal-cli -v today`
   - Get today's tasks + overdue from Todoist
2. **Present the raw ingredients** to Sam
3. **Discuss priorities together.** Ask what matters most today.
4. **Top 3**: Only set after discussion. Come from Sam, not the task list.
5. **Draft the plan**, let Sam review/edit
6. Only send to Remarkable after Sam approves

Frontmatter tells Claude when to trigger it. The rest is just steps.

The skill that keeps the system current: codify-learnings. None of these 60 skills were planned upfront. They came from doing real work with Claude, noticing a pattern worth keeping, and capturing it. At the end of a session, I say “codify what we learned.” Claude reviews the conversation, updates existing skills, or creates new ones:

---
name: codifying-learnings
description: Reviews the current conversation to identify learnings and update skills, CLAUDE.md, or project notes.
---

# Codifying Learnings

### 1. Analyze the Conversation
Look for:
- Corrections Sam made ("no, do it this way")
- Preferences expressed ("I prefer X", "always do Y")
- Patterns in how Sam works
- Mistakes I made that could be prevented

### 2. Categorize Learnings
**Skill updates** - New capabilities, refined workflows
**CLAUDE.md updates** - General preferences, behaviors, rules
**Project notes** - Project-specific context

### 3. Propose Changes
For each learning:
1. State what I learned
2. Show the proposed update
3. Ask Sam to confirm before writing

Example: I was debugging a deployment issue and discovered our CI pipeline doesn’t auto-deploy on merge. At the end of the session, codify-learnings added this to CLAUDE.md:

Never assume how deployments work. Don't say "X should be deploying now" unless you've verified the deployment mechanism.

Every future session knows this. The next session I do that work, Claude will handle more of it on its own. The skills don’t drift out of date because using them triggers their own maintenance.

Tools: CLIs and MCPs

CLIs and MCPs both give Claude access to external systems. CLIs are command-line tools Claude runs via bash. MCPs are live server connections with a standardized protocol. From Claude’s perspective, they’re both just “tools I can call.”

I’ve built 30+ CLIs and use a handful of MCPs. Here’s a snapshot:

Communication

slack-cligmail-cliemail-context-cli

Calendar & Tasks

gcal-clitodoist-clibooking-clidue-cli

Documents & Content

gdoc-cligslides-cligsheet-clid360-cliblog-clilesson-cli

Data & Analytics

clickhouse-climotherduck-clisf-cs-cliPhoenix MCPScoring MCP

Product & Project

jira-cliconfluence-clivitally-cliVitally MCPcc-projects-cli

Knowledge & Memory

para-cliqmdgranola-clicrmBear MCPmemsearch

Research

brave-cliPerplexity MCPMermaid Chart MCP

Remarkable

rm-archiverm-readertext2rmremarkable-calendar

Personal

books-cliwebinar-to-slide-converter

Utilities

claude-repairfourfour-parsermonarch-clibooking-service

Why mostly CLIs instead of MCPs? Two reasons. First, a CLI’s output is shaped for the workflow. A generic Slack API returns raw JSON with user IDs, timestamps, and nested structures. My slack-cli returns formatted messages with usernames resolved, threads expanded, and output sized for context windows. Each CLI is built around how Claude actually needs to consume the data, which means fewer tokens wasted on parsing.

Second, CLIs are dramatically easier to build and iterate on. A CLI is a Python script. You run it, see the output, fix it, run it again. An MCP server means running a local server, testing with an MCP inspector, managing connections. Creating, debugging, and optimizing a CLI takes minutes. I can ask Claude to build one in a session and have it working immediately. All built with Python + uv, living in ~/Code/CLIs/.

This setup does affect tool choices. I moved from Bear to Obsidian specifically because Obsidian stores everything as plain markdown files that Claude can read and write directly, and Obsidian Sync gives me version history if something goes wrong. Bear stores notes in a SQLite database, which is harder for an LLM to work with and riskier when things break.

Memory: Persistent Context

You want the agent to have the right context to do a good job, but you don’t want memory polluted by things that are irrelevant or wrong. The problem with most AI memory is that it’s a black box: the model memorizes things that might or might not be true, and you can’t see or edit what it knows.

What works for me: a dedicated source of truth per memory type. Each type of information lives in a system I already use and can see, edit, and share.

What Claude needs to remember	Where it lives	How Claude accesses it
⚙️ Rules and preferences	CLAUDE.md	Loaded automatically every session
🕐 What it worked on before	memsearch (daily summaries)	Searches past session history
📁 Projects, notes, reference	PARA (Obsidian)	Semantic search across all files
👥 People and relationships	Personal CRM	Looks up contacts, meeting history, context
✅ Tasks and priorities	Todoist	Reads/writes tasks, checks deadlines
📚 Books and reading	Bookshelf CLI	Library of read/in-progress/want-to-read

The key: every one of these is a system I already use directly. I read and edit the same project notes, the same CRM, the same task list. When Claude writes a memory, I can review it, fix it, or delete it. There’s no hidden memory store I can’t inspect. This is what keeps the signal-to-noise ratio high.

PARA deserves a brief explanation. It’s a knowledge management structure (Projects, Areas, Topics) stored as plain markdown in Obsidian, synced across devices. Claude searches it with qmd, a semantic search tool that indexes every file. Project notes, meeting summaries, customer briefs, personal reference material are all one search away.

Agents: Delegation

An agent is a Claude Code subprocess with its own context window, tools, and instructions. Two reasons to use them:

Parallelizing big jobs. If I need to review 50 customer accounts before a QBR, I don’t do them one at a time. I launch a team of agents. Each agent takes one account, reads the brief from PARA, pulls Salesforce data, checks Slack and email activity, scores against renewal risk factors, and produces a summary. 10 agents running in parallel, each on its own account. Results get aggregated into a single CSV.

Protecting context. Every token the main agent spends on research is a token it can’t spend on the actual work. If I want to explore a codebase, research a topic, or gather background on a customer, I launch a sub-agent to do that and return just the results I need. The main conversation stays clean and focused.

I have six custom agents defined: documentation-writer, fastapi-tester, playwright-user-tester, jira-issue-summarizer, project-dashboard-updater, and skill-builder.

In my experience, agent swarms aren’t ready for general productivity yet. The QBR account review is one of the few cases where they consistently work well. For most day-to-day work, a single agent with the right skills goes a long way.

Real Workflows

These show how the pieces chain together. Each one is specific to my work, but the pattern underneath is generic.

Tickets and PRDs from raw feedback

I paste a customer email, Slack thread, or support ticket. Claude uses a skill to break it down, checks Jira for existing tickets on the same topic, drafts a well-structured ticket with the right custom fields (priority, impacted area, account type), and if needed, expands it into a full PRD by pulling context from Confluence.

The pattern: unstructured input in, structured output out, with system lookups to avoid duplicates and add context.

Sprint planning and board grooming

“Let’s plan the sprint.” jira-cli pulls open tickets from our boards filtered to my impacted areas. Claude triages: what’s stale, what’s blocked, what’s highest priority. We walk through it together. Claude presents numbered items, I respond with quick decisions (y/n/move/defer). Tickets get updated, new ones get created, and the board reflects reality.

The pattern: pull current state from system, present for human decision, execute the changes. The human decides, Claude does the clicking.

Email triage

“Process my inbox.” gmail-cli pulls recent emails. For each one, email-context-cli gathers the full thread, CRM history on the sender, and a web search for context. Claude drafts responses, flags what needs my personal attention, and queues the rest. 20 emails triaged in one session.

The pattern: batch process with enrichment per item, human reviews before anything gets sent.

Morning planning

“Plan my day.” The daily-planning skill fires: gcal-cli pulls today’s calendar, todoist-cli pulls tasks and deadlines, yesterday’s unfinished items carry over with a flag. Claude produces a structured plan with time blocks, suggesting when to do deep work based on meeting gaps.

The pattern: aggregate multiple sources into one view, suggest prioritization, write the output somewhere persistent.

Where This Is Going

This setup wasn’t planned or built all at once. It grew over months as skills and CLIs got added one at a time, each one because I hit the same workflow again and thought “Claude should know how to do this by now.”

What I’ve learned:

Skills are the highest-leverage investment. Teach Claude a workflow once, benefit every time it comes up. Skills compound: as you build more, Claude can chain them together in ways you didn’t plan for. A skill that reads your calendar plus a skill that checks your CRM means Claude can prep for a meeting by pulling both automatically.

CLAUDE.md guardrails prevent more disasters than you’d expect. Without them, Claude is confident and wrong at inconvenient times. The behavioral corrections I’ve accumulated are battle scars. Each one represents a real mistake that won’t happen again.

CLIs beat copy-paste. The difference between “Claude tells you what to do” and “Claude does the thing” is enormous. When Claude can actually create the Jira ticket, draft the email, update the calendar event, the loop from decision to action shrinks from minutes to seconds.

The “stop and ask” pattern changed everything. I don’t want an autonomous agent that goes off and does things without checking. I want a collaborator that works step by step with me, presents options, and waits for my call. This is a CLAUDE.md rule, and it’s the most important one.

Pick tools that are AI-friendly. Strong APIs, everything programmable, version history for when things go wrong. This affects every tool choice: note-taking app, project management, docs platform. If your tool doesn’t have a good API, it’s invisible to Claude.

What’s still rough:

Mobile. This setup is desktop-only. I have a Mac Mini that’s always on, accessible via Happy Coder and tmux over Tailscale, but it’s not a great experience. Copy-pasting images is painful, and I haven’t found it productive. I experimented with OpenClaw, which lets you control your computer through messaging apps like WhatsApp or Telegram. Two things stopped me. First, security: it has full access to your machine and consumes external content (emails, web pages, documents), which makes it a prompt injection target. This isn’t theoretical. Malicious skills have been found exfiltrating data silently, and multiple companies have published analyses of the attack surface. Second, permissioning: since the interaction happens through messages, there’s no proper permission flow. It can ask for permission, but approving actions one at a time through text messages is clunky compared to Claude Code’s terminal UI.

Scaling as a team. This is one person’s setup. The real frontier is making it work across a team and an organization. Skills are powerful because they’re personal, but what happens when you need shared team skills? Who owns them? How do you version them? CLIs multiply fast. How do you keep up with the pace they evolve, handle deployment, manage the lifecycle? How do you keep the setup decentralized and personal while also building shared tools that everyone uses? Permissioning is another challenge: everyone needs to configure their own, and it takes time. These are open questions I’m actively working through.

The path forward:

The real unlock isn’t one person’s setup. It’s when this scales across an organization:

Clean data capture. Every call recorded, every email thread captured, Jira and Confluence kept clean. AI can’t work with data it can’t access. I’m investing heavily here.
Technical users build CLIs and integrations. Permissioned access to internal systems, composable building blocks that anyone can use.
Business users get a simpler entry point. Not everyone needs Claude Code. I’m evaluating Claude Coworker as a middle ground: non-technical users can reuse the same skills and CLIs through a more accessible interface, without managing a terminal setup.
Skills and CLIs get a real lifecycle. Shared vs. personal, versioning, deployment across an org. This is the unsolved piece.

What’s your setup? I’d genuinely like to know. What skills have you built? What tools are you connecting? What would you do differently? If anything here sparked an idea or you’ve solved a problem I’m still stuck on, I’d love to hear about it. Reach out on LinkedIn.