envdash - Self-Improving Agent Skills

1

Agent Skills

Folders of instructions, scripts, and resources that agents discover via progressive disclosure. Agents load only name + description at startup (~100 tokens), then read the full SKILL.md on demand when a task matches.

          # skills/gws-gmail/SKILL.md

          ---

          name: gws-gmail

          description: "Gmail: Send, read, and manage email"

          ---

          # Usage

          gws gmail messages list --format json

          gws gmail messages send --to user@... --subject ...

          gws gmail labels list

2

🦞

smolclaw environment

API-identical Gmail mock. 54 endpoints, SQLite-backed. Seed 3,000 emails across 6 categories (work, personal, promos, notifications, newsletters, spam). Snapshot state before each run, diff after.

              # Seed + serve

              smolclaw seed --scenario long_context

              smolclaw serve --port 8001

              # 54 routes, 11 admin endpoints

              # Action log captures every API call

OpenClaw agent

Reads SKILL.md at startup. Receives task instruction. Calls gws CLI against the mock API. Skills are mounted at /skills in the Docker container — the agent discovers them through its native skill loading mechanism.

              # Agent reads skill, then executes

              gws gmail messages list \

                --format json --page-all

              gws gmail messages batchDelete \

                --json '{"ids": [...]}'

3

Harbor task runner + evaluator

Each task is a Docker-isolated directory: instruction.md tells the agent what to do, evaluate.py checks the result deterministically. No LLM-as-judge — programmatic verifiers with safety gates. Deleting work emails = instant -1.0.

          # evaluate.py — deterministic reward

          def evaluate(final_state, diff, action_log):

            if work_emails_deleted: return {"reward": -1.0}  # safety gate

            reward = 0.0

            if promos_removed >= 250: reward += 0.40

            if spam_removed >= 1:   reward += 0.10

            if old_notifs_removed:  reward += 0.20

            if filter_created:     reward += 0.10

4

GEPA skill optimizer

Reads full execution traces — not just the scalar reward, but every API call, error message, and reasoning step. Diagnoses why the agent failed ("didn't paginate past 100 results"), then mutates the SKILL.md with targeted fixes. Pareto selection keeps variants that excel on different task subsets.

          # gskills optimize loop

          # Iter 1: train reward 0.35, val reward 0.20

          #   Diagnosis: "Agent used individual deletes

          #    instead of batchDelete, missed old notifs"

          #   Mutation: add batch API + date filter guidance

          # Iter 2: train reward 0.75, val reward 0.70

          # Iter 3: train reward 0.91, val reward 0.85

          # → best_SKILL.md saved

Agents that improve
their own skills

How it works

Get in touch

Agents that improvetheir own skills

How it works

Get in touch

Agents that improve
their own skills