Journal

Evaluating Multi-Modal Prompts Image, Text, and Beyond by Travis Kroon

Evaluating Multi-Modal Prompts: Image, Text, and Beyond.

Prompt engineering is no longer text-only. With GPT-4 Vision, Claude 3, and Gemini handling images, documents, charts—even audio—2025 demands a new discipline: multi-modal prompt evaluation. This post outlines how to evaluate image + text prompts systematically, measure performance, and build…

Prompt Refactoring Patterns for Complex Tasks by Travis Kroon

Prompt Refactoring Patterns for Complex Tasks

As prompt engineering matures, brute-force trial and error no longer cuts it. Complex tasks—multi-step reasoning, document synthesis, agent orchestration—need structured prompt refactoring. In this post, we explore reusable refactoring patterns to improve clarity, reliability, and output quality when basic prompting…

Automating Prompt Iteration with LangChain + LangSmith by Travis Kroon

Automating Prompt Iteration with LangChain + LangSmith

Manually testing prompts is fine for hobby projects. But if you’re shipping LLM-powered apps, you need an upgrade. Enter: LangChain + LangSmith. This combo lets you track, evaluate, and iterate on prompts automatically—with structured workflows, detailed logging, and prompt version…

Scoring Prompts at Scale Metrics That Matter by Travis Kroon

Scoring Prompts at Scale: Metrics That Matter

You can’t improve what you don’t measure. And in prompt engineering, what you measure shapes how you build. By 2025, AI teams have moved beyond vibe checks and spot tests. If you’re deploying LLMs in production—or even prototyping seriously—you need…