Back

How I use AI

Last updated April 1, 2026

This page is a collection of tips I often share with clients, colleagues, and friends to help them achieve better results with AI.

They come from my personal experience with agentic coding (aka “vibe coding”), applied AI engineering, and using LLMs for knowledge work.

An introduction to Context Engineering

The quality of input dictactes the quality of output.

Garbage in = garbage out. This summarizes this entire guide: Elevate the quality of your inputs, to get the best outputs.

Poop emoji -> LLM -> poop emoji

Context is essential:

  • too little = generic output
  • too much = lower prompt adherence (see context rot)

Agentic Coding

Read my context engineering tutorial for Cursor.

While most advice here is intended to be leveraged for software development, they also apply to any general-purpose tasks performed with an agent.

Enforce subagents usage in the plan

Subagents are useful for:

  • leveraging task-specific instructions, models, and toolset
  • isolating work in a different context window

I mainly use subagents to isolate context, and keep the main agent's context window clean. Most tasks like running commands, editing files, exploring the codebase, or calling MCPs can be achieved by subagents.

Conversely, having a single agent do implementation + testing + linting + iterating on the results will bloat a context window. This results your agent ending up in the “dumb zone” for the last parts of the plan. And you don't want it to hallucinate or start ignoring key instructions in the verification steps :)

By enforcing subagents usage in the plan, you get your model to spawn subagents for scoped subtasks, and can keep the main agent working as an orchestrator. You ensure it has enough context real estate to ensure the implementation stays faithful to the plan.

Progressive context disclosure

You want your LLM to have access to right context, but minimize context pollution: only provide context relevant to its current task.

At the codebase level: progressive disclosure of architecture details

The best way to achieve this is to leverage AGENTS.md files. I suggest collocating them with the code as showcased here. In TypeScript, you can create nested folders just to have a dedicated AGENTS.md file for the relevant code. Think of AGENTS.md as “fish eye lens” on your folder: it has high-level view of what's inside the folder, but it should also have peripheric vision of what's related to it.

Read more on Fish Eye by Amelia Wattenberger.

At the tools level: progressive disclosure of instructions

Use only agent skills that you have read. Add them when a need is identified. Make sure their description is relevant. Customize the description if your LLM/harness doesn't pick it up.

Trim tools output

You want to minimize irrelevant tokens in your context, keep the model "focused" on relevant information and save money by reducing token consumption.

The best way I know to scale this practice is to use tokf to filter out the output of your CLI commands to keep context clean.

For example, on a test command, you would only get the output of failing tests (instead of the complete list of test cases, coverage, etc.)

NB: Part of the discourse against MCPs are that they bloat the context window; some poorly implemented MCP also output shit tons of JSON (like an API would), resulting in poor performance in multi-turns scenario.

Models recommendations

The field is moving fast. ChatGPT released around 2 years ago. New models release every couple months. You want to be able to leverage the best tools.

AI chat client

Don't restrict yourself to a single tool/LLM. Try everything; use LLM clients that support all models. My personal favorite LLM client is LibreChat because it allows interacting with any models through a single app.

It comes with minor quality of life compromises, but the added benefits are invaluable. By using different models, you learn what they are good at and not. You're enabled to save costs and be more productive.

Preferred LLMs (April 2026 update)

Large language models can be smart, fast, or cheap. Choose two.

TL;DR:

  • OpenAI 5.x: has the best all-around model selection
  • Claude 4.5/4.6: excellent models, terrible price
  • Gemini 3: best price/cost compromise
  • Kimi 2: most affordable (litte experience with 2.5)
Intelligence vs cost to run graph
Intelligence vs cost to run (source: artificialanalysis.ai)

Knowledge work

Most of the knowledge work I use AI for is researching and consolidating information. Typically, I would:

  • ask for executive summaries synthesizing information across Linear, Notion, and GitHub
  • ask questions about industry trends, software architecture, or dev tools from a business/engineering perspectives

For most of these, I mostly rely on gpt-5.1-high because it's reliable in tool calling and its verbosity can be configured. I rarely found the need to upgrade models for this type of research tasks, though I would exceptionally do so for multi-hop research. In which case I would use gemini-3.1-pro or gpt-5.4.

If I want the work done quicker, gemini-3-flash is a strong candidate.

Agentic coding: planning

My recommendations for reliable implementation plans:

  • gpt-5.4 (or medium or higher) is the best of task planning
  • gpt-5.1-high is coming in second; it's not as smart and slower, but much cheaper
  • kimi-2.5 cheap, good value, but requires more prompting

Why GPT 5.1 over GPT 5.2

GPT 5.1 (high) vs GPT 5.2 (medium) have similar intelligence. But the former prices mostly on reasoning, and the latter on input tokens. I found GPT 5.2 to get more expensive as context grows quickly (typically in coding tasks.) If I need smarter than gpt-5.2-medium, I'd just rather use gpt-5.4-*.

Cost to run graph
Cost to run (source: artificialanalysis.ai)

Agentic coding: Implementation

  • codex-5.3-medium (higher as needed) - my go-to model for writing code; the efficiency offsets the price per token
  • gemini-3-flash - best intelligence for value, excellent at small-scope task and tools usage; that's the one I would be asking to interact with MCPs to figure out database shape, implementation requirements, etc
  • gemini-3.1-pro - that's the one I can trust when I'm lazy and want the work done; it's smart enough to figure out on its own, and cheap enough that I don't mind if I have to do the task again lol
  • codex-5.1-mini-high - cheap, best-value if you have clear instructions, but slow
  • codex-5.3-spark - a much faster, cheaper alternative to 5.1-mini-high but much less autonomous

Notable mentions

Kimi K2.5 and GLM 5 are both capable models at very cheap price. I haven’t found ways to leverage them efficiently yet for agentic coding. (Doesn't work well with Cursor)

For knowledge work, Kimi K2 was my go-to model for quick tasks, asking questions, and small-scoped research. It's an excellent model for conversation, but Kimi K2.5 is currently suffering from issues with tool usage in my LLM client which prevents me from using it more.

GLM 5 is probably the smartest coding model you can get at this price range, but I found it still needs a lot of handholding in my experience compared to e.g. OpenAI models. Users leveraging more "bruteforce" methodologies like ralph.

All Claude models are great, they're just expensive af.

Intelligence vs output tokens used graph
Intelligence vs output tokens used (source: artificialanalysis.ai)

How I use each

If you're curious, this is how I end up using each models:

  • Codex for coding
  • GPT for planning + research
  • Gemini for asking questions + summarizations
  • Kimi for non-work tasks

Get in touch

If you and your team need advice to help get more value out of AI for engineering and knowledge work, let's get in touch.