Tokens, Costs & How to Pay for Gumloop
AI doesn’t process words the way you read them. Understanding tokens — the actual unit of AI — helps you understand costs and optimize usage.
Understanding Tokens
AI models don’t read words. They read tokens — small chunks of text that are usually part of a word, a whole word, or a punctuation mark. Think of tokens as the “syllables” that AI understands.
As a rough rule of thumb: 1 token is about 3/4 of a word. A 100-word paragraph is roughly 130 tokens. A short email might be 200 tokens. A long document could be tens of thousands.
There are two types of tokens in every AI interaction:
| Token type | What it is | Cost |
|---|---|---|
| Input tokens | Everything you send to the model — your message, conversation history, tool definitions, tool results | Lower cost |
| Output tokens | Everything the model generates — its response, tool calls, reasoning | 2-4x more expensive |
Why the difference? Generating text is harder than reading it. The model has to predict each word one at a time, which requires more computation than simply processing input.
How Gumloop Credits Work
Gumloop credits are calculated based on two things:
- Tokens analyzed: every input and output token the AI processes costs credits. Output tokens cost more than input tokens because generation is more compute-intensive.
- Tool calls: tools that connect to external data sources (like Firecrawl, Apollo, or web search) have their own credit cost. Tools that stay within Gumloop (like sending a Slack message or reading a Google Doc) are free. See the exact cost of each tool.
There’s a hidden cost to watch for: when a tool returns data, the model has to read the entire response to use it. A tool that returns a large JSON payload (company profiles, scraped web pages, database results) can generate thousands of input tokens just from the result. Sometimes analyzing the response costs more than the tool call itself.
Here’s how all of this adds up in a real interaction:
You ask an agent to check a deal in Salesforce and enrich the contact.
Paying for Gumloop
Gumloop offers two ways to handle AI costs:
Gumloop Credits
The simplest option. You buy credits, and Gumloop handles the model provider relationship. Credits cover two things: inference (the tokens the AI reads and generates) and tool calls (actions like looking up data in Salesforce, sending an email, or scraping a website). This is the right choice for most teams. No API keys to manage, no billing complexity.
Bring Your Own API Key (BYOK)
If you already have an account with Anthropic, OpenAI, or another provider, you can plug in your own API key. You pay the provider directly for model usage and Gumloop separately for platform features. This is useful if you have negotiated enterprise rates or need to keep billing centralized.
Choosing the Right Model
Remember from Lesson 1: start with a capable model, then move down. The same principle applies to costs:
- Build your workflow with a strong model (e.g., Claude Sonnet) to get the quality right
- Once it works, try a faster, cheaper model (e.g., Claude Haiku) and compare results
- If the output quality is still good enough, you’ve just cut your costs significantly
Model choice is the single biggest factor in cost. The difference between a top-tier model and a mid-tier one can be 10x or more per token. Always test whether a cheaper model can handle your use case.
Optimizing Token Usage
Since you pay per token, being smart about token usage directly affects cost:
- Caching — if you ask the same question repeatedly (or similar questions), some providers offer prompt caching that reuses previously processed input tokens at a discount
- Context management — every message in a conversation gets re-sent to the model. Long conversations accumulate tokens fast. Starting a new conversation for a new topic keeps costs down.
- Be intentional about prompt size — a 50-word instruction often works as well as a 500-word one. Shorter prompts mean fewer input tokens and lower costs.
- Choose the right model — a smaller, faster model costs less per token. If the task doesn’t need the most capable model, don’t pay for it.
