Skip to main content

Tokens, Costs & How to Pay for Gumloop

AI doesn’t process words the way you read them. Understanding tokens — the actual unit of AI — helps you understand costs and optimize usage.

Understanding Tokens

AI models don’t read words. They read tokens — small chunks of text that are usually part of a word, a whole word, or a punctuation mark. Think of tokens as the “syllables” that AI understands.

As a rough rule of thumb: 1 token is about 3/4 of a word. A 100-word paragraph is roughly 130 tokens. A short email might be 200 tokens. A long document could be tens of thousands.

Type something to see how it splits into tokens:
Gumloop connects AI to the tools you already use and gets work done.

This is a simplified visualization. Real tokenizers (like GPT’s BPE) split text differently, but the idea is the same: AI reads chunks, not whole words.

There are two types of tokens in every AI interaction:

Token typeWhat it isCost
Input tokensEverything you send to the model — your message, conversation history, tool definitions, tool resultsLower cost
Output tokensEverything the model generates — its response, tool calls, reasoning2-4x more expensive

Why the difference? Generating text is harder than reading it. The model has to predict each word one at a time, which requires more computation than simply processing input.

How Gumloop Credits Work

Gumloop credits are calculated based on two things:

  • Tokens analyzed: every input and output token the AI processes costs credits. Output tokens cost more than input tokens because generation is more compute-intensive.
  • Tool calls: tools that connect to external data sources (like Firecrawl, Apollo, or web search) have their own credit cost. Tools that stay within Gumloop (like sending a Slack message or reading a Google Doc) are free. See the exact cost of each tool.

There’s a hidden cost to watch for: when a tool returns data, the model has to read the entire response to use it. A tool that returns a large JSON payload (company profiles, scraped web pages, database results) can generate thousands of input tokens just from the result. Sometimes analyzing the response costs more than the tool call itself.

Here’s how all of this adds up in a real interaction:

You ask an agent to check a deal in Salesforce and enrich the contact.

You
What’s the status of the Acme deal? Enrich their contact info.
Input tokensYour message + agent instructions + tool definitions~1,100 tokens2 credits
AI
I should look up “Acme” in Salesforce
Output tokensAI reasoning + tool selection~115 tokens1 credit
Tool call
salesforce_lookupSalesforce
query: “Acme”
Result: Acme Corp — Stage: Negotiation, Value: $48K, Next step: Contract review Thursday
Tool creditSalesforce lookup0 credits
Input tokensTool result fed back to the model~500 tokens1 credit
AI
Now I’ll enrich the contact using Apollo
Output tokensAI reasoning + tool selection~110 tokens1 credit
Tool call
apollo_enrichApollo
company: “Acme Corp”
Result: John Smith, VP Sales, john@acme.com, LinkedIn, phone, company details, funding history, tech stack, employee count …
Tool creditApollo enrichment5 credits
Input tokensFull JSON response analyzed by the model~4,000 tokens7 credits
AI
The Acme Corp deal is in Negotiation, valued at $48K with a contract review Thursday. The key contact is John Smith (VP Sales) at john@acme.com.
Output tokensFinal response to you~110 tokens1 credit
Total for this interaction
Input tokens (~5,600)10 credits
Output tokens (~335)3 credits
Salesforce lookup0 credits
Apollo enrichment5 credits
Total18 credits

Paying for Gumloop

Gumloop offers two ways to handle AI costs:

Gumloop Credits

The simplest option. You buy credits, and Gumloop handles the model provider relationship. Credits cover two things: inference (the tokens the AI reads and generates) and tool calls (actions like looking up data in Salesforce, sending an email, or scraping a website). This is the right choice for most teams. No API keys to manage, no billing complexity.

Bring Your Own API Key (BYOK)

If you already have an account with Anthropic, OpenAI, or another provider, you can plug in your own API key. You pay the provider directly for model usage and Gumloop separately for platform features. This is useful if you have negotiated enterprise rates or need to keep billing centralized.

Choosing the Right Model

Remember from Lesson 1: start with a capable model, then move down. The same principle applies to costs:

  1. Build your workflow with a strong model (e.g., Claude Sonnet) to get the quality right
  2. Once it works, try a faster, cheaper model (e.g., Claude Haiku) and compare results
  3. If the output quality is still good enough, you’ve just cut your costs significantly
The biggest cost lever

Model choice is the single biggest factor in cost. The difference between a top-tier model and a mid-tier one can be 10x or more per token. Always test whether a cheaper model can handle your use case.

Optimizing Token Usage

Since you pay per token, being smart about token usage directly affects cost:

  • Caching — if you ask the same question repeatedly (or similar questions), some providers offer prompt caching that reuses previously processed input tokens at a discount
  • Context management — every message in a conversation gets re-sent to the model. Long conversations accumulate tokens fast. Starting a new conversation for a new topic keeps costs down.
  • Be intentional about prompt size — a 50-word instruction often works as well as a 500-word one. Shorter prompts mean fewer input tokens and lower costs.
  • Choose the right model — a smaller, faster model costs less per token. If the task doesn’t need the most capable model, don’t pay for it.

Quiz: Tokens & streaming

Which statement about streaming responses is correct?

Correct! Models generate one token at a time and stream each one as it’s ready. This is how the “typing” effect works — the model is literally composing its response in real-time.

Not quite. Models generate tokens one at a time, sending each as it’s ready. The “typing” effect isn’t cosmetic — it reflects how the model actually works. Output tokens also cost 2-4x more than input tokens.

Reset