Astronox Docs

Rate Limits & Costs

Understanding API usage, quotas, and pricing.

Rate Limits & Costs

Understanding API usage, quotas, and pricing.

Overview

Astronox uses cloud AI models, which have:

  • Rate limits (requests per minute/day)
  • Token usage (affects cost if on paid tier)
  • Context limits (max conversation size)

Google Gemini

Free Tier Limits

Flash & Flash Lite

Rate Limits:

  • 60 requests per minute
  • 1,500 requests per day
  • Resets: Minute limit resets every 60s, daily limit at midnight PST

What counts as a request:

  • Each message you send
  • Follow-ups count separately
  • Tool calls don't count extra

Typical usage examples:

Light user (10-20 messages/day):
→ Never hits limits ✅

Regular user (50-100 messages/day):
→ Well within limits ✅

Power user (200-500 messages/day):
→ May hit daily limit on busy days ⚠️

Heavy user (500+ messages/day):
→ Will hit daily limit, need paid tier 🔴

Gemini Pro

Rate Limits:

  • 10 requests per minute
  • 500 requests per day

Same counting rules as Flash

Note: Lower limits due to higher model costs


What Happens When You Hit Limit

Per-Minute Limit

Error message:

⚠️ Rate limit exceeded
Too many requests in the last minute.
Please wait 45 seconds before retrying.

[Retry in 45s]

What to do:

  • Wait for countdown
  • Or pause for 1 minute
  • Limit resets automatically

Prevention:

  • Batch related questions in one message
  • Avoid rapid-fire messages
  • Switch models if one is limited

Daily Limit

Error message:

⚠️ Daily quota exceeded
You've used 1,500 requests today.
Quota resets at midnight PST (8 hours).

Options:
• Wait for reset
• Enable billing for unlimited requests
• Switch to different model (separate quota)

[Enable Billing] [Learn More]

What to do:

  • Wait until midnight PST
  • Enable billing (see below)
  • Use MintAI if available

Enabling Paid Tier

When you need it:

  • Consistently hit daily limits
  • Need higher rate limits
  • Commercial/professional use
  • Don't want interruptions

How to enable:

  1. Visit Google Cloud Console
  2. Select your project (same as API key)
  3. Billing → Enable billing
  4. Add payment method

No changes needed in Astronox - same API key works


Pricing (Paid Tier)

As of December 2025:

Flash & Flash Lite

Input:

  • $0.075 per 1 million tokens
  • ~$0.0001 per typical message

Output:

  • $0.30 per 1 million tokens
  • ~$0.0003 per typical response

Total per request:

  • Simple query: ~$0.0005 (half a cent)
  • Average request: ~$0.001 (one cent)
  • Complex request: ~$0.003 (three cents)

Gemini Pro

Input:

  • $1.25 per 1 million tokens
  • ~$0.0015 per typical message

Output:

  • $5.00 per 1 million tokens
  • ~$0.005 per typical response

Total per request:

  • Simple query: ~$0.002
  • Average request: ~$0.005 (half a cent)
  • Complex request: ~$0.01 (one cent)

Cost Estimations

Light use (50 requests/day):

  • Flash: ~$1.50/month
  • Pro: ~$7.50/month

Regular use (200 requests/day):

  • Flash: ~$6/month
  • Pro: ~$30/month

Heavy use (500 requests/day):

  • Flash: ~$15/month
  • Pro: ~$75/month

Note: Actual costs vary based on:

  • Message complexity
  • Response length
  • Conversation context
  • Tool usage

Optimizing Costs

Use Appropriate Model

For simple tasks:

✅ "List files in Downloads" → Flash Lite (cheaper)
❌ Don't use Pro for simple queries

For complex tasks:

✅ "Analyze entire codebase" → Pro (worth the cost)
❌ Don't use Flash Lite for complex analysis

Batch Requests

Inefficient:

"Find PDFs"
"Now find images"
"Now find videos"

3 requests = 3× cost

Efficient:

"Find all PDFs, images, and videos"

1 request = 1× cost


Start Fresh Conversations

Long conversations use more tokens:

  • Every message includes previous context
  • 100-message conversation = huge context
  • Costs increase with conversation length

Solution:

  • Start new chat when switching topics
  • Refresh conversation every 50-100 messages
  • Use memory instead of long context

Use Memory Wisely

Instead of:

Every day: "Remember I prefer Python..."

Many requests = high cost

Do once:

"Remember I prefer Python"

Then AI uses memory (no repeated cost)


MintAI (Pro)

Subscription Model

How it works:

  • Fixed monthly/annual cost
  • No per-token billing
  • Unlimited requests (fair use policy)

Cost:

  • Check official MintAI website
  • Typically: $X/month or $Y/year

Fair Use Policy

Unlimited, but reasonable:

  • Normal usage: No restrictions
  • Excessive usage: May be throttled
  • Abuse prevention: Limits if detected

What's considered excessive:

  • Thousands of requests per hour
  • Obvious automation/botting
  • Sharing account

What's normal:

  • Hundreds of requests per day ✅
  • Heavy development work ✅
  • Regular personal/professional use ✅

When Subscription Makes Sense

Choose Pro if:

  • Heavy daily usage (>200 requests/day)
  • Prefer predictable costs
  • Want premium models
  • No per-use billing stress

Stick with Gemini free tier if:

  • Light usage (<100 requests/day)
  • Occasional use
  • No budget for subscription
  • Free tier sufficient

Cost comparison:

Gemini Paid (200 req/day):
~$6-30/month depending on model

Pro (unlimited):
Fixed price (check website)

Break-even depends on your usage

Token Limits

What Are Tokens?

Tokens ≈ words/characters:

  • 1 token ≈ 0.75 words (English)
  • 1,000 tokens ≈ 750 words
  • 100k tokens ≈ 75k words ≈ 150 pages

What counts toward limit:

  • Your messages
  • AI responses
  • Conversation history
  • System prompt (~2k tokens)
  • Attached file content

Model Context Limits

Flash & Flash Lite:

  • 128k tokens (~96k words)
  • About 200 pages of text
  • Conversation history + current message

Gemini Pro:

  • 2M tokens (~1.5M words)
  • Entire novels + large codebases
  • Massive conversation history

Devstral 2 (MintAI):

  • 128k tokens (~96k words)
  • Similar to Flash

What Happens at Limit

Automatic truncation:

Conversation gets too long
↓
Oldest messages removed from context
↓
AI still responds, but may forget early context
↓
Start new conversation if context important

You won't see errors - handled automatically

Best practice:

  • Start new chat every 50-100 messages
  • Or when switching topics
  • Critical context → use memory system

Monitoring Usage

In Google AI Studio

View statistics:

  1. Visit Google AI Studio
  2. Select your project
  3. View dashboard:
    • Requests today
    • Requests this month
    • Token usage
    • Cost (if billing enabled)

In Astronox (Future)

Coming soon:

Settings → API Keys → View Usage

Today:
• Requests: 47 / 1,500 (3%)
• Estimated cost: $0.12

This month:
• Requests: 1,247
• Total cost: $3.42

Rate Limit Strategies

Heavy Usage Patterns

If you hit limits regularly:

Option 1: Enable billing

  • Removes daily limits
  • Pay per use
  • Simple solution

Option 2: Use multiple models

  • Flash has separate quota from Pro
  • Flash Lite has separate quota from Flash
  • Switch between them

Option 3: Optimize usage

  • Batch requests
  • Start fresh conversations
  • Use memory system

Option 4: Subscription

  • Pro for predictable cost
  • Unlimited Devstral 2 usage

Mixing Free & Paid

Strategy:

Free tier: Simple daily tasks (Flash Lite/Flash)
→ Stay within 1,500/day

Paid tier (when needed): Complex analysis (Pro)
→ Pay only for special tasks

Result: Minimize costs, maximize value

Billing Alerts

Set Up Notifications

In Google Cloud Console:

  1. Billing → Budgets & alerts
  2. Set monthly budget (e.g., $10)
  3. Alert thresholds:
    • 50% of budget
    • 90% of budget
    • 100% of budget

Receive emails when thresholds hit

Prevents surprise bills!


Cost Control Tips

✅ Use Flash by Default

Flash is the sweet spot:

  • Fast responses
  • Good quality
  • Low cost
  • High daily limits

Reserve Pro for when truly needed


✅ Clear Conversations

Weekly: Review and delete old conversations
Benefit: Reduces storage, keeps app fast

✅ Don't Over-Prompt

Wasteful:

"Please help me find files"
"I would appreciate it if you could..."

Efficient:

"Find files in Downloads"

Same result, fewer tokens


✅ Use File Paths

Instead of attaching:

"Analyze the image at ~/Desktop/screenshot.png"

Benefits:

  • No upload needed
  • Faster
  • Doesn't count toward attachment limit

❌ Avoid Long Conversations

Every message includes history:

Message 1: 100 tokens
Message 50: 5,000 tokens (includes all previous)
Message 100: 10,000 tokens

Cost increases exponentially!


Fair Usage

What's Reasonable

Normal patterns:

  • Work hours: 20-50 requests/hour ✅
  • Peak usage: 100 requests/hour ✅
  • Daily total: 500-1000 requests ✅

Concerning patterns:

  • Constant automated requests 🔴
  • Identical repeated requests 🔴
  • Account sharing 🔴

API Terms of Service

Review Google's terms:

  • Fair use expectations
  • Prohibited uses
  • Rate limit policies
  • Billing terms

Link: https://ai.google.dev/terms


Summary

Free tier (Gemini):

  • 60 req/min, 1,500 req/day (Flash)
  • Generous for most users
  • Enable billing if needed

Paid tier:

  • ~$0.001 per request (Flash)
  • ~$0.005 per request (Pro)
  • Predictable, affordable

Pro:

  • Fixed subscription
  • Unlimited Devstral 2
  • Best for heavy users

Optimize:

  • Use appropriate model
  • Batch requests
  • Fresh conversations
  • Memory instead of context

Next: Understand Accuracy & Reliability expectations.