Rate Limits & Costs

Understanding API usage, quotas, and pricing.

Overview

Astronox uses cloud AI models, which have:

Rate limits (requests per minute/day)
Token usage (affects cost if on paid tier)
Context limits (max conversation size)

Google Gemini

Free Tier Limits

Flash & Flash Lite

Rate Limits:

60 requests per minute
1,500 requests per day
Resets: Minute limit resets every 60s, daily limit at midnight PST

What counts as a request:

Each message you send
Follow-ups count separately
Tool calls don't count extra

Typical usage examples:

Light user (10-20 messages/day):
→ Never hits limits ✅

Regular user (50-100 messages/day):
→ Well within limits ✅

Power user (200-500 messages/day):
→ May hit daily limit on busy days ⚠️

Heavy user (500+ messages/day):
→ Will hit daily limit, need paid tier 🔴

Gemini Pro

Rate Limits:

10 requests per minute
500 requests per day

Same counting rules as Flash

Note: Lower limits due to higher model costs

What Happens When You Hit Limit

Per-Minute Limit

Error message:

⚠️ Rate limit exceeded
Too many requests in the last minute.
Please wait 45 seconds before retrying.

[Retry in 45s]

What to do:

Wait for countdown
Or pause for 1 minute
Limit resets automatically

Prevention:

Batch related questions in one message
Avoid rapid-fire messages
Switch models if one is limited

Daily Limit

Error message:

⚠️ Daily quota exceeded
You've used 1,500 requests today.
Quota resets at midnight PST (8 hours).

Options:
• Wait for reset
• Enable billing for unlimited requests
• Switch to different model (separate quota)

[Enable Billing] [Learn More]

What to do:

Wait until midnight PST
Enable billing (see below)
Use MintAI if available

Enabling Paid Tier

When you need it:

Consistently hit daily limits
Need higher rate limits
Commercial/professional use
Don't want interruptions

How to enable:

Visit Google Cloud Console
Select your project (same as API key)
Billing → Enable billing
Add payment method

No changes needed in Astronox - same API key works

Pricing (Paid Tier)

As of December 2025:

Flash & Flash Lite

Input:

$0.075 per 1 million tokens
~$0.0001 per typical message

Output:

$0.30 per 1 million tokens
~$0.0003 per typical response

Total per request:

Simple query: ~$0.0005 (half a cent)
Average request: ~$0.001 (one cent)
Complex request: ~$0.003 (three cents)

Gemini Pro

Input:

$1.25 per 1 million tokens
~$0.0015 per typical message

Output:

$5.00 per 1 million tokens
~$0.005 per typical response

Total per request:

Simple query: ~$0.002
Average request: ~$0.005 (half a cent)
Complex request: ~$0.01 (one cent)

Cost Estimations

Light use (50 requests/day):

Flash: ~$1.50/month
Pro: ~$7.50/month

Regular use (200 requests/day):

Flash: ~$6/month
Pro: ~$30/month

Heavy use (500 requests/day):

Flash: ~$15/month
Pro: ~$75/month

Note: Actual costs vary based on:

Message complexity
Response length
Conversation context
Tool usage

Optimizing Costs

Use Appropriate Model

For simple tasks:

✅ "List files in Downloads" → Flash Lite (cheaper)
❌ Don't use Pro for simple queries

For complex tasks:

✅ "Analyze entire codebase" → Pro (worth the cost)
❌ Don't use Flash Lite for complex analysis

Batch Requests

Inefficient:

"Find PDFs"
"Now find images"
"Now find videos"

3 requests = 3× cost

Efficient:

"Find all PDFs, images, and videos"

1 request = 1× cost

Start Fresh Conversations

Long conversations use more tokens:

Every message includes previous context
100-message conversation = huge context
Costs increase with conversation length

Solution:

Start new chat when switching topics
Refresh conversation every 50-100 messages
Use memory instead of long context

Use Memory Wisely

Instead of:

Every day: "Remember I prefer Python..."

Many requests = high cost

Do once:

"Remember I prefer Python"

Then AI uses memory (no repeated cost)

MintAI (Pro)

Subscription Model

How it works:

Fixed monthly/annual cost
No per-token billing
Unlimited requests (fair use policy)

Cost:

Check official MintAI website
Typically: $X/month or $Y/year

Fair Use Policy

Unlimited, but reasonable:

Normal usage: No restrictions
Excessive usage: May be throttled
Abuse prevention: Limits if detected

What's considered excessive:

Thousands of requests per hour
Obvious automation/botting
Sharing account

What's normal:

Hundreds of requests per day ✅
Heavy development work ✅
Regular personal/professional use ✅

When Subscription Makes Sense

Choose Pro if:

Heavy daily usage (>200 requests/day)
Prefer predictable costs
Want premium models
No per-use billing stress

Stick with Gemini free tier if:

Light usage (<100 requests/day)
Occasional use
No budget for subscription
Free tier sufficient

Cost comparison:

Gemini Paid (200 req/day):
~$6-30/month depending on model

Pro (unlimited):
Fixed price (check website)

Break-even depends on your usage

Token Limits

What Are Tokens?

Tokens ≈ words/characters:

1 token ≈ 0.75 words (English)
1,000 tokens ≈ 750 words
100k tokens ≈ 75k words ≈ 150 pages

What counts toward limit:

Your messages
AI responses
Conversation history
System prompt (~2k tokens)
Attached file content

Model Context Limits

Flash & Flash Lite:

128k tokens (~96k words)
About 200 pages of text
Conversation history + current message

Gemini Pro:

2M tokens (~1.5M words)
Entire novels + large codebases
Massive conversation history

Devstral 2 (MintAI):

128k tokens (~96k words)
Similar to Flash

What Happens at Limit

Automatic truncation:

Conversation gets too long
↓
Oldest messages removed from context
↓
AI still responds, but may forget early context
↓
Start new conversation if context important

You won't see errors - handled automatically

Best practice:

Start new chat every 50-100 messages
Or when switching topics
Critical context → use memory system

Monitoring Usage

In Google AI Studio

View statistics:

Visit Google AI Studio
Select your project
View dashboard:
- Requests today
- Requests this month
- Token usage
- Cost (if billing enabled)

In Astronox (Future)

Coming soon:

Settings → API Keys → View Usage

Today:
• Requests: 47 / 1,500 (3%)
• Estimated cost: $0.12

This month:
• Requests: 1,247
• Total cost: $3.42

Rate Limit Strategies

Heavy Usage Patterns

If you hit limits regularly:

Option 1: Enable billing

Removes daily limits
Pay per use
Simple solution

Option 2: Use multiple models

Flash has separate quota from Pro
Flash Lite has separate quota from Flash
Switch between them

Option 3: Optimize usage

Batch requests
Start fresh conversations
Use memory system

Option 4: Subscription

Pro for predictable cost
Unlimited Devstral 2 usage

Mixing Free & Paid

Strategy:

Free tier: Simple daily tasks (Flash Lite/Flash)
→ Stay within 1,500/day

Paid tier (when needed): Complex analysis (Pro)
→ Pay only for special tasks

Result: Minimize costs, maximize value

Billing Alerts

Set Up Notifications

In Google Cloud Console:

Billing → Budgets & alerts
Set monthly budget (e.g., $10)
Alert thresholds:
- 50% of budget
- 90% of budget
- 100% of budget

Receive emails when thresholds hit

Prevents surprise bills!

Cost Control Tips

✅ Use Flash by Default

Flash is the sweet spot:

Fast responses
Good quality
Low cost
High daily limits

Reserve Pro for when truly needed

✅ Clear Conversations

Weekly: Review and delete old conversations
Benefit: Reduces storage, keeps app fast

✅ Don't Over-Prompt

Wasteful:

"Please help me find files"
"I would appreciate it if you could..."

Efficient:

"Find files in Downloads"

Same result, fewer tokens

✅ Use File Paths

Instead of attaching:

"Analyze the image at ~/Desktop/screenshot.png"

Benefits:

No upload needed
Faster
Doesn't count toward attachment limit

❌ Avoid Long Conversations

Every message includes history:

Message 1: 100 tokens
Message 50: 5,000 tokens (includes all previous)
Message 100: 10,000 tokens

Cost increases exponentially!

Fair Usage

What's Reasonable

Normal patterns:

Work hours: 20-50 requests/hour ✅
Peak usage: 100 requests/hour ✅
Daily total: 500-1000 requests ✅

Concerning patterns:

Constant automated requests 🔴
Identical repeated requests 🔴
Account sharing 🔴

API Terms of Service

Review Google's terms:

Fair use expectations
Prohibited uses
Rate limit policies
Billing terms

Link: https://ai.google.dev/terms

Summary

Free tier (Gemini):

60 req/min, 1,500 req/day (Flash)
Generous for most users
Enable billing if needed

Paid tier:

~$0.001 per request (Flash)
~$0.005 per request (Pro)
Predictable, affordable

Pro:

Fixed subscription
Unlimited Devstral 2
Best for heavy users

Optimize:

Use appropriate model
Batch requests
Fresh conversations
Memory instead of context

Next: Understand Accuracy & Reliability expectations.

PreviousLimitations & Known Issues NextAccuracy & Reliability