Rate Limits & Costs
Understanding API usage, quotas, and pricing.
Overview
Astronox uses cloud AI models, which have:
- Rate limits (requests per minute/day)
- Token usage (affects cost if on paid tier)
- Context limits (max conversation size)
Google Gemini
Free Tier Limits
Flash & Flash Lite
Rate Limits:
- 60 requests per minute
- 1,500 requests per day
- Resets: Minute limit resets every 60s, daily limit at midnight PST
What counts as a request:
- Each message you send
- Follow-ups count separately
- Tool calls don't count extra
Typical usage examples:
Light user (10-20 messages/day):
→ Never hits limits ✅
Regular user (50-100 messages/day):
→ Well within limits ✅
Power user (200-500 messages/day):
→ May hit daily limit on busy days ⚠️
Heavy user (500+ messages/day):
→ Will hit daily limit, need paid tier 🔴
Gemini Pro
Rate Limits:
- 10 requests per minute
- 500 requests per day
Same counting rules as Flash
Note: Lower limits due to higher model costs
What Happens When You Hit Limit
Per-Minute Limit
Error message:
⚠️ Rate limit exceeded
Too many requests in the last minute.
Please wait 45 seconds before retrying.
[Retry in 45s]
What to do:
- Wait for countdown
- Or pause for 1 minute
- Limit resets automatically
Prevention:
- Batch related questions in one message
- Avoid rapid-fire messages
- Switch models if one is limited
Daily Limit
Error message:
⚠️ Daily quota exceeded
You've used 1,500 requests today.
Quota resets at midnight PST (8 hours).
Options:
• Wait for reset
• Enable billing for unlimited requests
• Switch to different model (separate quota)
[Enable Billing] [Learn More]
What to do:
- Wait until midnight PST
- Enable billing (see below)
- Use MintAI if available
Enabling Paid Tier
When you need it:
- Consistently hit daily limits
- Need higher rate limits
- Commercial/professional use
- Don't want interruptions
How to enable:
- Visit Google Cloud Console
- Select your project (same as API key)
- Billing → Enable billing
- Add payment method
No changes needed in Astronox - same API key works
Pricing (Paid Tier)
As of December 2025:
Flash & Flash Lite
Input:
- $0.075 per 1 million tokens
- ~$0.0001 per typical message
Output:
- $0.30 per 1 million tokens
- ~$0.0003 per typical response
Total per request:
- Simple query: ~$0.0005 (half a cent)
- Average request: ~$0.001 (one cent)
- Complex request: ~$0.003 (three cents)
Gemini Pro
Input:
- $1.25 per 1 million tokens
- ~$0.0015 per typical message
Output:
- $5.00 per 1 million tokens
- ~$0.005 per typical response
Total per request:
- Simple query: ~$0.002
- Average request: ~$0.005 (half a cent)
- Complex request: ~$0.01 (one cent)
Cost Estimations
Light use (50 requests/day):
- Flash: ~$1.50/month
- Pro: ~$7.50/month
Regular use (200 requests/day):
- Flash: ~$6/month
- Pro: ~$30/month
Heavy use (500 requests/day):
- Flash: ~$15/month
- Pro: ~$75/month
Note: Actual costs vary based on:
- Message complexity
- Response length
- Conversation context
- Tool usage
Optimizing Costs
Use Appropriate Model
For simple tasks:
✅ "List files in Downloads" → Flash Lite (cheaper)
❌ Don't use Pro for simple queries
For complex tasks:
✅ "Analyze entire codebase" → Pro (worth the cost)
❌ Don't use Flash Lite for complex analysis
Batch Requests
Inefficient:
"Find PDFs"
"Now find images"
"Now find videos"
3 requests = 3× cost
Efficient:
"Find all PDFs, images, and videos"
1 request = 1× cost
Start Fresh Conversations
Long conversations use more tokens:
- Every message includes previous context
- 100-message conversation = huge context
- Costs increase with conversation length
Solution:
- Start new chat when switching topics
- Refresh conversation every 50-100 messages
- Use memory instead of long context
Use Memory Wisely
Instead of:
Every day: "Remember I prefer Python..."
Many requests = high cost
Do once:
"Remember I prefer Python"
Then AI uses memory (no repeated cost)
MintAI (Pro)
Subscription Model
How it works:
- Fixed monthly/annual cost
- No per-token billing
- Unlimited requests (fair use policy)
Cost:
- Check official MintAI website
- Typically: $X/month or $Y/year
Fair Use Policy
Unlimited, but reasonable:
- Normal usage: No restrictions
- Excessive usage: May be throttled
- Abuse prevention: Limits if detected
What's considered excessive:
- Thousands of requests per hour
- Obvious automation/botting
- Sharing account
What's normal:
- Hundreds of requests per day ✅
- Heavy development work ✅
- Regular personal/professional use ✅
When Subscription Makes Sense
Choose Pro if:
- Heavy daily usage (>200 requests/day)
- Prefer predictable costs
- Want premium models
- No per-use billing stress
Stick with Gemini free tier if:
- Light usage (<100 requests/day)
- Occasional use
- No budget for subscription
- Free tier sufficient
Cost comparison:
Gemini Paid (200 req/day):
~$6-30/month depending on model
Pro (unlimited):
Fixed price (check website)
Break-even depends on your usage
Token Limits
What Are Tokens?
Tokens ≈ words/characters:
- 1 token ≈ 0.75 words (English)
- 1,000 tokens ≈ 750 words
- 100k tokens ≈ 75k words ≈ 150 pages
What counts toward limit:
- Your messages
- AI responses
- Conversation history
- System prompt (~2k tokens)
- Attached file content
Model Context Limits
Flash & Flash Lite:
- 128k tokens (~96k words)
- About 200 pages of text
- Conversation history + current message
Gemini Pro:
- 2M tokens (~1.5M words)
- Entire novels + large codebases
- Massive conversation history
Devstral 2 (MintAI):
- 128k tokens (~96k words)
- Similar to Flash
What Happens at Limit
Automatic truncation:
Conversation gets too long
↓
Oldest messages removed from context
↓
AI still responds, but may forget early context
↓
Start new conversation if context important
You won't see errors - handled automatically
Best practice:
- Start new chat every 50-100 messages
- Or when switching topics
- Critical context → use memory system
Monitoring Usage
In Google AI Studio
View statistics:
- Visit Google AI Studio
- Select your project
- View dashboard:
- Requests today
- Requests this month
- Token usage
- Cost (if billing enabled)
In Astronox (Future)
Coming soon:
Settings → API Keys → View Usage
Today:
• Requests: 47 / 1,500 (3%)
• Estimated cost: $0.12
This month:
• Requests: 1,247
• Total cost: $3.42
Rate Limit Strategies
Heavy Usage Patterns
If you hit limits regularly:
Option 1: Enable billing
- Removes daily limits
- Pay per use
- Simple solution
Option 2: Use multiple models
- Flash has separate quota from Pro
- Flash Lite has separate quota from Flash
- Switch between them
Option 3: Optimize usage
- Batch requests
- Start fresh conversations
- Use memory system
Option 4: Subscription
- Pro for predictable cost
- Unlimited Devstral 2 usage
Mixing Free & Paid
Strategy:
Free tier: Simple daily tasks (Flash Lite/Flash)
→ Stay within 1,500/day
Paid tier (when needed): Complex analysis (Pro)
→ Pay only for special tasks
Result: Minimize costs, maximize value
Billing Alerts
Set Up Notifications
In Google Cloud Console:
- Billing → Budgets & alerts
- Set monthly budget (e.g., $10)
- Alert thresholds:
- 50% of budget
- 90% of budget
- 100% of budget
Receive emails when thresholds hit
Prevents surprise bills!
Cost Control Tips
✅ Use Flash by Default
Flash is the sweet spot:
- Fast responses
- Good quality
- Low cost
- High daily limits
Reserve Pro for when truly needed
✅ Clear Conversations
Weekly: Review and delete old conversations
Benefit: Reduces storage, keeps app fast
✅ Don't Over-Prompt
Wasteful:
"Please help me find files"
"I would appreciate it if you could..."
Efficient:
"Find files in Downloads"
Same result, fewer tokens
✅ Use File Paths
Instead of attaching:
"Analyze the image at ~/Desktop/screenshot.png"
Benefits:
- No upload needed
- Faster
- Doesn't count toward attachment limit
❌ Avoid Long Conversations
Every message includes history:
Message 1: 100 tokens
Message 50: 5,000 tokens (includes all previous)
Message 100: 10,000 tokens
Cost increases exponentially!
Fair Usage
What's Reasonable
Normal patterns:
- Work hours: 20-50 requests/hour ✅
- Peak usage: 100 requests/hour ✅
- Daily total: 500-1000 requests ✅
Concerning patterns:
- Constant automated requests 🔴
- Identical repeated requests 🔴
- Account sharing 🔴
API Terms of Service
Review Google's terms:
- Fair use expectations
- Prohibited uses
- Rate limit policies
- Billing terms
Link: https://ai.google.dev/terms
Summary
Free tier (Gemini):
- 60 req/min, 1,500 req/day (Flash)
- Generous for most users
- Enable billing if needed
Paid tier:
- ~$0.001 per request (Flash)
- ~$0.005 per request (Pro)
- Predictable, affordable
Pro:
- Fixed subscription
- Unlimited Devstral 2
- Best for heavy users
Optimize:
- Use appropriate model
- Batch requests
- Fresh conversations
- Memory instead of context
Next: Understand Accuracy & Reliability expectations.