# Gemini workflows (April 2026)

**Last updated:** 2026-04-10

Single reference for **which models the repo uses**, **how to override them**, **cost levers**, and **where code lives**. Official SSOT for lifecycle and pricing: [Gemini API deprecations](https://ai.google.dev/gemini-api/docs/deprecations) and [Gemini API pricing](https://ai.google.dev/gemini-api/docs/pricing).

---

## Pricing estimates (order of magnitude)

**Always reconcile with the live page:** rates, tiers (Standard / Batch / Flex / Priority), and “thinking” output billing change. Figures below use **Paid tier → Standard** from [Gemini API pricing](https://ai.google.dev/gemini-api/docs/pricing) as scraped 2026-04-10; **your invoice is authoritative**.

### Reference rates (Standard, USD per 1M tokens unless noted)

| Model | Input (text / image / video) | Output (incl. thinking where applicable) | Image generation output |
|-------|-------------------------------|--------------------------------------------|-------------------------|
| `gemini-2.5-flash` | $0.30 / 1M | $2.50 / 1M | — |
| `gemini-2.5-flash-lite` | $0.10 / 1M | $0.40 / 1M | — |
| `gemini-2.5-flash-image` | $0.30 / 1M (text + image *input*) | Priced **per generated image** (not token-only) | **~$0.039 / image** (doc: 1024×1024 ≈ 1290 output tokens @ $30/1M image output tokens) |

**Batch API** (offline jobs): same page lists roughly **half** the Standard per-1M rates for several models — use for large backfills, not live FAQ flows.

**Rough cost formula (text + vision JSON, no grounding):**

`cost_usd ≈ (input_tokens / 1e6) × price_in + (output_tokens / 1e6) × price_out`

Token counts are **approximate** (prompt length, context excerpt, retries, and **thinking tokens** on 2.5 Flash increase output billed).

### Per workflow (this repo)

Assumptions are **indicative** so you can compare processes; multiply by volume for budgets.

| Process / tool | Typical model | Ballpark tokens per call | Order-of-magnitude cost (Standard) |
|----------------|---------------|--------------------------|-------------------------------------|
| **Blog FAQ answer** (`generate-faq-answers-optimized.php`) | 2.5 Flash → fallback Lite | ~1–3k in, ~0.2–1k out (1024 cap then tiers); retries add calls | **~$0.001–0.004** per answer |
| **Product page FAQ answer** (`generate-product-faq-answers.php`) | 2.5 Flash | ~1–2k in, ~0.2–1k out | **~$0.001–0.003** per answer |
| **Template AI block** (`template-content-gemini-generator.php`) | 2.5 Flash | ~1.5–3k in, ~0.4–2k out (longer HTML) | **~$0.002–0.008** per block |
| **Product Updates SEO meta** (`produkt-updates-seo-generate.php`) | 2.5 Flash | ~0.5–2k in, ≤200 out | **~$0.0003–0.001** per request |
| **Firmennamen API** (`generate-company-names.php`) | Default **Flash-Lite** (`FIRMENNAMEN_MODEL`); rate limits + CORS + industry allowlist; **1–10 names** per request | ~0.5–1.5k in, **~2k–3.5k** out (scaled by `count` ≤10) | **~$0.002–0.015** on Flash; **~$0.001–0.002** on Flash-Lite (same tokens, lower rates) |
| **Screenshot / asset analysis** (nano, mobile, partner Python scripts) | 2.5 Flash | **Image + prompt:** often ~2–5k+ input-equivalent; **~0.5–2k** JSON out (see script `maxOutputTokens`) | **~$0.002–0.015** per image per pass |
| **Multi-pass Nano analysis** (`analyze-nano-screenshots.py`) | 2.5 Flash | **5 passes ×** per screen | **~5×** single-pass estimate per PNG |
| **Blog featured image** (`generate-blog-featured-image.py`) | 2.5 Flash Image | Prompt text negligible vs image | **~$0.04 / image** (+ input pennies) |
| **OG image (Gemini path)** (`generate-og-image-gemini.py`) | 2.5 Flash Image | Same | **~$0.02–0.04 / image** depending on tier (Batch row is lower; check page) |
| **Quota / smoke tests** (`check-gemini-quota.py`, `test-gemini-models.php`) | Flash / Lite | Tiny | **Under $0.001** per run |

**Examples (2.5 Flash text, no retries):** rates are **per 1M tokens**; below, `1.5` and `0.4` are **thousands** of tokens (1.5k in, 400 out).

- `(1.5 × 0.30 + 0.4 × 2.50) / 1000` ≈ **$0.00145** per call.
- 12 FAQ answers × ~$0.002 ≈ **$0.02–0.05** per post (order of magnitude; thinking + retries increase this).

---

## Model policy (defaults)

| Use case | Default model | Env override |
|----------|---------------|--------------|
| German FAQ, template blocks, product FAQ, SEO meta (text) | `gemini-2.5-flash` | `GEMINI_TEXT_MODEL` |
| Second attempt / cheaper text tier | `gemini-2.5-flash-lite` | `GEMINI_TEXT_MODEL_FALLBACK` |
| Vision + JSON (screenshots, partner assets, mobile) | `gemini-2.5-flash` | `GEMINI_VISION_MODEL` |
| Blog featured + OG image generation | `gemini-2.5-flash-image` | `GEMINI_IMAGE_MODEL` |

**PHP** reads the same env vars via `v2/config/gemini-models.php` (`ordio_gemini_text_model_primary()`, etc.). **Python** vision scripts use `GEMINI_VISION_MODEL` with default `gemini-2.5-flash`. **Do not** use `gemini-2.0-flash` in new code — Google deprecated it with shutdown **2026-06-01**; migrate to 2.5 Flash or Flash-Lite per deprecations page.

**Thinking tokens:** On 2.5/3.x Flash families, “thinking” may be billed as output; keep `maxOutputTokens` tight where answers are short (FAQ scripts use 1024 initial tier with escalation).

---

## Central config

- `v2/config/gemini-models.php` — text primary/fallback helpers, vision/image defaults, `ordio_gemini_generate_content_url()`.
- **API keys (two environments):** `GEMINI_LOCAL_API_KEY` + legacy local `GEMINI_API_KEY` for dev; **`GEMINI_API_KEY` only** on production (`ordio_gemini_is_production_context()` in `v2/config/gemini-environment.php`). Docs: [`GEMINI_API_KEY_LOCAL.md`](GEMINI_API_KEY_LOCAL.md), [`GEMINI_API_KEY_PRODUCTION.md`](GEMINI_API_KEY_PRODUCTION.md). Resolver: `ordio_get_gemini_api_key()` in `v2/config/ai-faq-config.php`; Python: `v2/scripts/gemini_api_key.py`.

---

## Scripts (inventory)

| Area | Entry point | Notes |
|------|-------------|--------|
| Blog FAQ | `v2/scripts/blog/generate-faq-answers-optimized.php` | `curl_multi` concurrency; token tiers 1024→8192 on truncation |
| Product FAQ | `v2/scripts/product-pages/generate-product-faq-answers.php` | Uses `ordio_gemini_text_model_primary()` |
| Shared FAQ helper | `v2/helpers/faq-gemini-generator.php` | Same text defaults |
| Template AI blocks | `v2/helpers/template-content-gemini-generator.php`, `v2/config/ai-template-content-config.php` | Longer `maxOutputTokens` (2048) for 150–250 word blocks |
| Product Updates SEO | `v2/api/produkt-updates-seo-generate.php` | Short meta lines; `maxOutputTokens` 200 |
| Firmennamen API | `v2/api/generate-company-names.php` | `FIRMENNAMEN_MODEL` (default `flash-lite`); **1–10 names** per request; IP rate limits, CORS; legacy `flash-2.0` → Flash |
| Nano / mobile / partner vision | `v2/scripts/nano-ai/analyze-nano-screenshots.py`, `v2/scripts/mobile-app/analyze-screenshots-*.py`, `v2/scripts/partner/analyze-assets.py` | `--model` + `GEMINI_VISION_MODEL` |
| Quota check | `v2/scripts/nano-ai/check-gemini-quota.py` | `GEMINI_TEXT_MODEL` for probe call |
| Blog image | `v2/scripts/blog/generate-blog-featured-image.py` | `GEMINI_IMAGE_MODEL` |
| OG image | `v2/scripts/og-images/generate-og-image-gemini.py` | `GEMINI_IMAGE_MODEL` (see script docstring) |
| Model smoke test | `v2/scripts/tools/test-gemini-models.php` | Compares 2.5 Flash vs Flash-Lite only |

---

## Cost and quality levers

1. **Lower `maxOutputTokens` for short outputs** — Implemented for FAQ flows (1024 default with escalation). Template blocks keep higher limits.
2. **Concurrency (PHP)** — `generate-faq-answers-optimized.php --concurrency=N` (default 3); reduce if you see 429.
3. **Batch API** — ~50% vs standard on many models for **offline** bulk jobs; not for interactive single-post runs. See [Batch API](https://ai.google.dev/gemini-api/docs/batch-api).
4. **Context caching** — Useful when the **same long context** is sent many times (e.g. one post body × many FAQ rows). Larger implementation; see [Caching](https://ai.google.dev/gemini-api/docs/caching).
5. **Image models** — Priced per image as well as tokens on some tiers. Default `gemini-2.5-flash-image`; optional upgrade via `GEMINI_IMAGE_MODEL` (e.g. preview/pro image IDs) when marketing approves spend — verify current rows on the [pricing](https://ai.google.dev/gemini-api/docs/pricing) page.

---

## Vision: Flash vs Flash-Lite

Default vision model is **`gemini-2.5-flash`** for structured JSON quality on screenshots. To experiment with **`gemini-2.5-flash-lite`**, set `GEMINI_VISION_MODEL` and run your usual validation on a small fixture set before batch jobs.

---

## Verification

```bash
python3 v2/scripts/nano-ai/check-gemini-quota.py
php v2/scripts/tools/test-gemini-models.php
```

After prompt or model changes: spot-check FAQs (`validate-faq-quality` / `validate-faq-answers.py` as applicable) and 1–2 vision outputs.

---

## References

- [Rate limits](https://ai.google.dev/gemini-api/docs/rate-limits)
- [Batch API](https://ai.google.dev/gemini-api/docs/batch-api)
- [Context caching](https://ai.google.dev/gemini-api/docs/caching)