> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Benchmarking

> Load test and benchmark your TTS On-Premises deployment

A comprehensive load testing tool for TTS On-Premises that measures performance metrics including latency, throughput, and streaming characteristics across different QPS (Queries Per Second) loads.

## Overview

The tool simulates realistic TTS workloads by sending requests at specified rates with configurable burstiness patterns. It measures:

* End-to-end latency
* Audio generation latency per second
* Streaming metrics (first chunk, 4th chunk, average chunk latencies)
* Request success rates
* Server performance under different load conditions

## Quick start

```bash theme={"system"}
# Install the load test tool
pip install -e .

# Basic load test with streaming
python load-test.main \
    --host http://localhost:8081 \
    --stream \
    --min-qps 1.0 \
    --max-qps 7.0 \
    --qps-step 2.0 \
    --number-of-samples 300
```

## Parameters

### Required

| Parameter | Description                                                         | Example                 |
| --------- | ------------------------------------------------------------------- | ----------------------- |
| `--host`  | Base address of the On-Premises TTS server (endpoint auto-appended) | `http://localhost:8081` |

### Load configuration

| Parameter             | Default | Description                                                                   |
| --------------------- | ------: | ----------------------------------------------------------------------------- |
| `--min-qps`           |   `1.0` | Minimum requests per second to test                                           |
| `--max-qps`           |  `10.0` | Maximum requests per second to test                                           |
| `--qps-step`          |   `2.0` | Step size for QPS increments                                                  |
| `--number-of-samples` |     `1` | Total number of texts to synthesize per QPS level                             |
| `--burstiness`        |   `1.0` | Request timing pattern (`1.0` = Poisson, `< 1.0` = bursty, `> 1.0` = uniform) |

### TTS configuration

| Parameter             |                                      Default | Description                                                                                |
| --------------------- | -------------------------------------------: | ------------------------------------------------------------------------------------------ |
| `--stream`            |                                      `False` | Use streaming synthesis (`/SynthesizeSpeechStream`) vs non-streaming (`/SynthesizeSpeech`) |
| `--max_tokens`        |                                        `400` | Maximum tokens to synthesize (\~8s audio at 50 tokens/s)                                   |
| `--voice-ids`         |                         `["Olivia", "Remy"]` | Voice IDs to use (can specify multiple)                                                    |
| `--model_id`          |                                       `None` | Model ID for TTS synthesis (optional)                                                      |
| `--text_samples_file` | `scripts/tts_load_testing/text_samples.json` | File containing text samples                                                               |

### Output and analysis

| Parameter          |        Default | Description                                              |
| ------------------ | -------------: | -------------------------------------------------------- |
| `--benchmark_name` | auto-generated | Name for the benchmark run (affects output files)        |
| `--plot_only`      |        `False` | Only generate plots from existing results (skip testing) |
| `--verbose`        |        `False` | Enable verbose output for debugging                      |

## Examples

### Streaming vs non-streaming comparison

```bash theme={"system"}
# Non-streaming test
python load-test.main \
    --host http://localhost:8081 \
    --min-qps 10.0 \
    --max-qps 50.0 \
    --qps-step 10.0 \
    --number-of-samples 500 \
    --benchmark_name non-streaming-test

# Streaming test
python load-test.main \
    --host http://localhost:8081 \
    --stream \
    --min-qps 10.0 \
    --max-qps 50.0 \
    --qps-step 10.0 \
    --number-of-samples 500 \
    --benchmark_name streaming-test
```

### Plot-only mode

Generate plots from existing results without re-running tests:

```bash theme={"system"}
./scripts/tts-load-test \
    --plot_only \
    --benchmark_name prod-stress-test
```

## Understanding results

The tool generates comprehensive metrics for each QPS level.

### Latency metrics

* **E2E Latency:** Complete request-response time
* **Audio Generation Latency:** Time per second of generated audio
* **First Chunk Latency:** Time to first audio chunk (streaming only)
* **4th Chunk Latency:** Time to 4th audio chunk (streaming only)
* **Average Chunk Latency:** Mean time between chunks (streaming only)

### Percentiles

Results include P50, P90, P95, and P99 percentiles for all latency metrics.

### Output files

Results are saved in `benchmark_result/{benchmark_name}/`:

* `result.json` — Raw performance data
* `{benchmark_name}_*.png` — Performance charts

## Burstiness parameter

The burstiness parameter controls request timing distribution:

| Value   | Behavior                                |
| ------- | --------------------------------------- |
| `1.0`   | Poisson process (natural randomness)    |
| `< 1.0` | More bursty (requests come in clusters) |
| `> 1.0` | More uniform (evenly spaced requests)   |

## Performance tips

1. **Start small** — Begin with low QPS and small sample sizes
2. **Use appropriate text samples** — Match your production text length distribution
3. **Monitor server resources** — Watch CPU, memory, and network during tests
4. **Consider burstiness** — Real-world traffic is often bursty (try 0.7–0.9)
5. **Test both modes** — Compare streaming vs non-streaming for your use case

## Troubleshooting

### Common issues

| Issue                 | Solution                                       |
| --------------------- | ---------------------------------------------- |
| Connection errors     | Verify server address and network connectivity |
| Authentication errors | Set `INWORLD_API_KEY` for external APIs        |
| High latency          | Check server load and network conditions       |
| Memory issues         | Reduce `number-of-samples` for high QPS tests  |

### Debug mode

Use the `--verbose` flag for detailed request/response logging:

```bash theme={"system"}
./scripts/tts-load-test --verbose --host ... # other params
```

## Architecture

The tool uses:

* **Async/await:** Efficient concurrent request handling
* **Pausable timers:** Accurate server-only timing measurements
* **Multiple protocols:** gRPC, HTTP REST API support
* **Configurable clients:** Pluggable client architecture
* **Real-time progress:** Live progress bars and status updates