Skip to main content

Test Data Generation

Triggers

Overview

Test data generation jobs are managed through a queue-based workflow. When a user requests a new generation from the UI, the system creates a corresponding database entry. The status of that entry indicates where the job is in the processing pipeline.

Job statuses

  • Queued: The job is waiting to be picked up for processing.
  • Processing: The job is currently running.
  • Successful: The job completed without errors.
  • Failed: The job did not complete successfully.

Scheduling and execution

A Kubernetes scheduler periodically checks for pending jobs and starts processing jobs that are queued. By default, the scheduler runs every 5 minutes, and this interval is configurable by editing datagen.cronjob.schedule in values.yaml. The maximum number of jobs that can run concurrently is controlled by TEST_DATA_GEN_MAX_REQUESTS_TO_PROCESS in the system configuration.

Each job runs in its own pod. During execution, the pod does not connect to any external database, and only uses the provided metadata for generation. It connects to AWS S3, which is used to upload the generated output files.

Timeouts

The scheduler also monitors running jobs to ensure they do not exceed the configured processing time limit. Jobs that run longer than this limit are marked as Failed. The configured time limit can be changed via the TEST_DATA_GEN_TTL_SECONDS environment variable.