How long should your experiment run? This guide covers the factors that determine optimal experiment duration, how PressPlay's duration settings work, and strategies for timing your tests effectively.
PressPlay uses two duration parameters to control experiment length:
The minimum number of days an experiment must run before results are considered valid. This ensures sufficient data collection for statistical significance.
Default: 7-14 days for most asset types
The maximum number of days an experiment can run before automatically concluding. This prevents experiments from running indefinitely when results are inconclusive.
Default: 30-90 days depending on asset type and traffic levels
When an experiment starts, PressPlay tracks elapsed days and evaluates results against your configured settings:
Days 1-Minimum: Experiment runs regardless of results. No early termination.
Minimum-Maximum: System checks for statistical significance. Can conclude early if clear winner emerges or auto-kill conditions trigger.
Maximum Day: Experiment automatically concludes, applying your configured winner-selection rules.
When enabled, PressPlay automatically adjusts the end date if the experiment reaches statistical significance before the maximum duration. This optimizes your testing queue by starting the next experiment sooner.
Higher traffic apps reach statistical significance faster:
High traffic (10,000+ daily impressions): 7-14 days minimum
Medium traffic (1,000-10,000 daily impressions): 14-21 days minimum
Low traffic (less than 1,000 daily impressions): 21-30 days minimum
Smaller differences between variants require longer tests to detect reliably. Your Minimum Detectable Effect (MDE) setting influences this:
Large effects (10%+ MDE): Shorter durations acceptable
Medium effects (5-10% MDE): Standard durations recommended
Small effects (less than 5% MDE): Longer durations necessary
Different assets have different impact timelines:
Icons: Immediate impact, shorter tests (7-14 days)
Screenshots: Quick evaluation, moderate tests (10-21 days)
Descriptions: Slower impact, longer tests (14-30 days)
Feature Graphics: Visual impact, moderate tests (10-21 days)
Ensure experiments run long enough to capture representative traffic patterns:
Include both weekdays and weekends (minimum 7 days)
Consider monthly cycles (paycheck timing, recurring events)
Account for irregular patterns in your specific category
Higher confidence requirements need longer tests:
90% confidence: Faster results, higher false positive risk
95% confidence: Standard balance of speed and reliability
99% confidence: Longer duration, very low false positive risk
PressPlay applies sensible defaults based on your app's traffic and asset type. For most apps, these defaults work well without modification.
Adjust duration settings when creating or editing an experiment:
Navigate to experiment settings
Locate "Duration Settings" section
Set minimum duration days (1-90)
Set maximum duration days (must be greater than or equal to minimum)
Toggle "Auto-adjust end date" as desired
Save settings
Consider adjusting defaults for:
High-stakes experiments: Increase minimum to ensure reliability
Rapid iteration needs: Decrease maximum for faster learning cycles
Low-traffic apps: Increase both minimum and maximum
Seasonal timing: Adjust to align with specific events or periods
Use PressPlay's default duration settings for your first few experiments. This establishes baseline understanding of your app's testing dynamics before customization.
Track how long experiments actually run before reaching significance. Use this data to refine your duration settings over time.
Shorter experiments increase testing velocity but risk false positives. Longer experiments improve reliability but slow learning. Find the right balance for your business needs.
Always set minimum duration to at least 7 days. This ensures your results include both weekend and weekday traffic, which often behave differently.
Longer experiments mean slower queue progression. If you have many queued experiments, consider slightly shorter maximum durations to maintain testing velocity.
PressPlay can automatically stop underperforming experiments early based on your auto-kill settings:
Early Kill Min Installs: Minimum data points before auto-kill can trigger
Early Kill CVR Decrease: Performance threshold that triggers auto-kill
Early termination protects you from prolonged negative impact while still respecting minimum duration for statistical validity.
You can manually stop an experiment at any time by changing its status to STOPPING. This is useful when:
External factors invalidate the test
A variant causes unexpected issues
Business priorities shift dramatically
Duration is one of several factors in achieving statistical significance:
Longer duration = more impressions = larger sample size. Your required sample size depends on:
Traffic volume
Effect size (MDE)
Confidence level
Baseline conversion rate
Experiments can conclude early if they achieve statistical significance after the minimum duration. Enable "Auto-adjust end date" to take advantage of this optimization.
If maximum duration is reached without clear significance, PressPlay applies your configured winner-selection rules (typically applies the baseline variant or the variant with the best performance metric).
Minimum: 7 days
Maximum: 21 days
Rationale: Icons have immediate visual impact; 7 days captures weekly patterns; 21 days provides ample time for significance
Minimum: 14 days
Maximum: 30 days
Rationale: Screenshots require evaluation time; longer minimum ensures users see changes; 30 days accommodates lower effect sizes
Minimum: 14 days
Maximum: 45 days
Rationale: Descriptions influence consideration; longer evaluation period; extended maximum for subtle effects
Minimum: 21 days
Maximum: 60 days
Rationale: Longer duration needed to accumulate sufficient data; extended maximum accounts for slow significance achievement
Your average experiment duration directly determines testing velocity:
Average duration 14 days = ~26 experiments per year
Average duration 21 days = ~17 experiments per year
Average duration 30 days = ~12 experiments per year
Strategies for maintaining rapid testing without sacrificing quality:
Enable auto-adjust end date
Use appropriate MDE settings (larger MDE = faster significance)
Implement auto-kill to stop clear losers early
Ensure experiments are READY before queue position arrives
Test high-impact hypotheses that justify longer durations
Experiment Settings Guide - Configure MDE, confidence intervals, and other parameters
Understanding the Backlog - How duration affects your experiment queue
Interpreting Results - Analyze experiments that reach full duration
Auto-Kill Settings - Configure early termination rules