Sampling Plan and Data Collection Strategy

1. Purpose and Scope

This article defines how sampling and data collection are designed and executed during PPQ to generate representative, decision-quality data.

Focus is on practical implementation, including:

how sampling points are selected
how sample size and frequency are determined
how data streams are aligned and integrated
how variability is intentionally challenged and detected

2. Role within PPQ

Sampling and data collection operationalize the PPQ strategy. They must:

generate data capable of confirming CPP control
demonstrate batch uniformity and consistency
support statistical evaluation and final conclusions

This is not data gathering for completeness. It is targeted data generation to confirm process performance.

3. Sampling Plan Design

Sampling design is based on identifying where and when variability is most likely to occur.

Flow diagram showing how process understanding and risk assessment define sampling locations, timing, sample size, and frequency.

3.1 Selection of Sampling Locations

Sampling locations are selected based on where variability is most likely to occur, not convenience. Typical selection logic includes:

equipment geometry
dead zones, mixing boundaries, top vs bottom, inlet vs outlet
process sequence
early vs late fill units, first vs last material processed
material flow behavior
segregation risk, hold-up areas, transfer points
historical or known risk areas
locations previously associated with variability

Sampling must cover both:

representative locations
worst-case locations

Sampling locations should reflect equipment geometry and areas where variability is most likely to occur.

Diagram of a process tank showing sampling points at multiple vertical locations and outlet, selected based on equipment geometry and variability risk.

3.2 Selection of Sampling Timing

Sampling timing is aligned with process dynamics, not fixed intervals. Sampling points are selected at:

start of operation
where system stabilization occurs
steady-state operation
where process is expected to be consistent
end of operation
where depletion or drift may occur
critical transitions
changes in CPPs, phase changes, hold times

Timing must capture conditions where process behavior can change.

3.3 Determination of Sample Size

Sample size is based on the ability to detect variability, not arbitrary numbers. Considerations include:

batch size and number of units
expected variability
criticality of the CQA
required confidence in conclusions

Typical approach:

increase sampling where variability risk is high
reduce sampling where process is well understood

Sampling must be sufficient to:

detect non-uniformity
support statistical evaluation
confirm consistency across the batch

3.4 Sampling Frequency

Sampling frequency is driven by process speed and sensitivity.

fast or dynamic processes → higher frequency
stable processes → lower frequency

Frequency must ensure that transient deviations are not missed.

4. Data Collection Strategy

4.1 Alignment of Data Streams

Data must be structured so that relationships between CPPs, IPCs, and CQAs can be evaluated. This requires:

time alignment of data points
linkage of samples to specific process conditions
consistent identification of batch, location, and stage

Each CQA result must be traceable to:

corresponding CPP values
process conditions at the time of sampling

4.2 CPP Data Collection

CPP data must reflect actual process behavior, not snapshots.

Approach:

continuous monitoring where possible
defined recording intervals where continuous data is not available
capture of variability within the acceptable range

Data must allow assessment of:

stability
excursions
trends over time

4.3 In-Process Control Data

IPC data is used to confirm that intermediate stages perform as expected.

Collection must:

align with critical process steps
reflect decision points in the process
detect early signs of deviation

IPC results should be evaluated in context of CPP behavior.

4.4 Final Product Data

CQA data confirms final product quality.

Collection must:

represent full batch distribution
align with sampling plan
allow comparison across batches

Final product data is the ultimate confirmation of process performance.

4.5 Supporting Data

Supporting data is collected where it impacts process performance.

Examples:

environmental conditions
utility performance
system status

Only relevant data should be included. Avoid unnecessary data collection.

5. Data Integration and Traceability

All data must be integrated into a single evaluable dataset.

Requirements:

linkage of CPP, IPC, and CQA data
identification of batch, time, and location
ability to reconstruct process conditions for any sample

Data must support:

cross-parameter analysis
identification of relationships
evaluation of consistency

6. Link to Risk and Control Strategy

Sampling and data collection must directly reflect:

identified CPPs
known variability sources
defined control strategy

High-risk parameters require:

more frequent sampling
broader coverage
tighter linkage to CQAs

7. Execution

Execution must follow the defined plan without adjustment.

no reduction in sampling
no substitution of locations or timing
no omission of data points

Any deviation must be:

documented
justified
evaluated for impact

8. Summary

Sampling and data collection during PPQ are designed to actively detect variability and confirm process control.

They are based on:

targeted selection of locations and timing
sufficient sample size and frequency
structured data collection and integration

This ensures that PPQ conclusions are based on complete, representative, and traceable evidence of process performance.