dc.description.abstract | To provide a convincing proof that a new method is better than the state of the art, computer graphics projects are often accompanied by user studies, in which a group of observers rank or rate results of several algorithms. Such user studies, known as subjective image quality assessment experiments, can be very time‐consuming and do not guarantee to produce conclusive results. This paper is intended to help design efficient and rigorous quality assessment experiments and emphasise the key aspects of the results analysis. To promote good standards of data analysis, we review the major methods for data analysis, such as establishing confidence intervals, statistical testing and retrospective power analysis. Two methods of visualising ranking results together with the meaningful information about the statistical and practical significance are explored. Finally, we compare four most prominent subjective quality assessment methods: single‐stimulus, double‐stimulus, forced‐choice pairwise comparison and similarity judgements. We conclude that the forced‐choice pairwise comparison method results in the smallest measurement variance and thus produces the most accurate results. This method is also the most time‐efficient, assuming a moderate number of compared conditions.To provide a convincing proof that a new method is better than the state‐of‐the‐art, computer graphics projects are often accompanied by user studies, in which a group of observers rank or rate results of several algorithms. Such user studies, known as subjective image quality assessment experiments, can be very time consuming and do not guarantee to produce conclusive results. This paper is intended to help design efficient and rigorous quality assessment experiments and emphasise the key aspects of the results analysis. | en_US |