A Narrative Review of Patient-reported Outcomes in Overactive Bladder: What is the Way of the Future?

  • Christopher R. Chapple 1,
  • Con J. Kelleher 2,
  • Chris J. Evans 3,
  • Zoe Kopp 3,
  • Emad Siddiqui 4,
  • Nathan Johnson 3,
  • Morgan Mako 3
1 Department of Urology Research, University of Sheffield, Sheffield, England, UK 2 Guys and St. Thomas’ Hospitals, London, UK 3 Endpoint Outcomes, Boston, MA, USA 4 Astellas Pharma Europe Ltd, Chertsey, UK

Take home message

A brief, overactive bladder symptom and health-related quality of life assessment with weekly recall has the potential to accurately characterize disease burden compared with a diary alone, and improve and standardize efficacy detection in clinical trials and ease patient burden.

Publication: European Urology, Volume 70, Issue 5, November 2016, Pages 799-805

PII: S0302-2838(16)30143-9

DOI: 10.1016/j.eururo.2016.04.033

The International Continence Society defines overactive bladder (OAB) symptom complex as “urinary urgency, usually accompanied by frequency and nocturia, with or without urgency urinary incontinence (UUI), in the absence of urinary tract infection or other obvious pathology” [1]. This symptom-based definition is a useful starting point in terms of diagnosing patients; however, in terms of evaluating the impact of interventions, it fails to address what is most important to patients. Patients seek treatment because their symptoms affect their health-related quality of life (HRQoL) [2]. Given the heterogeneity of symptoms and multifaceted impact of OAB, measurement of outcomes in clinical trials is complicated, and researchers are confronted with the problem of balancing basic assessment with obtaining a comprehensive picture of patient outcomes [3]. Goldman et al [4] highlighted the lack of formal guidance and the significant heterogeneity of both response and nonresponse definitions in a systematic review of OAB treatment endpoints. Goldman et al [4] reports on the heterogeneity of symptom-based and patient-reported outcome measures (PROMs)-based definitions of treatment response/nonresponse. For example, while most studies defined UUI treatment response as a 50–100% reduction in UUI episodes [4], others included a reduction of ≥2 episodes/wk [5], ≥50% reduction in incontinence pad weight [6], an increase in ≥1 continent d/wk [5], or 3–7 consecutive dry d [7]. The symptoms of urgency and frequency have also been used as endpoints with similar heterogeneity in the criteria used for definitions of success.

As evidenced by the above discussion, by recording frequency, volume, and number of incontinence episodes the bladder diary is at the core of every OAB assessment and represents the gold standard investigation [8]. Additional information may include the number of pads used and quantity of fluid intake [9]. The diary is clearly a useful tool not only in the initial patient evaluation as it allows clinicians to appropriately diagnose and plan an intervention, but also in objectively defining response to therapy. See Figure 1 for an overview of recommended endpoints in OAB.

To capture the impact of symptoms on patients, several psychometrically-validated PROMs exist [10]. These include the Overactive Bladder Symptom Score (OABSS) [11], the Overactive Bladder Questionnaire (OAB-q) [12], the King's Health Questionnaire [13], and the Patient Perception of Bladder Condition [14]. PROMs are routinely included as secondary endpoints in trials alongside diaries [15]. While some trials rely solely on primary nonbladder diary-based PROMs endpoints [16], other frequently used PROMs include global assessments, satisfaction, and goal attainment scaling [17].

To understand, support, and inform the development of a new multidimensional PROMs that could be used to replace bladder diaries as a primary or key secondary endpoint in clinical trials, we conducted a review of literature published within the past 10 yr on OAB treatment-response assessments. In particular, addressing the key issues of: (1) whether the definition of treatment response/nonresponse should include a symptom assessment, (2) should PROMs provide information about whether a reduction in symptoms actually improves patients’ lives, and (3) use of measures of treatment satisfaction and goal achievement. We believe that if a new multidimensional measure can be developed, then standardization of response definitions would allow for cross-trial comparisons and remove the confusion caused by individual symptom reporting while collecting data that are meaningful to both patients and practitioners.

We conducted a narrative review of OAB literature available in the PubMed database. If an article that satisfied the study inclusion criteria was identified, two members of the research team (Kopp and Evans) reviewed the article's abstract for inclusion. If the two authors agreed, the full-text article was retrieved for analysis. A full-text article was excluded if its focus was not related to OAB outcome measures. The two researchers had to agree before an article was excluded. The goals of the search were articles that examined bladder diary utility compared with other PROMs, the presence of placebo effects, patient burden in completing daily diaries, appropriate recall, recommendations for endpoints in OAB trials, and how other therapeutic areas utilize diaries and PROMs.

Inclusion criteria included: (1) published January 1, 2004 to January 22, 2016, (2) written in English, and (3) contain key search terms in the title or abstract. Key search terms included: overactive bladder, lower urinary tract dysfunction, lower urinary tract symptoms, urinary incontinence, urge urinary incontinence AND randomized controlled trial, bladder diary, voiding diary, urinary diary, patient-reported outcomes, patient satisfaction, global assessment scale, placebo-effect, treatment response, and quality of life. In addition, we examined literature in other chronic diseases in which treatment response has historically been determined by patient reporting via diaries. A systematic review of OAB literature was not completed, as we were specifically interested in the assessment of treatment response in clinical trials.

Figure 2 outlines the search results of the review. Ultimately, 80 articles were included in the review.

3.1. Placebo and training effects in OAB trials

Clinical trials for the treatment of OAB have noted a significant response in patients treated with placebo [18]. According to Mangera et al [19], bladder diaries may influence treatment outcomes in randomized controlled trials (RCTs) of treatment with antimuscarinic agents because of the unique contribution bladder diaries have toward the placebo effect. One issue is experimental subordination, where a patient answers subjective questions in a way that is seen to please their physician [19]. Also, as OAB constitutes a complex of symptoms, behaviors, and behavior modifications, a bladder training effect is apparent when visual feedback of performance trains the patient to change their behavior [20]. This has been recognized in the American Urological Association/Society of Urodynamics, Female Pelvic Medicine, and Urogenital Reconstruction OAB Diagnosis and Treatment Guidelines [21] that note that a self-monitoring effect may occur as a daily diary makes patients aware of their voiding habits. A placebo response is evident from this survey in clinical trials of OAB, as seen in Table 1.

Table 1

Placebo and training effects in overactive bladder randomized controlled trials


Outcome No. of studies No. of patients given placebo Mean change (SD) p value
Incontinence episodes/d 12 1847 –1.12 (0.59) <0.001
Micturition episodes/d 11 1938 –1.04 (0.8) 0.0016
Urgency episodes/d 3 928 –1.15 (1.74) 0.37
Mean micturition volume (ml) 11 1854 10.61 (12.9) 0.02
Maximum cystometric capacity (ml) 6 208 –16.87 (9.99) 0.009

SD = standard deviation.

3.2. Correlations between PRO measures and bladder diary endpoints

Significant correlations between widely-used PROMs and bladder diary endpoints exist within OAB literature. The OABSS, for example, consists of the sum score of four symptom items: daytime frequency, nighttime frequency, urgency, and UUI [11]. In the original validation, the actual number of daytime and nighttime urinations were gathered and urgency and UUI were assessed with a frequency scale. Each symptom score correlated positively with the OABSS (rs = 0.10–0.78). In a comparison study of the OABSS to a 3-d bladder diary [22], statistically significant improvements in all OABSS and corresponding bladder diary variables (p < 0.001) were found with high correlations (rs ≥ 0.5) between score changes in nighttime frequency and UUI. Consequently, the OABSS is an alternative to a diary for assessment in clinical practice. The OAB-q is a validated 33-item symptom bother and HRQoL questionnaire [12]. The coping and social interactions subscales significantly correlate with the number of urinations per day (r = –0.20 and –0.23 respectively, p = 0.02). The sleep subscale and number of urinations per night were highly correlated (r = –0.50, p < 0.0001). A validation study comparing the 1-wk and 4-wk versions of the OAB-q to a 3-d diary, found moderate to strong correlations between the OAB-q subscales and nearly all diary variables [23].

The Overactive Bladder Awareness Tool (OAB-V8) is a validated 8-item instrument [24]. In the validation of the OAB-V8, clinical variables of urgency, nocturia, and daytime frequency were collected with a bladder diary and compared with OAB-V8 scores; the OAB-V8 performed well with high sensitivity (0.96) and specificity (0.827).

The Questionnaire-Based Voiding Diary (QVD) is another validated instrument with a high correlation to a 48-h bladder diary [25] and [26]. The sensitivity, specificity, and positive likelihood ratio of the QVD for diagnosis of UUI were 0.82, 0.79, and 4.0, respectively. The authors conclude that the QVD is a useful alternative to the bladder diary. See Table 2 for a summary of correlations between PROMs and bladder diary endpoints.

Table 2

Correlations between patient-reported outcome measures and bladder diary endpoints


Measure Correlations
OABSS [11] and [22] • OABSS compared with a 3-d bladder diary
• Statistically significant improvements in all OABSS and corresponding bladder diary variables (p < 0.001 for all variables)
• High correlations (Spearman's rho ≥ 0.5) between score changes in nighttime frequency and urgency incontinence
• Urgency and daytime frequency correlation coefficients were (r = 0.40, p < 0.001) and (r = 0.26, p < 0.001), respectively, demonstrating low to moderate correlation with their corresponding bladder diary variables
OAB-q/V8 [23] and [24] • OAB-q scores compared with both urgency, daytime frequency, and nocturia with 1-wk bladder diary and urogynecologist diagnosis
• Coping and social interactions subscales were significantly correlated with the no. of urinations/d (r = –0.20 and –0.23, respectively, p = 0.02). The sleep subscale and no. of urinations per night were highly correlated (r = –0.50, p < 0.0001)
• OAB V-8 is an 8-item version of OAB-q; OAB-V8 bothersomeness scores compared with bladder diary and clinician diagnosis
QVD [25] • Four QVD subscale (type and amount of fluid intake, urinary output, urinary symptoms, and fluid intake behavior) demonstrated high correlations with a 48-h bladder diary
• Correlation between QVD fluid intake and bladder diary was high (r = 0.65–0.83, p < 0.01)
• High correlation between fluid intake behavior and urinary frequency (r = 0.82, p < 0.01), urgency (r = 0.77, p < 0.01), and urge incontinence (r = 0.71, p < 0.01)

OABSS = Overactive Bladder Symptom Score; OAB-q/V8 = Overactive Bladder Awareness Tool; QVD = Questionnaire Based Voiding Diary.

3.3. Burden, over/underestimation, and lack of validation

Several publications highlight issues regarding the burden of, lack of compliance with, and overestimation of symptom frequency using bladder diaries. Diaries place a large inconvenience on patients [22] and [27]. In one study, compliance with diaries was found to be high in the office setting, yet 52% of patients demonstrated issues with adherence to instructions at home [28]. In another study, only 47% of women (p = 0.01) were found to accurately report daytime frequency using a diary [29]. Other studies of many patients overestimated or underreported nighttime frequency using a diary when compared with a medical chart [30] and [31].

Although bladder diaries are considered to be the gold standard for OAB diagnosis and remain useful in clinical practice and research, they lack validation and vary greatly in terms of content, format, and duration of recall period. In 2011, Bright et al [32] conducted a review of 81 studies using bladder diaries and concluded that, at that time, no validated urinary diary existed. See Table 3 for a summary of burden, over/underestimation, and lack of validation in bladder diaries.

Table 3

Burden, over/underestimation, recall, and lack of validation in bladder diaries


Burden • Patients must keep the diary for several consecutive days
• In one study, 52% of patients had issues with adherence to instructions for proper use at home [28]
Over/underestimation • In one study, only 47% of women were found to accurately report daytime urinary frequency using a bladder diary [29]
• Other studies of male-only and female patient reports may overestimate or underreport the frequency of nocturia using a bladder diary [30] and [31]
Recall period • In general for PRO measures, shorter recall periods are considered better as rating variance increases the longer the delay there is between an event/experience occurring and the reporting of it [34]
Lack of validation • Diaries vary greatly in terms of content, format, and duration of recall period
• Only one bladder diary has been evaluated for criterion and construct validity, reliability, and responsiveness [32]

PRO = patient reported outcome.

3.4. Recall periods

In diagnosing OAB, patients’ completion of the diary for 2–3 d has been recommended [33], other recommendations in literature range from 24 h to 2 wk [9]. In clinical trials it is common to complete diaries for 3–7 d. In general, shorter recall periods are considered better than longer recall periods as rating variance increases the longer the delay there is between an event and the reporting of it [34]. However, researchers have found that 1-wk diaries are as reliable as 2-wk diaries and a comparison of a 5-d diary to a 24-h diary found the 24-h diary overestimated the maximum volume voided [35] and [36].

Recall periods in other chronic, symptomatic conditions were reviewed. In pain and fatigue assessments, when momentary reports were compared with recalled reports (over 1–28 d) substantial concordance was found between reports, suggesting that longer recall periods do not necessarily lead to substantially less accurate results [37]. Research in cancer pain confirms that 24-h recall and 7-d recall can be highly correlated [38]. Conversely, there is some evidence, in pain, that a 7-d window may more accurately characterize a patient's condition than the assessment of their current status [39]. See Table 3 for a summary of recall periods in bladder diaries.

The International Consultation on Incontinence Research Society highlighted the need for a standardized measure in all outcome evaluations to increase comparability and standardize the assessment between different treatment evaluations in different populations [3]. The International Consultation on Incontinence Research Society recommends that a comprehensive evaluation should encompass satisfaction, symptoms, HRQoL, and adverse events as elements of a minimum in any outcome measurement. It is of note that OAB clinical trials have reported individual symptoms in isolation (eg, frequency) as primary outcomes; however, this approach may neither portray true therapeutic outcomes nor reflect what matters most to patients [2]. Instead, the use of composite endpoints may more accurately reflect the nature of OAB symptoms and correlate better with improved patient HRQoL, treatment satisfaction, and persistence; thereby harmonizing the reporting of trial data by removing confusion caused by individual symptom reporting.

3.6. Endpoints in similar syndrome-defined conditions

We also examined literature in relevant therapeutic areas and syndrome-defined chronic conditions (eg, restless legs syndrome [RLS]) that are patient identified and that have relied on diaries to gather symptom response. In interstitial cystitis/bladder pain syndrome where investigators historically have relied on diaries to assess treatment, our review reveals a change in interstitial cystitis/bladder pain syndrome endpoints. In a 2014 phase 3 RCT for the treatment of interstitial cystitis, investigators used the O’Leary-Sant questionnaire as primary outcome measures instead of a diary [40].

Benign prostatic hyperplasia (BPH) relies on PROMs as a primary endpoint. In a recent RCT to compare monotherapy versus combination therapy for OAB symptoms induced by BPH, the primary endpoint was a total change in OABSS score [41]. Secondary endpoints included the change in both OABSS and total International Prostate Symptom Score. A systematic review of solifenacin/tamsulosin in therapy for patients with BPH reveals widespread utilization of the International Prostate Symptom Score as a coprimary endpoint alongside diaries [42]. RCTs of treatments for RLS now routinely rely on the use of PROMs to document treatment efficacy, tolerability, symptom severity, and improvement. Allen et al [43] compared treatments for RLS using PROMs instead of traditional diary outcomes. Similarly, other pharmacological trials have defined RLS treatment response in terms of PROMs endpoints [44] and [45].

Tension headache and migraine have historically relied on the use of diaries for diagnosis and treatment. Clinical studies now incorporate PROMs as primary, coprimary, and secondary endpoints. Widely used PROMs with correlations to diaries include the Migraine Disability Assessment and Headache Impact Test [46].

This review emphasizes the limitations of the traditional use of bladder diaries as primary endpoints in OAB trials. While diaries play an important role in diagnosis, the results highlight that diaries allow for a unique bladder-training effect and contribute to the placebo effect seen in clinical trials. As there is a strong correlation between existing PROMs and diaries, the development of a new PROM as an alternate existing measures and diaries for assessing treatment outcome will bring added value. Such a tool would provide better understanding of OAB treatment efficacy. We acknowledge, however, that issues with current instruments exist. The commonly used questionnaires were developed prior to current European Medicines Agency, US Food and Drug Administration, and International Society for Pharmacoeconomics and Outcomes Research guidelines for the development and validation of PRO measures [47], [48], and [49]. Also, there is no standard recommendation for the most appropriate recall period to use in any study, although the recall period used should match the purpose of the study. A new measure appropriately developed with a longer recall period could reduce patient burden and lead to better overall compliance with recording their symptoms.

Existing PROMs would serve as a starting point for the development of a new PROM that would correlate strongly with all aspects of a bladder diary, would quantify OAB symptoms, and incorporate evaluation of satisfaction and HRQoL.

A measure that incorporates key symptoms measured in a diary and assesses impact on the patient such as HRQoL and satisfaction measures would offer advantages over existing assessments. Firstly, if the recall period is extended from momentary assessment to weekly the training effect could be reduced as the frequency of assessment is decreased. Secondly, the incorporation of a HRQoL assessment may reduce the placebo effect as it may be more difficult to subconsciously change behavior to improve HRQoL outcomes. We recognize that this is theoretical, and the placebo effect will not completely disappear; however, a brief, symptom, and HRQoL assessment utilizing a weekly recall has the potential to more accurately characterize disease burden compared with a diary alone, improve on efficacy detection in clinical trials, and provide a less burdensome method for patients to record their OAB complaints.

Author contributions: Christopher R. Chapple had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Chapple, Kelleher, Evans, Kopp, Siddiqui, Johnson, Mako.

Acquisition of data: Evans, Kopp, Johnson, Mako.

Analysis and interpretation of data: Siddiqui, Chapple, Kelleher, Evans, Kopp, Johnson, Mako.

Drafting of the manuscript: Chapple, Kelleher, Evans, Kopp, Siddiqui, Johnson, Mako.

Critical revision of the manuscript for important intellectual content: Chapple, Kelleher, Evans, Kopp, Siddiqui, Johnson, Mako.

Statistical analysis: Evans, Kopp, Johnson, Mako.

Obtaining funding: Siddiqui.

Administrative, technical, or material support: Siddiqui, Evans, Kopp, Johnson, Mako.

Supervision: Chapple, Kelleher, Evans, Kopp, Siddiqui, Johnson, Mako.

Other: None.

Financial disclosures: Christopher R. Chapple certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: Astellas.

Acknowledgments: Bladder Assessment Tool Advisory Committee: Pamela Brandt, Chris Chapple, Chris Evans, Zalmai Hakimi, Yukio Homma, Con Kelleher, Kathleen Kobashi, Zoe Kopp, Chris Payne, and Emad Siddiqui.

