OP20: Distribution of observation period length, stratified by gender

This query is used to provide summary statistics for the observation period length across all observation period records stratified by gender (gender_concept_id): the mean, the standard deviation, the minimum, the 25th percentile, the median, the 75th percentile, the maximum and the number of missing values. The length of an is defined as the difference between the start date and the end date. All existing gender_concept_id values are summarized.


Sample query run:
The following is a sample run of the query.
Oracle only
SELECT gender
     , count(*) as observation_periods_cnt
     , min( period_length ) AS min_period
     , max( period_length ) AS max_period
     , round( avg( period_length ), 2 ) AS avg_period
     , round( stdDev( period_length ), 1 ) AS stdDev_period
     , PERCENTILE_DISC(0.25) WITHIN GROUP( ORDER BY period_length ) 
                                     AS percentile_25
     , PERCENTILE_DISC(0.5)  WITHIN GROUP (ORDER BY period_length )
                                     AS median
     , PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY period_length )
                                     AS percentile_75
  FROM -- person, gender, period_length
     ( SELECT person_id, gender, period_length
         FROM /* person, gender */
            ( SELECT person_id, concept_name AS gender
                FROM -- person, first observation date
                   ( SELECT person_id
                          , min( observation_period_start_date ) AS first_observation_date
                       FROM observation_period
                      GROUP BY person_id
                JOIN person USING( person_id )
                JOIN vocabulary.concept ON concept_id = gender_concept_id
               WHERE year_of_birth IS NOT NULL
         JOIN -- person, length of observation periods
            ( SELECT person_id
                   , observation_period_end_date - observation_period_start_date + 1
                     AS period_length
                FROM observation_period
            ) USING( person_id )
 GROUP BY gender
Output field list:
 Field  Description
gender Gender concept name
observations_period_cnt Number of observation periods for specific gender
min_period Minimum duration of observation period in days
max_period Maximum duration of observation period in days
avg_period Average duration of observation in days
stdDev_period Standard deviation of observation
percentile_25 25th percentile of observation periods in days
median Median of observation periods in days
percentile_75 75th percentile of observation periods in days

Sample output record:
 Field  Description
gender  MALE
observations_period_cnt  1812743
min_period  1
max_period  2372
avg_period  653.77
stdDev_period  502.40
percentile_25  365
median  457
percentile_75  731