By Jo Cotton, Faculty of Education, University of Cambridge
The term ‘secondary’ refers to data that has been collected by another researcher and shared for use and analysis by others. This is in contrast to to primary data, which is collected explicitly for use in a new study. At the start of my PhD, I did not intend to analyse secondary data. I wanted to recruit a new sample and test a social skills intervention for children with Attention Deficit Disorder ADHD (so interesting, right?!). However, during my early reviews of literature I was inexorably drawn to the lack of clarity about other childhood factors in adult outcomes, i.e. what would make an ADHD intervention actually make a difference in the long-term? To study this, I needed longitudinal data, which is impractical to collect within PhD research. So, my thesis in the end was: “How do childhood ADHD and stress relate to adult wellbeing and educational attainment? A data science investigation using the 1970 British Cohort Study”.
The 1970 British Cohort Study (BCS70) was based on 17,198 babies born in one week in April 1970, representing an estimated 95-98% of births in Great Britain that week. The full cohort was surveyed again at ages 5, 10, 16, 26, 30, 34, 38, 42, and 46, and the study is ongoing. BCS70 is managed by the Centre for Longitudinal Studies at the Institute of Education, and the data is accessible free-of-charge to academic researchers in the UK via the UK Data Service. There are numerous other data sources available on the site for perusal and use; it is a proverbial data nirvana. An account can be created with a Raven ID. If you want to know more about BCS70 and/or the UK data service, have a look at these websites:
Measuring ADHD symptomatology in children born in 1970 was tricky, since it didn’t exist then as it is defined today. If you’re interested, I published a paper on the method (Cotton & Baker, 2018)
Once that was done, I was able to analyse ADHD and other psychosocial factors measured at ages 0, 5, and 10 and their relationship to wellbeing and educational attainment outcomes at ages 34 and 42. My findings are of course subject to caveats, but here are a few examples:
- About 5 percent of children at age 10 in 1980 in Great Britain met an approximation of today’s DSM-5 ADHD diagnostic criteria.
- At age 10 (N=11,426) there was a significant relationship between higher ADHD symptomatology, higher chronic and life event stressors, and lower self-esteem and locus of control, with a small but practically important effect size.
- In a sample matched on sex and socio-economic factors specifically relevant to ADHD, childhood ADHD symptomatology did not have a practically significant effect on adult wellbeing or educational attainment.
Regarding perks and potential pitfalls, most aspects of secondary data analysis can have both. For example, you don’t have to collect the data yourself. The perks are probably self-evident: no need for complicated ethics procedures, recruitment, logistics of engaging with human participants, and dealing with all the ways they find to be unpredictable and unintentionally circumvent the best-laid plans. An ethics form is required, but mine was very straightforward, and it may be necessary to engage with a few technical people to gain access to the data or ask questions about it. Otherwise, there is very little messy, time-consuming human contact. Secondary data analysis is a solitary activity, and it can be incredibly challenging to manage your own time when there are no other humans expecting you to show up, call back, or even reply to an email. It also involves endless hours of nothing but working on a computer, which increases the risk of strain to eyes, hands, wrists, neck, back, posture, general fitness level, and mental health. All of these risks can be managed, for example by taking regular breaks, physical exercise, socialising, and other forms of self-care. It is easy to allow a PhD to become all-consuming, and self-care activities do take time and effort, so it is important to build them into the project plan.
Given there is no new data to collect, it’s not necessary to fully define the research questions in advance. This allows the questions to evolve over time, which can lead to better questions. However, it can also lead to ‘analysis paralysis’. In BCS70 for example, the amount of data collected was vast. In just one sweep from age 10, over 3,000 data items were captured for each of the 14,875 cohort member responses. The permutations are endless. This can be overwhelming, and it is easy to get distracted considering all the different data analyses that could be done.
I’ve outlined perks and pitfalls in more detail in a mind-map:
To sum up, secondary data analysis can be a great way to study relationships between constructs, particularly when large, rich, longitudinal samples are important. It supports learning of more advanced quantitative methods and software, and does not have to be used exclusively, i.e. can be used to augment primary data. As with any method there are potential pitfalls, but they can be planned for and managed.
Overall, I’m glad I analysed secondary data and would certainly do it again. I had much greater statistical power and stronger findings than I could have possibly achieved with primary research, so in my case the benefits outweighed the difficulties.
Cotton, J., & Baker, S. T. (2018). A data mining and item response mixture modeling method to retrospectively measure Diagnostic and Statistical Manual of Mental Disorders‐5 attention deficit hyperactivity disorder in the 1970 British Cohort Study. International Journal of Methods in Psychiatric Research. https://doi.org/10.1002/mpr.1753
Jo Cotton is a doctoral candidate in the Faculty of Education at the University of Cambridge researching the relationship between stress and Attention Deficit Hyperactivity Disorder (ADHD). Jo was previously a technology and change management consultant, and left the corporate world to pursue ADHD advocacy. She hopes to use her research results to develop changes in schools that will lead to less stress and better outcomes for children and families affected by ADHD.