Feeds:
Posts
Comments

Posts Tagged ‘smoking’

This is a post inspired by a question I saw online: Which single public health intervention would be most effective in the UK?

I would like to share my own views on the question although don’t expect anything comprehensive as I don’t have much experience about how an idea can be taken further to impact policy and public health practice.

‘Investigating addiction in the UK’ study. Source URL: http://www.raconteur.net

Something must be done – and fast!

Legend has it that a great chess player travelled to Manhattan to take part in a World Chess tournament. Looking around Central Park, he saw that a crowd had gathered around a street chess player who was offering money to those who could beat him. He decided to give it a go – and after a gruelling match, they shaked hands on a draw. This dented his confidence and ultimately caused him to return to his homeland without taking part in the tournament.

Little did he know that the street chess player was a grand master who wanted to pass time before taking part in the same the tournament.

What has this got to do with a public health intervention? I will come back to it…

From my observations over the last 7-8 years as a scientist studying different common diseases such as diabetes – to which £1 of every £10 of the NHS’s budget is spent on, obesity – which is the major risk factor for heart attacks, and chronic obstructive pulmonary disease (COPD) – currently the third leading killer in the world, it is clear that cheap and effective treatments for these diseases are a long way away. This is not to say that there is no progress as there is tremendous research being carried out on (i) understanding the molecular causes of (e.g. genes, proteins that cause) these diseases and (ii) developing new therapies. The continuous economical costs of treating patients with current state-of-the-art therapies is reaching infeasible levels with a significant proportion being wasted on patients who do not adhere to their prescriptions properly1 and ‘top selling’ drugs being so inefficient that up to 25 patients need to be treated in order to prevent one adverse event such as a heart attack2. These diseases drain the NHS’s budget, cost the lives and healthy years of hundreds of thousands of people and causes emotional distress to the patients and their loved ones. If something is not done now – and quick – latter generations may not have an NHS that is ‘free and accessible to all’ to rely on as the system is already showing signs of failure in many parts of the country3,4 – although costing around 1 in 5 of the government’s annual budget.

Parents need help!

What is also striking about these diseases is that up to 9 in 10 cases are thought to be preventable. Thus, concentrating on prevention rather than ‘cure’ makes most sense as the only economically feasible solution lies here. No single public health intervention is going to solve all the problems that the UK health system faces currently but one thing that has always stared me in the face was how clueless and/or irresponsible most parents are, regardless of which socio-economic stratum they belong to – writing this sentence as I read an article on a teenager who died from obesity after his mother continually brought takeaway to his hospital bed5. The consequence is children living through many traumatic experiences, picking up bad habits and developing health problems due to a combination of ignorance, lack of guidance and toxic environments.

A wise man was once asked: “How do we educate our children?” and he is said to have replied “Educate yourself as they will imitate you”. As a new father, I got to observe first-hand that my child is virtually learning everything in life from myself and my wife. Thinking back, my parents never smoked, did not allow any visitors to smoke in the house, and kept me away from friends who smoked. Their actions were the main factor for myself and my three siblings to never start smoking – although there was pressure from my school friends. Research suggests that this is true across the general population, that is, if parents do not smoke, their children are more likely to become adults who will not either6; if parents prepare healthy food, their children will do too; if parents do not drink or drink moderately, the children will do too; if parents are educated, their children will be too7; and the list goes on… As the only economically feasible hope seems to be prevention, there is no better place to start than educating parents.

Since starting as a researcher at my current institute, I have been to a dozen or so ‘induction courses’, taking lessons on a variety of subjects from ‘equality and diversity’ to ‘fire safety’ to ’unconscious biases’. Although most seemed a bit of a time waster at first, after enrolling to them, I soon accepted that these were important as I did not know how crucial they were in certain situations – situations that are more common than one would think. I would not have attended them if they were not mandatory.

However, arguably, none of these skills that I picked up in these induction courses are as important as being a good parent and helping my children achieve their potential physically, intellectually, psychologically, emotionally and socially. I think it is irresponsible that there exists no mandatory training before people become parents. We as parents are expected to be not just people who keep our children alive by providing for them, but we are also expected to be good dieticians, sleep coaches, pedagogues, psychiatrists, life coaches, friends… Unsurprisingly, many parents are failing horribly as we are not equipped with a solid foundation to guide them properly. The result is: one-third of the population is obese, one-fourth drink above advised thresholds, one-fourth of students report to have taken drugs, one-fifth smoke (noting that vaping is not included in this figure), one-fifth show symptoms of anxiety or depression and up to one-tenth may be game addicts.

To help parents in this long and extremely difficult journey of parenthood, I propose mandatory courses tailored for first-time parents – with exemptions & alternatives available. The specific syllabus and the length of the course should be shaped by pedagogy, public health, psychology, sociology, and epidemiology experts but also by the parents themselves.

In this course parents can:

  1. Be persuaded about the importance of such a course – just as I learned that spending time learning about fire safety was not a bad idea
  2. Be provided with links on where to easily find reliable information (e.g. NHS website)
  3. Learn about the mental and physical health aspects of smoking, drinking alcohol, exercising, eating high sugar content food, pollution, watching TV, reading books, cooking healthy food, mould, asthma triggers, excessive use of social media etc.
  4. Feedback any problems they have to a central panel and make suggestions as to how the course could be improved
  5. Hear about local activities (e.g. ‘Stop smoking’ events, English courses, even events such as Yoga classes)
  6. Receive information about who they can contact if they themselves have addiction problems (e.g. smoking, alcohol, drugs, gambling)
  7. Learn about what to look out for in their children (e.g. any obvious signs of physical and mental diseases, bullying)
  8. Be encouraged to support their children achieve their potential – no matter what background they come from
  9. Be encouraged to offer help in local as well as national problems such as the organ donor shortage, climate change (recycling, carbon emissions), air pollution etc.
  10. Be reminded of the responsibility to provide future generations a sustainable world
  11. Be taught about the relevant laws (e.g. child seat, domestic abuse, cannot leave at home on their own).

I believe if the course is designed with the help of experts but also by parents, the course can be engaging and lead to more knowledgeable parents. This is turn will lead to positive changes in behaviour and a significant drop in the incidence of unhealthy diets/lifestyles, (at least heavy) smoking, substance use and binge drinking – major causes of the abovementioned common diseases. I think to ensure that parents engage and take part in the process, an exam should be administered where individuals who fail should re-take the exam. Parents who contribute to the process with feedback and suggestions can be rewarded with minor presents or a simple ‘thank you’ card from the government itself – a gesture that is bound to make parents feel part of a bigger process. Parents who are engaged in this process will also be encouraged to engage with their children’s education and help their teachers when they start going to school. Parental participation in turn, will positively affect academic achievement and the healthy development of children – a phenomenon shown by many studies8,9. Incentives such as additional child tax credit/benefit and/or paid parental leave for both parents should be considered to increase true participation rates.

These courses can then be accompanied by a number of optional courses where NGOs and volunteers from the local community can offer advice on matters such as ‘how to quit smoking?’, ‘how to find jobs?’, online parenting, English language courses (for non-speakers), and engaging children with local sports teams. I would certainly volunteer to give a session on the genetic causes of diabetes and obesity – and I know there are plenty of academics and professionals (e.g. experienced teachers, solicitors) out there whom would happily offer free advice to those who are interested. There are NGOs providing information on almost all diseases and health-related skills (e.g. CPR, first-aid) and this course would offer a more targeted and cost-efficient platform for them to disseminate their brochures and information on their upcoming events.

Many upper-middle to upper class parents regularly attend similar courses and events – and making this available to every parent would represent another way to close ‘the gap’10. Old problems persist but new ones are added on top such as online gaming, e-cigarettes, FOMO and betting addiction – and the courses can evolve with the times. A government which successfully implements such a course can leave a great legacy as social interventions have long lasting impact and even affect other countries.

One could argue that a course like this should be offered to every citizen at few key stages in their lives (e.g. first parenthood, before first child reaches puberty) – and that would be the ultimate aim. But as this option may initially be very costly and hard to organise and focusing on parents ensures that not only the parents are educated but consequently the children are too – making the process more cost efficient. The first courses could be trialled in certain regions of the country before going nation-wide.

We are all in the same boat – whether we realise or not

I would like to diverge a little to mention the potential sociological benefits of the proposed course: Tolstoy, in Anna Karenina wrote “Happy families are all alike; every unhappy family is unhappy in their own way” – also an increasingly used aphorism in public health circles. However, I observe and believe that many of us are unhappy due to similar reasons: we all want to be listened to, understood and feel like we are being cared about. I believe the proposed course accompanied with an honest feedback system would be a great start in getting the ‘neglected masses’ involved in national issues.

I would like to finish by returning to the little story at the start. I believe that many parents, especially those from poorer backgrounds, give up trying for their children early on as they do not think that they or their children can compete against other ‘well-off’ individuals and therefore see no future for themselves. Their children and grandchildren also end up in this vicious cycle. But if they get to see first-hand in the proposed course that we all – rich and poor – start from not too dissimilar levels as parents and have the same anxieties about our children can also motivate us all to push a little bit extra and hopefully close the massive gaps that exist between the different socio-economic strata in the UK11 – and ultimately decrease the prevalence of the diseases that are crippling the NHS.

Further reading

  1. Shork, N. 2015. Personalized medicine: Time for one-person trials. Nature. 520(7549)
  2. Bluett et al., 2015. Impact of inadequate adherence on response to subcutaneously administered anti-tumour necrosis factor drugs: results from the Biologics in Rheumatoid Arthritis Genetics and Genomics Study Syndicate cohort. Rheumatology. 54(3):494-9
  3. NHS failure is inevitable – and it will shock those responsible into action. The Guardian. URL: https://www.theguardian.com/commentisfree/2018/apr/06/nhs-failure-health-service. Accessed on 30th October 2019
  4. The first step towards fixing the UK’s health care system is admitting it’s broken. Quartz. https://qz.com/1201096/by-deifying-the-nhs-the-uk-will-never-fix-its-broken-health-care-system/. Accessed on 30th October 2019
  5. Teenager Dies from Obesity After Mother Brought Takeaways to His Hospital Bed – Extra.ie. URL: https://extra.ie/2019/09/12/news/extraordinary/child-dies-obesity-mum-hospital. Accessed on 27th October 2019
  6. Mike Vuolo and Jeremy Staff. 2013. Parent and Child Cigarette Use: A Longitudinal, Multigenerational Study. Pediatrics. 132(3): 568–577
  7. Sutherland et al. 2008. Like Parent, Like Child. Child Food and Beverage Choices During Role Playing. Arch Pediatr Adolesc Med. 162(11): 1063–1069
  8. Sevcan Hakyemez-Paul, Paivi Pihlaja & Heikki Silvennoinen. 2018. Parental involvement in Finnish day care – what do early childhood educators say? European Early Childhood Education Research Journal, 26:2, 258-273
  9. Jennifer Christofferson & Bradford Strand. 2016. Mandatory Parent Education Programs Can Create Positive Youth Sport Experiences. A Journal for Physical and Sport Educators. 29:6, 8-12
  10. How Obesity Relates to Socioeconomic Status. Population Reference Bureau. URL: https://www.prb.org/obesity-socioeconomic-status/. Accessed: 18/12/19
  11. Nancy E. Adler, Katherine Newman. 2002. Socioeconomic Disparities In Health: Pathways And Policies. Health Affairs. 21:2, 60-76

Read Full Post »

smoking_genetics_gwas_mesut_erzurumluoglu
A ‘Circos’ plot (with three concentric circular ‘Manhattan’ plots) presenting results from our latest genetic association study of smoking behaviour – showing some (not all) regions in our genome that are associated with smoking behaviour (Erzurumluoglu, Liu, Jackson et al, 2019). SI: Smoking initiation – whether they smoke or not; CPD: Cigarettes per day – how many cigarettes do they smoke per day; SC: Smoking cessation – whether they’ve stopped smoking after starting. Labels in the outer circle show the name of the nearest gene to the identified variants. X-axis: Genomic positions of the variants in the human genome (chromosome numbers, 1-22, in the outer circle), Y-axis: Statistical significance of the genetic variants in this study – higher the peak, greater the significance. Red peaks are the newly identified regions in the genome, and the blue ones were identified by previous groups. Image source: Molecular Psychiatry

I believe that all scientists should be bloggers and that they should spare some thought and time to explain their research to interested non-scientists without using technical jargon. This is going to be my attempt at one; hopefully it’ll be a nice and short read.

We’ve just published a paper in one of the top molecular psychiatry journals (well, named Molecular Psychiatry 🙂 ) where we tried to identify genetic variants that (directly or indirectly) affect (i) whether a person starts smoking or not, and once initiated, (ii) whether they smoke more. The paper is titled: Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. It is ‘open access’ so anyone with access to the internet can read the paper without paying a single penny.

If you can understand the paper, great! If not, I will now try my best to explain some of the key points of the paper:

Why is it important?

Smoking causes all sorts of diseases, including respiratory diseases such as chronic obstructive pulmonary disease (which causes 1 in 20 of all deaths globally; more stats here) and lung cancer – which causes ~1 in 5 of all cancer deaths (more stats here). Therefore understanding what causes individuals to smoke is very important. A deeper understanding can help us develop therapies/interventions that help smokers to stop and have a massive impact on reducing the financial, health and emotional burden of smoking-related diseases.

Genes and Smoking? What!?

There are currently around fifty genetic variants that are identified to be associated with various smoking behaviours and we identified 40 of them in our latest study, including two on the X-chromosome which is potentially very interesting. There are probably hundreds more to be found*. So, it’s hard to comprehend but yes, our genes – given the environment – can affect whether we start smoking or not, and whether we’ll smoke heavier or not. This is not to say our genes determine whether we smoke or not so that we can’t do anything about it.

There are three main take-home messages:

1- I have to start by re-iterating the “given the environment” comment above. If there was no such thing as cigarettes or tobacco in the world, there would be no smoking. If none of our friends or family members smoked, we’re probably not going to smoke no matter what genetic variants we inherit. So the ‘environment’ you’re brought up in is by far the most important reason why you may start smoking.

2- I have to also underline the term “associated“. What we’re identifying are correlations so we don’t know whether these genetic variants are directly or indirectly affecting the smoking behaviour of individuals – bearing in mind that some might be statistical artefacts. Some of the genetic variants are more apparently related to smoking than others though: for example, variants in genes coding for nicotine receptors cause them to function less efficiently so more nicotine is needed to induce ‘that happy feeling‘ that smokers get. Other variants can directly or indirectly affect the educational attainment of an individual, which in turn can affect whether someone smokes or not. I’d highly recommend reading the ‘FAQ’ by the Social Science Genetic Association Consortium (link below) which fantastically explains the caveats that comes with these types of genetic association studies.

3- Last but not least, there are many (I mean many!) non-smokers who have these genetic variants. I haven’t got any data on this but I’m almost 100% sure that all of us have at least one of these variants – but a large majority of people in the world (~80%) don’t smoke.

Closing remarks

To identify these genetic variants, we had to analyse the genetic data of over 620k people. To then identify which genes and biological pathways these variants may be affecting, we queried many genetic, biochemical and protein databases. We’ve been working on this study for over 2 years.

Finally, this study would not be possible (i) without the participants of over 60 studies, especially of UK Biobank – who’ve contributed ~400k of the total 622k, and (ii) without a huge scientific collaboration. The study was led by groups located at the University of Leicester, University of Cambridge, University of Minnesota and Penn State University – with contribution by researchers from >100 different institutions.

It will be interesting to see what, if any, impact these findings will have. We hope that there will be at least one gene within our paper that turns out to be a target for an effective smoking cessation drug.

Further reading

1- FAQs about “Gene discovery and polygenic prediction from a 1.1-million-person GWAS of educational attainment” – a must read in my opinion

2- Smoking ‘is down to your genes’ – a useful commentary on the NHS website on an older study

3- 9 reasons why many people started smoking in the past – a nice read

4- Genetics and Smoking – an academic paper, so quite technical

5- Causal Diagrams: Draw Your Assumptions Before Your Conclusions – a fantastic course on ‘Cause and Effect’ by Prof. Miguel Hernan at Harvard University

6- Searching for “Breathtaking” genes – my earlier blog post on genetic association studies

Data access

The full results can be downloaded from here

*in fact we know that there is another paper in press that has identified a lot more associations than we have

Read Full Post »

Download a PDF version of the blog post from here:


After performing a genome-wide association study (GWAS), we’d then ideally want to link the identified associations/SNPs to (druggable) genes and biological pathways. Unearthing novel biology can inform drug target (in)validation but also lead to higher-impact publications (see ‘selected publications’ below). The latter point is especially important for early-career researchers who will be applying for fellowships and/or lectureships soon 🙂

Happy to help out with any of the below.

A slide from my Journal club on the October 2017 GTEx paper: Identifying the causal variants and genes, and the relevant tissues and pathways is the ultimate aim of GWASs. If the causal gene(s) turns out to be ‘druggable’, it can lead to pharmaceutical companies to develop treatments for the disease of interest. See My Research page to download the full slides.

Methods and Software

The below are some of the Post-GWAS ‘SNP follow-up’ steps/software that I have been taking/using for the last 2-3 years:

1- Finemapping the identified signals:

This step refines each signal to a set of variants that are 99% likely to contain the underlying causal variant – assuming the causal variant has been analysed

• Wakefield method [1] – Output: 99% credible set (Tutorial and R code available here: Wakefield_method_finemapping)

2- Query eQTL databases:

Rather than just assume that the gene nearest to the sentinel SNP is the causal gene, we can bring in other lines of evidence such as eQTL and pQTL analyses to check whether the SNP(s) is associated with the expression of a gene.

• GTEx v7 dataset (n up to 492; RNASeq) [2] – publicly available at [3] (see My Research page to download my Journal club slides on GTEx v6 paper)

• NESDA-NTR Blood eQTL dataset (n=4,896; microarray) [4] – publicly available at [5]

• Lung eQTL dataset (n=1,111; microarray) [6] – need to request lookups from Dr. Ma’en Obeidat

• BIOS (Biobank-Based Integrative Omics Study) Blood eQTL dataset (n=2,116; RNAseq) [7] – publicly available at [8]

• Westra et al Blood eQTL dataset (n=5,311 with replication in 2,775; microarray) [9] – publicly available at [10]

• There are other tissue/organ specific databases such as BRAINEAC (n=134) and Brain xQTL (n=up to 494)

3- eQTL-GWAS signal colocalisation:

• eCAVIAR [11] by Hormozdiari et al, 2016 [12] – Click for Powerpoint presentation (ecaviar_colocalisation_mesut_04_07_18) and methods (ecaviar methods_v3)

• It also helps to plot the Z-scores of the eQTL (separate plots for each gene near the signal) and GWAS SNPs on the same plot – maybe with the SNPs in the 99% credible set mark differently to other SNPs near the sentinel SNP. Of course, choosing the relevant tissue(s) is crucial!

4- Query pQTL databases:

• Sun et al, 2018 dataset [13] – need to request lookups from the authors (maybe Dr. Adam Butterworth)

5- Variant effect prediction:

Checking whether our sentinel SNP is in LD with a coding variant that is predicted to be functional provides another line of evidence for a putatively causal gene.

• DeepSEA – for noncoding SNPs [14] (see My Research page to download my Journal club slides on DeepSEA)

• SIFT, PolyPhen-2, and FATHMM via Ensembl VEP – for coding SNPs [15]

6- Enrichment of associations at DNase hypersensitivity sites:

Using your GWAS results to identify chromatin features relevant to your trait of interest can yield important information on the genetic aetiology of that trait (e.g. DNase hypersensitivity site enrichment in fetal lung would mean that developmental pathways in the lung are playing an important role)

• GARFIELD [16]

• FORGE [17] – very easy to use but superseded by GARFIELD

7- Pathway enrichment analysis:

• ConsensusPathDB [18] – as it queries more biological pathway and gene ontology databases than the alternatives. You can input all the genes that are implicated by eQTL/pQTL databases and variant effect prediction (e.g. genes that harbour a coding variant in the 99% credible set). Good idea to remove genes in the MHC region (e.g. HLA genes) to identify pathways other than the immune system-related ones. Methods can be found here: ConsensusPathDB_methods

• You can also do an additional check to see if the ‘significant’ pathways (e.g. FDR<5%) are mainly due to the implicated genes – as identified by eQTL/pQTL and variant effect prediction (list 1) – or the regions identified by GWAS itself: extract all the genes within 500kb of the sentinel SNPs (list 2) and then make 100 lists (same size as list 1) with genes randomly selected from this set. Then input these to ConsensusPathDB and see how many times the pathways identified by list 1 appears in the output as ‘significant’.

8- LD score regression:

Bivariate LD score regression allows one to identify the genetic correlation between two traits which implies shared biology.

• LD Hub [19] – check the genetic correlation between your trait of interest and up to >600 traits (see My Research page to download my Journal club slides on LD Hub)

• Stratified LD score regression [20] – check if there’s significant enrichment of heritability at variants overlapping histone marks (e.g. H3K4me1, H3K4me3) that are specific to cell lines of interest (e.g. lung-related cell lines for a GWAS of a respiratory disease)

9- Single-variant and genetic risk-score PheWAS (phenome-wide association study):

• GeneAtlas [21] or the UK Biobank Engine [22] for single-variant PheWAS

• PRS Atlas [23] – for polygenic risk score PheWAS (see My Research page to download my Journal club slides on the PRS Atlas)

• Other automated and reliable software include PHESANT

10- Druggability analysis:

Once a list of potentially causal genes is created, one can then query drug/target databases to see whether the respective genes’ products (i.e. protein) are already targeted by certain compounds – or even better, in clinical trials (see ‘Approved Drugs and Clinical Candidates’ section for each protein in ChEMBL – if there is one).

• DGIdb – publicly available at [24]

• ChEMBL – publicly available at [25]

11- Protein-protein interactions:

If several proteins within your gene list are predicted/known to interact, this will provide a separate line of evidence for those genes – that is if they’re implicated by different signals/SNPs.

• STRING [26] – a score of >0.9 implies a ‘high-quality’ prediction

12- Literature review:

• A thorough literature review of the identified genes is always a good way to start a story. Download RefSeq_all_gene_summaries for extracted gene function summaries from RefSeq [27]

13- GWAS catalog lookup:

Checking to see if your associated SNPs are also associated with other traits can be important for (i) shared biology and (ii) specificity – can be important for drug target discovery.

• PhenoScanner [28]

• GWAS catalog – publicly available at [29]

14- Mouse Knockout studies:

• International Mouse Phenotyping Consortium (IMPC) [30] – see (i) if the genes of interest have been knocked out and (ii) what phenotypes were observed in the knockout mice

15- Mendelian randomization analysis:

Although over-hyped in my opinion, when carried out correctly it becomes a very useful tool to assess the causal relationship between an exposure and outcome. You can use your associated SNPs as a proxy for your trait (e.g. LDL cholesterol associated SNPs) and then check to see if your trait causes a disease (e.g. obesity)

• MR-Base [31] – carry out Mendelian randomization studies using your trait of interest as exposure or outcome

Selected Publications:

The methods above were used in the papers below:

1- Shrine, Guyatt, and Erzurumluoglu et al, 2018. New genetic signals for lung function highlight pathways and pleiotropy, and chronic obstructive pulmonary disease associations across multiple ancestries. Nature Genetics [32]

2- Wain et al, 2017. Genome-wide association analyses for lung function and chronic obstructive pulmonary disease identify new loci and potential druggable targets. Nature Genetics [33]

3- Allen et al, 2017. Genetic variants associated with susceptibility to idiopathic pulmonary fibrosis in people of European ancestry: a genome-wide association study. The Lancet Respiratory Medicine [34] – I like Figure 3 in this paper where they align and plot both the Lung eQTL and IPF GWAS results to visualise whether the causal variant in the eQTL study and GWAS are likely to be the same. However, as mentioned above at point 3 (i.e. eQTL-GWAS signal colocalisation), I would suggest using Z-scores rather than P-values to observe the direction of effects

4- Erzurumluoglu, Liu, and Jackson et al, 2018. Meta-analysis of up to 622,409 individuals identifies 40 novel smoking behaviour associated genetic loci. Molecular Psychiatry [35]the Circos plot in this paper is brilliant! No competing interests declared 😉

Further reading

• Visscher et al, 2017. 10 Years of GWAS Discovery: Biology, Function, and Translation. AJHG

• Okada et al, 2014. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature – one of those inspirational papers; I really liked Figure 2 the first time I saw it

• Erzurumluoglu et al, 2015. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process. BioMed Research International – I recommend this paper for PhD students who are looking for a comprehensive review comparing the ways Mendelian diseases and complex diseases are analysed. It is a little out of date in terms of the software/databases (e.g. The gnomAD database is not in there) that are in the tables but the main messages hold

Download a PDF version of the blog post from here:


Social Media
There’s a little thread under the below tweet, where Dr. Eric Fauman (Pfizer) states “The gene pointed at by an eQTL is actually less likely to be the causal gene”.

Read Full Post »

BBC_news_sperm_count

BBC news article published on the 18th March 2018. According to the article, men with low sperm counts are at a higher risk of disease/health problems. However, this is unlikely to be a causal relationship and more likely to be a spurious correlation. May even turn out to be the other way round due to “reverse causality”, a bias we encounter a lot in epidemiological studies. The following sounds more plausible (to me at least!): “Men with disease/health problems are likely to have low sperm counts” (likely cause: men with health problems tended to smoke more in general and this caused low sperm counts in those individuals).

As an enthusiastic genetic epidemiologist (keyword here: epidemiologist), I try to keep in touch with the latest developments in medicine and epidemiology. However, it is impossible to read all articles that come out as there is a lot of epidemiology and/or medicine papers published daily (in fact, too much!). For this reason, instead of reading the original academic papers (excluding papers in my specific field), I try to skim read from reputable news outlets such as the BBC, The Guardian and Medscape (mostly via Twitter). However, health news even in these respectable media outlets are full of wrong and/or oversensationalised titles: they either oversensationalise what the scientist has said or take the word of the scientist they contact – who are not infallible and can sometimes believe in their own hypotheses too much.

It wouldn’t harm us too much if the message of an astrophysics related publication is misinterpreted but we couldn’t say the same with health related news. Many people take these news articles as gospel truth and make lifestyle changes accordingly. Probably the best example for this is the Andrew Wakefield scandal in 1998 – where he claimed that the MMR vaccine caused autism and gastro-intestinal disease but later investigations showed that he had undeclared conflicts of interest and had faked most of the results (click here for a detailed article in the scandal). Many “anti-vaccination” (aka anti-vax) groups used his paper to strengthen their arguments and – although now retracted – the paper’s influence can still be felt today as many people, including my friends, do not allow their children to be vaccinated as they falsely think they might succumb to diseases like autism because of it.

The first thing we’re taught in our epidemiology course is “correlation does not mean causation.” However, a great deal of epidemiology papers published today report correlations (aka associations) without bringing in other lines of evidence to provide evidence for a causal relationship. Some of the “interesting ones” amongst these findings are then picked up by the media and we see a great deal of news articles with titles such as “coffee causes cancer” or “chocolate eaters are more successful in life”. There have been instances when I read the opposite in the same paper a couple of months later (example: wine drinking is protective/harmful for pregnant women). The problem isn’t caused only due to a lack of scientific method training on the media side, but also due to health scientists who are eager to make a name for themselves in the lay media without making sure that they have done everything they could to ensure that the message they’re giving is correct (e.g. triangulating using different methods). As a scientist who analyses a lot of genetic and phenotypic data, it is relatively easier for me to observe that the size of the data that we’re analysing has grown massively in the last 5-10 years. However, in general, we scientists haven’t been able to receive the computational and statistical training required to handle these ‘big data’. Today’s datasets are so massive that if we take the approach of “let’s analyse everything we got!”, we will find a tonne of correlations in our data whether they make sense or not.

To provide a simple example for illustrative purposes: let’s say that amongst the data we have in our hands, we also have each person’s coffee consumption and lung cancer diagnosis data. If we were to do a simple linear regression analysis between the two, we’d most probably find a positive correlation (i.e. increased coffee consumption means increased risk of lung cancer). 10 more scientists will identify the same correlation if they also get their hands on the same dataset; 3 of them will believe that the correlation is worthy of publication and submit a manuscript to a scientific journal; and one (other two are rejected) will make it past the “peer review” stage of the journal – and this will probably be picked up by a newspaper. Result: “coffee drinking causes lung cancer!”

However, there’s no causal relationship between coffee consumption and lung cancer (not that I know of anyway :D). The reason we find a positive correlation is because there is a third (confounding) factor that is associated with both of them: smoking. Since coffee drinkers smoke more in general and smoking causes lung cancer, if we do not control for smoking in our statistical model, we will find a correlation between coffee drinking and lung cancer. Unfortunately, it is not very easy to eliminate such spurious correlations, therefore health scientists must make sure they use several different methods to support their claims – and not try to publish everything they find (see “publish or perish” for an unfortunate pressure to publish more in scientific circles).

cikolata_ve_nobel_odulu

A figure showing the incredible correlation between countries’ annual per capita chocolate consumption and the number of Nobel laureates per 10 million population. Should we then give out chocolate in schools to ensure that the UK wins more Nobel prizes? However, this is likely not a causal relationship as it makes more sense that there is a (confounding) factor that is related to both of them: (most likely) GDP per capita at purchasing power parity. To view even quirkier correlations, I’d recommend this website (by Tyler Vigen). Image source: http://www.nejm.org/doi/full/10.1056/NEJMon1211064.

As a general rule, I keep repeating to friends: the more ‘interesting’ a ‘discovery’ sounds, the more likely it is to be false.

Hard to explain why I think like this but I’ll try: for a result to sound ‘interesting’ to me, it should be an unexpected finding as a result of a radical idea. There are just so many brilliant scientists today that finding unexpected things is becoming less and less likely – as almost every conceivable idea arises and is being tested in several groups around the world, especially in well researched areas such as cancer research. For this reason, the idea of a ‘discovery’ has changed from the days of Newtons and Einsteins. Today, ‘big discoveries’ (e.g. Mendel’s pea experimets, Einstein’s general relativity, Newton’s law of motion) have given way to incremental discoveries, which can be as valuable. So with each (well-designed) study, we’re getting closer and closer to cures/therapies or to a full understanding of underlying biology of diseases. There are still big discoveries made (e.g. CRISPR-Cas9 gene editing technique), but if they weren’t discovered by that respective group, they probably would have been discovered within a short space of time by another group as the discoverers built their research on a lot of other previously published papers. Before, elite scientists such as Newton and Einstein were generations ahead of their time and did most things on their own, but today, even the top scientists are probably not too ahead of a good postdoc as most science literature is out there for all to read in a timely manner (and more democratic compared to the not-so-distant past) and is advancing so fast that everyone is left behind – and we’re all dependent on each other to make discoveries. The days of lone wolves is virtually over as they will get left behind those who work in groups.

To conclude, without carefully reading the scientific paper that the newspaper article is referring to – hopefully they’ve included a link/citation at the bottom of the page! – or seeking what an impartial epidemiologist is saying about it, it’d be wise to take any health-related finding we read in newspapers with a pinch of salt as there are many things that can go wrong when looking for causal relationships – even scientists struggle to make the distinction between correlations and causal relationships.

power_posing

Amy Cuddy’s very famous ‘Power posing’ talk, which was the most watched video on the TED website for some time. In short, she states that if you give powerful/dominant looking poses, this will induce hormonal changes which will make you confident and relieve stress. However, subsequent studies showed that her ‘finding’ could not be replicated and she that did not analyse her data in the manner expected of a scientist. If a respectable scientist had found such a result, they would have tried to replicate their results; at least would have followed it up with studies which bring other lines of concrete evidence. What does she do? Write a book about it by bringing in anecdotal evidence at best and give a TED talk as if it’s all proven – as becoming famous (by any means necessary) is the ultimate aim for many people; and many academics are no different. Details can be found here. TED talk URL: https://www.ted.com/talks/amy_cuddy_your_body_language_shapes_who_you_are

PS: For readers interested in reading a bit more, I’d like to add a few more sentences. We should apply the below four criteria – as much as we can – to any health news that we read:

(i) Is it evidence based? (e.g. supported by a clinical trial, different experiments) – homeopathy is a bad example in this regard as they’re not supported by clinical trials, hence the name “alternative medicine” (not saying they’re all ineffective and further research is always required but most are very likely to be);

(ii) Does it make sense epidemiologically? (e.g. the example mentioned above i.e. the correlation observed between coffee consumption and lung cancer due to smoking);

(iii) Does it make sense biologically? (e.g. if gene “X” causes eye cancer but the gene is only expressed in the pancreatic cells, then we’ve most probably found the wrong gene)

(iv) Does it make sense statistically? (e.g. was the correct data quality control protocol and statistical method used? See figure below for a data quality problem and how it can cause a spurious correlation in a simple linear regression analysis)

graph-3

Wrong use of a statistical (linear regression) model. If we were to ignore the outlier data point at the top right of the plot, it becomes easy to see that there is no correlation between the two variables on the X and Y axes. However, since this outlier data point has been left in and a linear regression model has been used, the model identifies a positive correlation between the two variables – we would not have seen that this was a spurious correlation had we not visualised the data.

PPS: I’d recommend reading “Bad Science” by Ben Goldacre and/or “How to Read a Paper – The basics of evidence based medicine” by Trisha Greenhalgh – or if you’d like to read a much better article on this subject with a bit more technical jargon, have a look this highly influential paper by Prof. John Ioannidis: Why Most Published Research Findings Are False.

References:

Wakefield et al, 1998. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet. URL: http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2897%2911096-0/abstract

Editorial, 2011. Wakefield’s article linking MMR vaccine and autism was fraudulent. BMJ. URL: http://www.bmj.com/content/342/bmj.c7452

Read Full Post »

bileve_qt_paper_3_lung_function_traits_concentric_circos

Breathtaking genes: A ‘Circos’ plot depicting how chronic obstructive pulmonary disease (COPD) has become a global concern – the 3rd biggest killer, defined by poor lung function. Our work shows that many parts of our DNA play a role in our lung health. Peaks in red are newly discovered regions, and the blue ones were previously identified by other groups. Millions of genetic variants from tens of thousands of individuals were analysed in this study. The identified genes will help us understand why some of us have better lung function, and lead to the identification of drug targets of potential relevance to COPD.

A press release was issued by the University of Leicester Press Office on 6 February 2017 about a study that I was also heavily involved in (please click on links below for details):

Breakthrough advance offers the potential to defuse a ‘ticking timebomb’ for serious lung disease, including for over 1 billion smokers worldwide (source: World lung health study allows scientists to predict your chance of developing deadly disease — University of Leicester)”

COPD_smoking_nat_genet_lung_function_gwas_wain

The study has received a lot of attention from the media, with articles appearing in large media outlets such as BBC News, The Independent and MSN News. If you’re interested in the details, please read the paper published in Nature Genetics.

If interested in reading about the area of Genetic Epidemiology itself, please have a look at my (previously published) blog post about the matter: Searching for “Breath taking” genes. Literally!

Details on Circos plot* (above): FEV1: Forced expiratory lung volume in 1 second; FVC: Forced lung volume capacity; FEV1/FVC: the ratio of the two measurements. Labels in the outer circle show the name of the nearest gene to the newly identified (red) variants. X-axis: Genomic position of variant in genome (chromosome number in the outer circle), Y-axis: Statistical significance of variant in this study (higher the peak the greater the significance).

*The figure is a more artistic version of Figure 1 (Manhattan plot) in the above mentioned academic paper. It did not make it into the final manuscript published in Nature Genetics (6th Feb 2017) as it was found to be “confusing” by one of the reviewers – and the editor agreed. 😦 However, the plot was shortlisted (title: Breathtaking genes) and displayed in the Images of Research exhibition (9th Feb 2017) organised by the University of Leicester. 😉

 

My role in the Wain et al paper mentioned above: I led the ‘functional follow-up’ of the identified associated variants (e.g. mining eQTL datasets, biological pathway analyses, identify druggable genes, pleiotropy, protein-protein interactions) and appropriately visualise the GWAS results (various Manhattan and Circos plots). I was part of the core bioinformatics team of three in Leicester – alongside Dr. Nick Shrine and Dr. Maria Soler-Artigas.

 

References:

Wain LV et al., Published online 6th Feb 2017. Genome-wide association analyses for lung function and chronic obstructive pulmonary disease identify new loci and potential druggable targets. Nature Genetics. URL: https://www.nature.com/articles/ng.3787

Read Full Post »

smoking-infographic_cancer_research_uk

We now know that, through studies carried out by many natural scientists over decades, smoking is a (considerable) risk factor for many cancers and respiratory diseases; but the public ignore these findings and keep smoking, which is where social scientists can help facilitate in getting the message across. Just one example of where the social sciences can have a massive (positive) impact on society. Image taken from stopcancer.support

Scientists focus relentlessly on the future. Once a fact is firmly established, the circuitous path that led to its discovery is seen as a distraction.” – Eric Lander in the Cell journal (Jan 2016)

 

As scientists in the ‘natural’ sciences (e.g. genetics, physics, chemistry, geology), we have to make observations in the real world and think of hypotheses and models to make sense of it all. To test our hypotheses, we then have to collect (sufficient amounts of) data and see if the data collected fit the results that our proposed model predicted. Our hypotheses could be described as our ‘prejudice’ towards the data. However, we then have to try and counteract (and hopefully eliminate) our biases towards the data by performing well-designed experiments. If the results backup our predictions, we of course become (very!) happy and try to (replicate and then) publish our results. Even then (i.e. after a paper has been submitted to a journal), there is a lot left to do as the publication process is a long-winded one with many rounds of ‘peer-reviewing’ (an important quality control mechanism), where we have to reply fully to all the questions, suggestions and concerns the reviewers throw at us about the importance of the results, reliability of the data, the methods used, and the language of the manuscript submitted (e.g. are the results presented in an easy-to-understand way, are we over-sensationalising the results?). If all goes well, the published results from the analyses can help us (as the research community) understand the mechanisms behind the phenomenon analysed (e.g. biological pathways relating to disease, underlying mechanism of a new technology) and provide a solid foundation for other scientists to take the work forward.

If the results are not what we expected, a true scientist also feels fortunate and becomes more driven as a new challenge has now been set, igniting the curious side of the scientist; and strives to understand if anything may have gone wrong with the analysis or that whether the hypothesis was wrong. A (natural) scientist who is conscious and aware of the evolution and history of science knows that many discoveries have been made through ‘happy accidents’ (e.g. penicillin, x-ray scan, microwave oven, post-it notes) since it is in the nature of science to be serendipitous; and that a wrong hypothesis and/or an unexpected result can also lead to a breakthrough. Hopefully without losing any of our excitement, we go back to square one and start off with a brand new hypothesis (NB: the research paradigm in some fields is also changing, with ‘hypothesis-free’ approaches already been, and are being developed). This process (i.e. from generating the hypothesis to data collection to analysis to publication of results) usually takes years, even with some of the brightest people collaborating and working full-time on a research question.

 

The first time you do something, it’s science. The second time, it’s engineering. A third time, it’s just being a technician. I’m a scientist. Once I do something, I do something else.” – Cliff Stoll in his TED talk (Feb 2006)

 

Natural scientists take great pride in exploring nature (living and non-living) and the laws that govern it in a creative, objective and transparent way. One of the most important characteristics of publications in the natural sciences is repeatability of the methods and replication of the results. I do not want to paint a picture where everything is perfect with regards to the literature in the natural sciences, as there has always been, and will be, problems in the way some research questions have been tackled (e.g. due to poor use of statistical methods, over-sensationalisation of results in lay media, fraud, selective reporting, sad truth of ‘publish or perish’, unnecessary number of co-authors on papers). However science evolves through mistakes, being open-minded about accepting new ideas and being transparent about the methods used. Natural scientists are especially blessed with regards to there being many respectable journals (with relatively high impact factors, 2 or more reviewers involved in the peer-reviewing process) in virtually all fields within the natural sciences, where a large number of great scientific papers are published; and these have clearly (positively) affected the quality of life of our species (e.g. increasing crop yield, facilitating understanding of diseases and preventive measures, curative drugs/therapies, underlying principles of modern technology).

I wrote all the above to come to the main point of this post: I believe the abovementioned ‘experiment-centric’ (well-designed, statistically well-powered), efficient (has real implications) and reliable (replicable and repeatable) characteristics of the studies carried out within the natural sciences should be made more use of in (and probably become a benchmark for) the social sciences. There should be a more stringent process before a paper/book is published similar to the natural sciences, and a social scientist must work harder (than they are doing at current) to alleviate their own prejudices before starting to write-up for publication (and not get away with papers which are full of speculation and sentences containing “may be due/related to”). I am not even going to delve into the technicalities of some of the horrendously implemented statistical methods and the bold inferences/claims made as a result of them (e.g. correlations/associations still being reported as ‘causation’, P-values of <0.05 used as 'proof').

Of course there are great social scientists out there who publish some policy-changing work and try to be as objective as a human being can possibly be, however I have to say that (from my experience at least!) they seem to be a great minority in an ocean of bad sociologists. Social sciences seem (to me!) to be characterised by subjective, incoherent and inconsistent findings (e.g. due to diverse ideologies, region-specific effects, lack of collaboration, lack of replication); and a comprehensive quality control mechanism does not seem to be in place to prevent bad literature from being published. A sociologist friend had once told me “you can find a reference for any idea in the social sciences”, which I think sums up the field's current state for me in one sentence.

 

The scientist is not a person who gives the right answers, he’s one who asks the right questions.” – Claude Lévi-Strauss, an anthropologist (I would humbly update it as “The scientist is not necessarily a person who gives the right answers, but one who asks the right questions”)

 

Social sciences should not be the place where ones who could not (get the grades and/or) be successful in the natural sciences go to and get a (relatively) easier ride; and publish tens of papers/books which go insufficiently peer-reviewed, unread and uncited for life; but get a lecturer post at a university much quicker in relation to a natural scientist. Social scientists should not be any different from natural scientists with regards to the general aspects of research, so they should also spend years (just like most natural scientists) trying to develop their hypotheses and debunk their own prejudices; work in collaboration with other talented social scientists who will guide them in the right way; and be held accountable to a stringent peer-reviewing process before they can claim to have made a contribution (via books/papers) to their respective fields. Instead of publishing loads of bad papers, they should be encouraged to and concentrate on publishing fewer but much better papers/books.

Social sciences have a lot to offer to society (see the above figure about smoking for an example), but unfortunately (in my opinion) the representatives have let the field down. I believe universities and maybe even the governments all around the world should make it their objective to develop great sociologists by not only engaging them with the techniques used in the social sciences (and its accompanying literature), but also by funding them to travel to other laboratories/research institutions and get a flavour of the way natural scientists work.

 

Addition to post: For an academically better (and much harsher!) criticism of the social sciences than mines, see Roberto Unger’s interview at the Social Science Bites website (click on link).

moon-suit

Moon landing – a momentous achievement of mankind, and the natural sciences (and engineering)

PS: I must state here that I have vastly generalised about the social sciences; and mostly cherry picked and pointed out the negative sides. However every sociologist knows within them whether they really are motivated to find out the truth about sociological phenomena; and are not just in it for the respect that being an academic brings, or for the titles (e.g. Dr., Prof.). I personally have many respectable sociologist friends/colleagues myself (including my father) who are driven to understand and dissect sociological problems/issues and look for ways to solve real-life problems. They give me hope in that sense…

PPS: I am not an expert in the natural sciences nor in the social sciences. Just sharing my (maybe not so!) humble opinions on the subject matter as I get increasingly frustrated with the lack of quality I observe throughout the social sciences. Many of my friends/colleagues in the social sciences would attest to some or all of the things I stated above (gathering from my personal communications). I value the social sciences a lot and want it to live up to its potential in making our communities better…

Read Full Post »

Difference between the lung of a COPD patient and an unaffected one. Image taken from NHLBI website (click on image to access the source)

Difference between the lung of a COPD patient and an unaffected one. Image taken from the NHLBI website (one of the leading institutes in providing information on various diseases; click on image to access the source)

Many of us will either suffer or have a relative/friend who suffers from a disease called Chronic Obstructive Pulmonary Disease (COPD, click on link for details) which is a progressive respiratory disease characterised by decreasing lung function (struggling to inhale/exhale air, irreversible airflow obstruction), very likely accompanied by chronic infections. COPD has a prevalence of over 2% in the UK population (corresponding to approx. 1 million in the UK, probably a lower bound estimate due to many undiagnosed cases; this figure is approx. 16 million in the USA) and is currently the third biggest killer in the world (only behind cancers and heart-related diseases) – costing the lives of millions (in the USA alone, number of deaths attributed to COPD is over 100 thousand); and the health services, billions of pounds.

Contrary to the well-known genetic disorders such as Cystic Fibrosis and Huntington’s disease, which are diseases caused entirely by a person’s genetic makeup and caused by mutations in a single gene, COPD is a (very!) complex disease with many genes and environmental factors (e.g. smoking, pollutants) contributing to the development/progression of the disease. This complexity makes it much harder to dissect the causes and find potential (genetic) targets for cures or therapies. However, we do know that smoking is by far the biggest risk factor with up to 90% of those who go on to develop clinically significant COPD being smokers. But only a minority (<25%) of all smokers develop COPD, indicating the strong role genetics can play in the progression of this disorder. Also not all COPD patients are smokers (up to 25% in some populations), indicating that – at least in some patients – genetics can play a rather determining role. I must stress that all the statistics I provide here can vary considerably from population to population due to different lifestyles and genetic backgrounds.

Genetic_epidemiology_genetics_mesut_erzurumluoglu

I – together with a large group of collaborators – search for genetic predictors of lung function, which helps us to identify which individuals are more likely to develop the disease and potentially understand the underlying biology/pathology of respiratory diseases such as COPD and asthma, and related traits such as smoking behaviour. To do this, we carry out what is called a genome-wide association study (GWAS, click on link for details), where we obtain the genetic data (millions of data points) from tens of thousands of COPD (or asthma) patients and ‘controls’ (people with normal lung function). To ensure that our results are not biased by different ethnicities, life styles and related individuals, we collect all the relevant information about the participants and make sure that we control for them in the statistical models that we use. GWASs have been extremely successful in the identification of successful targets for other diseases and have led to the field of Genetic Epidemiology (GE, click on link for details) to come to the fore of population-based medicine. GE requires extensive understanding of Statistics (needed to make sense of the very large datasets), Bioinformatics (application of computer software to the management of large biological data), Programming (needed to change data formats, manage very large data), Genetics (needed for interpretation of results) and Epidemiology (branch of medicine which deals with how often diseases occur in different groups of people, and why); thus requires inter-disciplinary collaborations.

GWAS results are traditionally presented with a Manhattan plot (due to its resemblance of the city's skyline) where the genetic variants corresponding to the dots above the top grey line (representing P values less than 5e-7 i.e. 0.0000005) are usually followed up with additional studies to validate their plausibility. Image taken from Wikipedia (click on image to access source)

GWAS results are traditionally presented with a Manhattan plot (due to its resemblance of the city’s skyline) where the genetic variants corresponding to the dots above the top grey line (representing P-values less than 5e-8 i.e. 0.00000005) are usually followed up with additional studies to validate their plausibility. Image taken from Wikipedia (click on image to access source)

The inferences we make from these studies can shed light in to which genes and biological pathways play key roles in causing COPD. We then follow up these newly identified genes and pathways to analyse whether there are molecules which could be used to target these and be potential drugs for treating COPD patients. Our results can be of immense help to Pharmaceutical companies (and ultimately to patients), as many clinical trials initiated without genetic line of evidence have failed, costing the public and these companies billions of pounds.

As smoking is the biggest risk factor for respiratory diseases like COPD, I am – also with the contribution of many collaborators – in the process of analysing whether some people are more likely to start smoking, stop after starting, and smoke more than usual when they start smoking. The results can have huge implications as many people struggle to stop smoking, and when they do, research suggests that up to 90% (figure differs between populations) of them start to smoke again within the first year after quitting. Smoking is not only a huge contributor to the risk of developing COPD, but also to lung (biggest killer amongst all cancers), mouth, throat, kidney, liver, pancreas, stomach and colon cancer (not an exhaustive list). In the UK alone, these cancers cause the slow and painful death of tens of thousands, alongside a huge psychological and financial burden on the families and public resources.

The “lung” and the short of it (stealing a phrase thought up by my colleagues at the University of Leicester, click on link to see who they are) is that COPD is a disease that is going to affect many of us, and any useful finding which leads to cures and/or therapies could increase the life years of COPD patients and affect the lives of thousands of people directly, and millions indirectly (e.g. families of COPD sufferers, cost to the NHS). Finding targets to help people stop smoking can potentially have even bigger implications as many continue to smoke, despite huge efforts and funding allocated to smoking prevention and cessation.

A nice TED talk about the world of Data science and Genetic Epidemiology

Addition to post (09/02/17): A Circos plot presenting results from our latest lung function GWAS (Wain et al, 2017; Nature Genetics) was shortlisted (title: Breathtaking genes) and displayed in the Images of Research exhibition (9th Feb 2017) organised by the University of Leicester

Read Full Post »