Determining the familial risk distribution of colorectal cancer: a data mining approach

Rowena Chau, Mark A. Jenkins, Daniel D. Buchanan, Driss Ait Ouakrim, Graham G. Giles, Graham Casey, Steven Gallinger, Robert W. Haile, Loic Le Marchand, Polly A. Newcomb, Noralane Morey Lindor, John L. Hopper, Aung Ko Win

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 % confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 % of families (SIR = 7.11; 95 % CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 % of families (SIR = 2.94; 95 % CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 % of families (SIR = 1.23; 95 % CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 % of families (SIR = 0.61; 95 % CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 % of the population) was 12-times that for people in the lowest risk category (60 %) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

Original languageEnglish (US)
Pages (from-to)1-11
Number of pages11
JournalFamilial Cancer
DOIs
StateAccepted/In press - Dec 17 2015

Fingerprint

Data Mining
Colorectal Neoplasms
Confidence Intervals
Incidence
Neoplasms
Colonic Neoplasms
Registries
Population
Prostatic Neoplasms

Keywords

  • Colorectal cancer
  • Data mining
  • Familial aggregation
  • Familial risk

ASJC Scopus subject areas

  • Cancer Research
  • Genetics
  • Oncology
  • Genetics(clinical)

Cite this

Chau, R., Jenkins, M. A., Buchanan, D. D., Ait Ouakrim, D., Giles, G. G., Casey, G., ... Win, A. K. (Accepted/In press). Determining the familial risk distribution of colorectal cancer: a data mining approach. Familial Cancer, 1-11. https://doi.org/10.1007/s10689-015-9860-6

Determining the familial risk distribution of colorectal cancer : a data mining approach. / Chau, Rowena; Jenkins, Mark A.; Buchanan, Daniel D.; Ait Ouakrim, Driss; Giles, Graham G.; Casey, Graham; Gallinger, Steven; Haile, Robert W.; Le Marchand, Loic; Newcomb, Polly A.; Lindor, Noralane Morey; Hopper, John L.; Win, Aung Ko.

In: Familial Cancer, 17.12.2015, p. 1-11.

Research output: Contribution to journalArticle

Chau, R, Jenkins, MA, Buchanan, DD, Ait Ouakrim, D, Giles, GG, Casey, G, Gallinger, S, Haile, RW, Le Marchand, L, Newcomb, PA, Lindor, NM, Hopper, JL & Win, AK 2015, 'Determining the familial risk distribution of colorectal cancer: a data mining approach', Familial Cancer, pp. 1-11. https://doi.org/10.1007/s10689-015-9860-6
Chau, Rowena ; Jenkins, Mark A. ; Buchanan, Daniel D. ; Ait Ouakrim, Driss ; Giles, Graham G. ; Casey, Graham ; Gallinger, Steven ; Haile, Robert W. ; Le Marchand, Loic ; Newcomb, Polly A. ; Lindor, Noralane Morey ; Hopper, John L. ; Win, Aung Ko. / Determining the familial risk distribution of colorectal cancer : a data mining approach. In: Familial Cancer. 2015 ; pp. 1-11.
@article{ac559d569349493cb81787b8702056d2,
title = "Determining the familial risk distribution of colorectal cancer: a data mining approach",
abstract = "This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 {\%} confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 {\%} of families (SIR = 7.11; 95 {\%} CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 {\%} of families (SIR = 2.94; 95 {\%} CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 {\%} of families (SIR = 1.23; 95 {\%} CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 {\%} of families (SIR = 1.06; 95 {\%} CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 {\%} of families (SIR = 0.61; 95 {\%} CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 {\%} of the population) was 12-times that for people in the lowest risk category (60 {\%}) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.",
keywords = "Colorectal cancer, Data mining, Familial aggregation, Familial risk",
author = "Rowena Chau and Jenkins, {Mark A.} and Buchanan, {Daniel D.} and {Ait Ouakrim}, Driss and Giles, {Graham G.} and Graham Casey and Steven Gallinger and Haile, {Robert W.} and {Le Marchand}, Loic and Newcomb, {Polly A.} and Lindor, {Noralane Morey} and Hopper, {John L.} and Win, {Aung Ko}",
year = "2015",
month = "12",
day = "17",
doi = "10.1007/s10689-015-9860-6",
language = "English (US)",
pages = "1--11",
journal = "Familial Cancer",
issn = "1389-9600",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Determining the familial risk distribution of colorectal cancer

T2 - a data mining approach

AU - Chau, Rowena

AU - Jenkins, Mark A.

AU - Buchanan, Daniel D.

AU - Ait Ouakrim, Driss

AU - Giles, Graham G.

AU - Casey, Graham

AU - Gallinger, Steven

AU - Haile, Robert W.

AU - Le Marchand, Loic

AU - Newcomb, Polly A.

AU - Lindor, Noralane Morey

AU - Hopper, John L.

AU - Win, Aung Ko

PY - 2015/12/17

Y1 - 2015/12/17

N2 - This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 % confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 % of families (SIR = 7.11; 95 % CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 % of families (SIR = 2.94; 95 % CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 % of families (SIR = 1.23; 95 % CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 % of families (SIR = 0.61; 95 % CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 % of the population) was 12-times that for people in the lowest risk category (60 %) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

AB - This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 % confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 % of families (SIR = 7.11; 95 % CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 % of families (SIR = 2.94; 95 % CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 % of families (SIR = 1.23; 95 % CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 % of families (SIR = 0.61; 95 % CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 % of the population) was 12-times that for people in the lowest risk category (60 %) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

KW - Colorectal cancer

KW - Data mining

KW - Familial aggregation

KW - Familial risk

UR - http://www.scopus.com/inward/record.url?scp=84949939022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949939022&partnerID=8YFLogxK

U2 - 10.1007/s10689-015-9860-6

DO - 10.1007/s10689-015-9860-6

M3 - Article

C2 - 26681340

AN - SCOPUS:84949939022

SP - 1

EP - 11

JO - Familial Cancer

JF - Familial Cancer

SN - 1389-9600

ER -