ever, once converted to percentages it is found that only 40%
of the occurrences reported fever for the 65+ age range while
42% reported fever for the 0-17 age range. Consequently, al-
though the numbers are lower, the results show that both age
ranges have a similar likelihood of reporting fever as a symp-
tom. This principle applies to all of the age ranges and all of
the symptoms as they are compared against each other.
Percent Infected by Age Range
Using polynomial regression, we were able to draw nu-
merous conclusions about the percentage of infected people
throughout different age groups. Since April 2nd, we can see
that most of the cases were in the age range of 18-49. While
at first, this may seem like an extensive range, our machine
learning model for how the percent of infected by age range
shows that this number will decrease from 65% to only 51%
by September 24th (175 days after April 2nd). Furthermore,
the infected rates among people in the age range from 0-17
will shoot up from less than 1% to 4% of total cases. The age
range of 65+ almost doubles from around 12% to 23%. Any
other age ranges did not experience any significant changes
in their infected percentages.
We can see that closer to the start of the outbreak, it
was generally older people getting affected by the virus, but
as time progresses, we can see that infected rates for younger
age groups are slowly increasing and accounting for a more
substantial majority of total cases. While the age group of 18-
49 is an extensive range as opposed to other ranges such as
50-64, spanning for only 14 years, it has overall decreased
from the start of the pandemic, whereas the youngest age
range has been increasing exponentially. This rate of increase
in the percentage of infected is more evidence that young
adults are getting infected more often and have a higher plau-
sibility of actually getting infected. These changes in per-
centages between different age groups are notable because it
allows for new possibilities and potential findings in subse-
quent research, which will overall help when developing a
cure for the disease.
Conclusion
The aim of this study was to determine age’s overall relation-
ship with COVID-19 symptoms and infection rate. After con-
ducting the research, it is determined that a causal relation-
ship could be found between age and COVID-19 symptoms.
Three major findings were extracted from the data: older pa-
tients are likely to experience slower symptom progression
than younger patients, the young adult age range has the high-
est likelihood of getting infected with COVID-19, and symp-
toms generally remain constant across all age groups. Us-
ing these results, it is now possible to determine when life-
threatening symptoms will occur for certain age groups, so
that hospitals can determine the most efficient way to treat the
patients. Additionally, knowing what age ranges are affected
the most will provide future researchers more data to work
with when trying to develop a cure for the disease. More
research must be done to further prove these causal relation-
ships, but from preliminary analysis, a relationship can be
determined.
AUTHOR INFORMATION
Aditya Mittal contributed towards the use of k-means clustering in the symp-
tom progression period section and the use of data science libraries in the most
recurring symptoms section. Henry Zhao contributed towards the use of polyno-
mial regression in the percent infected by age range section. Aadhi Kumaraswamy
contributed data mining, extraction, and manipulation of the California and Kaggle
datasets. Megan Jacob contributed towards the finding of datasets and the usage
of datasets in machine learning algorithms. Rohan Ayyagari contributed towards
the finding of datasets and the creation of the structure of the research paper.
ACKNOWLEDGEMENTS
Thanks to Keshav Rao for guiding the team through the research paper
and teaching the various components of how to write it. Thanks Aspiring Scholars
Directed Research Program for providing the resources for the team to complete
the study.
Bibliography
1. T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An
efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 24(7):881–892, 2002.
2. Purnima Bholowalia and Arvind Kumar. Ebk-means: A clustering technique based on elbow
method and k-means in wsn. International Journal of Computer Applications, 105(9), 2014.
3. Marina Santini. Advantages & disadvantages of k-means and hierarchical clustering (un-
supervised learning). URL: http://santini. se/teaching/ml/2016/Lect_10/10c_Unsupervise
dMethods. pdf (Accesed 17.04. 2019), 2016.
4. David J Hand and Niall M Adams. Data mining. Wiley StatsRef: Statistics Reference Online,
pages 1–7, 2014.
5. Wes McKinney et al. pandas: a foundational python library for data analysis and statistics.
Python for High Performance and Scientific Computing, 14(9), 2011.
6. Min Cao, Dandan Zhang, Youhua Wang, Yunfei Lu, Xiangdong Zhu, Ying Li, Honghao Xue,
Yunxiao Lin, Min Zhang, Yiguo Sun, Zongguo Yang, Jia Shi, Yi Wang, Chang Zhou, Yidan
Dong, Ping Liu, Steven M Dudek, Zhen Xiao, Hongzhou Lu, and Longping Peng. Clinical
features of patients infected with the 2019 novel coronavirus (COVID-19) in shanghai, china.
March 2020. doi: 10.1101/2020.03.04.20030395.
7. Suxin Wan, Yi Xiang, Wei Fang, Yu Zheng, Boqun Li, Yanjun Hu, Chunhui Lang, Daoqiu
Huang, Qiuyan Sun, Yan Xiong, Xia Huang, Jinglong Lv, Yaling Luo, Li Shen, Haoran Yang,
Gu Huang, and Ruishan Yang. Clinical features and treatment of COVID-19 patients in
northeast chongqing. Journal of Medical Virology, 92(7):797–806, April 2020. doi: 10.1002/
jmv.25783.
8. I.P. Donald. A longitudinal study of joint pain in older people. Rheumatology, 43(10):1256–
1260, July 2004. doi: 10.1093/rheumatology/keh298.
9. Yuan Tian, Long Rong, Weidong Nian, and Yan He. Review article: gastrointestinal fea-
tures in COVID-19 and the possibility of faecal transmission. Alimentary Pharmacology &
Therapeutics, 51(9):843–851, March 2020. doi: 10.1111/apt.15731.
10. Federico Licastro, Giuseppina Candore, Domenico Lio, Elisa Porcellini, Giuseppina
Colonna-Romano, Claudio Franceschi, and Calogero Caruso. Immunity & Ageing, 2(1):
8, 2005. doi: 10.1186/1742-4933-2-8.
11. Petter Brodin and Mark M. Davis. Human immune system variation. Nature Reviews Im-
munology, 17(1):21–29, December 2016. doi: 10.1038/nri.2016.125.
342 | ASDRP Summer 2020 Effect of Age on COVID-19 Symptoms