Nathan Sima ’23


Operations Research and Financial Engineering

Project Title

Data Mining Methods and Research on Environmental Literature

Certificate(s): Applications of Computing, Finance, Statistics and Machine Learning

I studied text mining to create an analysis of wastewater textual data and the use of soft sensors to predict monthly average river flow. My role in the first project consisted of preprocessing data and primary data analysis. I implemented a rigorous six-step process of keyword preprocessing to address various challenges in deep text processing, such as stemming, acronyms and chemical expressions. While researching each keyword, I learned about many environmental engineering and wastewater research topics. After running our preprocessing code, I derived preliminary information from the results using data visualization and graphs developed in the programming language Python. This provided an eye-opening exploration into intercategory relationships and trend identification. For the second project, I conducted extensive data development and ran the enhanced, iterated stepwise multiple linear regression package on them. I learned about numerous statistical methods from this and their respective benefits and drawbacks. Overall, I gained a broad view of the processes involved in innovative scientific research. I hope to incorporate similar techniques into my work at Princeton and beyond.

Internship Year


Project Category

Climate and Environmental Science


Princeton WET (Water and Energy Technologies) Lab, Department of Civil and Environmental Engineering and the Andlinger Center for Energy and the Environment, Princeton University


Z. Jason Ren, Professor of Civil and Environmental Engineering and the Andlinger Center for Energy and the Environment; Junjie Zhu, Associate Research Scholar, Civil and Environmental Engineering