Skip to content
Back to Projects

ArXiV AI/ML Analysis

Data visualization and topic modeling analysis of 1.7M scholarly AI/ML articles from arXiv to identify research trends and industry patterns.

Research UC Berkeley

Skills

NLP Data Visualization Topic Modeling Unsupervised Learning

Tools

Python Spacy Scikit-learn Plotly Power BI

A data visualization initiative examining scholarly articles on arXiv to identify insights and industry patterns in artificial intelligence and machine learning research.

Overview

Analyzed 1.7 million scholarly articles sourced from arXiv’s open archive, covering publications from 1999-2019, to detect insights and industry trends in AI/ML research papers.

Techniques

Applied Natural Language Processing for text analysis, topic modeling for theme extraction, and unsupervised learning methods for pattern discovery. Created interactive visualizations using multiple tools including Bokeh, Plotly, Matplotlib, Seaborn, Altair, Power BI, and Tableau.

Technologies

Built with Python ecosystem including Pandas, NumPy, Spacy for NLP, and Scikit-learn for machine learning tasks.