Analyzing factors influencing viewer count of TED Talks through Text Analytics

Yasha Pastaria and Miriam McGaugh
Oklahoma State University


The objective of this research paper was to explore the TED Talks data and generate some insights including understanding popularity trends of TED Talks over the years in terms of views, comments and ratings. In addition, this project explored possible drivers of the trend like occupation of the speaker, duration of the ted talk, number of speakers among other items. The analysis will be useful to the consumer in understanding where TED Talks are heading over the years. It will ultimately help them design the best TED Talks and avoid the mistakes of the worst ones. The data source for the analysis was a Kaggle dataset, TED Talk Data. The main dataset contained metadata about every TED Talk hosted on the website until September 21, 2017.There were 2,550 rows and 14 variables where each row contained data for a particular TED Talk. SAS Viya and SAS Enterprise Miner were used to conduct the data preparation and cleaning, text analytics and sentiment analysis, which was conducted to determine how viewers felt about the talk. A descriptive analysis examined the factors that affect viewer count and the trends that have been observed over the years in the TED Talks.