Google BigQuery for Education

© 2017 ACM. The size and complexity of MOOC data present overwhelming challenges to many institutions. This paper details the functionality of edx2bigquery -An open source Python package developed by Harvard and MIT to ingest and report on hundreds of MITx and HarvardX course datasets from edX, maki...

Full description

Bibliographic Details
Main Authors: Lopez, Glenn (Author), Seaton, Daniel T. (Author), Ang, Andrew (Author), Tingley, Dustin (Author), Chuang, Isaac (Author)
Format: Article
Language:English
Published: ACM, 2021-11-09T12:37:43Z.
Subjects:
Online Access:Get fulltext
Description
Summary:© 2017 ACM. The size and complexity of MOOC data present overwhelming challenges to many institutions. This paper details the functionality of edx2bigquery -An open source Python package developed by Harvard and MIT to ingest and report on hundreds of MITx and HarvardX course datasets from edX, making use of Google BigQuery to handle multiple terabytes of learner data. For this application, we find that Google BigQuery provides ease of use in loading the multi-faceted MOOC datasets and near real-Time interactive querying of data, including large clickstream datasets; moreover, we are able to provide flexible research and reporting dashboards, visualizing and aggregating data, by interfacing services associated with BigQuery. This framework makes it feasible for edx2bigquery to be open source, following standards which emphasize the importance of data products that transcend a particular data science platform and allow teams with diverse backgrounds to interact with data. edx2bigquery is being adopted by other institutions with an aim toward future collaboration.