Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community
Code reuse has the benefits of saving time and resources but poses a risk when attempting to tailor copied code for a new purpose or in cases when such copies are buggy or otherwise faulty. In the field of data science, the web application JupyterNotebook is a popular tool for creating computational...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Uppsala universitet, Institutionen för informationsteknologi
2019
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-415391 |
id |
ndltd-UPSALLA1-oai-DiVA.org-uu-415391 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-uu-4153912020-07-03T03:27:46ZCode Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook CommunityengSigvardsson, UlfUppsala universitet, Institutionen för informationsteknologi2019Engineering and TechnologyTeknik och teknologierCode reuse has the benefits of saving time and resources but poses a risk when attempting to tailor copied code for a new purpose or in cases when such copies are buggy or otherwise faulty. In the field of data science, the web application JupyterNotebook is a popular tool for creating computational notebooks, documents containing both plain text and code snippets, many of which are publicly available on code hosting sites such as GitHub. This thesis describes the acquisition of approximately 2.6 million computational notebooks and analysis of this data set.By hashing the contents of every code snippet, using the MD5 hashing algorithm,cloned snippets were found through snippets producing identical hashes. By subsequently mapping the snippets to their corresponding notebooks, the relative originality of a notebook could be determined. This analysis shows that nearly 95% of notebooks are written in some version of Python. Furthermore, nearly 54% of notebooks in the data set are comprised of code blocks also found in other notebooks and, on average, approximately 70% of the code in any given notebookis copied from elsewhere. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-415391IT ; 19032application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Engineering and Technology Teknik och teknologier |
spellingShingle |
Engineering and Technology Teknik och teknologier Sigvardsson, Ulf Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
description |
Code reuse has the benefits of saving time and resources but poses a risk when attempting to tailor copied code for a new purpose or in cases when such copies are buggy or otherwise faulty. In the field of data science, the web application JupyterNotebook is a popular tool for creating computational notebooks, documents containing both plain text and code snippets, many of which are publicly available on code hosting sites such as GitHub. This thesis describes the acquisition of approximately 2.6 million computational notebooks and analysis of this data set.By hashing the contents of every code snippet, using the MD5 hashing algorithm,cloned snippets were found through snippets producing identical hashes. By subsequently mapping the snippets to their corresponding notebooks, the relative originality of a notebook could be determined. This analysis shows that nearly 95% of notebooks are written in some version of Python. Furthermore, nearly 54% of notebooks in the data set are comprised of code blocks also found in other notebooks and, on average, approximately 70% of the code in any given notebookis copied from elsewhere. |
author |
Sigvardsson, Ulf |
author_facet |
Sigvardsson, Ulf |
author_sort |
Sigvardsson, Ulf |
title |
Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
title_short |
Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
title_full |
Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
title_fullStr |
Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
title_full_unstemmed |
Code Cloning Habits Of The Jupyter Notebook Community : Code Cloning Habits Of The Jupyter Notebook Community |
title_sort |
code cloning habits of the jupyter notebook community : code cloning habits of the jupyter notebook community |
publisher |
Uppsala universitet, Institutionen för informationsteknologi |
publishDate |
2019 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-415391 |
work_keys_str_mv |
AT sigvardssonulf codecloninghabitsofthejupyternotebookcommunitycodecloninghabitsofthejupyternotebookcommunity |
_version_ |
1719324761900711936 |