Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.

Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing...

Full description

Bibliographic Details
Main Author: Johansson, Elias
Format: Others
Language:English
Published: Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM) 2019
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88602
id ndltd-UPSALLA1-oai-DiVA.org-lnu-88602
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-lnu-886022019-09-10T04:28:56ZSeparation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.engJohansson, EliasLinnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)2019optical character recognitionautomatic text extractionpythongoogle cloud visionstring analysisreceiptComputer EngineeringDatorteknikAutomatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88602application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic optical character recognition
automatic text extraction
python
google cloud vision
string analysis
receipt
Computer Engineering
Datorteknik
spellingShingle optical character recognition
automatic text extraction
python
google cloud vision
string analysis
receipt
Computer Engineering
Datorteknik
Johansson, Elias
Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
description Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments.
author Johansson, Elias
author_facet Johansson, Elias
author_sort Johansson, Elias
title Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
title_short Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
title_full Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
title_fullStr Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
title_full_unstemmed Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR.
title_sort separation and extraction of valuable information from digital receipts using google cloud vision ocr.
publisher Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
publishDate 2019
url http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88602
work_keys_str_mv AT johanssonelias separationandextractionofvaluableinformationfromdigitalreceiptsusinggooglecloudvisionocr
_version_ 1719249784746803200