Online Dimensionality Reduction

In this thesis, we investigate online dimensionality reduction methods, wherethe algorithms learn by sequentially acquiring data. We focus on two specificalgorithm design problems in (i) recommender systems and (ii) heterogeneousclustering from binary user feedback. (i) For recommender systems, we c...

Full description

Bibliographic Details
Main Author:	Ariu, Kaito
Format:	Others
Language:	English
Published:	KTH, Reglerteknik 2021
Subjects:	Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290791 http://nbn-resolving.de/urn:isbn:978-91-7873-780-2

id	ndltd-UPSALLA1-oai-DiVA.org-kth-290791
record_format	oai_dc
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer Sciences Datavetenskap (datalogi)
spellingShingle	Computer Sciences Datavetenskap (datalogi) Ariu, Kaito Online Dimensionality Reduction
description	In this thesis, we investigate online dimensionality reduction methods, wherethe algorithms learn by sequentially acquiring data. We focus on two specificalgorithm design problems in (i) recommender systems and (ii) heterogeneousclustering from binary user feedback. (i) For recommender systems, we consider a system consisting of m users and n items. In each round, a user,selected uniformly at random, arrives to the system and requests a recommendation. The algorithm observes the user id and recommends an itemfrom the item set. A notable restriction here is that the same item cannotbe recommended to the same user more than once, a constraint referred toas a no-repetition constraint. We study this problem as a variant of themulti-armed bandit problem and analyze regret with the various structurespertaining to items and users. We devise fundamental limits of regret andalgorithms that can achieve the limits order-wise. The analysis explicitlyhighlights the importance of each component of regret. For example, we candistinguish the regret due to the no-repetition constraint, that generated tolearn the statistics of user’s preference for an item, and that generated tolearn the low-dimensional space of the users and items were shown. (ii) Inthe clustering with binary feedback problem, the objective is to classify itemssolely based on limited user feedback. More precisely, users are just askedsimple questions with binary answers. A notable difficulty stems from theheterogeneity in the difficulty in classifying the various items (some itemsrequire more feedback to be classified than others). For this problem, wederive fundamental limits of the cluster recovery rates for both offline andonline algorithms. For the offline setting, we devise a simple algorithm thatachieves the limit order-wise. For the online setting, we propose an algorithm inspired by the lower bound. For both of the problems, we evaluatethe proposed algorithms by inspecting their theoretical guarantees and usingnumerical experiments performed on the synthetic and non-synthetic dataset. === Denna avhandling studerar algoritmer för datareduktion som lär sig från sekventiellt inhämtad data. Vi fokuserar speciellt på frågeställningar som uppkommer i utvecklingen av rekommendationssystem och i identifieringen av heterogena grupper av användare från data. För rekommendationssystem betraktar vi ett system med m användare och n objekt. I varje runda observerar algoritmen en slumpmässigt vald användare och rekommenderar ett objekt. En viktig begränsning i vår problemformuleringar att rekommendationer inte får upprepas: samma objekt inte kan rekommenderas till samma användartermer än en gång. Vi betraktar problemet som en variant av det flerarmadebanditproblemet och analyserar systemprestanda i termer av "ånger” under olika antaganden.Vi härleder fundamentala gränser för ånger och föreslår algoritmer som är (ordningsmässigt) optimala. En intressant komponent av vår analys är att vi lyckas att karaktärisera hur vart och ett av våra antaganden påverkar systemprestandan. T.ex. kan vi kvantifiera prestandaförlusten i ånger på grund av att rekommendationer inte får upprepas, på grund avatt vi måste lära oss statistiken för vilka objekt en användare är intresserade av, och för kostnaden för att lära sig den lågdimensionella rymden för användare och objekt. För problemet med hur man bäst identifierar grupper av användare härleder vi fundamentala gränser för hur snabbt det går att identifiera kluster. Vi gör detta för algoritmer som har samtidig tillgång till all data och för algoritmer som måste lära sig genom sekventiell inhämtning av data. Med tillgång till all data kan vår algoritm uppnå den optimala prestandan ordningsmässigt. När data måste inhämtas sekventiellt föreslår vi en algoritm som är inspirerad av den nedre gränsen på vad som kan uppnås. För båda problemen utvärderar vi de föreslagna algoritmerna numeriskt och jämför den praktiska prestandan med de teoretiska garantierna. === <p>QC 20210223</p><p></p>
author	Ariu, Kaito
author_facet	Ariu, Kaito
author_sort	Ariu, Kaito
title	Online Dimensionality Reduction
title_short	Online Dimensionality Reduction
title_full	Online Dimensionality Reduction
title_fullStr	Online Dimensionality Reduction
title_full_unstemmed	Online Dimensionality Reduction
title_sort	online dimensionality reduction
publisher	KTH, Reglerteknik
publishDate	2021
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290791 http://nbn-resolving.de/urn:isbn:978-91-7873-780-2
work_keys_str_mv	AT ariukaito onlinedimensionalityreduction
_version_	1719378235243888640
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-2907912021-02-24T05:43:54ZOnline Dimensionality ReductionengAriu, KaitoKTH, Reglerteknik2021Computer SciencesDatavetenskap (datalogi)In this thesis, we investigate online dimensionality reduction methods, wherethe algorithms learn by sequentially acquiring data. We focus on two specificalgorithm design problems in (i) recommender systems and (ii) heterogeneousclustering from binary user feedback. (i) For recommender systems, we consider a system consisting of m users and n items. In each round, a user,selected uniformly at random, arrives to the system and requests a recommendation. The algorithm observes the user id and recommends an itemfrom the item set. A notable restriction here is that the same item cannotbe recommended to the same user more than once, a constraint referred toas a no-repetition constraint. We study this problem as a variant of themulti-armed bandit problem and analyze regret with the various structurespertaining to items and users. We devise fundamental limits of regret andalgorithms that can achieve the limits order-wise. The analysis explicitlyhighlights the importance of each component of regret. For example, we candistinguish the regret due to the no-repetition constraint, that generated tolearn the statistics of user’s preference for an item, and that generated tolearn the low-dimensional space of the users and items were shown. (ii) Inthe clustering with binary feedback problem, the objective is to classify itemssolely based on limited user feedback. More precisely, users are just askedsimple questions with binary answers. A notable difficulty stems from theheterogeneity in the difficulty in classifying the various items (some itemsrequire more feedback to be classified than others). For this problem, wederive fundamental limits of the cluster recovery rates for both offline andonline algorithms. For the offline setting, we devise a simple algorithm thatachieves the limit order-wise. For the online setting, we propose an algorithm inspired by the lower bound. For both of the problems, we evaluatethe proposed algorithms by inspecting their theoretical guarantees and usingnumerical experiments performed on the synthetic and non-synthetic dataset. Denna avhandling studerar algoritmer för datareduktion som lär sig från sekventiellt inhämtad data. Vi fokuserar speciellt på frågeställningar som uppkommer i utvecklingen av rekommendationssystem och i identifieringen av heterogena grupper av användare från data. För rekommendationssystem betraktar vi ett system med m användare och n objekt. I varje runda observerar algoritmen en slumpmässigt vald användare och rekommenderar ett objekt. En viktig begränsning i vår problemformuleringar att rekommendationer inte får upprepas: samma objekt inte kan rekommenderas till samma användartermer än en gång. Vi betraktar problemet som en variant av det flerarmadebanditproblemet och analyserar systemprestanda i termer av "ånger” under olika antaganden.Vi härleder fundamentala gränser för ånger och föreslår algoritmer som är (ordningsmässigt) optimala. En intressant komponent av vår analys är att vi lyckas att karaktärisera hur vart och ett av våra antaganden påverkar systemprestandan. T.ex. kan vi kvantifiera prestandaförlusten i ånger på grund av att rekommendationer inte får upprepas, på grund avatt vi måste lära oss statistiken för vilka objekt en användare är intresserade av, och för kostnaden för att lära sig den lågdimensionella rymden för användare och objekt. För problemet med hur man bäst identifierar grupper av användare härleder vi fundamentala gränser för hur snabbt det går att identifiera kluster. Vi gör detta för algoritmer som har samtidig tillgång till all data och för algoritmer som måste lära sig genom sekventiell inhämtning av data. Med tillgång till all data kan vår algoritm uppnå den optimala prestandan ordningsmässigt. När data måste inhämtas sekventiellt föreslår vi en algoritm som är inspirerad av den nedre gränsen på vad som kan uppnås. För båda problemen utvärderar vi de föreslagna algoritmerna numeriskt och jämför den praktiska prestandan med de teoretiska garantierna. <p>QC 20210223</p><p></p>Licentiate thesis, comprehensive summaryinfo:eu-repo/semantics/masterThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290791urn:isbn:978-91-7873-780-2TRITA-EECS-AVL ; 2021:12application/pdfinfo:eu-repo/semantics/openAccess

Online Dimensionality Reduction

Similar Items