Towards a Unified Framework of Matrix Derivatives

The need of processing and analyzing massive statistics simultaneously requires the derivatives of matrix-to-scalar functions (scalar-valued functions of matrices) or matrix-to-matrix functions (matrixvalued functions of matrices). Although derivatives of a matrix-to-scalar function have already bee...

Full description

Bibliographic Details
Main Authors: Jianyu Xu, Guoqi Li, Changyun Wen, Kun Wu, Lei Deng
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8453264/
id doaj-09ee6f3a10b145a0b3284d1ce25a8633
record_format Article
spelling doaj-09ee6f3a10b145a0b3284d1ce25a86332021-03-29T21:11:11ZengIEEEIEEE Access2169-35362018-01-016479224793410.1109/ACCESS.2018.28672348453264Towards a Unified Framework of Matrix DerivativesJianyu Xu0Guoqi Li1https://orcid.org/0000-0002-8994-431XChangyun Wen2Kun Wu3Lei Deng4Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, ChinaDepartment of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, ChinaSchool of Electrical and Electronic Engineering, Nanyang Technological University, SingaporeDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaDepartment of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USAThe need of processing and analyzing massive statistics simultaneously requires the derivatives of matrix-to-scalar functions (scalar-valued functions of matrices) or matrix-to-matrix functions (matrixvalued functions of matrices). Although derivatives of a matrix-to-scalar function have already been defined, the way to express it in algebraic expression, however, is not as clear as that of scalar-to-scalar functions (scalar-valued functions of scalars). Due to the fact that there does not exist a uniform way of applying “chain rule”on matrix derivation, we classify approaches utilized in existing schemes into two ways: the first relies on the index notation of several matrices, and they would be eliminated while being multiplied; the second relies on the vectorizing of matrices and thus they can be dealt with in the way we treat vector-tovector functions (vector-valued functions of vectors), which has already been settled. On one hand, we find that the first approach holds a much lower time complexity than that of the second approach in general. On the other hand, until now though we know most typical functions that can be derived in the first approach, theoretically the second approach is more generally fit for any routine of "chain rule." The result of the second approach, nevertheless, can be also simplified to the same order of time complexity with the first approach under certain conditions. Therefore, it is important to establish these conditions. In this paper, we establish a sufficient condition under which not only the first approach can be applied but also the time complexity of results obtained from the second approach can be reduced. This condition is described in two equivalent individual conditions, each of which is a counterpart of an approach sequentially. In addition, we generalize the methods and use these two approaches to do the derivatives under the two conditions individually. This paper enables us to unify the framework of matrix derivatives, which would result in various applications in science and engineering.https://ieeexplore.ieee.org/document/8453264/Matrix derivativesindex notationKronecker productchain rulematrix calculustime complexity
collection DOAJ
language English
format Article
sources DOAJ
author Jianyu Xu
Guoqi Li
Changyun Wen
Kun Wu
Lei Deng
spellingShingle Jianyu Xu
Guoqi Li
Changyun Wen
Kun Wu
Lei Deng
Towards a Unified Framework of Matrix Derivatives
IEEE Access
Matrix derivatives
index notation
Kronecker product
chain rule
matrix calculus
time complexity
author_facet Jianyu Xu
Guoqi Li
Changyun Wen
Kun Wu
Lei Deng
author_sort Jianyu Xu
title Towards a Unified Framework of Matrix Derivatives
title_short Towards a Unified Framework of Matrix Derivatives
title_full Towards a Unified Framework of Matrix Derivatives
title_fullStr Towards a Unified Framework of Matrix Derivatives
title_full_unstemmed Towards a Unified Framework of Matrix Derivatives
title_sort towards a unified framework of matrix derivatives
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description The need of processing and analyzing massive statistics simultaneously requires the derivatives of matrix-to-scalar functions (scalar-valued functions of matrices) or matrix-to-matrix functions (matrixvalued functions of matrices). Although derivatives of a matrix-to-scalar function have already been defined, the way to express it in algebraic expression, however, is not as clear as that of scalar-to-scalar functions (scalar-valued functions of scalars). Due to the fact that there does not exist a uniform way of applying “chain rule”on matrix derivation, we classify approaches utilized in existing schemes into two ways: the first relies on the index notation of several matrices, and they would be eliminated while being multiplied; the second relies on the vectorizing of matrices and thus they can be dealt with in the way we treat vector-tovector functions (vector-valued functions of vectors), which has already been settled. On one hand, we find that the first approach holds a much lower time complexity than that of the second approach in general. On the other hand, until now though we know most typical functions that can be derived in the first approach, theoretically the second approach is more generally fit for any routine of "chain rule." The result of the second approach, nevertheless, can be also simplified to the same order of time complexity with the first approach under certain conditions. Therefore, it is important to establish these conditions. In this paper, we establish a sufficient condition under which not only the first approach can be applied but also the time complexity of results obtained from the second approach can be reduced. This condition is described in two equivalent individual conditions, each of which is a counterpart of an approach sequentially. In addition, we generalize the methods and use these two approaches to do the derivatives under the two conditions individually. This paper enables us to unify the framework of matrix derivatives, which would result in various applications in science and engineering.
topic Matrix derivatives
index notation
Kronecker product
chain rule
matrix calculus
time complexity
url https://ieeexplore.ieee.org/document/8453264/
work_keys_str_mv AT jianyuxu towardsaunifiedframeworkofmatrixderivatives
AT guoqili towardsaunifiedframeworkofmatrixderivatives
AT changyunwen towardsaunifiedframeworkofmatrixderivatives
AT kunwu towardsaunifiedframeworkofmatrixderivatives
AT leideng towardsaunifiedframeworkofmatrixderivatives
_version_ 1724193376434126848