Text this: TFSWA-ResUNet: music source separation with time–frequency sequence and shifted window attention-based ResUNet