CONTINUOUS SIGN LANGUAGE RECOGNITION WITH MULTI-SCALE SPATIAL-TEMPORAL FEATURE ENHANCEMENT

Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement

Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement

Blog Article

Continuous Sign Language Recognition (CSLR) seeks to interpret the gestures used by people who are hard of hearing-mute individuals and translate them into natural language, thereby enhancing communication and interaction.A successful CSLR method relies on the continuous tracking of the presenter’s gestures and facial movements.Existing CSLR methods struggle with fully leveraging fine-grained continuous frame information and often overlook the importance of multi-scale feature integration during decoding.

To solve the above-mentioned issues, in this paper, we propose a spatial-temporal feature-enhanced network, called STNet for CSLR task.Firstly, for better continuous frame information exploration, based on the optimal transport algorithm, we first propose a spatial resonance module, which is used SUITS COTTON HOPKINS to extract the global common spatial features of two adjacent frames along the frame sequence.Secondly, we design a frame-wise loss to preserve and enhance the specific features of each frame.

Lastly, to emphasize the multi-scale feature fusion, on the decoder side, we design a multi-temporal perception module, to allow each frame to focus on a larger range of other frames and enhance information interaction from different scales.Extensive experiments on three benchmark datasets including PHOENIX14, PHOENIX14-T, and CSL-Daily tattoos demonstrate that STNet consistently outperforms state-of-the-art methods, with a notable improvement of 2.9% in CSLR, showcasing its effectiveness and generalizability.

Our approach provides a robust foundation for real-world applications such as sign language education and communication tools, while ablation and case studies highlight the impact of each module, paving the way for future research in CSLR.

Report this page