Multimodal AI / IEEE Publication
Project 01
TapToTab
A multimodal AI system that turns guitar performance videos into playable tabs by combining what the model sees with what it hears. The project sits at the intersection of computer vision, signal processing, and practical music understanding.
Why It Matters
Published at IEEE as a real applied AI system, showing how synchronized visual and audio signals can be transformed into structured musical output through a custom end-to-end pipeline.
Key Details
- Built data pipelines to process synchronized audio and visual signals so the model could reason across both modalities at once.
- Developed custom algorithms to detect hand positions and fret interactions, translating guitar movement into structured information.
- Designed the system as a full pipeline rather than a narrow model demo, connecting detection, feature extraction, and tab generation into one workflow.