georgian-io/Multimodal-Toolkit
A toolkit that extends HuggingFace transformers to combine text embeddings with tabular categorical and numerical features for classification and regression.

This library builds multimodal models on top of pretrained transformer architectures (BERT, ALBERT, etc.) by adding fusion layers that combine transformer outputs with categorical and numerical tabular features. It supports end-to-end training where both the combining module and transformer parameters are fine-tuned for supervised downstream tasks. The toolkit is built on PyTorch and integrates directly with HuggingFace Transformers.