Image · Video · Audio — the hottest AI repositories on heatdrop

Newcomers Heavyweights

Hottest Accelerating

Image · Video · Audio

heavyweights · gaining speed

lkuza2/java-speech-api

+0.1 ★/day→steady

A wrapper around Google's speech services that handles the tedious audio plumbing so you don't have to.

★ 545 Java Image · Video · Audio · explained

ikostrikov/TensorFlow-VAE-GAN-DRAW

+0.2 ★/day→steady

A single repo that lets you train DCGAN, VAE, or DRAW without wrestling three different codebases.

★ 592 Python Image · Video · Audio · explained

mlachmish/MusicGenreClassification

+0.2 ★/day→steady

A 2016 Tel Aviv University project that swaps Tao Feng's RBM for a TensorFlow CNN and scrapes 30-second previews to classify ten music genres.

★ 600 Python Image · Video · Audio · explained

jonbruner/generative-adversarial-networks

+0.2 ★/day→steady

A dead-simple TensorFlow implementation that trades bleeding-edge complexity for actual comprehension.

★ 539 Jupyter Notebook Learning · explained

carpedm20/simulated-unsupervised-tensorflow

+0.2 ★/day→steady

A 2017-era implementation that uses adversarial training to refine synthetic eye images so they fool a discriminator without losing their gaze labels.

★ 575 Python Image · Video · Audio · explained

Zardinality/WGAN-tensorflow

+0.2 ★/day→steady

A straightforward notebook implementation of Wasserstein GAN that lets you flip the loss signs and still trains, because duality is weird like that.

★ 579 Jupyter Notebook Image · Video · Audio · explained

zsdonghao/text-to-image

+0.2 ★/day→steady

A 2016 paper implementation that generates flower images from text descriptions, built when TensorFlow 1.x was fresh and "skip thought vectors" sounded futuristic.

★ 599 Python Image · Video · Audio · explained

HRLTY/TP-GAN

+0.2 ★/day→steady

A 2017 ICCV paper that synthesizes frontal faces from extreme side angles using two perceptual paths at once.

★ 510 Python Image · Video · Audio · explained

stanfordnlp/mac-network

+0.2 ★/day→steady

Stanford's MAC cell breaks visual reasoning into explicit, inspectable computation steps—rare honesty in a field that usually hides its work.

★ 513 Python Computer Vision · explained

evancohen/sonus

+0.2 ★/day→steady

Sonus gives Node.js projects offline hotword detection, then streams speech to cloud STT only after you get its attention.

★ 638 JavaScript Image · Video · Audio · explained

pathak22/pyflow

+0.2 ★/day→steady

A Python shim around Ce Liu's venerable C++ Coarse2Fine optical flow, minus the OpenCV dependency headache.

★ 661 C++ Computer Vision · explained

Grzego/handwriting-generation

+0.2 ★/day→steady

A clean TensorFlow implementation of Alex Graves' 2013 paper that generates plausible cursive from text, complete with attention windows and style knobs.

★ 594 Python Image · Video · Audio · explained

Justin-Tan/generative-compression

+0.2 ★/day→steady

A TensorFlow implementation of extreme learned image compression that trades exact reconstruction for tiny file sizes by letting a generator dream up the textures.

★ 533 Python Computer Vision · explained

sunshineatnoon/Paper-Implementations

+0.2 ★/day→steady

Before Hugging Face and Lightning, there was this: one developer's clean re-implementation of the papers that defined an era.

★ 621 Python ML Frameworks · explained

sergeytulyakov/mocogan

+0.2 ★/day→steady

MoCoGAN disentangles motion and content in video generation, letting you swap faces while keeping the expression—or vice versa.

★ 602 Python Image · Video · Audio · explained

jsn5/dancenet

+0.2 ★/day→steady

A Keras project that generates new dance sequences by compressing video frames into a latent space, then predicting the next pose with an LSTM and Mixture Density Network.

★ 519 Python Image · Video · Audio · explained

shibing624/parrots

+0.2 ★/day→steady

Parrots wraps ASR and TTS into pip-installable Python with pre-trained voices and emotional fine-tuning.

★ 525 Python Image · Video · Audio · explained

rui1996/DeRaindrop

+0.2 ★/day→steady

A 2018 CVPR spotlight paper that uses attention maps to stop generative networks from inventing plausible-looking but wrong background details behind raindrops.

★ 548 Python Computer Vision · explained

cvondrick/videogan

+0.2 ★/day→steady

A Torch7 implementation that generates short, plausible video clips by separating foreground motion from static backgrounds using adversarial training.

★ 706 Lua Image · Video · Audio · explained

alphacep/vosk

+0.2 ★/day→steady

VOSK skips neural network training in favor of storing every audio chunk it has ever seen, then fingerprint-matches new input against the hoard.

★ 500 C Image · Video · Audio · explained

loading more…