The 2015 paper that made billion-edge graphs fit on one machine
LINE is the original C++ implementation of a network embedding method that predates node2vec and most modern graph neural networks.

What it does
LINE turns nodes in a graph into dense vectors, preserving either first-order proximity (direct neighbors) or second-order proximity (shared neighbors). It handles directed, undirected, weighted, and binary edges. The output is standard embedding files you can feed into downstream classifiers or visualizations.
The interesting bit
The efficiency claim is the hook: “millions of vertices and billions of edges on a single machine within a few hours” in 2015, before GPUs were standard for graph work. The trick is edge-sampling with negative sampling—borrowed from word2vec’s playbook—rather than full matrix factorization. The authors later built GraphVite, which superseded this entirely.
Key highlights
- Ships as raw C++ with no dependencies beyond Boost (Windows) or GSL (Linux)
- Includes helper tools:
reconstruct.cppfor densifying sparse networks,normalize.cppfor L2 normalization,concatenate.cppfor merging 1st- and 2nd-order embeddings - Example pipeline for YouTube social network data with node classification evaluation
- Configurable via CLI: embedding size, proximity order, negative samples, thread count
- Published at WWW 2015; 1,053 stars suggests it was widely used before successors arrived
Caveats
- Repository is explicitly unmaintained; authors redirect to GraphVite for any new work
- Undirected edges require manual duplication into two directed edges in the input file
- No Python bindings or modern packaging; you’re compiling C++ and parsing text files
Verdict
Worth a look if you’re reproducing 2015-era graph embedding baselines or studying how word2vec-style sampling migrated to networks. Skip it if you need active maintenance, GPU acceleration, or a Python API—GraphVite and PyTorch Geometric have you covered there.