Neural vector embeddings

Hyperdimensional Computing, Vector Symbolic Architectures, Holographic Reduced Representations

December 20, 2017 — August 29, 2024

approximation
feature construction
geometry
high d
language
linear algebra
machine learning
metrics
neural nets
NLP

Representations of complicated spaces by vectors which preserve semantic information.

Warning: this is not my current area, but it is a rapidly moving one.

Treat notes here with caution; many are outdated.

Figure 1

Feature construction for inconvenient data; made famous by word embeddings such as word2vec being surprisingly semantic. Note that word2vec has a complex relationship to its documentation and now even more famous for LLMs.

But what do these vectors mean? I am not sure. One interesting approach is to use Sparse autoencoders to understand the vectors.

1 Transformer embeddings

Technical survey: Kleyko et al. (2022) cites back to the year 2000.

3 Embedding vector databases

Related: learnable indices. See vector databases.

4 Misc

Entity embeddings of categorical variables (code):

We map categorical variables in a function approximation problem into Euclidean spaces, which are the entity embeddings of the categorical variables. The mapping is learned by a neural network during the standard supervised training process. Entity embedding not only reduces memory usage and speeds up neural networks compared with one-hot encoding, but more importantly by mapping similar values close to each other in the embedding space it reveals the intrinsic properties of the categorical variables. We applied it successfully in a recent Kaggle competition and were able to reach the third position with relative simple features. We further demonstrate in this paper that entity embedding helps the neural network to generalize better when the data is sparse and statistics is unknown. Thus it is especially useful for datasets with lots of high cardinality features, where other methods tend to overfit. We also demonstrate that the embeddings obtained from the trained neural network boost the performance of all tested machine learning methods considerably when used as the input features instead. As entity embedding defines a distance measure for categorical variables it can be used for visualising categorical data and for data clustering.

5 Other Software

  • word2vec

    This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.”

  • fastText

    fastText is a library for efficient learning of word representations and sentence classification.

6 References

Bengio, Ducharme, Vincent, et al. 2003. A Neural Probabilistic Language Model.” Journal of Machine Learning Research.
Boykis. 2023. What Are Embeddings?
Cancho, and Solé. 2003. Least Effort and the Origins of Scaling in Human Language.” Proceedings of the National Academy of Sciences.
Cao, Hripcsak, and Markatou. 2007. A statistical methodology for analyzing co-occurrence data from a large sample.” Journal of Biomedical Informatics.
Cohn, Agarwal, Gupta, et al. 2023. EELBERT: Tiny Models Through Dynamic Embeddings.” In.
Cunningham, Ewart, Riggs, et al. 2023. Sparse Autoencoders Find Highly Interpretable Features in Language Models.”
Deerwester, Dumais, Furnas, et al. 1990. Indexing by Latent Semantic Analysis.”
Gayler. 2004. Vector Symbolic Architectures Answer Jackendoff’s Challenges for Cognitive Neuroscience.”
Guthrie, Allison, Liu, et al. 2006. A Closer Look at Skip-Gram Modelling.” In.
Herremans, and Chuan. 2017. Modeling Musical Context with Word2vec.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017.
Jiang, Aragam, and Veitch. 2023. Uncovering Meanings of Embeddings via Partial Orthogonality.”
Kanerva. 2009. Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors.” Cognitive Computation.
Kiros, Zhu, Salakhutdinov, et al. 2015. Skip-Thought Vectors.” arXiv:1506.06726 [Cs].
Kleyko, Rachkovskij, Osipov, et al. 2022. A Survey on Hyperdimensional Computing Aka Vector Symbolic Architectures, Part I: Models and Data Transformations.” ACM Computing Surveys.
Lazaridou, Nguyen, Bernardi, et al. 2015. Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation.” arXiv:1506.03500 [Cs].
Le, and Mikolov. 2014. Distributed Representations of Sentences and Documents.” In Proceedings of The 31st International Conference on Machine Learning.
Mikolov, Chen, Corrado, et al. 2013. Efficient Estimation of Word Representations in Vector Space.” arXiv:1301.3781 [Cs].
Mikolov, Le, and Sutskever. 2013. Exploiting Similarities Among Languages for Machine Translation.” arXiv:1309.4168 [Cs].
Mikolov, Sutskever, Chen, et al. 2013. Distributed Representations of Words and Phrases and Their Compositionality.” In arXiv:1310.4546 [Cs, Stat].
Mikolov, Yih, and Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In HLT-NAACL.
Mitra, and Craswell. 2017. Neural Models for Information Retrieval.” arXiv:1705.01509 [Cs].
Moran, Sridhar, Wang, et al. 2022. Identifiable Deep Generative Models via Sparse Decoding.”
Narayanan, Chandramohan, Venkatesan, et al. 2017. Graph2vec: Learning Distributed Representations of Graphs.” arXiv:1707.05005 [Cs].
O’Neill, Ye, Iyer, et al. 2024. Disentangling Dense Embeddings with Sparse Autoencoders.”
Park, Choe, and Veitch. 2024. The Linear Representation Hypothesis and the Geometry of Large Language Models.”
Pennington, Socher, and Manning. 2014. GloVe: Global Vectors for Word Representation.” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).
Plate. 2000. Analogy Retrieval and Processing with Distributed Vector Representations.” Expert Systems.
Saengkyongam, Rosenfeld, Ravikumar, et al. 2024. Identifying Representations for Intervention Extrapolation.”