40 million embeddings to find who knows what on Hacker News
After the semantic map from the previous post, where I embedded 40 million posts and comments from Hacker News, I saw how the community not only supported the project with encouraging suggestions but also discovered how quickly the community shrinks from across the world into real life relationships. Robert (robg) reached out and we began discussing his work in the neural basis of semantic knowledge and how he built social semantic algorithms way back in 2008. Despite the intervening 16 years, we're amazed that social networks, even Hacker News, don't compute and display the trusted voices across topics. Instead of prioritizing pages based on content, social networks could prioritize the people behind the content. The semantic map of the community, in effect, breaks down to computable regions based on the ways people use language to talk about their knowledge and interests.
In short, why is it so hard to discover and explore the people who best know about different topics, the extent of their knowledge, and how they relate to like minds? So Robert and I have been jamming for the last month toward some examples based on the semantics of Hacker News. For every user, based on their HN comments, I've computed their place in the complete semantic map of the entire community.
You can explore and interact with the new app at hn2.wilsonl.in.
Here are some cool things we discovered as we've looked at how to expand the Hackerverse:
We can organize your semantics on Hacker News. By starting with a user for the semantics of the community we show your contributions to HN semantically alongside similar users, and your "unique identity" of words you use.
- For example, check out robg's new profile.
We can search the semantics of HN based on who knows what. The n-th order semantics shows how any search vector decomposes into the constituent people and how they use language in the community. Example queries:
We can map the community by who knows what and their relationships based on the semantics involved. Knowledge is not equally uniform, so this topography of the community helps to highlight the people and what they know.
When you add up these three capabilities—organize your semantics, search the semantics, and map the community—we think the technology starts to show the people behind the words. Rather than organizing the world's information, what if we could organize the world's people? It has us thinking about a whole range of social knowledge challenges, but we'd love to hear your thoughts. Please join the waitlist if you'd like to further explore with us as we make more progress on this project. We're focused on the fun of finding and connecting with people not pages, and we'd love to hear your thoughts.