https://zyros.dev/ausdevs_conversations/

TLDR

I split the conversations into chunks, then asked Gemini whether the chunks should be merged. Then I asked Gemini to describe the conversation three times (what’s the sentiment, topic, and technical topic). This was vectorised using Gemini’s embedding model, and then I did dimensionality reduction so that it’s viewable on a 2D plot. See FAQs at the bottom if you have any questions.

Background

AusDevs 2.0.0 is an Australian discord server that mostly has students or recent graduates using it. Their wiki page is here, https://ausdevs.com/, and it’s mostly supported by smish. I search this pretty often, and I’m pretty interested in natural language things so figured this would be a nice project.

Data

You can figure out how I collected this data. I’m not going to get into it.

Feature Engineering

Now that I have my data, I need to go parse it. Discord data is pretty needlessly verbose for my purposes.

This message above is 126 lines of JSON meta data. Contains stuff like peoples roles, what emojis they used, what their username is, what the message’s unique ID is in discord’s database, so on and so on.

My feature engineering has three core classes. The chunker, the describer and the embedder. For chunking I first do a semantic chunk (maximum of 50 messages, split if there’s over 2 hours between two messages), then inside of my chunker I’ll ask the LLM whether or not I should merge these two together

Now we have chunks! Let’s go describe