š² On finding relevant notes when you need them
Another use-case where LLMs are genuinely useful and not overhyped nonsense
After awhile, note taking systems tend to reach a point where you forget what youāve got. Sometimes, though ā we want that stuff back. There are a couple of ways to accomplish that, and LLMs are one of the newest.
But first, a reminder: forgetting is natural, normal, and healthy. Thereās a fancy term for it in child development: synaptic pruning1. But before you can prune, there must be something there. Something that was useful at least once, generally. The question isā¦ when it might finally be useful, will you be able to find it again?
Knowledge bases grow and then shrink
A newborn babyās brain grows a ton ā early brain development involves explosive growth, similar to the burst of enthusiasm many folks felt at the beginning of the pandemic when personal knowledge management as a movement began to really take off. The contents of my Obsidian vault grew exponentially as I filled it with everything I needed to offload from my blurry mom-brain so I could still do knowledge work ā mostly wrestling with the nerdy academic research I was digging into because I wanted to create a plausible and interesting fantasy world for the book I was writing while my son took naps.
Then he got older, I went back to work, and I took fewer and fewer notes ā but the collection was still useful. Iām extremely glad I curated its contents; I rely on the information there daily, even though I donāt take as many notes as I used to by any stretch of the imagination. There are a handful of topics and files that I refer back to almost constantly, others I refer to only regularly ā like my potty training notes, reflections on previous Thanksgiving meals, or idealistic plans for what Iād like my retirement to look like. And there are, of course, notes I made once and never really touched again, because my life changed ā the odds of me teaching US history in an environment where I need to worry about structuring a classroom lesson with careful differentiation according to student reading levels are exceptionally low, for example.
When a child hits 2 or 3, the number of synapses in their brain peaks ā and the brain starts to remove synapses it doesnāt need anymore. It āforgetsā skills and knowledge that havenāt been used enough to justify the cost of keeping.
There are methods to combat forgetting, of course ā formal spaced repetition practices are the most popular, although I prefer to think of it as āstructured serendipityā since Iām not usually trying to memorize. According to my Readwise database, my personal peak streak for flipping through flashcards of things I wanted to see again later is 121 days. Iāve been considering setting up a flashcard system for memorizing the Pokemon type chart, because my son is a lot more pleasant on hikes if thereās the promise of spinning a Pokestop every half-mile or soā¦ but he doesnāt like to lose battles and heās a little too young to grasp the complexities himself yet.
Like my son, my notes collection is about 4 years old at this point, and I can attest that Iāve done more āpruningā than adding in the last year or so ā I moved the finance stuff into a shared drive with my husband, my work stuff now lives in a shared Notion database with my work colleagues, Iāve deleted a bunch of PDFs that are no longer relevant to my interests, Iāve archived old books I no longer have an interest in finishing because I no longer aspire to the rarefied heights of ātraditionally published authorā or even the grind of āsuccessful self-published author.ā
Notes are valuable resources
But as with the Pokemon type charts, I created most of my notes for a reason. Pulling from my own notes accesses vetted sources instead of the wild world of the internet ā I know I liked it enough to read it, if nothing else. But thatās certainly not the only value: the latent connections between my thoughts and my notes are easier to find if I only need to refresh my memory, not re-learn the thing with different sources that might not align to the overgrown pathways in my brain.
A curated notes collection saves time, and itās efficient, particularly in an era where internet search is badly broken and even when LLMs donāt hallucinate like crazy, they often lack context.
Taking notes is easy, especially if (as with my types chart) you donāt bother to re-write everything into your own words. Using them isnāt even that hard, really ā even if you only vaguely remember what youāre looking for, there are a bunch of ways to find stuff again. Most tools have some kind of search function, whether it uses semantic search, boolean operators, database queries, regular expressions, natural language searchā¦ or isnāt called search at all but relies on up-front organization; stuff like naming conventions, backlinks, structured data classification systems, tagging schema, folder systemsā¦ itās generally straightforward to lay hands on an old note. As far back as twenty years ago, browsers kept searchable histories and helpfully changed the color of links that had already been clicked. I know people who still rely almost entirely on their browser search history as a ānotes database.ā If a method aināt broke, why fix it?
The hard part is finding useful things when youāre not looking for it, and thatās where the technology has really improved a lot in the last year or so. Little AIs that live in the corner of your screen, ready to helpfully pop up with a reminder that you already started writing an almost identical article two years ago. Services that get to know you and algorithmically serve you content they think you want, when you want it ā memetically offering you coupons for diapers before you even know youāre pregnant. Plugins2 that index your notes, perform magic I barely understand ā like vector embedding ā and tell you what youāve got thatās similar to what youāre currently looking at.
The frustrating part of this new technology is that itās not set up for the careful systems I created three years ago; vector embeddings are made using the contents of your notes, so if you embedded content that is only rendered by a particular program ā to avoid redundancy, perhaps ā youāre kind of3 out of luck. Worse is if you use consistent templates ā if youāre writing chapters of a novel, for example, vector embeddings will get you similarly structured chapters instead of notes relevant to the content of what youāre writing, unless youāre very careful about only indexing certain subsets of your notes. This is maybe not what you want to do if you, for example, want to see other chapters touching on similar themes and also nonfiction notes about that theme, without weighting every chapter that involves particular characters. Vector embeddings are helpful, but they arenāt as āsmartā as all that, alas.
A well-trained LLM levels up search
Historically I havenāt used Notion for anything non-collaborative. Itās slow to load. I donāt really like databases, and I donāt like the way it āpushesā me to add icons and banners to everything. Most importantly, I find the way its block-based structure interferes with a simple cmd+a
āselect allā immensely frustrating. While Iām venting, concatenation and tracking word counts is all but impossible unless I do it manually? Ugh. Plus, the keybinds suck.
The AI, though, is incredible. Being able to feed my unrefined database of raw highlights and annotations to the workspace, do no setup beyond clicking a handful of consent buttons, then ask a question like āshow me all of my notes about infrastructure that might pair well with this point I am making about how Fall of Angels is a fantasy novel that teaches a lot about infrastructureā (stay tuned for that forthcoming article sometime next spring š¤Ŗ) and getā¦ useful results, neatly organized into a brief memo, with footnotes sourced to my own personal notes, complete with a carefully curated list of (in this case 14) additional related notes isā¦ incredible.
Seriously, I am sufficiently blown away that Iām going to just share a screenshot of what that slowly growing note looks like, even though itās messy and disjointed and raw and ugly and probably gives too much of the game away.
The NotionAI is terrible at many things. If you try to use it to create a database property for automatically counting words in a document (which should be table stakes, imo) it hallucinates wildly and is wrong by orders of magnitude. If you ask it whether a particular chapter of a story has a coherent beginning, middle, and end, it cannot tell. It has most of the same painful conversational pitfalls as ChatGPT, which makes sense because as far as I know, Notionās AI is essentially a wrapper of GPT-3.5.
But wow is it convenient, and whatever the Notion team has done to get it to answer questions like āfind me notes aboutā¦ā really, really works. Itās always well-sourced, which is what mattersā¦ and it manages to find hidden gems I had completely forgotten about ā although I absolutely cannot trust it to accurately report on the status of those things, and it definitely gets some nuance wrong. For instance, I asked for books and of the three bullets, only one is a book instead of an article. Plus, I already wrote that āInfrastructure in Ancient Civilizationsā article it says I could develop (although I had completely forgotten about it) ā compare the actual content of my note to NotionAIās report on it.
Now, sure, if I search the raw term āinfrastructureā with old-fashioned search, I get twice as many results. But they arenāt as useful as even a quasi-accurate memo. Compare:
Of my top results, only one is in any way useful in any way to the project at hand (that Chinese transport network thing, which I definitely forgot about entirely until now), and it showed up in the curated list of 14 the AI gave me. The snippet of text thatās displayed isnāt clearly connected to anything Iām thinking about, and it takes a lot more mental effort to sift through and remember why it might matter.
I ought to figure out how to do this in Obsidian too, but honestlyā¦ that is up there with training my own LLM model in terms of āprobably a good idea, donāt really have time.ā Iām just glad that LLMs have leveled up search, because as Googleās dominance in the last era of the internet demonstrates ā search is key.
Midjourney makes beautiful images, and Elevenlabs seems to have nailed text to speechā¦ but for me search, and summarization, are the killer use-cases for AI ā which is why Elicit (which automates time-consuming research tasks like summarizing papers, extracting data, and synthesizing findings) has historically been one of the apps I point to when people ask me what AI is even good for, anyway.
Yeah, thereās lots of hype ā but I truly believe that figuring out how to leverage large language models for search is as key to success in the current era as learning boolean operators was 20 years ago.
Hereās the basic primer on synaptic pruning I used as a reference while writing this.
The one I use for Obsidian is called Smart Connections, but as nice and helpful as the developer is, sometimes I feel too stupid to get the most out of it.
Although Obsidian does have plugins like āeasy bakeā, which will take embedded content and copy or move the text ā and itās possible to create code scripts that do the same thing in more custom ways. Even Iāve done it with javascript or python and some kind help, and Iām no coder ā tools like Copilot and ChatGPTās strawberry model make this even easier now. Thanks to ChatGPT, I managed to install homebrew in the terminal without asking for help, after bouncing right off the documentation I found using Googleā¦
Enjoyed and agreed with raves and rants. We're in that awkward space between new tech rollout and UI's that make the tech understandable, not to mention useable.
Every couple of months I install Smart Connections and end up cursing. I'm not stupid, we shouldn't be made to feel that way, and I refuse to use products with lofty pitches that do not deliver like this one - Spend less time linking, tagging and organizing because Smart Connections finds relevant notes so you don't have to!
I am currently indexing all of my documents locally using GPT4All and Meta's model Llama 3 8B Instruct. If it works well, then I'll try out the Obsidian plugin Copilot for Obsidian which can call the above model in GPT4All, keeping everything local. (Bad product name: It has nothing to do with Microsoft's Copilot). Fingers crossed. If you're interested in some eye candy, here's the dev's latest video: https://youtu.be/1jSaGwuPiJs