Find Org Files

After some discussions with a friend, I wrote an essay with code for how to load a file in Emacs by first selecting a tag (that you would have added to your files), and then selecting from files that have that tag. Seemed like a good idea at the time, but it dawned on me that I could take advantage of the filtering capabilities with new extensions to completing-read.

In this essay, I explain how I can select a Org files based on either filename, title or associated tags … or some combination of them. Once we would write a program to search all files, gather the information (indexing the function names, but in this case, I’d index the file’s title and any associated tags) and write that into a TAGS file for Emacs to read, or at least, extend Gnu Global. New search programs, like ripgrep, are so fast, we can use it to dynamically search the files at run-time.

Calling my new org-find-file and typing eshell shows the following filtered file collection:

org-find-file-a.png

org-find-file

As a thin-wrapper around find-file, my org-find-file function interactively filters the list of available files by calling org-find-file—choose-file function:

(defun org-find-file (file)
  "Load org-specific file like `find-file'.

If called interactively, the list of files inclues the Org's
title as well as any headline tags."
  (interactive (list (org-find-file--choose-file)))
  (find-file file))

The job of org-find-file--choose-file is two-fold:

  1. Display a nice version of all org files
  2. When chosen, return the filename of the selection

The completing-read function takes a prompt as a string, and a list of possible choices from file-choices. Since most of us use something like Ivy, vertico, or Selectrum, we can filter the list until we narrow it down to our selection.

(completing-read "Choose a fruit: "
                 '("apple" "orange" "banana" "other"))

Shows this after typing an:

org-find-file-b.png

The completing-read can also accept alists, plists, hash-tables, etc.

(let ((alist '(("red"    . "apple")
               ("orange" . "orange")
               ("yellow" . "banana"))))
  (completing-read "Choose a fruit: " alist))

In this example, completing-read shows the colors, and when I select yellow, the function returns yellow. But, I could use that selection to lookup the other value:

(let* ((alist '(("red"    . "apple")
                ("orange" . "orange")
                ("yellow" . "banana")))
       (choice (completing-read "Choose a fruit: " alist)))
  (alist-get choice alist nil nil 'equal))

When running this code, selecting yellow return banana.

So org-find-file—chose-file needs to create an associative list, like:

(
 ;; ...
 ("eshell-why.org : Why use EShell?" . "eshell-why.org")
 ("eshell.org : Introduction to Emacs Shell :emacs :technical" . "eshell.org")
 ("eshell-fun.org : Eschewing Zshell for Emacs Shell :emacs :technical" . "eshell-fun.org")
 ("piper-presentation.org : Death to the Shell :emacs :presentation :eshell" . "piper-presentation.org")
 ("eshell-present.org : Presenting the Eshell :technical :shell :emacs :presentation" . "eshell-present.org")
 ("eshell-presentation.org : Presenting the EShell :technical :shell :emacs :presentation" . "eshell-presentation.org")
 ("eshell-present-and-notes.org : Presenting Eshell :technical :shell :emacs :presentation :noexport" . "eshell-present-and-notes.org")
 ;; ...

Where to select the org files?

  • If given a directory, look for files there
  • If in a project, (the project-current returns non-nil), start from the top of that project
  • Otherwise, look in the current directory

With the default-directory set, we get a list of the files from org-find-file—file-choices, and assign it to a local variable, file-choices:

(defun org-find-file--choose-file (&optional directory)
  "Use `completing-read' to present Org files for selection.
Acquires the list of files (and their descriptive text) from
calling `org-find-file--file-choices' (which returns an alist)."
  (let* ((default-directory (if (project-current)
                                (project-root (project-current))
                              (or directory default-directory)))
         (file-choices (org-find-file--file-choices))
         (chosen-file  (completing-read "File: " file-choices)))
    (alist-get chosen-file file-choices nil nil 'equal)))

Creating the AList

The nicely displayed list of org files is a combination of filename, the title, and the tags, so I create two functions for this:

org-find-file—gather-titles
Returns an alist of (filename . title)
org-find-file—gather-tags
Returns a hash-table where key is the filename, and value is a list of tags.

To smash them, er. format them, I call seq-map with a λ that cons a pretty title (from org-find-file—file-format) and the filename:

(defun org-find-file--file-choices ()
  "Return alist of file _labels_ and the file references."
  (let ((titles  (org-find-file--gather-titles))
        (tags    (org-find-file--gather-tags)))
    (seq-map (lambda (entry)
               (seq-let (file title) entry
                 (cons (org-find-file--file-format file title (gethash file tags))
                       file)))
             titles)))

The pretty (and desciptive) filename comes from this function.

(defun org-find-file--file-format (file title tags)
  "Return a nicely format string containing the parameters."
  (let* ((title-color `(:foreground ,(face-attribute 'org-document-title :foreground)))
         (title-str    (string-trim title))
         (title-pretty (propertize title-str 'face title-color))
         (tag-str      (string-join tags " ")))  ; <-- Updated
    (format "%s : %s %s" file title-pretty tag-str)))

Note the use of propertize to distinguish the title from both the filename and the tags.

Gathering the Titles

At this point, I need to get the org-file’s titles and tags, and with quick grep-replacements, like ripgrep, I can get the titles of all files using a call like:

rg --ignore-case --no-heading --no-line-number "#\+title:"

Which can return something like:

...
Technical/Learning/index.org:#+TITLE:  Teaching Programming to Middle Schoolers
Technical/Learning/python.org:#+TITLE:  Programming with Python
Technical/Learning/Python/index.org:#+TITLE:  Learning Python
Technical/index.org:#+TITLE:  1
Technical/Python/new-project.org:#+TITLE:  New Projects in Python
Technical/Learning/java.org:#+TITLE:  Learning Java
Technical/OpenStack/using-heat-templates.org:#+TITLE:  Using Heat Templates
README.org:#+title:  My Website
...

The —gather-titles function calls rg, splits each line on the : character, and returns a list of the filename, and the title:

(defun org-find-file--gather-titles ()
  "Return list "
  (thread-last "rg --ignore-case --no-heading --no-line-number '^#\\+title:'"
               (shell-command-to-list)
               (--map (split-string it ":"))
               (--map (list (nth 0 it) (nth 2 it)))))

When I will call rg, using shell-command-to-string, but need the results as a list of strings, split on newline characters:

(defun shell-command-to-list (command)
  "Return call to COMMAND as a list of strings for each line."
  (thread-first command
                shell-command-to-string
                (string-lines t)))

Gathering the Tags

Getting all the tags for the files is a bit more complicated.

Brief Review of Org Tags

Org surprised me with how easy to add tags, but how little searchability org provides. I mean, you can display an org file with headlines that match a tag (using org-tags-view) and have the agenda limit its display of TODO items (with org-agenda-filter-by-tag).

You associate tags with a headline, like:

* A headline about Something    :foo:bar:bird:

This has three tags, foo, bar, and bird.

Like everything in Org, you can type those tags, or you can call the function org-set-tags-command (defaults to C-c C-q).

Also, you can associate every headline with a tag, by adding something like this in your buffer:

TAGS: foo bar bird

You’ll notice that on a headline, you surround tags with colons, but on the #+TAGS: line, you surround them with spaces. This becomes important later as we search for both types.

A feature of Org tags to consider is that headlines inherit tags from parent headlines. For instance:

* Top-level Headline     :foo:
** Sub-level Headline    :bar:
*** Interesting Headline :baz:

The Interesting Headline has three tags, foo, bar, and baz. Because of this, ripgrep (or other line-oriented search tools) aren’t as effective in pin-pointing a headline with a particular tag. Since my goal is to open a file based on the tag anywhere in document, this won’t influence the design.

Regular Expression for Tags

While it wouldn’t take much to craft a regular expression to parse the tags from a headline, Org already supplies org-tag-line-re. But as an Emacs-oriented regexp (obviously), we need to convert it before passing it over to ripgrep. The pcre2el project can convert this, like:

(format "rg --no-heading '%s' %s"
        (rxt-elisp-to-pcre org-tag-line-re)
        project-dir)

The org-tag-line-re works for headline tags, but not for tags affecting all headlines in a file. I thought could work:

(rx (or
     (regexp org-tag-line-re)
     (seq line-start "#+tags:" (one-or-more space)
          (group (one-or-more (any alnum "@" "_" space))))))

Since both our regular expression as well as org-tag-line-re start with line-start (i.e. ^), the or clause (i.e. |) fails to match. In other words, we need to expand this combination to create our own:

(defvar org-find-files-tag-line-re
  (rx line-start
      (or
       (seq (one-or-more "*") " " (+? any) ":"
            (group (one-or-more (any alnum "@_#%:"))) ":")
       (seq "#+tags:" (one-or-more space)
            (group (one-or-more (any alnum "@_#%" space)))))
      line-end)
  "Regular expression that matches either headline or global file tags.")

I’m glad to use the rx macro to make the regular expression more readable.

Code to Acquire the Tags

Using this regular expression to search for tags leads to a problem, as the following is possible:

focused-work.org:#+TAGS: emacs hamacs
focused-work.org:* Timers :noexport:
focused-work.org:* Technical Artifacts                                :noexport:
literate-database.org:#+tags:   emacs technical
...

Here, a single file, could have more than one entry, due to repeated tags on different headlines.

The function, org-find-file--gather-tags, calls rg with a converted version of org-find-files-tag-line-re, and shows me all the filenames and tags, but I need to merge it. My solution was to use a hash table, where I could append the new tags (found on the current line) to any tags found earlier:

(defun org-find-file--gather-tags ()
  "Return hash-table of key as filename, and values are tags.
Note that the tags are _all_ tags in the file."
  (let ((results  (make-hash-table :test 'equal))
        (tag-list (thread-last (format "rg --ignore-case --no-heading --no-line-number '%s'"
                                       (rxt-elisp-to-pcre org-find-files-tag-line-re))
                               (shell-command-to-list)
                               (--map (split-string it ":")))))
    (dolist (entry tag-list)
      (seq-let (file ignored tags) entry
        (let ((prev-tags (gethash file results))
              (new-tags  (org-find-file--massage-tags tags)))
          (puthash file (seq-union prev-tags new-tags) results))))
    results))

This function uses the following helper function to convert the tag-portion of the rg command line, into a list of tags:

(defun org-find-file--massage-tags (tag-string)
  "Return TAG-STRING as a list of tags.
For instance, the string: foo:bar -> '(\"foo\" \"bar\")"
  (let* ((tag-separators (rx (1+ (any space ":"))))
         (tag-list       (split-string tag-string tag-separators t)))
    (--map (concat ":" it) tag-list)))

Since this last function is functional and easy to test:

(ert-deftest org-find-file--massage-tags-test ()
  (should (equal (org-find-file--massage-tags "foo") '(":foo")))
  (should (equal (org-find-file--massage-tags "foo bar") '(":foo" ":bar")))
  (should (equal (org-find-file--massage-tags "foo:bar") '(":foo" ":bar")))
  (should (equal (org-find-file--massage-tags "  foo  ") '(":foo"))))

There we go. That is everything needed for a search function to list org files, allowing you to select by filename, words in its title, or even tags. If you find this idea interesting, grab the source code.