Finding Files based on Tags

I have a friend who has been collecting notes using a Wiki, and I’ve been wondering if we couldn’t add any missing features in org. As you know, use of the built-in hyperlinks links one org-file to another to transverse notes. Use the similar Wiki syntax of typing two [[ and followed by the filename (or better yet, C-u C-c C-l to select the file). Adding a back-link package, like org-roam, and you have just as powerful a system as a Wiki for reading.

I saw on Mastodon a comment that a Wiki allows you to select a file (er, page) by first going to a tag, and then selecting a page from a list of all pages with that tag. Sounds quite useful. Let’s write that code, as I think if you are a judicious tagger, this could be more helpful than remembering the filename. To make this more focused, let’s limit this to a project (or a single directory tree). I’m also going to try to keep my code to standard (albeit modern) Emacs, and do my best to limit the need for outside projects.

Brief Review of Org Tags

Associate tags with a headline, as in:

* A headline about Something    :foo:bar:bird:

This has three tags, foo, bar, and bird.

Like everything in Org, you can type those tags, or you can call the function org-set-tags-command (defaults to C-c C-q). Also, you can associate every headline with a tag, by adding something like this in your buffer:

TAGS: foo bar bird

You’ll notice that on a headline, you surround tags with colons, but on the #+TAGS: line, you surround them with spaces. This becomes important later as we search for both types.

Searching Tags

Org surprised me with how easy to add tags, but how little searchability org provides. I mean, you can display an org file with headlines that match a tag (using org-tags-view) and have the agenda limit its display of TODO items (with org-agenda-filter-by-tag).

Wouldn’t it be nice to call find-file with a tag, and the completion system could list files that match that tag? Once we would write a program to search all files, gather the tags and write them into a TAGS file for Emacs to read, or at least, extend Gnu Global. New search programs, like ripgrep, are so fast, we could use it to dynamically search the files at run-time.

Tag Inheritance

A feature of Org tags to consider is that headlines inherit tags from parent headlines. For instance:

* Top-level Headline     :foo:
** Sub-level Headline    :bar:
*** Interesting Headline :baz:

The Interesting Headline has three tags, foo, bar, and baz. Because of this, ripgrep (or other line-oriented search tools) aren’t as effective in pin-pointing a headline with a particular tag. Since my goal is to open a file based on the tag anywhere in document, this won’t influence the design. The hierarchy will still work, albeit, opening a file positions the cursor at the parent, not some child.

Regular Expression for Tags

While it wouldn’t take much to craft a regular expression to parse the tags from a headline, Org already supplies org-tag-line-re. But as an Emacs-oriented regexp (obviously), we need to convert it before passing it over to ripgrep. The pcre2el project can convert this, like:

(format "rg --no-heading '%s' %s"
        (rxt-elisp-to-pcre org-tag-line-re)
        project-dir)

The org-tag-line-re works for headline tags, but not for tags affecting all headlines in a file. This could work:

(rx (or
     (regexp org-tag-line-re)
     (seq line-start "#+tags:" (one-or-more space)
          (group (one-or-more (any alnum "@" "_" space))))))

Since both our regular expression as well as org-tag-line-re start with line-start (i.e. ^), the or clause (i.e. |) fails to match. In other words, we need to expand this combination to create our own:

(defvar org-find-files-tag-line-re
  (rx line-start
      (or
       (seq (one-or-more "*") " " (+? any) ":"
            (group (one-or-more (any alnum "@_#%:"))) ":")
       (seq "#+tags:" (one-or-more space)
            (group (one-or-more (any alnum "@_#%" space)))))
      line-end)
  "Regular expression that matches either headline or global file tags.")

Search for Tagged Lines

Let’s create an interactive function that will display the headings of all files with tags, allowing us to use the grep-mode to select and jump to any file. This function runs ripgrep, rg, but does it through the Emacs grep function, which runs some command, displays the results, and create links to the results.

(require 'pcre2el)

(defun org-grep-tags (project-dir)
  "Show `grep-mode' buffer of org files with tagged headlines.

If PROJECT-DIR is nil, searches in the current project."
  (interactive (list (read-directory-name "Directory: " (if (project-current)
                                         (project-root (project-current))
                                       default-directory))))

  (let ((command (format "rg --no-heading '%s' %s"
                         (rxt-elisp-to-pcre org-find-files-tag-line-re)
                         project-dir)))
    (grep command)))

Note that we could have use the following call to interactive:

(interactive "DDirectory: ")

But calling (list (read-directory-name …)) allows me to specify the default project (which is probably what I want since rg will search recursively).

Shell Command to List

In the rest of this essay, when I will call rg, using shell-command-to-string, but evaluate the results as a list of strings, split on newline characters.

(defun shell-command-to-list (command)
  "Return call to COMMAND as a list of strings for each line."
  (thread-first command
                shell-command-to-string
                (string-lines t)))

What Tags do we have?

Let’s make a function that returns the available tags using ripgrep. This function creates an output variable as a list of every match from the ripgrep, as a list of strings. We extract the tags using seq-map and send that to seq-uniq to give us a list of individual tags in all files in a directory:

(defun org-tags-in-current (&optional dir)
  "Returns a list of tags available in a directory tree."
  (unless dir
    (setq dir (if (project-current)
                  (project-root (project-current))
                default-directory)))
  (let* ((command (format "rg --ignore-case --no-heading --no-line-number --no-filename '%s' %s"
                          (rxt-elisp-to-pcre org-find-files-tag-line-re) dir))
         (output  (shell-command-to-list command))
         (tags    (thread-last output
                               (seq-map 'org-tags-in-current--from-grep)
                               (flatten-list))))
    (seq-uniq tags 'string-equal)))

The function given to seq-map, org-tags-in-current--from-grep, should use the org-find-files-tag-line-re we created above, to grab either the first or second group for the tags. An initial approach would be:

(defun org-tags-in-current--from-grep (line)
  (when (string-match org-find-files-tag-line-re line)
    (or (match-string 1 line) (match-string 2 line))))

If a headline has two or more tags, e.g. :foo:bar: this returns foo:bar, treating as a single entry. The function we need should return a list of tags for each headline:

(defun org-tags-in-current--from-grep (line)
  "Return a list of tags from LINE that match `org-find-files-tag-line-re'."
  (let ((case-fold-search t))
    (when (string-match org-find-files-tag-line-re line)
      (if-let ((s1 (match-string 1 line)))
          (split-string s1 ":")
        (if-let ((s2 (match-string 2 line)))
          (split-string s2 (rx (1+ space))))))))

Can I prove it to myself?

(ert-deftest org-tags--from-grep-test ()
  (should (equal (org-tags--from-grep "** Headline :foobar:") '("foobar")))
  (should (equal (org-tags--from-grep "** Headline :foo:bar:") '("foo" "bar")))
  (should (equal (org-tags--from-grep "#+tags: foobar") '("foobar")))
  (should (equal (org-tags--from-grep "#+tags: foo bar") '("foo" "bar"))))

For example, in my current project comprising all pages in my website, I have:

(org-tags-in-current)

Find File for Tags

The initial goal was to recreate a find-file that filters files that match a tag, like the project-find-file function.

(defun org-find-file-by-tags (file-tuple)
  "Load file from FILE-TUPLE like `find-file'.

If called interactively, first ask for a TAG and then limit the
files displayed based on if they have a headline that contains
that TAG."
  (interactive (list
                ;; The org-find-files-file-with-tag is a hash table where
                ;; the key is a user-visible entry, and the value is a
                ;; list of the filename and the (first) line number:
                (let ((files (call-interactively 'org-find-files-file-with-tag)))
                  (gethash (completing-read "File: " files) files))))

  (seq-let (file line) file-tuple
    (find-file file)
    (when line
      (goto-line line))))

The real worker in this function is org-find-files-file-with-tag that prompts for a tag, and then returns a hash table containing matching files. The key/value entries in this are:

  • What we show to the user (key). For instance: Technical/Emacs/org-find-files-tags.org : Brief Review of Org Tags
  • What we need to load the file (value). This will be a list of both the full filename as the line number where the tag occurs.

This supporting function calls our org-tags-in-current function we defined above to get all available tags, and then calls ripgrep to search files with that particular tag.

(defun org-find-files-file-with-tag (tag &optional dir)
  "Return hashtable of files in project with headlines containing TAG."
  (interactive (list (completing-read "Tag: " (org-tags-in-current) nil t)))
  (unless dir
    (setq dir (file-truename (if (project-current)
                                 (project-root (project-current))
                               default-directory))))

  (let* ((tags-re (rx (or (seq "#+tags:" (0+ any) space (literal tag) word-boundary)
                          (seq (1+ "*") space (1+ any) ":" (literal tag) ":"))))
         (command (format "rg --ignore-case --no-heading --line-number '%s' %s"
                          (rxt-elisp-to-pcre tags-re) dir))
         (output  (shell-command-to-list command))
         (reducer (org-find-files--add-file-with-tag dir))
         (results (make-hash-table :test 'equal)))
    (reduce reducer output :initial-value results)
    results))

The call to ripgrep returns a list of files, the line number where the tag appears, as well as the headline. We need to convert that raw list to a hashtable with a user-visible key and filename/line number value. For that, we call good ol’ reduce, but the function for it takes two values, a hash-table we will use as an accumulator, and the entry. Since ripgrep returns the full filename, I’d like to trim the initial directory.

Instead of creating a function to give to reduce, we create a function that returns a function we can give to reduce that lexically stores our directory … a reducer function:

(defun org-find-files--add-file-with-tag (dir)
  "Return a reducer function with DIR available as lexical scope.
The function return is a reducer, accepting a hashtable as an accumulator,
and an entry from ripgrep. Assumes the entry looks like:

/home/howard/website/Technical/Emacs/org-find-file-tags.org:58:** What Tags do we have? :tags:

And crafts a key for display to the user, e.g.

     Technical/Emacs/org-find-file-tags.org :: What Tags do we have?

And the value is a list with the full filename as well as the
linenumber. For instance:

    (\"/home/howard/website/Technical/Emacs/org-find-file-tags.org\" 58)"
  (lambda (acc-hash rg-file-entry)
    (let ((line-re (rx (group (one-or-more (not ":"))) ":"
                       (group (one-or-more digit))     ":"
                       (or "#+tags:"
                           (seq
                            (one-or-more "*") (one-or-more space)
                            (group (+? (not ":")))
                            (one-or-more space) ":")))))
      (when (string-match line-re rg-file-entry)
        ;; Extract the grouped parts of the entry into variables:
        (let* ((fullfile (match-string 1 rg-file-entry))
               (linestr  (match-string 2 rg-file-entry))
               (heading  (or (match-string 3 rg-file-entry) ""))
               (linenum  (string-to-number linestr))
               ;; Prepare to store the key/value in the hashtable:
               (key      (format "%s %s"
                                 (string-trim (string-remove-prefix dir fullfile))
                                 (if (string-blank-p heading)  ""
                                   (concat ":: " heading))))
               (value    (list fullfile linenum)))
          (puthash key value acc-hash)))
      acc-hash)))

There we go … a few screenshots to see how calling org-find-files-file-with-tag first allows us to choose a tag:

org-find-file-tags-a.png

And when I select something, for instance, emacs, I’m given a list of files with that tag:

org-find-file-tags-b.png

Grab the resulting code from this essay at org-find-file-tags.el in my configuration files.