" nil t))))
(markdown-remove-newlines-in-paragraphs) ; once all the hyphenation is dealt with, remove the hard-newlines which are common in PDF copy-pastes. These hard newlines are a problem because they break many string matches, and they make `langcheck` highlight every line beginning/ending in red as an error.
)
(message "%s %s" begin end)
)
(flyspell-buffer)
(query-inflation-adjust)
(ispell) ; spellcheck
(message "Getting suggested links…")
(getLinkSuggestions "~/wiki/metadata/linkSuggestions.el")
(message "Checking grammar/language…")
(langtool-check)
(call-interactively #'langtool-correct-buffer) ; grammar
(message "Remember to collapse appendices, annotate links, add inflation-adjustments to all '$'/'₿'s, add margin notes, 'invert' images, and run `markdown-lint`")
nil
)))))
(defun pdf-fix-spaced-words (begin end)
"Query-collapse spaced-glyph-runs like “I n”, “W h e n”.
In a range of BEGIN and END, or the entire buffer if not.
Keys during prompt: y = replace, n = skip, ! = replace rest, q = quit."
(interactive
(if (use-region-p)
(list (region-beginning) (region-end))
(list (point-min) (point-max))))
(let* ((re "\\b[[:alpha:]]\\(?:[ \u00A0][[:alpha:]]\\)\\{1,40\\}\\b")
(end-marker (copy-marker end t))
(replace-all nil))
(goto-char begin)
(catch 'done
(while (re-search-forward re end-marker t)
(let* ((orig (match-string 0))
(fixed (replace-regexp-in-string "[ \u00A0]" "" orig)))
(unless replace-all
(let ((c (read-char-choice
(format "Replace “%s” → “%s”? (y, n, !, q) " orig fixed)
'(?y ?n ?! ?q))))
(cl-case c
(?y (replace-match fixed t t))
(?n nil)
(?! (setq replace-all t)
(replace-match fixed t t))
(?q (throw 'done nil)))))
(when replace-all
(replace-match fixed t t)))))))
(defun clean-pdf-text (&optional start end)
"Clean PDF-ish text in buffer/region using `/static/build/clean-pdf.py`.
If START and END are provided or a region is active, only region processed.
Otherwise, the entire buffer will be processed.
This function processes the text paragraph by paragraph, where paragraphs
are defined as text blocks separated by triple newlines (\\n\\n\\n). Each
paragraph that contains lines ending with '-' is sent to the external
`clean-pdf.py` script, which uses AI to correct common PDF extraction issues:
- Removing spurious hyphens at line breaks
- Joining words split across lines
- Fixing ligature and character encoding problems
The cleaned text replaces the original text in the buffer. All changes are
grouped as a single, atomic operation for undo purposes.
Note: This function assumes that `clean-pdf.py` is in Emacs's executable path."
(interactive)
(unless (executable-find "clean-pdf.py")
(error "Error: Python `clean-pdf.py` script not found in path"))
(let* ((region-is-active (region-active-p)) ; simplified this check
(process-start (cond (start start)
(region-is-active (region-beginning))
(t (point-min))))
(process-end (cond (end end)
(region-is-active (region-end))
(t (point-max)))))
(atomic-change-group
(save-excursion
(let ((case-fold-search nil)) ; Make search case-sensitive
(goto-char process-start)
(while (and (< (point) process-end)
(not (eobp)))
(let* ((current-point (point))
(found-boundary (save-excursion
(search-forward "\n\n\n" process-end t)))
(paragraph-end (if found-boundary
(match-beginning 0)
(min process-end (point-max))))
(paragraph (when (and (< current-point paragraph-end)
(<= paragraph-end (point-max)))
(buffer-substring-no-properties current-point paragraph-end)))
(needs-cleaning (and paragraph
(not (string-empty-p paragraph))
(string-match-p "-\n" paragraph)
(not (string-match-p "—\n" paragraph))
(not (string-match-p "---\n" paragraph)))))
(when needs-cleaning
(let ((cleaned-paragraph
(with-temp-buffer
(insert paragraph)
(condition-case err
(progn
(call-process-region (point-min) (point-max)
"clean-pdf.py"
t t nil)
(buffer-string))
(error
(message "Error in clean-pdf.py: %s" (error-message-string err))
paragraph)))))
(unless (string= paragraph cleaned-paragraph)
(when (and (<= current-point (point-max))
(<= paragraph-end (point-max)))
(delete-region current-point paragraph-end)
(insert cleaned-paragraph)))))
(if found-boundary
(goto-char (+ (match-end 0) 1))
(goto-char (min (1+ paragraph-end) (point-max))))))))
(message "PDF text cleaning completed for %s"
(if region-is-active "selected region" "entire buffer")))))
(defun query-inflation-adjust ()
"Interactively inflation-adjust all dollar amounts in the buffer.
This replaces all dollar amounts in the buffer with a Pandoc Markdown
Span element in Gwern.net inflation-adjuster format.
The function checks first if there are any dollar amounts in the buffer.
If there are, it prompts the user for a year, defaulting to the current year,
and then replaces all dollar amounts with the dollar amount wrapped in a
Pandoc Markdown Span element tagged with the year, following the format
`[$Y]($YEAR)`.
For example, `$20` would be replaced with `[$20]($2023)` if the year 2023
was input. If there are no dollar amounts in the buffer, the function exits
without prompting the user.
The function handles negative dollar amounts, and dollar amounts followed
immediately by punctuation. It does not handle dollar amounts in
scientific notation or with currency symbols, or dollar amounts with unusual
formatting following some European conventions.
The function checks if the year input by the user is a numeric string.
If it is not, the function outputs an error message and does not perform
the replacement.
The user can exit the function by appending `q` to the year input (e.g., `2024q`).
This is more convenient than accepting & quitting."
(interactive)
(let ((from "\\([-−]?\\$\\([0-9,.]+\\)\\([kmbt]?\\)\\)\\>"))
(if (not (string-match-p from (buffer-string)))
(message "query-inflation-adjust: No dollar amounts to adjust; exiting.")
(let* ((year-input (read-string "Year ('q' skips): " (format-time-string "%Y")))
(year (replace-regexp-in-string "q$" "" year-input)))
(cond
((string-match-p "q$" year-input)
(message "query-inflation-adjust: User chose to skip."))
((string-match-p "\\`[0-9]+\\'" year)
(let ((to (concat "[$\\2]($" year ")\\3")))
(query-replace-regexp from to nil (point-min) (point-max))
(message "Inflation adjustment completed for year %s." year)))
(t (message "Invalid year input. Please enter a valid year or append 'q' to skip.")))))))
;; (defun number-with-commas (num)
;; "Format a number `NUM` with commas."
;; (let ((num-str (number-to-string num))
;; (pos 0)
;; (result ""))
;; (dotimes (i (length num-str))
;; (when (and (> pos 0) (= 0 (mod pos 3)))
;; (setq result (concat "," result)))
;; (setq pos (1+ pos)
;; result (concat (substring num-str (- (length num-str) pos) (- (length num-str) (1- pos))) result)))
;; result))
(defun comma-format-number (num-str)
"Insert commas into a number string `NUM-STR`."
(let ((num (string-to-number num-str)))
(number-with-commas num)))
(defun number-with-commas (num)
"Format a number `NUM` with commas."
(let ((num-str (reverse (number-to-string num))) ; Reverse the string to process from the least significant digit
(result ""))
(dotimes (i (length num-str))
(when (and (> i 0) (= 0 (mod i 3)))
(setq result (concat result ","))) ; Insert comma before every group of 3 digits
(setq result (concat result (char-to-string (elt num-str i))))) ; Append current digit
(reverse result))) ; Reverse the result to correct the order
; implement a URL-skipping `query-replace-regexp`, named `my-markdown-or-html-query-replace-regexp`. Many rewrites need to skip URLs but there's no good way to do it hitherto. This function also takes arbitrary functions to do the rewrite, which can be useful too for more complex rewrites.
; GPT-4:
(require 'pulse)
(defcustom my-markdown-or-html-query-replace-skip-urls t
"Non-nil means `my-markdown-or-html-query-replace-regexp` should skip link URLs."
:type 'boolean
:group 'my-markdown-or-html)
(defun my-markdown-or-html-inside-link-p ()
"Return non-nil if point is inside a Markdown or HTML link."
(let ((faces (list (get-text-property (point) 'face)
; The point might be at the beginning or end of the URL, which might not have the expected face property.
; To fix this issue, we check for the face property not only at the point but also at the previous and next characters.
; This way, we ensure the function returns non-nil if the point is at the beginning or end of a URL.
(get-text-property (1- (point)) 'face)
(get-text-property (1+ (point)) 'face))))
(cl-some (lambda (face) (or (eq face 'markdown-url-face)
(eq face 'html-attr-value-face)))
faces)))
(defun my-markdown-or-html-query-replace-args ()
"Read the arguments for `my-markdown-or-html-query-replace-regexp`."
(query-replace-read-args (concat "Query replace"
(if current-prefix-arg " word" "")
" regexp"
(if (and transient-mark-mode mark-active) " in region" ""))
t))
(defun my-markdown-or-html-query-replace-confirm (from to)
"Prompt the user to confirm replacing `FROM` with `TO`.
Highlight the matched text during the query. Accept `y` for yes, `n` for no, and `q` to quit."
(pulse-momentary-highlight-region (match-beginning 0) (match-end 0))
(let ((response nil))
(while (not (member response '(?y ?n ?q)))
(setq response (read-char-choice (format "Replace `%s' with `%s'? (y/n/q) " from to) '(?y ?n ?q))))
(cond ((eq response ?y) t)
((eq response ?n) nil)
((eq response ?q) 'quit)
)
)
)
(defun my-markdown-or-html-query-replace-regexp (regexp replace-fn &optional start end)
"Interactively query replace `REGEXP` with the result of `REPLACE-FN`.
This version continues searching after a `q` quit signal but skips replacements.
This function skips over Markdown or HTML link URLs when performing the
replacements if `my-markdown-or-html-query-replace-skip-urls` is non-nil. It operates in the
region between `START` and `END` if both are provided; otherwise, it operates on
the entire buffer.
`REPLACE-FN` should be a function that accepts a string (the matched text) and
returns the replacement string.
WARNING: This function handles user input to `quit` (via `q`) not by stopping the search entirely,
but by continuing to search through the remainder of the document without performing
any further replacements. This hack works around a problem where quitting a replacement operation
would unexpectedly kill the calling command (`fmt`).
By setting a `skip-replacement` flag upon a `quit` signal, somehow it works?
\(fn REGEXP REPLACE-FN &optional START END)"
(interactive)
(let ((replacements 0)
(skip-replacement nil)) ; New flag to control skipping of replacements
(save-excursion
(goto-char (or start (point-min)))
(save-match-data
(while (re-search-forward regexp end t)
(unless (or skip-replacement (my-markdown-or-html-inside-link-p))
(let* ((from (match-string 0))
(to (funcall replace-fn from))
(confirmation (my-markdown-or-html-query-replace-confirm from to)))
(cond ((eq confirmation t)
(replace-match to t t)
(setq replacements (1+ replacements)))
((eq confirmation 'quit)
(setq skip-replacement t))))))) ; Continue searching but skip replacements
(message "Query replace finished, %d replacements made" replacements)))
)
; GPT-4
(require 'rx)
(defgroup markdown-newline-removal nil
"Options for removing newlines within paragraphs in Markdown text."
:group 'markdown)
(defcustom markdown-excluded-chars (rx (any ?- ?\n ?\d ?# ?* ?> ?. ?? ?|))
"Characters to exclude when removing newlines within paragraphs in Markdown text."
:type 'regexp
:group 'markdown-newline-removal)
(defun markdown-remove-newlines-in-paragraphs (&optional buffer use-region)
"Replace newlines with spaces within paragraphs of Markdown text in BUFFER.
It then adds a newline between each sentence.
If BUFFER is nil, use the current buffer. If USE-REGION is non-nil,
operate on the current region instead of the entire buffer.
This assumes you have already removed hyphenation (either by removing the
hyphen+newline for line-breaking-only hyphens, or just the newline when the
hyphen is in the original word regardless of line-breaking)."
(interactive "P")
(with-current-buffer (or buffer (current-buffer))
;; Save the current point position and restore it after the operation
(save-excursion
;; Determine the search range based on the use-region argument
(let ((start (if use-region (region-beginning) (point-min)))
(end (if use-region (region-end) (point-max))))
;; Move the point to the start of the search range
(goto-char start)
;; Define the regex pattern using the excluded characters custom variable
(let* ((excluded-chars (replace-regexp-in-string "^\\[\\|\\]$" "" markdown-excluded-chars))
(pattern (concat "\\([^" excluded-chars "]\\)\\(\n\\)\\([^" excluded-chars "]\\)")))
;; Search and replace newlines within paragraphs
(save-match-data
(while (re-search-forward pattern end t)
(replace-match "\\1 \\3" nil nil)))))
;; Insert a newline at the end of each sentence
(save-excursion
(goto-char (point-min))
(while (re-search-forward "\\([.!?]\\)\\([[:space:]]\\)" nil t)
(replace-match "\\1\n"))))
;; Inform the user when the operation is complete
(message "Newlines removed within paragraphs.")))
(defvar markdown-rewrites '())
(defun buffer-contains-substring (string)
(save-excursion
(save-match-data
(goto-char (point-min))
(search-forward string nil t))))
(defun getLinkSuggestions (rewriteFile)
"Query the user for a long list of possible search-and-replaces in a buffer.
The list is defined in a file argument REWRITEFILE and generated by an external utility parsing the gwern.net sources.
This works by: read a Lisp file with the `rewrites` variable; `rewrites` is a list of string pairs.
Each string is a possible anchor text like \"Barack Obama\" and its corresponding URL, \"https://en.wikipedia.org/wiki/Barack_Obama\".
Looping over this list of pairs, for each pair, do a `query-replace` query on the buffer,
converting it into a Markdown lint like `[Barack Obama](https://en.wikipedia.org/wiki/Barack_Obama)`.
This external file can be written by hand, but it is intended to be automatically generated by a Haskell tool
which parses the full Gwern.net HTML+Markdown sources for all possible links,
and turns the full set of links into a list of anchor text/link pairs, which it dumps in Elisp format into the file.
This tool is run automatically by a cron job.
So any link on Gwern.net will automatically become a search-and-replace query,
and it will be updated based on any manually-added links."
(progn
(load-file rewriteFile)
(let ((begin (if (region-active-p) (region-beginning) (point-min)))
(end (if (region-active-p) (region-end) (point-max)))
)
(save-excursion
(goto-char (point-min))
(dolist (pair markdown-rewrites)
(let ((original (first pair))
(replacement (second pair))
)
; skip if already done
(if (not (buffer-contains-substring replacement))
(let ((case-fold-search t) (search-upper-case nil) (case-replace nil))
(query-replace-once original (concat "[" original "](" replacement ")") t begin end)
))))))))
(defun markdown-annotation-compile ()
"Turn a Markdown buffer into a HTML5 snippet without newlines and with escaped quotes,
suitable for using as a GTX string inside annotated gwern.net links (see `full.gtx`)."
(interactive)
(call-interactively #'fmt)
(save-window-excursion
(defvar $pos 1)
(message "Preprocessing and compiling into HTML…")
; Pandoc converts the Markdown to HTML. Then the HTML goes through `preprocess-markdown` which runs additional typographic/formatting rewrites, runs LinkAuto to automatically linkify text, and then runs through GenerateSimilar to provide a list of relevant annotations to curate as the 'see-also' section at the bottom of annotations (if they are approved).
; NOTE: because `preprocess-markdown` is calling the OA API via the embedder, $OPENAI_API_KEY must be defined in the Emacs environment, either via `(setenv "OPENAI_API_KEY" "sk-xyz123456789")` or by putting it in `~/.bash_profile`. (Putting it in `.env` or `.bashrc` is not enough, because they won't apply to GUI/X Emacs)
(let ((markdown-command "preprocess-annotation.sh")) ; (visible-bell nil)
(markdown-kill-ring-save)
(setq $pos (point-max))
(goto-char $pos)
(insert "\n---\n")
(yank)
(goto-char $pos)
; (replace-all "\n" " ")
(let ( ; (begin (if (region-active-p) (region-beginning) (+ $pos 1)))
; (end (if (region-active-p) (region-end) (point-max)))
)
(replace-all "
" "
")
(replace-all "
" "")
; (replace-all "
" "
\n
")
; (replace-all "
" "
\n
")
(replace-all " id=\"cb1\">" "") ; the Pandoc syntax-highlighting IDs cause ID clashes when substituted into pages, so delete all
(replace-all " id=\"cb2\">" "")
(replace-all " id=\"cb3\">" "")
(replace-all " id=\"cb4\">" "")
(replace-all "
![]()
![]()
" "\" />")
(replace-all "’’" "’")
(replace-all "’s" "’s")
(replace-all "%3Csup%3Est%3C/sup%3E" "th")
(replace-all "%3Csup%3End%3C/sup%3E" "nd")
(replace-all "%3Csup%3Erd%3C/sup%3E" "rd")
(replace-all "" "")
; unnecessary in annotations for WP links because they will be regenerated by the single-source-of-truth:
(replace-all " class=\"id-not link-live\"" "")
(replace-all "class=\"id-not link-live\"" "")
(delete-trailing-whitespace)
(forward-line)
(html-mode)
(my-frame-urgent-hint-set) ; XMonad is set to use XMonad.Hooks.UrgencyHook.withUrgencyHook’s FocusHook to yank focus to X11 frames with urgent hint set
(ding)
(message "Done.")
)
)
)
)
(add-hook 'markdown-mode-hook
(lambda ()
(define-key markdown-mode-map "\C-c\ w" 'markdown-annotation-compile)))
(defvar html-mode-map) ; suppress reference-to-free-variable byte-compile warning
(add-hook 'html-mode-hook
(lambda ()
(define-key html-mode-map "\C-c\ w" 'markdown-annotation-compile)))
; for the `foo` buffer I do most of my annotation work in, on the first copy-paste of a block of text, detect if it has any paragraph breaks (ie. double newlines), and if it does not, then automatically run paragraphizer.py on it to try to break it up into logical paragraphs.
(defun markdown-paragraphize ()
"Automatically paragraphize single-paragraph abstracts.
Intended for Markdown mode with double-newlines for newlines;
may malfunction if run on other formats like HTML
\(where `
` pairs can come in many forms, not to mention other block elements like blockquotes\)."
(interactive)
(delete-trailing-whitespace)
(let ((double-newline-found nil))
(save-excursion
(goto-char (point-min))
(unless (search-forward-regexp "\n\n" nil t)
(message "Paragraphizing abstract…")
(let* ((paragraphizer-path (executable-find "paragraphizer.py"))
(original-text (buffer-substring-no-properties (point-min) (point-max)))
(original-length (length original-text)))
(if paragraphizer-path
(progn
; NOTE: we do *not* want to capture stderr output
(call-process-region (point-min) (point-max) paragraphizer-path t (list t nil) nil)
(let* ((new-length (- (point-max) (point-min)))
(has-double-newline (save-excursion
(goto-char (point-min))
(search-forward-regexp "\n\n" nil t))))
(if (and (>= new-length original-length)
has-double-newline)
(progn
(setq double-newline-found t)
(message "Paragraphizing abstract done. New text is valid (%d chars, has paragraphs)." new-length))
(progn
(delete-region (point-min) (point-max))
(insert original-text)
(message "Paragraphizing skipped: new text invalid [char-length change: %d → %d, has \\n\\n: %s]."
original-length new-length (if has-double-newline "True" "False"))))))
(error "Error: Python `paragraphizer.py` script not found in path")))))
(when double-newline-found
(goto-char (point-max)))))
(defun markdown-paragraphize-hook ()
"Hook function for `markdown-paragraphize`."
(when (and (equal (buffer-name) "foo")
(derived-mode-p 'markdown-mode)
(eq this-command 'yank)
(>= (buffer-size) 500)) ; ensure that there is enough in the buffer to plausibly be a full copy-pasted abstract, as opposed to a random snippet or line.
(markdown-paragraphize)))
(add-hook 'post-command-hook #'markdown-paragraphize-hook)
; https://emacs.stackexchange.com/a/56037
(defun my-frame-urgent-hint-set--for-x11 (frame arg &optional window-id)
"Set the x11-urgency hint for the FRAME to ARG (on WINDOW-ID) :
- If ARG is nil, unset the urgency.
- If ARG is any other value, set the urgency.
If you unset the urgency, you still have to visit the frame to reset it."
(let* ((wm-prop "WM_HINTS") ;; Constants.
(wm-flag-urgent #x100)
(wm-hints (append (x-window-property wm-prop frame wm-prop window-id nil t) nil))
(flags (car wm-hints)))
(setcar wm-hints
(if arg
(logior flags wm-flag-urgent)
(logand flags (lognot wm-flag-urgent))))
(x-change-window-property wm-prop wm-hints frame wm-prop 32 t)))
(defun my-frame-urgent-hint-set (&optional arg)
"Mark the current Emacs frame as requiring urgent attention.
With prefix argument ARG which is not boolean value nil, remove urgency
\(which might or might not change display, depending on the window manager\)."
(interactive "P")
(let*
(
(frame (selected-frame))
(win-system (window-system frame)))
(cond
((eq win-system 'x)
(my-frame-urgent-hint-set--for-x11 frame (not arg)))
;; only Linux X11 is supported:
(t
(message "Urgent hint for window system %S unsupported" win-system)))))
; add new-line / paragraph snippet
(add-hook 'html-mode-hook
(lambda ()
(define-key html-mode-map (kbd "") (lambda () (interactive)
(if (= ?\s (preceding-char)) (delete-char -1))
(insert "
\n
")
(if (= ?\s (following-char)) (delete-char 1)))
)
))
(add-hook 'markdown-mode-hook 'visual-fill-column-mode)
;; Markup editing shortcuts for HTML/Markdown/GTX annotation editing.
;; Functions to easily add italics, bold, Wikipedia links, smallcaps, & margin-note syntax.
(defun surround-region-or-word (start-tag end-tag)
"Surround region (or next word) with START-TAG and END-TAG.
If the previous command was this command, remove the end tag before point,
skip any leading whitespace, move forward one word (within the same line),
and insert only the end tag.
This allows repeating the keybinding to incrementally mark up a region
by bubbling the end-tag through the line."
(interactive)
(if (eq last-command 'surround-region-or-word)
;; Repeated invocation: extend the existing formatted region.
(let ((end-tag-len (length end-tag)))
;; Verify the text immediately before point is the end tag.
(if (and (>= (point) end-tag-len)
(string= (buffer-substring-no-properties (- (point) end-tag-len) (point))
end-tag))
(delete-region (- (point) end-tag-len) (point))
(error "Expected end tag not found"))
;; Skip any whitespace before moving to the next word, for better compatibility with word-level movement like `C-f`:
(skip-chars-forward " \t")
;; Move forward one word, but avoid crossing a newline.
(unless (or (eolp) (looking-at "\n"))
(forward-word))
(insert end-tag))
;; First invocation: wrap region (or next word) with start and end tags.
(let* ((beg (if (use-region-p)
(region-beginning)
(progn
(skip-chars-forward " \t")
(point))))
(end (if (use-region-p)
(region-end)
(progn (forward-word) (point)))))
(goto-char end)
(insert end-tag)
(goto-char beg)
(insert start-tag)
;; Position point after the inserted tags.
(goto-char (+ end (length start-tag) (length end-tag)))
(deactivate-mark)))
(setq this-command 'surround-region-or-word))
;; the wrappers:
(defun html-insert-emphasis ()
"Surround selected region (or word) with HTML tags for italics/emphasis (also Markdown, which supports `*FOO*`)."
(interactive)
(surround-region-or-word "" ""))
(defun markdown-insert-emphasis ()
"Surround selected region (or word) with Markdown asterisks for italics/emphasis.
Equivalent to `FOO` in HTML.
Gwern.net uses `*` for emphasis, and generally reserves `_` for italics such as book titles
(in keeping with Internet conventions predating Gruber's Markdown mistake of conflating `*`/`_`)."
(interactive)
(surround-region-or-word "*" "*"))
(defun html-insert-strong ()
"Surround selected region (or word) with bold tags (HTML, equivalent to `**` in Markdown).
Used in abstracts for topics, first-level list emphasis, etc."
(interactive)
(surround-region-or-word "" ""))
(defun markdown-insert-strong ()
"Surround selected region (or word) with `**` bold tags (Markdown).
Equivalent to `FOO` in HTML.
Used in abstracts for topics, first-level list emphasis, etc."
(interactive)
(surround-region-or-word "**" "**"))
(defun html-insert-smallcaps ()
"Surround selected region (or word) with smallcaps syntax.
Built-in CSS class in HTML & Pandoc Markdown, span syntax is equivalent to
`[FOO]{.smallcaps}`.
Smallcaps are used on Gwern.net for second-level emphasis after bold has been used."
(interactive)
(surround-region-or-word "" ""))
(defun markdown-insert-smallcaps ()
"Surround selected region (or word) with smallcaps syntax (Pandoc Markdown).
Built-in CSS class in HTML & Pandoc Markdown, equivalent to
`FOO`.
Smallcaps are used on Gwern.net for second-level emphasis after bold has been used."
(interactive)
(surround-region-or-word "[" "]{.smallcaps}"))
(defun html-insert-wp-link ()
"Surround selected region (or word) with custom Wikipedia link syntax in HTML.
Compiled by Interwiki.hs to the equivalent (usually) of `FOO`."
(interactive)
(surround-region-or-word "" ""))
(defun markdown-insert-wp-link ()
"Surround selected region (or word) with custom Wikipedia link syntax in Markdown."
(interactive)
(surround-region-or-word "[" "](!W)"))
(defun markdown-insert-margin-note ()
"Surround selected region FOO BAR (or word FOO) with a `margin-note`.
\(Implemented as a special `` class.\)
This creates marginal glosses (in the left margin) as counterparts to sidenotes.
These margin-notes are used as very abbreviated italicized summaries of the
paragraph \(like very small inlined section headers\)."
(interactive)
(surround-region-or-word "[" "]{.marginnote}"))
(defun html-insert-margin-note ()
"Surround selected region FOO BAR (or word FOO) with a `margin-note`.
\(Implemented as a special `` HTML class.\)
This creates marginal glosses (in the left margin) as counterparts to sidenotes.
These margin-notes are used as very abbreviated italicized summaries of the
paragraph \(like very small inlined section headers\).
When inserting margin-notes into HTML snippets, that usually means an annotation
and the margin-note is an editorial insertion, which are denoted by paired `[]` brackets.
To save typing effort, we add those as well if not present."
(interactive)
(let ((content (if (use-region-p)
(buffer-substring-no-properties (region-beginning) (region-end))
(thing-at-point 'word t))))
(if (and (string-prefix-p "[" content) (string-suffix-p "]" content))
(surround-region-or-word "" "")
(surround-region-or-word "[" "]"))))
(defun markdown-insert-editorial-note ()
"Surround selected region (or word) with editorial syntax.
Used on Gwern.net to denote ‘editorial’ insertions like commentary
or annotations. Markdown version"
(interactive)
(surround-region-or-word "[" "]{.editorial}"))
(defun html-insert-editorial-note ()
"Surround selected region FOO BAR (or word FOO) with `editorial note`.
\(Implemented as a special `` HTML class.\)
This is for editorial insertions like commentary.
When inserting editorial notes into HTML snippets,
that usually means an annotation and
the editorial-note is an editorial insertion,
which are denoted by paired `[]` brackets.
To save typing effort, we add those as well if not present.
See also margin-notes (‘html-insert-margin-note’, ‘markdown-insert-margin-note’)."
(interactive)
(let ((content (if (use-region-p)
(buffer-substring-no-properties (region-beginning) (region-end))
(thing-at-point 'word t))))
(if (and (string-prefix-p "[" content) (string-suffix-p "]" content))
(surround-region-or-word "" "")
(surround-region-or-word "[" "]"))))
;; keybindings:
;;; Markdown:
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ \C-e" 'markdown-insert-emphasis)))
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ \C-s" 'markdown-insert-strong)))
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ s" 'markdown-insert-smallcaps)))
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ \C-w" 'markdown-insert-wp-link)))
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ \C-m" 'markdown-insert-margin-note)))
(add-hook 'markdown-mode-hook (lambda()(define-key markdown-mode-map "\C-c\ e" 'markdown-insert-editorial-note)))
;;; HTML:
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ \C-e" 'html-insert-emphasis)))
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ \C-s" 'html-insert-strong)))
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ s" 'html-insert-smallcaps)))
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ \C-w" 'html-insert-wp-link)))
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ \C-m" 'html-insert-margin-note)))
(add-hook 'html-mode-hook (lambda()(define-key html-mode-map "\C-c\ e" 'html-insert-editorial-note)))
;sp
; (add-hook 'markdown-mode-hook 'flyspell)
;for toggling visibility of sections - makes big pages easier to work with
(add-hook 'markdown-mode-hook 'outline-minor-mode)
;In Markdown files, there are few excuses for unbalanced delimiters, and unbalance almost always indicates a link syntax error; in cases where quoted text must contain unbalanced delimiters (eg diffs, or neural-net-generated text or redirects fixing typos), a matching delimiter can be added in a comment like '' to make it add up.
(defun balance-parens () (when buffer-file-name
(add-hook 'after-save-hook
'check-parens
nil t)))
(add-hook 'markdown-mode-hook 'balance-parens)
(add-hook 'ledger-mode-hook 'balance-parens)
(add-hook 'emacs-lisp-mode-hook 'balance-parens)
(add-hook 'haskell-mode-hook 'balance-parens)
(add-hook 'css-mode-hook 'balance-parens)
(add-hook 'javascript-mode-hook 'balance-parens)
(add-hook 'html-mode-hook 'balance-parens)
(add-hook 'python-mode-hook 'balance-parens)
; Insert the secondary X clipboard at point (handles Unicode correctly); works better than `xclip -o`.
; Trims whitespace (spurious whitespace is often added by X GUI programs like Firefox eg. double-clicking HTML headers or page titles, requiring tedious manual deletion).
(global-set-key "\M-`" #'(lambda () (interactive)
(insert-for-yank
(gui-get-selection 'PRIMARY 'UTF8_STRING))))
; Trim spurious whitespace from other X GUI copy-pastes as well.
; (We do not attempt to hook `yank` and run this on *all* copy-paste-like behavior, because deleting whitespace could seriously interfere with document or programming modes.)
(defun my-trim-gui-selection (orig-fun &rest args)
"Trim whitespace from text selected via GUI before yanking into Emacs.
Runs ORIG-FUN on ARGS to create the selected text (ie. original `gui-get-selection` + args)."
(let ((selection (apply orig-fun args)))
(if (stringp selection)
(string-trim selection)
selection)))
(advice-add 'gui-get-selection :around #'my-trim-gui-selection)
; ispell: ignore code blocks in Pandoc Markdown
; TODO: add a fix for '#' not being handled in URLs. current hack borrowed from (also a good example of maintainers being lazy)
(add-to-list 'ispell-skip-region-alist
'("^~~~" . "^~~~"))
(add-to-list 'ispell-skip-region-alist '("#[a-zA-Z]+" forward-word))
;flycheck mode: meant for code errors, but useful for prose linting & Markdown syntax checking as well
;Supported languages:
;Currently used CLI: 'proselint'/'mdl' for text Markdown writing; 'hlint'/'ghc' for Haskell; 'tidy' for HTML, 'flake8' (`apt-get install python-flake8`) for Python; 'jshint' (`npm install --prefix ~/src/ jshint`, configured in ~/.jshintrc) for JavaScript
; NOTE: aside from the 'flycheck' package (from MELPA), you also need a CLI tool to do the actual checking, either (`mdl`, Ruby) or + (`markdown-lint-cli`, a Node.js clone/fork)
; (load "~/src/flycheck/flycheck.el") ; the MELPA package is out of date and does not have either Markdown or proselint support as of 18 Nov 2019, so we have to load from the Github repo
(require 'flycheck)
(defun my/flycheck-disable-for-gtx-files ()
"Disable flycheck for files ending with '.gtx'."
(let ((filename (buffer-file-name)))
(when (and filename (string-match "\\.gtx\\'" filename))
(flycheck-mode -1))))
(add-hook 'after-init-hook #'global-flycheck-mode)
(add-hook 'flycheck-before-syntax-check-hook #'my/flycheck-disable-for-gtx-files)
; syntax checkers must be whitelisted/enabled individually, so turn on proselint & mdl
(add-to-list 'flycheck-checkers 'proselint) ; configured in ~/.proselintrc
(add-to-list 'flycheck-checkers 'markdown-mdl) ; configured in ~/.mdlrc ; list & explanation of rules:
; 'langtool': a Java tool for grammar checking:
(when (require 'langtool nil 'noerror)
(setq langtool-language-tool-jar "~/bin/bin/LanguageTool-4.7/languagetool-commandline.jar")
(setq langtool-default-language "en-US")
; ; can look up in there or run a command like
; $ java -jar languagetool-commandline.jar -b --line-by-line --language en-US --disable "EN_QUOTES,MULTIPLICATION_SIGN,WORD_CONTAINS_UNDERSCORE,ET_AL,SENTENCE_WHITESPACE,DASH_RULE,MORFOLOGIK_RULE_EN_US,EN_UNPAIRED_BRACKETS,WHITESPACE_RULE" ~/wiki/research-criticism.md
; to disable specific rules
(setq langtool-user-arguments '("-b" "--line-by-line" "--disable" "EN_QUOTES,MULTIPLICATION_SIGN,WORD_CONTAINS_UNDERSCORE,ET_AL,SENTENCE_WHITESPACE,DASH_RULE,MORFOLOGIK_RULE_EN_US,EN_UNPAIRED_BRACKETS,WHITESPACE_RULE,UNIT_SPACE,TR"))
)
; mismatched quotes are no good either
;
(add-hook 'markdown-mode-hook (lambda () (modify-syntax-entry ?\" "$" markdown-mode-syntax-table)))
; We visually highlight '\[' in Markdown files to emphasize that they are part of editorial insertions (like '[sic]') and *not* the ubiquitous Markdown link syntax. Confusing them can cause problems.
(global-prettify-symbols-mode 1)
(add-hook 'markdown-mode-hook (lambda ()
(mapc (lambda (pair) (push pair prettify-symbols-alist))
'(
("\\[" . ?〖)
("\\]" . ?〗)
("–" . ?ˉ) ; highlight en-dashes because they look so much like regular hyphens
("—" . ?⸺) ; em-dashes
("−" . ?━) ; MINUS SIGN
))))
;; for line-by-line incrementing IDs; useful for popup links.
;; (defun add-html-spans (start end)
;; (interactive "r")
;; (goto-char start)
;; (let ((id 1))
;; (while (and (< (point) end)
;; (re-search-forward "^>.*$" nil t))
;; (let ((line (match-string 0)))
;; (unless (string-match "^>[ \t]*$" line)
;; (replace-match (format "> %s" id
;; (substring line 2))
;; t nil nil 0)
;; (setq id (1+ id)))))))
;;; region-gpg-inline.el --- Inline symmetric GPG tokens for regions (version 1) -*- lexical-binding: t; -*-
;;
;; Usage
;;
;; - Select a region and run: `M-x gpg-toggle-region`
;; → replaces region with a single-line token: `[GPG1:BASE64...]`
;;
;; - Put point anywhere inside a `[GPG1:...]` token (no region needed) and run:
;; `M-x gpg-toggle-region`
;; → replaces token with decrypted plaintext.
;;
;; - Put point inside an ASCII-armored PGP block (`-----BEGIN PGP MESSAGE-----`)
;; and run:
;; `M-x gpg-toggle-region`
;; → replaces the block with decrypted plaintext.
;;
;; Example usage:
;;
;; "baz quux xxyyzz baz Modern, ad-hoc, symmetric encryption for arbitrary text. lorem ipsum"
;; [Select → M-x gpg-toggle-region → password: "foo bar" → password confirmation: "foo bar" →]
;; "baz quux xxyyzz baz [GPG1:jA0ECQMKtkRJcAQru0b/0mkBVGAblrLxisvGKBk+gSDTobtTa7VaNEjEHeYeCyH2Zppau4eYtR8TBRNptwEqbqLNtghc4ohjREm4K8kPhrop6epU398LdH1NT6ItLdANcvby53YXVa0PTKqA0WD0MDNdx6IVGh7cYgg=] lorem ipsum"
;;
;; Requirements
;;
;; - gpg installed.
;; - For --pinentry-mode loopback to work reliably, set in ~/.gnupg/gpg-agent.conf:
;; > allow-loopback-pinentry
;; then reload:
;; $ gpgconf --reload gpg-agent
;;
;; Caveats
;;
;; - Decrypting in-place puts plaintext into the buffer/undo history and may
;; interact with autosave/backups.
;; - Tokens must remain on one physical line (hard-wrapping breaks them).
;; - Encryption-only; no authentication or signing or resistance to editing.
;; - Not authenticated (no signatures). Token tampering will usually cause decryption failure, but you can’t prove who wrote it.
;;
;; Background: Modern, ad-hoc, symmetric encryption for arbitrary text.
;;
;; Encrypt text notes in-place in a buffer using binary OpenPGP ciphertext from `gpg --symmetric`,
;; then Base64 encoding that binary so it can sit inline in Markdown or HTML.
;; (We do not use Base64-URL because the hyphens make it riskier to use in HTML comments, which is the most natural way to use it.)
;; Useful for inline secrets in shared documents (e.g., notes/org-mode).
;; Select text to censor/uncensor, run `M-x gpg-toggle-region`, and enter a password.
;; The secrets become opaque `[GPG1:…]` tokens, safe for untrusted storage.
;;
;; Conceptually analogous to "Column-Level Encryption" in GNU Recutils or SQL,
;; but applied ad-hoc to text regions. Uses randomized symmetric OpenPGP
;; (typically AES-256 with slower key derivation), prioritizing security over searchability/performance.
;;
;; NOTE: To support `--pinentry-mode loopback` reliably, your ~/.gnupg/gpg-agent.conf
;; may require:
;;
;; > allow-loopback-pinentry
;;
;; TODO: Possible alternative: age . Benefits: scrypt, shorter outputs, simpler invocation with no pinentry. Drawbacks: not installed by default on Linux, still relatively new... Note to self: revisit in a few years.
;;
;;; Code:
(require 'subr-x) ;; string-trim, string-trim-left, string-empty-p, when-let
(defgroup region-gpg-inline nil
"Inline encrypt/decrypt of regions using symmetric GPG."
:group 'tools)
(defcustom region-gpg-program "gpg"
"Path to the gpg executable."
:type 'string)
(defcustom region-gpg-common-args
'("--batch" "--yes" "--pinentry-mode" "loopback" "--passphrase-fd" "0"
"--compress-algo" "none"
"--cipher-algo" "AES256"
"--s2k-cipher-algo" "AES256" ;; belt+suspenders for passphrase-based symmetric cases
;; Hardening: max out OpenPGP S2K work factor (encoded count; slows key-derivation).
"--s2k-digest-algo" "SHA512"
"--s2k-count" "65011712")
"Arguments always passed to gpg.
This setup expects gpg-agent to allow loopback pinentry.
Compression is disabled since inline tokens are typically small.
S2K (string-to-key) is set to max iterations with SHA512 to resist
brute-force attacks on short passphrases."
:type '(repeat string))
(defconst region-gpg-inline-tag "GPG1"
"Inline ciphertext token prefix.
Tokens look like: [GPG1:BASE64...]")
(defconst region-gpg-inline-token-regexp
(concat "\\[" (regexp-quote region-gpg-inline-tag) ":\\([A-Za-z0-9+/=]+\\)\\]")
"Regexp matching a [GPG1:...] token on a single line.")
(defconst region-gpg-armor-begin-regexp "^[ \t]*-----BEGIN PGP MESSAGE-----[ \t]*$"
"Regexp matching the start of an ASCII-armored PGP message.")
(defconst region-gpg-armor-end-regexp "^[ \t]*-----END PGP MESSAGE-----[ \t]*$"
"Regexp matching the end of an ASCII-armored PGP message.")
(defun region-gpg--assert-gpg ()
"Signal `user-error' if `region-gpg-program' is not executable.
This checks `region-gpg-program' using function `executable-find'."
(unless (executable-find region-gpg-program)
(user-error "Cannot find %s via `executable-find' (check variable `exec-path')"
region-gpg-program)))
(defun region-gpg--replace (start end text)
"Replace region START..END with TEXT."
(save-excursion
(goto-char start)
(delete-region start end)
(insert text)))
(defun region-gpg--as-bytes (s)
"Return S as a unibyte string of bytes.
If S is multibyte, it is encoded with coding system `raw-text'."
(if (multibyte-string-p s)
(encode-coding-string s 'raw-text)
s))
(defun region-gpg--run (input passphrase args output-mode)
"Run gpg with ARGS, feeding PASSPHRASE then INPUT on stdin.
OUTPUT-MODE controls how stdout is returned:
- the symbol `binary' returns a unibyte string of raw bytes
- the symbol `text' returns a UTF-8 decoded string
On failure, signal `user-error' with stderr."
(region-gpg--assert-gpg)
(when (string-match-p "\n" passphrase)
(user-error "Passphrase must not contain a newline"))
(let ((coding-system-for-write 'binary)
(coding-system-for-read 'binary))
(with-temp-buffer
;; Work in bytes; build stdin as: PASS\n(PLAINTEXT-or-CIPHERTEXT BYTES)
(set-buffer-multibyte nil)
;; Passphrase is interpreted as a line of text by gpg; encode as UTF-8.
(insert (encode-coding-string passphrase 'utf-8-unix))
(insert "\n")
;; INPUT: plaintext (UTF-8) or raw bytes (unibyte).
(if (multibyte-string-p input)
(insert (encode-coding-string input 'utf-8-unix))
(insert (region-gpg--as-bytes input)))
(let* ((err-file (make-temp-file "region-gpg-inline-stderr-"))
(exit-code (apply #'call-process-region
(point-min) (point-max)
region-gpg-program
;; DELETE=t means replace this temp buffer with stdout
t (list t err-file) nil
(append region-gpg-common-args args))))
(unwind-protect
(if (and (integerp exit-code) (zerop exit-code))
(let ((out-bytes (region-gpg--as-bytes (buffer-string))))
(pcase output-mode
('binary out-bytes)
('text (decode-coding-string out-bytes 'utf-8-unix))
(_ (user-error "Invalid OUTPUT-MODE (expected `binary' or `text'): %S"
output-mode))))
(let ((err (with-temp-buffer
(when (file-exists-p err-file)
(insert-file-contents err-file))
(string-trim (buffer-string)))))
(user-error "GNU gpg failed (%s): %s"
(format "%s" exit-code)
(if (string-empty-p err) "(no stderr)" err))))
(when (file-exists-p err-file)
(delete-file err-file)))))))
(defun region-gpg-inline--wrap (b64)
"Wrap base64 B64 as a [GPG1:...] token."
(format "[%s:%s]" region-gpg-inline-tag b64))
(defun region-gpg-inline--unwrap (token)
"Extract base64 payload from TOKEN of the form [GPG1:...].
Signal `user-error' if TOKEN does not match exactly."
(let ((s (string-trim token)))
(if (string-match (concat "\\`\\[" (regexp-quote region-gpg-inline-tag)
":\\([A-Za-z0-9+/=]+\\)\\]\\'") s)
(match-string 1 s)
(user-error "Region is not exactly a [%s:...] token" region-gpg-inline-tag))))
(defun region-gpg-inline-encrypt-region (start end)
"Encrypt region START..END into a single-line [GPG1:...] token (symmetric)."
(interactive "r")
(when (and (called-interactively-p 'any) (not (use-region-p)))
(user-error "No active region"))
(let ((p1 (read-passwd "Encrypt passphrase: "))
(p2 (read-passwd "Confirm passphrase: ")))
(unless (string= p1 p2)
(user-error "Passphrases do not match"))
(let* ((plain (buffer-substring-no-properties start end))
(bin (region-gpg--run plain p1 '("--symmetric") 'binary))
(b64 (base64-encode-string (region-gpg--as-bytes bin) t))
(tok (region-gpg-inline--wrap b64)))
(region-gpg--replace start end tok))))
(defun region-gpg-inline-decrypt-region (start end)
"Decrypt region START..END containing a [GPG1:...] token (symmetric)."
(interactive "r")
(when (and (called-interactively-p 'any) (not (use-region-p)))
(user-error "No active region"))
(let* ((token (buffer-substring-no-properties start end))
(b64 (region-gpg-inline--unwrap token))
(bin (region-gpg--as-bytes (base64-decode-string b64)))
(p (read-passwd "Decrypt passphrase: "))
(plain (region-gpg--run bin p '("--decrypt") 'text)))
(region-gpg--replace start end plain)))
(defun region-gpg-armor-decrypt-region (start end)
"Decrypt an ASCII-armored PGP MESSAGE block in region START..END (symmetric)."
(interactive "r")
(when (and (called-interactively-p 'any) (not (use-region-p)))
(user-error "No active region"))
(let* ((cipher (buffer-substring-no-properties start end))
(p (read-passwd "Decrypt passphrase: "))
(plain (region-gpg--run cipher p '("--decrypt") 'text)))
(region-gpg--replace start end plain)))
(defun region-gpg-inline--bounds-of-inline-token-at-point ()
"Return (START . END) of the [GPG1:...] token containing point, or nil.
This searches only the current line (tokens are intended to be single-line)."
(save-excursion
(let ((pos (point)))
(goto-char (line-beginning-position))
(catch 'found
(while (re-search-forward region-gpg-inline-token-regexp
(line-end-position) t)
(when (<= (match-beginning 0) pos (match-end 0))
(throw 'found (cons (match-beginning 0) (match-end 0)))))
nil))))
(defun region-gpg-inline--bounds-of-armor-block-at-point ()
"Return (START . END) of the ASCII-armored PGP block containing point, or nil."
(save-excursion
(let ((pos (point)))
(when (re-search-backward region-gpg-armor-begin-regexp nil t)
(let ((beg (match-beginning 0)))
(when (re-search-forward region-gpg-armor-end-regexp nil t)
(let ((end (match-end 0)))
(when (<= beg pos end)
(cons beg end)))))))))
(defun region-gpg-inline--dwim-bounds ()
"Return (START END) bounds for DWIM toggling.
Priority:
1) Active region.
2) [GPG1:...] token at point.
3) ASCII-armored PGP block at point.
If none apply, signal `user-error'."
(cond
((use-region-p)
(list (region-beginning) (region-end)))
((when-let ((b (region-gpg-inline--bounds-of-inline-token-at-point)))
(list (car b) (cdr b))))
((when-let ((b (region-gpg-inline--bounds-of-armor-block-at-point)))
(list (car b) (cdr b))))
(t
(user-error "Select a region to encrypt, or put point inside a [%s:...] token / PGP block to decrypt"
region-gpg-inline-tag))))
(defun region-gpg-inline--classify (start end)
"Classify region START..END as the symbol `inline', `armor', or `plain'."
(let* ((prefix (buffer-substring-no-properties start (min end (+ start 64))))
(s (string-trim-left prefix))
(token-prefix (format "[%s:" region-gpg-inline-tag)))
(cond
((string-prefix-p token-prefix s) 'inline)
((string-prefix-p "-----BEGIN PGP MESSAGE-----" s) 'armor)
(t 'plain))))
;;;###autoload
(defun gpg-toggle-region (start end)
"DWIM: encrypt START to END region, or decrypt [GPG1:...].
If a region is active, operate on that region.
If no region is active, decrypt the token/block containing point."
(interactive (region-gpg-inline--dwim-bounds))
(pcase (region-gpg-inline--classify start end)
('inline (region-gpg-inline-decrypt-region start end))
('armor (region-gpg-armor-decrypt-region start end))
('plain (region-gpg-inline-encrypt-region start end))))
;; Optional aliases matching earlier drafts:
;;;###autoload
(defalias 'encrypt-region-to-password #'region-gpg-inline-encrypt-region)
;;;###autoload
(defalias 'decrypt-region-from-password #'region-gpg-inline-decrypt-region)
(provide 'region-gpg-inline)
;;; region-gpg-inline.el ends here
;;; markdown.el ends here