sparser

Simple parser — extract URLs from text files
git clone git clone https://git.krisyotam.com/krisyotam/sparser.git
Log | Files | Refs | LICENSE

CLAUDE.md (4034B)


      1 # sparser — CLAUDE.md
      2 
      3 ## Project
      4 
      5 sparser (Simple Parser) is a suckless tool that extracts external URLs
      6 from text-based files. It handles HTML, Markdown (MD/MDX), plain text,
      7 and other text files. It can process a single file, read from stdin, or
      8 recursively walk a directory tree. Outputs one URL per line to stdout.
      9 
     10 Designed to pair with suploader for a pipeline:
     11   sparser -R /content | suploader -
     12 
     13 ## Coding Standards — Suckless C Style
     14 
     15 All code in this project MUST follow the suckless.org coding style:
     16 
     17 ### Language
     18 - C99 (ISO/IEC 9899:1999), no extensions
     19 - POSIX.1-2008 (`_POSIX_C_SOURCE 200809L`)
     20 
     21 ### Indentation & Whitespace
     22 - Tabs for indentation (1 tab = 1 level)
     23 - Spaces for alignment only, never for indentation
     24 - No tabs except at the beginning of a line
     25 - Maximum line length: 79 characters
     26 
     27 ### Comments
     28 - Use `/* */` only, never `//`
     29 - Comment fallthrough cases in switch statements
     30 
     31 ### Variables
     32 - All declarations at the top of the block
     33 - Pointer `*` adjacent to variable name: `char *p`, not `char* p`
     34 - No C99 `bool`; use `int` (0/1)
     35 - Global/static variables not used outside TU must be `static`
     36 
     37 ### Functions
     38 - Return type on its own line
     39 - Function name at column 0 on next line (enables `grep ^funcname`)
     40 - Opening `{` on its own line for functions
     41 - Functions not used outside their file: `static`
     42 
     43 ```c
     44 static void
     45 usage(void)
     46 {
     47 	fprintf(stderr, "usage: sparser [-v] [-R] path\n");
     48 	exit(1);
     49 }
     50 ```
     51 
     52 ### Braces
     53 - Opening `{` on same line for control flow (if, for, while, switch)
     54 - Closing `}` on its own line unless continuing (else, do-while)
     55 - Use braces even for single statements when sibling branches use them
     56 
     57 ### Naming
     58 - lowercase_with_underscores for functions and variables
     59 - UPPERCASE for macros and constants
     60 - CamelCase for typedef'd struct types
     61 - No `_t` suffix (reserved by POSIX)
     62 - Prefix module functions with module name
     63 
     64 ### Control Flow
     65 - Space after `if`, `for`, `while`, `switch`
     66 - No space after `(` or before `)`
     67 - Use `goto` for cleanup/unwind, not nested ifs
     68 - Return/exit early on failure
     69 - Test against 0, not -1: `if (func() < 0)`
     70 
     71 ### Error Handling
     72 - All allocation checked; goto cleanup on failure
     73 - `die()` for fatal errors (prints message, exits)
     74 - `warn()` for recoverable errors (prints, continues)
     75 
     76 ### File Organization Order
     77 1. License header
     78 2. System includes (alphabetical)
     79 3. Local includes
     80 4. Macros
     81 5. Type definitions
     82 6. Function declarations
     83 7. Global variables
     84 8. Function definitions (same order as declarations)
     85 
     86 ### Headers
     87 - System headers first, alphabetical
     88 - Local headers after blank line
     89 - No cyclic dependencies
     90 - Include only what is needed
     91 
     92 ## Architecture
     93 
     94 ### Module Layout
     95 
     96 | Module | Prefix | File | Responsibility |
     97 |--------|--------|------|----------------|
     98 | Main | — | sparser.c | Entry point, directory walking, file dispatch |
     99 | Extract | `extract_` | extract.c | URL extraction from text content |
    100 | Utilities | `die`, `warn`, `x*` | util.c | Memory wrappers, string ops, error handling |
    101 | Config | — | config.h | Compile-time constants |
    102 
    103 ### Architecture Rules
    104 - **Separate compilation.** Every .c file compiles independently.
    105 - **No dynamic loading.** All features compiled in.
    106 - **No external dependencies.** Pure C99 + POSIX.
    107 - **Line-oriented output.** One URL per line to stdout.
    108 - **Unix pipeline friendly.** Works with pipes, xargs, etc.
    109 
    110 ## Build
    111 
    112 ```sh
    113 make            # build sparser binary
    114 make clean      # remove build artifacts
    115 make install    # install to /usr/local/bin
    116 ```
    117 
    118 Dependencies: none (pure C99 + POSIX)
    119 
    120 ## Usage
    121 
    122 ```sh
    123 # Extract URLs from a single file
    124 sparser page.html
    125 
    126 # Recursive directory scan
    127 sparser -R /content
    128 
    129 # Read from stdin
    130 cat file.md | sparser -
    131 
    132 # Verbose (show file names being processed)
    133 sparser -v -R /content
    134 
    135 # Deduplicate output
    136 sparser -u -R /content
    137 
    138 # Pipeline with suploader
    139 sparser -u -R /content | suploader -
    140 ```
    141 
    142 ## Git Conventions
    143 
    144 - No `Co-Authored-By: Claude` lines
    145 - Commit messages: imperative, <72 chars, no period
    146 - One logical change per commit