CLAUDE.md (4034B)
1 # sparser — CLAUDE.md 2 3 ## Project 4 5 sparser (Simple Parser) is a suckless tool that extracts external URLs 6 from text-based files. It handles HTML, Markdown (MD/MDX), plain text, 7 and other text files. It can process a single file, read from stdin, or 8 recursively walk a directory tree. Outputs one URL per line to stdout. 9 10 Designed to pair with suploader for a pipeline: 11 sparser -R /content | suploader - 12 13 ## Coding Standards — Suckless C Style 14 15 All code in this project MUST follow the suckless.org coding style: 16 17 ### Language 18 - C99 (ISO/IEC 9899:1999), no extensions 19 - POSIX.1-2008 (`_POSIX_C_SOURCE 200809L`) 20 21 ### Indentation & Whitespace 22 - Tabs for indentation (1 tab = 1 level) 23 - Spaces for alignment only, never for indentation 24 - No tabs except at the beginning of a line 25 - Maximum line length: 79 characters 26 27 ### Comments 28 - Use `/* */` only, never `//` 29 - Comment fallthrough cases in switch statements 30 31 ### Variables 32 - All declarations at the top of the block 33 - Pointer `*` adjacent to variable name: `char *p`, not `char* p` 34 - No C99 `bool`; use `int` (0/1) 35 - Global/static variables not used outside TU must be `static` 36 37 ### Functions 38 - Return type on its own line 39 - Function name at column 0 on next line (enables `grep ^funcname`) 40 - Opening `{` on its own line for functions 41 - Functions not used outside their file: `static` 42 43 ```c 44 static void 45 usage(void) 46 { 47 fprintf(stderr, "usage: sparser [-v] [-R] path\n"); 48 exit(1); 49 } 50 ``` 51 52 ### Braces 53 - Opening `{` on same line for control flow (if, for, while, switch) 54 - Closing `}` on its own line unless continuing (else, do-while) 55 - Use braces even for single statements when sibling branches use them 56 57 ### Naming 58 - lowercase_with_underscores for functions and variables 59 - UPPERCASE for macros and constants 60 - CamelCase for typedef'd struct types 61 - No `_t` suffix (reserved by POSIX) 62 - Prefix module functions with module name 63 64 ### Control Flow 65 - Space after `if`, `for`, `while`, `switch` 66 - No space after `(` or before `)` 67 - Use `goto` for cleanup/unwind, not nested ifs 68 - Return/exit early on failure 69 - Test against 0, not -1: `if (func() < 0)` 70 71 ### Error Handling 72 - All allocation checked; goto cleanup on failure 73 - `die()` for fatal errors (prints message, exits) 74 - `warn()` for recoverable errors (prints, continues) 75 76 ### File Organization Order 77 1. License header 78 2. System includes (alphabetical) 79 3. Local includes 80 4. Macros 81 5. Type definitions 82 6. Function declarations 83 7. Global variables 84 8. Function definitions (same order as declarations) 85 86 ### Headers 87 - System headers first, alphabetical 88 - Local headers after blank line 89 - No cyclic dependencies 90 - Include only what is needed 91 92 ## Architecture 93 94 ### Module Layout 95 96 | Module | Prefix | File | Responsibility | 97 |--------|--------|------|----------------| 98 | Main | — | sparser.c | Entry point, directory walking, file dispatch | 99 | Extract | `extract_` | extract.c | URL extraction from text content | 100 | Utilities | `die`, `warn`, `x*` | util.c | Memory wrappers, string ops, error handling | 101 | Config | — | config.h | Compile-time constants | 102 103 ### Architecture Rules 104 - **Separate compilation.** Every .c file compiles independently. 105 - **No dynamic loading.** All features compiled in. 106 - **No external dependencies.** Pure C99 + POSIX. 107 - **Line-oriented output.** One URL per line to stdout. 108 - **Unix pipeline friendly.** Works with pipes, xargs, etc. 109 110 ## Build 111 112 ```sh 113 make # build sparser binary 114 make clean # remove build artifacts 115 make install # install to /usr/local/bin 116 ``` 117 118 Dependencies: none (pure C99 + POSIX) 119 120 ## Usage 121 122 ```sh 123 # Extract URLs from a single file 124 sparser page.html 125 126 # Recursive directory scan 127 sparser -R /content 128 129 # Read from stdin 130 cat file.md | sparser - 131 132 # Verbose (show file names being processed) 133 sparser -v -R /content 134 135 # Deduplicate output 136 sparser -u -R /content 137 138 # Pipeline with suploader 139 sparser -u -R /content | suploader - 140 ``` 141 142 ## Git Conventions 143 144 - No `Co-Authored-By: Claude` lines 145 - Commit messages: imperative, <72 chars, no period 146 - One logical change per commit