# Defining a Parser A **parser definition** is the complete set of parameters employed by a program, expressed as a dict, a dataclass, a function signature, or a pre-built `ArgumentParser`. fargv supports four parser definition styles, each mapping naturally to a different development style and project size. --- ## 1 Plain dict with Python literals **When to use**: prototypes, notebooks, and scripts where parameter names are self-documenting and you want the least possible boilerplate. Pass a `dict` whose values are **plain Python literals** as the parser definition. fargv infers the parameter type from the default value's runtime type. | Default value | Inferred type | CLI form | |---|---|---| | `int` | integer | `--epochs 50` | | `float` | float | `--lr 0.001` | | `bool` | boolean switch | `--verbose` / `--verbose false` | | `str` | string | `--output_dir ./out` | | `tuple` (≥ 3 items) | choice (first = default) | `--mode train` | | `list` | positional list | leftover tokens | ```python import fargv # ── Machine Learning: train a classifier ───────────────────────────────────── p, _ = fargv.parse({ "data_dir": "/datasets/imagenet", "output_dir": "{data_dir}/checkpoints", # {key} interpolation "model": ("resnet50", "vit_b16", "efficientnet_b0"), "epochs": 90, "lr": 0.1, "batch_size": 256, "amp": False, "workers": 8, "files": [], # positional: extra paths, if any }) print(f"Training {p.model} for {p.epochs} epochs lr={p.lr} amp={p.amp}") ``` ```bash python train.py --model=vit_b16 --epochs=30 --amp python train.py --data_dir=/data/imagenet --lr=0.01 --batch_size=128 ``` ```python # ── Data Analytics: run a report pipeline ──────────────────────────────────── p, _ = fargv.parse({ "db_url": "postgresql://localhost/analytics", "format": ("parquet", "csv", "json"), "date_from": "2024-01-01", "date_to": "2024-12-31", "dry_run": False, "tables": [], }) ``` **Common mistakes** ```python # ✗ set() works as positional in legacy API but is ambiguous — prefer list "files": set() # ✓ Use list "files": [] # ✓ A two-element tuple of strings IS a two-choice enum — consistent with 3+ "mode": ("train", "eval") # choice: train | eval "mode": ("train", "eval", "test") # choice: train | eval | test ``` **When to use** - Prototypes and research scripts where speed of writing matters most - Notebooks and one-off data-pipeline scripts - Teaching / demos where the definition should be self-evident **When to avoid** - When `--help` quality matters for end users (no descriptions on params) - When you need mandatory parameters, file-existence checks, or streams - When you want IDE autocompletion on the result namespace | | | |---|---| | ✅ Pros | ❌ Cons | | Minimum boilerplate — a plain dict | No per-parameter `--help` descriptions | | `{key}` string interpolation | No rich types (streams, paths, tuples) | | Self-documenting defaults | No mandatory parameters | | Full auto-params for free | `p.lr` is untyped `SimpleNamespace` | **How do I…** *Make a parameter a choice?* — use a tuple with ≥ 3 elements: ```python {"mode": ("train", "eval", "test")} # first element is default ``` *Add a description to a single parameter without switching styles?* — use the 2-element `(default, "description")` shorthand: ```python {"lr": (0.01, "Initial learning rate")} ``` *Collect extra CLI tokens?* — use an empty list: ```python {"files": []} # python script.py a.txt b.txt → p.files == ["a.txt", "b.txt"] ``` *Switch to a richer style for just one parameter?* — mix a `Fargv*` object into the dict alongside plain literals: ```python {"lr": 0.01, "weights": fargv.FargvStr(fargv.REQUIRED)} ``` --- ## 2 Plain dict with `Fargv*` types **When to use**: production scripts where `--help` quality matters, parameters that must exist on disk, streams, fixed-length tuples, or verbosity counters. Replace bare Python literals with explicit `Fargv*` parameter objects. You can mix both styles freely in one dict. ```python import sys import fargv # ── Computer Vision: object-detection training ─────────────────────────────── p, _ = fargv.parse({ "images_dir": fargv.FargvStr("/data/coco/images", description="Root directory of COCO-style images"), "annotations": fargv.FargvExistingFile("/data/coco/instances_train.json", description="COCO annotations JSON file (must exist)"), "output_dir": fargv.FargvStr("{images_dir}/../runs", description="Where to write checkpoints and logs"), "arch": fargv.FargvChoice(["yolov8n", "yolov8s", "yolov8m", "yolov8l"], description="Model architecture"), "img_size": fargv.FargvTuple((int, int), default=(640, 640), description="Input resolution (width height)"), "epochs": fargv.FargvInt(100, description="Total training epochs"), "lr0": fargv.FargvFloat(0.01, description="Initial learning rate"), "augment": fargv.FargvBool(True, description="Enable mosaic / colour-jitter augmentation"), "log": fargv.FargvStream(sys.stderr, description="Log destination (file path, stderr, or stdout)"), "weights": fargv.FargvStr(fargv.REQUIRED, description="Path to pretrained weights (mandatory)"), "verbosity": fargv.FargvInt(0, short_name="v", is_count_switch=True, description="Verbosity level (-vvv = 3)"), "checkpoints": fargv.FargvPositional(default=[], description="Checkpoint files to evaluate"), }) print(f"Detecting with {p.arch} at {p.img_size} weights={p.weights}") ``` ```bash python detect.py --weights=yolov8n.pt --epochs=50 --img_size="(1280, 1280)" python detect.py --weights=yolov8l.pt --log=train.log --augment false -vv python detect.py --weights=yolov8l.pt a.pt b.pt c.pt # positional checkpoints ``` ```python # ── NLP: fine-tune a language model ────────────────────────────────────────── p, _ = fargv.parse({ "model_name": fargv.FargvStr("bert-base-uncased", description="HuggingFace model identifier"), "train_file": fargv.FargvExistingFile(fargv.REQUIRED, description="Training data (jsonl, one example per line)"), "task": fargv.FargvChoice(["classification", "ner", "qa", "summarisation"], description="Fine-tuning task type"), "max_length": fargv.FargvInt(512), "epochs": fargv.FargvInt(3), "lr": fargv.FargvFloat(2e-5), "fp16": fargv.FargvBool(False, description="Enable 16-bit mixed precision"), }) ``` **Common mistakes** ```python import sys # ✗ FargvStream / FargvOutputStream do NOT accept string keywords as defaults # — strings are only valid on the CLI, not at construction time "log": fargv.FargvOutputStream("stderr") # raises FargvError # ✓ Pass the actual sys object "log": fargv.FargvStream(sys.stderr) # default = stderr "out": fargv.FargvOutputStream() # default = stdout "inp": fargv.FargvInputStream() # default = stdin # ✗ Forgetting REQUIRED is a sentinel, not a string "weights": fargv.FargvStr("REQUIRED") # default is the literal string "REQUIRED" # ✓ "weights": fargv.FargvStr(fargv.REQUIRED) ``` **When to use** - Production scripts where `--help` quality and clear error messages matter - When you need mandatory parameters, file-existence validation, or streams - When mixing one or two rich parameters into an otherwise plain dict **When to avoid** - When you need IDE autocompletion on the result (use dataclass instead) - For trivial scripts where the extra verbosity slows you down | | | |---|---| | ✅ Pros | ❌ Cons | | Per-parameter `--help` descriptions | More verbose than bare literals | | Rich types (path, stream, tuple, count-switch) | `SimpleNamespace` result — no IDE autocompletion | | Explicit mandatory parameters (`REQUIRED`) | Requires importing `Fargv*` classes | | Mix freely with plain literals | | **How do I…** *Make a parameter mandatory?* ```python {"weights": fargv.FargvStr(fargv.REQUIRED, description="model weights")} ``` *Add a count-switch verbosity flag?* ```python {"verbose": fargv.FargvInt(0, short_name="v", is_count_switch=True)} # -vvv sets verbose=3 ``` *Require a file to exist at parse time?* ```python {"config": fargv.FargvExistingFile(fargv.REQUIRED)} ``` *Accept a stream (file, stdout, stderr)?* ```python import sys {"log": fargv.FargvStream(sys.stderr)} # --log=out.txt or --log=stdout ``` --- ## 3 Function signature **When to use**: library functions you want to expose as scripts, `python -m fargv module.function` ad-hoc invocations, and situations where the function already documents itself. Pass any **callable** as the definition. fargv introspects its signature using `inspect` and `typing.get_type_hints`, inferring one parameter per argument. ```python import fargv # ── NLP: tokeniser benchmark ───────────────────────────────────────────────── def tokenise( corpus_path: str, tokeniser: str = "wordpiece", vocab_size: int = 30_000, lower_case: bool = True, output_dir: str = "./tokeniser_out", ) -> None: """Tokenise *corpus_path* and save the resulting vocabulary.""" print(f"tokenising {corpus_path!r} vocab={vocab_size} lower={lower_case}") p, _ = fargv.parse(tokenise) tokenise(**vars(p)) ``` ```bash python tok.py --corpus_path=/data/wiki.txt --vocab_size=50000 python tok.py --corpus_path=/data/cc.txt --tokeniser=bpe --lower_case false ``` ```python # ── Data Analytics: pandas pipeline ───────────────────────────────────────── import pandas as pd def aggregate( input_csv: str, group_by: str = "region", metric: str = "revenue", output_csv: str = "aggregated.csv", dropna: bool = True, ) -> None: df = pd.read_csv(input_csv) if dropna: df = df.dropna(subset=[metric]) df.groupby(group_by)[metric].sum().reset_index().to_csv(output_csv, index=False) p, _ = fargv.parse(aggregate, non_defaults_are_mandatory=True) aggregate(**vars(p)) ``` ```python # ── Expose any callable directly ───────────────────────────────────────────── import fargv p, _ = fargv.parse(sorted, given_parameters=["prog", "--reverse"]) # Or: python -m fargv numpy.linspace -s 0 -S 6.283 --num 8 ``` **Common mistakes** ```python # ✗ *args / **kwargs cause an error unless you opt in def bad(x: int, *args, **kwargs): ... p, _ = fargv.parse(bad) # raises FargvError # ✓ Opt in explicitly p, _ = fargv.parse(bad, fn_def_tolerate_wildcards=True) # ✗ Unannotated parameter with None default is silently skipped def ambiguous(x: int, device=None): ... # 'device' has no annotation + None default p, _ = fargv.parse(ambiguous) # 'device' does not appear in help or result # ✓ Add an annotation def clear(x: int, device: str = "cpu"): ... # ✗ Rich fargv types cannot be expressed as function defaults — they appear as # their plain Python value on the CLI (no description, no special behaviour) def model(paths=fargv.FargvPositional([])): # fargv sees a FargvPositional object, ... # not recognised; treated as FargvStr ``` **When to use** - Library functions you want to expose as CLI tools with no wrapper - Ad-hoc invocation via `python -m fargv module.callable` - When the function already documents itself via docstring and annotations **When to avoid** - When you need IDE autocompletion on the parsed result (use dataclass) - When parameters need rich types like streams or path validation - When the function has unannotated `None` defaults you depend on | | | |---|---| | ✅ Pros | ❌ Cons | | Zero duplication — signature *is* the definition | Type annotations required for accurate coercion | | Works on any callable (stdlib, third-party) | `*args`/`**kwargs` need `fn_def_tolerate_wildcards=True` | | Docstring appears in `--help` | `None` defaults without annotations are silently skipped | | Natural fit for `python -m fargv` | Rich `Fargv*` types not usable as function defaults | | | No `{key}` interpolation | **How do I…** *Make parameters without defaults mandatory?* ```python def run(host: str, port: int = 8080): ... p, _ = fargv.parse(run, non_defaults_are_mandatory=True) # python run.py --host=localhost ``` *Parse and call in one step?* ```python fargv.parse_and_launch(train) ``` *Call from inside the function itself?* ```python def train(lr: float = 0.01, epochs: int = 10): p, _ = fargv.parse_here() # resolves own signature print(p.lr, p.epochs) ``` *Handle `*args` or `**kwargs`?* ```python p, _ = fargv.parse(fn, fn_def_tolerate_wildcards=True) ``` --- ## 4 Dataclass **When to use**: any project where IDE autocompletion, type safety, and re-using the config object across modules matter. The return value is an instance of your class — `cfg.lr`, `cfg.arch` autocomplete in every IDE. Decorate a class with `@dataclass` and pass the **class** (not an instance) to `fargv.parse`. ### 4a Plain field defaults For parameters expressible as plain Python literals, annotate the field and set the default directly. fargv infers the `Fargv*` type from the annotation and the default value. ```python from dataclasses import dataclass, field import fargv # ── Machine Learning: distributed training ─────────────────────────────────── @dataclass class TrainConfig: data_root: str = "/datasets/imagenet" num_workers: int = 8 arch: str = "resnet50" pretrained: bool = True epochs: int = 90 lr: float = 0.1 weight_decay: float = 1e-4 amp: bool = False gpus: int = 1 output_dir: str = "./runs" cfg, _ = fargv.parse(TrainConfig) # cfg is TrainConfig — IDE autocompletes cfg.lr, cfg.arch, etc. print(f"Training {cfg.arch} for {cfg.epochs} epochs on {cfg.gpus} GPU(s)") ``` ```bash python train.py --arch=vit_b16 --epochs=30 --amp --gpus=4 ``` ```python # ── CV: mandatory fields (no default) ──────────────────────────────────────── @dataclass class InferConfig: checkpoint: str # mandatory — no default images_dir: str = "/data/val" threshold: float = 0.5 half: bool = False cfg, _ = fargv.parse(InferConfig, non_defaults_are_mandatory=True) ``` ### 4b `Fargv*` parameter instances as defaults When you need a rich type that has no plain Python literal equivalent (`FargvPositional`, `FargvStream`, `FargvInt(is_count_switch=True)`, …), assign a `Fargv*` instance as the field default. fargv detects it and uses it directly, ignoring the annotation (which exists only to satisfy the type checker and `@dataclass`). ```python import sys from dataclasses import dataclass import fargv # ── ML checkpoint tool ──────────────────────────────────────────────────────── @dataclass class CheckpointConfig: # Plain defaults — inferred from annotation + value output_dir: str = "./runs" dry_run: bool = False # Rich types — FargvParameter instance IS the parameter definition checkpoints: list = fargv.FargvPositional(default=[], description="Checkpoint .pt files to inspect") verbosity: int = fargv.FargvInt(0, short_name="v", is_count_switch=True, description="Verbosity level (-vvv = 3)") log: object = fargv.FargvStream(sys.stderr, description="Log stream (path, stderr, or stdout)") cfg, _ = fargv.parse(CheckpointConfig) # cfg.checkpoints is a list, cfg.verbosity is an int, cfg.log is a file-like ``` ```bash python tool.py a.pt b.pt --output_dir=./out -vv --log=run.log ``` ### 4c Field docstrings Per-field descriptions appear in `--help` output and can be attached directly in the parser definition as **attribute docstrings** — a bare string literal immediately after the field. Two placement styles are recognised: | Style | Syntax | Recognised by | |---|---|---| | Next-line | string on the following line | fargv, Sphinx, PyCharm, Pylance | | Same-line | string after `;` on the same line | fargv only | ```python from dataclasses import dataclass import fargv @dataclass class TrainConfig: # ── No description ──────────────────────────────────────────────────────── output_dir: str = "./runs" # ── Same-line (compact; recognised by fargv, not by Sphinx/IDEs) ───────── epochs: int = 90; "Total number of training epochs." # ── Next-line (conventional; recognised by fargv, Sphinx, and IDEs) ────── lr: float = 0.1 "Initial learning rate." arch: str = "resnet50" """ Model architecture identifier. Any torchvision-compatible name is accepted. """ cfg, _ = fargv.parse(TrainConfig) ``` Field docstrings are a fallback: if the field default is a `Fargv*` instance with an explicit `description=`, that takes precedence and the docstring is ignored. ```python @dataclass class MixedConfig: # description= wins over the docstring lr: float = fargv.FargvFloat(0.1, description="Learning rate (explicit wins)") "This docstring is ignored." # no description= → docstring is used epochs: int = fargv.FargvInt(90, short_name="e") "Total training epochs." ``` **Common mistakes** ```python # ✗ Missing type annotation — @dataclass ignores class attributes without one, # so fargv never sees this field @dataclass class Bad: paths = fargv.FargvPositional(default=[]) # NOT a dataclass field; ignored # ✓ Add a type annotation @dataclass class Good: paths: list = fargv.FargvPositional(default=[]) # ✗ Trailing comma — makes the value a one-element tuple, not a FargvParameter @dataclass class Bad2: paths: list = fargv.FargvPositional(default=[]), # <- comma! # paths is now (FargvPositional(...),) — fargv sees a tuple, infers FargvChoice # ✓ No trailing comma @dataclass class Good2: paths: list = fargv.FargvPositional(default=[]) # ✗ FargvStream default must be a sys object, not a string keyword @dataclass class Bad3: log: object = fargv.FargvStream("stderr") # raises FargvError at import time # ✓ import sys @dataclass class Good3: log: object = fargv.FargvStream(sys.stderr) # ✗ Passing a dataclass instance instead of the class itself cfg = TrainConfig() p, _ = fargv.parse(cfg) # raises TypeError — pass the class, not an instance # ✓ p, _ = fargv.parse(TrainConfig) ``` **When to use** - Any project where IDE autocompletion, mypy/pyright, or `isinstance` checks matter - When the config object is passed around multiple modules - When you want mandatory fields expressed as plain missing defaults **When to avoid** - For throwaway scripts (a plain dict is faster to write) - When `{key}` string interpolation between parameters is essential | | | |---|---| | ✅ Pros | ❌ Cons | | Full IDE autocompletion on result | More boilerplate than a plain dict for tiny scripts | | `isinstance(cfg, MyConfig)` works | Type annotations required on every field | | Reusable typed config across modules | No `{key}` interpolation across fields | | Rich types via `Fargv*` instance defaults | `Fargv*` defaults share one object per class | | Mandatory fields: just omit the default | | | Works with `dataclasses.asdict`, `json.dumps` | | **How do I…** *Make a field mandatory?* — omit the default: ```python @dataclass class Config: checkpoint: str # no default → required on CLI threshold: float = 0.5 ``` *Add a description to a field?* — bare string literal immediately after: ```python @dataclass class Config: lr: float = 0.1 "Initial learning rate." # shown in --help ``` *Add a rich type (stream, count-switch, positional)?* ```python import sys from dataclasses import dataclass import fargv @dataclass class Config: verbose: int = fargv.FargvInt(0, short_name="v", is_count_switch=True) log: object = fargv.FargvStream(sys.stderr) files: list = fargv.FargvPositional(default=[]) ``` *Use subcommands in a dataclass?* ```python from dataclasses import dataclass, field import fargv @dataclass class Config: cmd: dict = field(default_factory=lambda: { "train": {"lr": 0.01}, "eval": {"dataset": "val"}, }) cfg, _ = fargv.parse(Config, subcommand_return_type="nested") ``` --- ## Choosing the right approach | | Dict (literals) | Dict (`Fargv*`) | Function | Dataclass | |---|:---:|:---:|:---:|:---:| | Boilerplate | minimal | low | none† | low | | IDE autocomplete on result | ❌ | ❌ | ❌ | ✅ | | Per-param descriptions | ❌ | ✅ | via docstring | field docstrings or `Fargv*` defaults | | Rich types (stream, path, tuple, count-switch) | ❌ | ✅ | ❌ | ✅ (4b) | | Mandatory params | ❌ | ✅ | ✅ | ✅ | | Reusable typed config object | ❌ | ❌ | ❌ | ✅ | | `{key}` interpolation | ✅ | ✅ | ❌ | ❌ | | Works with existing callables | ❌ | ❌ | ✅ | ❌ | † the function itself is the definition; no wrapper needed. In practice many scripts start with a plain dict and graduate to a dataclass as the project grows — the migration is a mechanical rename-and-annotate.