File Formats

JSON, YAML, TOML, INI, CSV/TSV, and when to use them

JSON (JavaScript Object Notation)

Ubiquitous data-interchange format. Great default for APIs, configs, and storing structured data.

  • Pros: Simple, widely supported, strict, human-readable.
  • Cons: No comments by spec, limited types (no dates), strict quoting rules.
  • When handy: API payloads, cross-language data exchange, small to medium configs.
# Read JSON
import json
with open("config.json", "r", encoding="utf-8") as f:
    cfg = json.load(f)

# Write JSON (pretty)
with open("config.json", "w", encoding="utf-8") as f:
    json.dump(cfg, f, indent=2, ensure_ascii=False)

YAML (YAML Ain't Markup Language)

Indentation-based superset for configs. More expressive than JSON (comments, anchors).

  • Pros: Comments, anchors/aliases, concise for nested configs.
  • Cons: Whitespace-sensitive, multiple flavors, foot-gun with tabs.
  • When handy: App/service configuration (CI/CD, Kubernetes, tools).
# pip install pyyaml
import yaml

# Read
with open("config.yaml", "r", encoding="utf-8") as f:
    cfg = yaml.safe_load(f)

# Write
with open("config.yaml", "w", encoding="utf-8") as f:
    yaml.safe_dump(cfg, f, sort_keys=False)

TOML (Tom's Obvious, Minimal Language)

Minimal config format used by Python tooling (pyproject.toml). Good balance of strictness and readability.

  • Pros: Deterministic, comments allowed, well-specified types.
  • Cons: Less common outside tooling; multi-line/complex structures can be verbose.
  • When handy: Project configs, dependency metadata, settings you want under version control.
# Python 3.11+: tomllib is built-in (read-only)
import sys
if sys.version_info >= (3, 11):
    import tomllib
    with open("pyproject.toml", "rb") as f:
        cfg = tomllib.load(f)
else:
    # pip install tomli
    import tomli
    with open("pyproject.toml", "rb") as f:
        cfg = tomli.load(f)

# Writing TOML (third-party)
# pip install tomlkit
from tomlkit import dumps as toml_dumps
text = toml_dumps({"tool": {"example": {"enabled": True}}})
open("example.toml", "w", encoding="utf-8").write(text)

INI / .env (Key-Value)

Simple key-value config files. INI supports sections; .env is one-key-per-line for environment variables.

  • Pros: Very simple, easy to edit, great for small settings.
  • Cons: Limited types/structures, quoting/escaping can be inconsistent.
  • When handy: Local overrides, credentials via env vars, legacy configs.
# INI
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read("settings.ini", encoding="utf-8")
value = cfg["section"]["key"]

# .env
# pip install python-dotenv
from dotenv import load_dotenv
load_dotenv(".env")  # loads into process env
import os
token = os.getenv("API_TOKEN")

CSV / TSV (Tabular Data)

Row/column text formats. Universal for spreadsheets and simple datasets.

  • Pros: Human-readable, works with Excel/Sheets, tiny overhead.
  • Cons: No schema, no nested data, quoting/commas can be tricky.
  • When handy: Imports/exports, quick data exchange, small analytics.
import csv

# Read
with open("data.csv", newline="", encoding="utf-8") as f:
    rows = list(csv.DictReader(f))

# Write
with open("out.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=["id","name"])
    w.writeheader()
    w.writerows(rows)

JSON Lines (NDJSON/JSONL)

One JSON object per line. Great for logs, streaming, and incremental processing.

  • Pros: Append-friendly, streamable, resilient to partial files.
  • Cons: No global structure; must process line-by-line.
  • When handy: Logging, big data pipelines, long-running jobs.
import json

# Read NDJSON
with open("events.jsonl", "r", encoding="utf-8") as f:
    events = [json.loads(line) for line in f if line.strip()]

# Write NDJSON
with open("events.jsonl", "w", encoding="utf-8") as f:
    for event in events:
        f.write(json.dumps(event) + "\n")

XML (Extensible Markup Language)

Verbose but structured markup. Common in older systems and some tooling.

  • Pros: Schemas (XSD), namespaces, streaming parsers.
  • Cons: Verbose, harder to read/write by hand vs JSON/YAML.
  • When handy: Legacy integrations, standards-based documents.
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
items = [el.attrib for el in root.findall(".//item")]

Choosing the right format

  • JSON: default for APIs and general interchange.
  • YAML: human-edited app/service config with comments.
  • TOML: project/tooling configuration (pyproject.toml).
  • INI/.env: tiny configs, local dev secrets via env vars.
  • CSV/TSV: flat tabular data, spreadsheets, quick exports.
  • JSONL: logs/streams and append-only pipelines.
  • XML: standards/legacy systems that require it.