File Formats
JSON, YAML, TOML, INI, CSV/TSV, and when to use them
JSON (JavaScript Object Notation)
Ubiquitous data-interchange format. Great default for APIs, configs, and storing structured data.
- Pros: Simple, widely supported, strict, human-readable.
- Cons: No comments by spec, limited types (no dates), strict quoting rules.
- When handy: API payloads, cross-language data exchange, small to medium configs.
# Read JSON
import json
with open("config.json", "r", encoding="utf-8") as f:
cfg = json.load(f)
# Write JSON (pretty)
with open("config.json", "w", encoding="utf-8") as f:
json.dump(cfg, f, indent=2, ensure_ascii=False)
YAML (YAML Ain't Markup Language)
Indentation-based superset for configs. More expressive than JSON (comments, anchors).
- Pros: Comments, anchors/aliases, concise for nested configs.
- Cons: Whitespace-sensitive, multiple flavors, foot-gun with tabs.
- When handy: App/service configuration (CI/CD, Kubernetes, tools).
# pip install pyyaml
import yaml
# Read
with open("config.yaml", "r", encoding="utf-8") as f:
cfg = yaml.safe_load(f)
# Write
with open("config.yaml", "w", encoding="utf-8") as f:
yaml.safe_dump(cfg, f, sort_keys=False)
TOML (Tom's Obvious, Minimal Language)
Minimal config format used by Python tooling (pyproject.toml). Good balance of strictness and readability.
- Pros: Deterministic, comments allowed, well-specified types.
- Cons: Less common outside tooling; multi-line/complex structures can be verbose.
- When handy: Project configs, dependency metadata, settings you want under version control.
# Python 3.11+: tomllib is built-in (read-only)
import sys
if sys.version_info >= (3, 11):
import tomllib
with open("pyproject.toml", "rb") as f:
cfg = tomllib.load(f)
else:
# pip install tomli
import tomli
with open("pyproject.toml", "rb") as f:
cfg = tomli.load(f)
# Writing TOML (third-party)
# pip install tomlkit
from tomlkit import dumps as toml_dumps
text = toml_dumps({"tool": {"example": {"enabled": True}}})
open("example.toml", "w", encoding="utf-8").write(text)
INI / .env (Key-Value)
Simple key-value config files. INI supports sections; .env is one-key-per-line for environment variables.
- Pros: Very simple, easy to edit, great for small settings.
- Cons: Limited types/structures, quoting/escaping can be inconsistent.
- When handy: Local overrides, credentials via env vars, legacy configs.
# INI
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read("settings.ini", encoding="utf-8")
value = cfg["section"]["key"]
# .env
# pip install python-dotenv
from dotenv import load_dotenv
load_dotenv(".env") # loads into process env
import os
token = os.getenv("API_TOKEN")
CSV / TSV (Tabular Data)
Row/column text formats. Universal for spreadsheets and simple datasets.
- Pros: Human-readable, works with Excel/Sheets, tiny overhead.
- Cons: No schema, no nested data, quoting/commas can be tricky.
- When handy: Imports/exports, quick data exchange, small analytics.
import csv
# Read
with open("data.csv", newline="", encoding="utf-8") as f:
rows = list(csv.DictReader(f))
# Write
with open("out.csv", "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=["id","name"])
w.writeheader()
w.writerows(rows)
JSON Lines (NDJSON/JSONL)
One JSON object per line. Great for logs, streaming, and incremental processing.
- Pros: Append-friendly, streamable, resilient to partial files.
- Cons: No global structure; must process line-by-line.
- When handy: Logging, big data pipelines, long-running jobs.
import json
# Read NDJSON
with open("events.jsonl", "r", encoding="utf-8") as f:
events = [json.loads(line) for line in f if line.strip()]
# Write NDJSON
with open("events.jsonl", "w", encoding="utf-8") as f:
for event in events:
f.write(json.dumps(event) + "\n")
XML (Extensible Markup Language)
Verbose but structured markup. Common in older systems and some tooling.
- Pros: Schemas (XSD), namespaces, streaming parsers.
- Cons: Verbose, harder to read/write by hand vs JSON/YAML.
- When handy: Legacy integrations, standards-based documents.
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
items = [el.attrib for el in root.findall(".//item")]
Choosing the right format
- JSON: default for APIs and general interchange.
- YAML: human-edited app/service config with comments.
- TOML: project/tooling configuration (pyproject.toml).
- INI/.env: tiny configs, local dev secrets via env vars.
- CSV/TSV: flat tabular data, spreadsheets, quick exports.
- JSONL: logs/streams and append-only pipelines.
- XML: standards/legacy systems that require it.