Maintenance
We have created a python library to enable easy access to our large data catalog. It also assists our work in ETL, as it contains various methods and objects essential to the data wrangling procceses.
Currently, this library lives in the etl repository ( find it here).
Installation¶
Simply install it from PyPI:
Update release¶
After working on your changes in the library, publishing to PyPI is automated:
- Bump the version in
lib/catalog/pyproject.toml - Update the changelog in
lib/catalog/README.md - Commit and push to
master- the package will be automatically published to PyPI via GitHub Actions
The workflow triggers automatically when lib/catalog/pyproject.toml changes on the master branch. It includes a safety check to ensure the version was actually bumped before publishing.
Manual trigger: You can still manually trigger the workflow by clicking Run Workflow in GitHub Actions if needed.
Generate llms.txt¶
The library ships an llms.txt file (at docs/libraries/catalog/llms.txt) that is auto-generated from module docstrings and documentation markdown files. To regenerate it after changing docstrings or docs:
This runs docs/ignore/others/bake_llms_txt.py, which inspects the public API surface and doc files so the output stays in sync with the codebase.