DwC-A#
pyinaturalist_convert.dwca
Download and convert the iNaturalist GBIF and taxonomy datasets from DwC-A to SQLite.
Extra dependencies: sqlalchemy
Example: Download everything and load into a SQLite database:
>>> from pyinaturalist_convert import load_dwca_tables
>>> load_dwca_tables()
Note
By default, data is saved in the recommended platform-specific data directory, for example
~\AppData\Local\
on Windows, or ~/.local/share/
on Linux. Use the db_path
argument to use a different location.
Main functions:
Download observation and taxonomy archives and load into a SQLite database. |
|
Create or update an observations SQLite table from the GBIF DwC-A archive. |
|
Create or update a taxonomy SQLite table from the GBIF DwC-A archive |
- pyinaturalist_convert.dwca.download_dwca_observations(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))#
Download and extract the DwC-A research-grade observations dataset. Reuses local data if it already exists and is up to date.
Example to load into a SQLite database (using the sqlite3 shell, from bash):
export DATA_DIR="$HOME/.local/share/pyinaturalist" sqlite3 -csv $DATA_DIR/observations.db ".import $DATA_DIR/gbif-observations-dwca/observations.csv observations"
- pyinaturalist_convert.dwca.download_dwca_taxa(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))#
Download and extract the DwC-A taxonomy dataset. Reuses local data if it already exists and is up to date.
- pyinaturalist_convert.dwca.load_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), progress=None)#
Create or update an observations SQLite table from the GBIF DwC-A archive. This keeps only the most relevant subset of columns available in the archive, in a format consistent with API results and other sources.
To load everything as-is, see
load_full_dwca_observations()
.
- pyinaturalist_convert.dwca.load_dwca_tables(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#
Download observation and taxonomy archives and load into a SQLite database.
As of 2022-05, this will require about 42GB of free disk space while loading, and the final database will be around 8GB.
- pyinaturalist_convert.dwca.load_dwca_taxa(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca/taxa.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), column_map={'id': 'id', 'parentNameUsageID': 'parent_id', 'references': 'reference_url', 'scientificName': 'name', 'taxonRank': 'rank'}, progress=None)#
Create or update a taxonomy SQLite table from the GBIF DwC-A archive
- pyinaturalist_convert.dwca.load_full_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#
Create an observations SQLite table from the GBIF DwC-A archive, using all columns exactly as they appear in the archive.
This requires the
sqlite3
executable to be installed on the system, since its.import
command is by far the fastest way to load this.