Inserting data#
Creating tables#
The dclient create-table command creates a new empty table with an explicit schema. Define columns with --column name type and optionally set primary keys with --pk:
dclient create-table mydb dogs \
--column id integer \
--column name text \
--column age integer \
--pk id \
-i myapp
This hits the Datasette create API with a columns array. The response includes the generated schema:
{
"ok": true,
"database": "mydb",
"table": "dogs",
"schema": "CREATE TABLE [dogs] (\n [id] INTEGER PRIMARY KEY,\n [name] TEXT,\n [age] INTEGER\n)"
}
Compound primary keys are supported by passing --pk multiple times:
dclient create-table mydb events \
--column user_id integer \
--column event_id integer \
--column data text \
--pk user_id --pk event_id
If you want to create a table and populate it with data in one step, use dclient insert --create instead.
Inserting rows#
The dclient insert command can be used to insert data from a local file directly into a Datasette instance, via the Write API introduced in the Datasette 1.0 alphas.
First you’ll need to authenticate with the instance.
To insert data from a data.csv file into a table called my_table in the data database:
dclient insert data my_table data.csv --create -i myapp
You can also pipe data into standard input:
curl -s 'https://api.github.com/repos/simonw/dclient/issues' | \
dclient insert data issues - --create -i myapp
Upserting data#
The dclient upsert command works exactly like insert but uses the upsert endpoint, which will update existing rows with matching primary keys rather than raising an error.
dclient upsert data my_table data.csv --csv -i myapp
Streaming data#
dclient insert works for streaming data as well.
If you have a log file containing newline-delimited JSON you can tail it and send it to a Datasette instance like this:
tail -f log.jsonl | \
dclient insert data logs - --nl -i myapp
When reading from standard input (filename -) you are required to specify the format. In this example that’s --nl for newline-delimited JSON. --csv and --tsv are supported for streaming as well, but --json is not.
In streaming mode records default to being sent to the server every 100 records or every 10 seconds, whichever comes first. You can adjust these values using the --batch-size and --interval settings. For example, here’s how to send every 10 records or if 5 seconds has passed since the last time data was sent to the server:
tail -f log.jsonl | dclient insert \
data logs - --nl --create -i myapp \
--batch-size 10 \
--interval 5
Supported formats#
Data can be inserted from CSV, TSV, JSON or newline-delimited JSON files.
The format of the file will be automatically detected. You can override this by using one of the following options:
--csv--tsv--json--nlfor newline-delimited JSON
Use --encoding <encoding> to specify the encoding of the file. The default is utf-8.
JSON#
JSON files should be formatted like this:
[
{
"id": 1
"column1": "value1",
"column2": "value2"
},
{
"id": 2
"column1": "value1",
"column2": "value2"
}
]
Newline-delimited files like this:
{"id": 1, "column1": "value1", "column2": "value2"}
{"id": 2, "column1": "value1", "column2": "value2"}
CSV and TSV#
CSV and TSV files should have a header row containing the names of the columns.
By default, dclient will attempt to detect the types of the different columns in the CSV and TSV files - so if a column only ever contains numeric integers it will be stored as integers in the SQLite database.
You can disable this and have every value treated as a string using --no-detect-types.
Other options#
--create- create the table if it doesn’t already exist--replace- replace any rows with a matching primary key--ignore- ignore any rows with a matching existing primary key--alter- alter table to add any columns that are missing--pk id- set a primary key (for if the table is being created)
If you use --create a table will be created with rows to match the columns in your uploaded data - using the correctly detected types, unless you use --no-detect-types in which case every column will be of type text.
dclient insert –help#
Usage: dclient insert [OPTIONS] DATABASE TABLE FILEPATH
Insert data into a remote Datasette instance
Example usage:
dclient insert main mytable data.csv --csv -i myapp
dclient insert main mytable data.csv --csv --create --pk id
Options:
-i, --instance TEXT Datasette instance URL or alias
--csv Input is CSV
--tsv Input is TSV
--json Input is JSON
--nl Input is newline-delimited JSON
--encoding TEXT Character encoding for CSV/TSV
--no-detect-types Don't detect column types for CSV/TSV
--alter Alter table to add any missing columns
--pk TEXT Columns to use as the primary key when creating the
table
--batch-size INTEGER Send rows in batches of this size
--interval FLOAT Send batch at least every X seconds
--token TEXT API token
--silent Don't output progress
-v, --verbose Verbose output: show HTTP request and response
--replace Replace rows with a matching primary key
--ignore Ignore rows with a matching primary key
--create Create table if it does not exist
--help Show this message and exit.
dclient upsert –help#
Usage: dclient upsert [OPTIONS] DATABASE TABLE FILEPATH
Upsert data into a remote Datasette instance
Example usage:
dclient upsert main mytable data.csv --csv -i myapp
Options:
-i, --instance TEXT Datasette instance URL or alias
--csv Input is CSV
--tsv Input is TSV
--json Input is JSON
--nl Input is newline-delimited JSON
--encoding TEXT Character encoding for CSV/TSV
--no-detect-types Don't detect column types for CSV/TSV
--alter Alter table to add any missing columns
--pk TEXT Columns to use as the primary key when creating the
table
--batch-size INTEGER Send rows in batches of this size
--interval FLOAT Send batch at least every X seconds
--token TEXT API token
--silent Don't output progress
-v, --verbose Verbose output: show HTTP request and response
--help Show this message and exit.
dclient create-table –help#
Usage: dclient create-table [OPTIONS] DATABASE TABLE_NAME
Create a new empty table with an explicit schema
Example usage:
dclient create-table mydb dogs \
--column id integer --column name text --pk id
Options:
-c, --column TEXT... Column definition: name type (e.g. --column id integer
--column name text)
--pk TEXT Column(s) to use as primary key
-i, --instance TEXT Datasette instance URL or alias
--token TEXT API token
-v, --verbose Verbose output: show HTTP request and response
--help Show this message and exit.