dlt.destinations.impl.clickhouse_cluster.clickhouse_cluster_adapter
clickhouse_cluster_adapter
def clickhouse_cluster_adapter(
data: Any,
table_engine_type: Optional[TTableEngineType] = None,
sort: Optional[TSQLExprOrColumnSeq] = None,
partition: Optional[TSQLExprOrColumnSeq] = None,
settings: Optional[TMergeTreeSettings] = None,
codecs: Optional[TColumnCodecs] = None,
create_distributed_tables: Optional[bool] = None,
distributed_table_suffix: Optional[str] = None,
sharding_key: Optional[str] = None) -> DltResource
Adapts the given data by applying Clickhouse Cluster-specific hints.
Arguments:
dataAny - The data to be transformed. It can be raw data or an instance of DltResource. If raw data, the function wraps it into a DltResource object.table_engine_typeTTableEngineType, optional - The table engine type used when creating the Clickhouse table.sortTSQLExprOrColumnSeq, optional - Sorting key SQL expression or sequence of column names. Used to generatedORDER BYclause of table creation statement. If passing a SQL expression, use normalized column names when referring to columns.partitionTSQLExprOrColumnSeq, optional - Partition key SQL expression or sequence of column names. Used to generatedPARTITION BYclause of table creation statement. If passing a SQL expression, use normalized column names when referring to columns.settingsTMergeTreeSettings, optional - Dictionary of MergeTree settings to apply to the table. Will be added toSETTINGSclause of table creation statement.codecsTColumnCodecs, optional - Dictionary of codecs to apply to the table's columns. Will be added asCODECclauses in column definitions of table creation statement.create_distributed_tablesbool, optional - Whether to create distributed tables in addition to standard tables.distributed_table_suffixstr, optional - Suffix to append to table names when creating distributed tables. For example, if set to_dist, a table namedeventswill have a distributed table namedevents_dist.sharding_keystr, optional - Sharding key expression to use for distributed tables.
Returns:
DltResource- A resource with applied Clickhouse Cluster-specific hints.
Raises:
ValueError- If input fortable_engine_typeis invalid.TypeError- If input types forsort,partition,settings, orcodecsare invalid.
Examples:
Set table engine type:
data = [{"name": "Alice", "description": "Software Developer"}]
clickhouse_cluster_adapter(data, table_engine_type="merge_tree")
Set sort and partition keys:
data = [{"date": "2024-01-01", "town": "Springfield", "street": "Evergreen Terrace"}]
clickhouse_cluster_adapter(
data,
sort=["town", "street"], # can also be SQL expression
partition="toYYYYMM(date)" # can also be sequence of column names
)
Set MergeTree settings:
clickhouse_cluster_adapter(
data,
settings={"allow_nullable_key": True, "max_suspicious_broken_parts": 500}
)
Set column codecs:
clickhouse_cluster_adapter(
data,
codecs={"town": "LZ4HC", "street": "Delta, ZSTD(2)"}
)
Create distributed tables with specific suffix and sharding key:
clickhouse_cluster_adapter(
data,
create_distributed_tables=True,
distributed_table_suffix="_distributed",
sharding_key="city_id % 4"
)