API Reference

intake_avro.source.AvroTableSource(urlpath) Source to load tabular Avro datasets.
intake_avro.source.AvroSequenceSource(urlpath) Source to load Avro datasets as sequence of Python dicts.
class intake_avro.source.AvroTableSource(urlpath, metadata=None, storage_options=None)[source]

Source to load tabular Avro datasets.

Parameters:
urlpath: str

Location of the data files; can include protocol and glob characters.

Attributes:
cache_dirs
datashape
description
hvplot

Returns a hvPlot object to provide a high-level plotting API.

plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

Methods

close() Close open resources corresponding to this data source.
discover() Open resource and populate the source attributes.
read() Load entire dataset into a container and return it
read_chunked() Return iterator over container fragments of data source
read_partition(i) Return a (offset_tuple, container) corresponding to i-th partition.
to_dask() Create lazy dask dataframe object
to_spark() Pass URL to spark to load as a DataFrame
yaml([with_plugin]) Return YAML representation of this data-source
set_cache_dir  
read()[source]

Load entire dataset into a container and return it

to_dask()[source]

Create lazy dask dataframe object

to_spark()[source]

Pass URL to spark to load as a DataFrame

Note that this requires org.apache.spark.sql.avro.AvroFileFormat to be installed in your spark classes.

This feature is experimental.

class intake_avro.source.AvroSequenceSource(urlpath, metadata=None, storage_options=None)[source]

Source to load Avro datasets as sequence of Python dicts.

Parameters:
urlpath: str

Location of the data files; can include protocol and glob characters.

Attributes:
cache_dirs
datashape
description
hvplot

Returns a hvPlot object to provide a high-level plotting API.

plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

Methods

close() Close open resources corresponding to this data source.
discover() Open resource and populate the source attributes.
read() Load entire dataset into a container and return it
read_chunked() Return iterator over container fragments of data source
read_partition(i) Return a (offset_tuple, container) corresponding to i-th partition.
to_dask() Create lazy dask bag object
to_spark() Provide an equivalent data object in Apache Spark
yaml([with_plugin]) Return YAML representation of this data-source
set_cache_dir  
read()[source]

Load entire dataset into a container and return it

to_dask()[source]

Create lazy dask bag object