API Reference

intake_avro.source.AvroTableSource(urlpath) Source to load tabular Avro datasets.
intake_avro.source.AvroSequenceSource(urlpath) Source to load Avro datasets as sequence of Python dicts.
class intake_avro.source.AvroTableSource(urlpath, blocksize=100000000, metadata=None, storage_options=None)[source]

Source to load tabular Avro datasets.

Parameters:
urlpath: str

Location of the data files; can include protocol and glob characters.

blocksize: int or None

Partition the input files by roughly this number of bytes. Actual partition sizes will depend on the inherent structure of the data files. If None, each input file will be one partition, no file scanning will be needed ahead of time

storage_options: dict or None

Parameters to pass on to the file-system backend

Attributes:
cache_dirs
datashape
description
hvplot

Returns a hvPlot object to provide a high-level plotting API.

plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

Methods

close() Close open resources corresponding to this data source.
discover() Open resource and populate the source attributes.
read() Load entire dataset into a container and return it
read_chunked() Return iterator over container fragments of data source
read_partition(i) Return a part of the data corresponding to i-th partition.
to_dask() Create lazy dask dataframe object
to_spark() Pass URL to spark to load as a DataFrame
yaml([with_plugin]) Return YAML representation of this data-source
set_cache_dir  
read()[source]

Load entire dataset into a container and return it

to_dask()[source]

Create lazy dask dataframe object

to_spark()[source]

Pass URL to spark to load as a DataFrame

Note that this requires org.apache.spark.sql.avro.AvroFileFormat to be installed in your spark classes.

This feature is experimental.

class intake_avro.source.AvroSequenceSource(urlpath, blocksize=100000000, metadata=None, storage_options=None)[source]

Source to load Avro datasets as sequence of Python dicts.

Parameters:
urlpath: str

Location of the data files; can include protocol and glob characters.

blocksize: int or None

Partition the input files by roughly this number of bytes. Actual partition sizes will depend on the inherent structure of the data files. If None, each input file will be one partition, no file scanning will be needed ahead of time

storage_options: dict or None

Parameters to pass on to the file-system backend

Attributes:
cache_dirs
datashape
description
hvplot

Returns a hvPlot object to provide a high-level plotting API.

plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

Methods

close() Close open resources corresponding to this data source.
discover() Open resource and populate the source attributes.
read() Load entire dataset into a container and return it
read_chunked() Return iterator over container fragments of data source
read_partition(i) Return a part of the data corresponding to i-th partition.
to_dask() Create lazy dask bag object
to_spark() Provide an equivalent data object in Apache Spark
yaml([with_plugin]) Return YAML representation of this data-source
set_cache_dir  
read()[source]

Load entire dataset into a container and return it

to_dask()[source]

Create lazy dask bag object