API Reference¶

`intake_avro.source.AvroTableSource`(urlpath)	Source to load tabular Avro datasets.
`intake_avro.source.AvroSequenceSource`(urlpath)	Source to load Avro datasets as sequence of Python dicts.

class intake_avro.source.AvroTableSource(urlpath, metadata=None, storage_options=None)[source]¶

Source to load tabular Avro datasets.

Parameters:	urlpath: str Location of the data files; can include protocol and glob characters.
Attributes:	cache_dirs datashape description `hvplot` Returns a hvPlot object to provide a high-level plotting API. `plot` Returns a hvPlot object to provide a high-level plotting API. `plots` List custom associated quick-plots

Methods

`close`()	Close open resources corresponding to this data source.
`discover`()	Open resource and populate the source attributes.
`read`()	Load entire dataset into a container and return it
`read_chunked`()	Return iterator over container fragments of data source
`read_partition`(i)	Return a (offset_tuple, container) corresponding to i-th partition.
`to_dask`()	Create lazy dask dataframe object
`to_spark`()	Pass URL to spark to load as a DataFrame
`yaml`([with_plugin])	Return YAML representation of this data-source

set_cache_dir

to_spark()[source]¶

Pass URL to spark to load as a DataFrame

Note that this requires org.apache.spark.sql.avro.AvroFileFormat to be installed in your spark classes.

This feature is experimental.

class intake_avro.source.AvroSequenceSource(urlpath, metadata=None, storage_options=None)[source]¶

Source to load Avro datasets as sequence of Python dicts.

Parameters:	urlpath: str Location of the data files; can include protocol and glob characters.
Attributes:	cache_dirs datashape description `hvplot` Returns a hvPlot object to provide a high-level plotting API. `plot` Returns a hvPlot object to provide a high-level plotting API. `plots` List custom associated quick-plots

Methods

`close`()	Close open resources corresponding to this data source.
`discover`()	Open resource and populate the source attributes.
`read`()	Load entire dataset into a container and return it
`read_chunked`()	Return iterator over container fragments of data source
`read_partition`(i)	Return a (offset_tuple, container) corresponding to i-th partition.
`to_dask`()	Create lazy dask bag object
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`([with_plugin])	Return YAML representation of this data-source

set_cache_dir