API Reference¶
intake_avro.source.AvroTableSource (urlpath) |
Source to load tabular Avro datasets. |
intake_avro.source.AvroSequenceSource (urlpath) |
Source to load Avro datasets as sequence of Python dicts. |
-
class
intake_avro.source.
AvroTableSource
(urlpath, blocksize=100000000, metadata=None, storage_options=None)[source]¶ Source to load tabular Avro datasets.
Parameters: - urlpath: str
Location of the data files; can include protocol and glob characters.
- blocksize: int or None
Partition the input files by roughly this number of bytes. Actual partition sizes will depend on the inherent structure of the data files. If None, each input file will be one partition, no file scanning will be needed ahead of time
- storage_options: dict or None
Parameters to pass on to the file-system backend
Attributes: - cache_dirs
- datashape
- description
hvplot
Returns a hvPlot object to provide a high-level plotting API.
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
Methods
close
()Close open resources corresponding to this data source. discover
()Open resource and populate the source attributes. read
()Load entire dataset into a container and return it read_chunked
()Return iterator over container fragments of data source read_partition
(i)Return a part of the data corresponding to i-th partition. to_dask
()Create lazy dask dataframe object to_spark
()Pass URL to spark to load as a DataFrame yaml
([with_plugin])Return YAML representation of this data-source set_cache_dir
-
class
intake_avro.source.
AvroSequenceSource
(urlpath, blocksize=100000000, metadata=None, storage_options=None)[source]¶ Source to load Avro datasets as sequence of Python dicts.
Parameters: - urlpath: str
Location of the data files; can include protocol and glob characters.
- blocksize: int or None
Partition the input files by roughly this number of bytes. Actual partition sizes will depend on the inherent structure of the data files. If None, each input file will be one partition, no file scanning will be needed ahead of time
- storage_options: dict or None
Parameters to pass on to the file-system backend
Attributes: - cache_dirs
- datashape
- description
hvplot
Returns a hvPlot object to provide a high-level plotting API.
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
Methods
close
()Close open resources corresponding to this data source. discover
()Open resource and populate the source attributes. read
()Load entire dataset into a container and return it read_chunked
()Return iterator over container fragments of data source read_partition
(i)Return a part of the data corresponding to i-th partition. to_dask
()Create lazy dask bag object to_spark
()Provide an equivalent data object in Apache Spark yaml
([with_plugin])Return YAML representation of this data-source set_cache_dir