Rapid

class rapid.rapid.Rapid(auth: Optional[RapidAuth] = None)

Bases: object

The rAPId class is the main SDK class for the rAPId API. It acts as a wrapper for the various API endpoints, providing a simple and intuitive programmatic interface.

Parameters:

auth (rapid.auth.RapidAuth, optional) – An instance of the rAPId auth class, which is used for authentication and authorization with the API. Defaults to None.

convert_dataframe_for_file_upload(df: DataFrame)

Converts a pandas DataFrame to a format that can be used for file uploads to the API.

Parameters:

df (DataFrame) – The pandas DataFrame to convert.

Returns:

A dictionary containing the converted DataFrame in a format suitable for file uploads to the API.

create_schema(schema: Schema)

Creates a new schema on the API.

Parameters:

schema (rapid.items.schema.Schema) – The schema model for which you want to create for.

Raises:

class: rapid.exceptions.SchemaAlreadyExistsException: If you try to create a schema that already exists in rAPId.

Raises:

rapid.exceptions.SchemaCreateFailedException – If an error occurs while trying to update the schema.

download_dataframe(domain: str, dataset: str, version: Optional[int] = None, query: Query = Query(select_columns=None, filter=None, group_by_columns=None, aggregation_conditions=None, order_by_columns=None, limit=None)) DataFrame

Downloads data to a pandas DataFrame based on the domain, dataset and version passed.

Parameters:
  • domain (str) – The domain of the dataset to download the DataFrame from.

  • dataset (str) – The dataset from the domain to download the DataFrame from.

  • version (int, optional) – Version of the dataset to download.

  • query (rapid.items.query.Query, optional) – An optional query type to provide when downloading data. Defaults to empty.

Raises:

DatasetNotFoundExceptionrapid.exceptions.DatasetNotFoundException: If the specificed domain, dataset and version to download does not exist in the rAPId instance we throw the dataset not found exception.

Returns:

A pandas DataFrame of the data

Return type:

DataFrame

fetch_job_progress(_id: str)

Makes a GET request to the API to fetch the progress of a specific job.

Parameters:

_id (str) – The ID of the job to fetch the progress for.

Returns:

A JSON response of the API’s response.

For more details on the response structure, see the API documentation: https://getrapid.link/api/docs#/Jobs/get_job_jobs__job_id__get

generate_headers() Dict
generate_info(df: DataFrame, domain: str, dataset: str)

Generates metadata information for a pandas DataFrame and a specified dataset in the API.

Parameters:
  • df (DataFrame) – The pandas DataFrame to generate metadata for.

  • domain (str) – The domain of the dataset to generate metadata for.

  • dataset (str) – The name of the dataset to generate metadata for.

Raises:

rapid.exceptions.DatasetInfoFailedException – If an error occurs while generating the metadata information.

Returns:

A dictionary containing the metadata information for the DataFrame and dataset.

generate_schema(df: DataFrame, domain: str, dataset: str, sensitivity: str) Schema

Generates a schema for a pandas DataFrame and a specified dataset in the API.

Parameters:
  • df (DataFrame) – The pandas DataFrame to generate a schema for.

  • domain (str) – The domain of the dataset to generate a schema for.

  • dataset (str) – The name of the dataset to generate a schema for.

  • sensitivity (str) – The sensitivity level of the schema to generate.

Raises:

rapid.exceptions.SchemaGenerationFailedException – If an error occurs while generating the schema.

Returns:

A Schema class type from the generated schema for the DataFrame and dataset.

Return type:

rapid.items.schema.Schema

list_datasets()

Makes a POST request to the API to list the current datasets.

Returns:

A JSON response of the API’s response.

For more details on the response structure, see the API documentation: https://getrapid.link/api/docs#/Datasets/list_all_datasets_datasets_post

update_schema(schema: Schema)

Uploads a new updated schema to the API.

Parameters:

schema (rapid.items.schema.Schema) – The new schema model that will be used for the update.

Raises:

rapid.exceptions.SchemaUpdateFailedException – If an error occurs while trying to update the schema.

upload_dataframe(domain: str, dataset: str, df: DataFrame, wait_to_complete: bool = True)

Uploads a pandas DataFrame to a specified dataset in the API.

Parameters:
  • domain (str) – The domain of the dataset to upload the DataFrame to.

  • dataset (str) – The name of the dataset to upload the DataFrame to.

  • df (DataFrame) – The pandas DataFrame to upload.

  • wait_to_complete (bool, optional) – Whether to wait for the upload job to complete before returning. Defaults to True.

Raises: rapid.exceptions.DataFrameUploadValidationException: If the DataFrame’s schema is incorrect. rapid.exceptions.DataFrameUploadFailedException: If an unexpected error occurs while uploading the DataFrame.

Returns:

If wait_to_complete is True, returns “Success” if the upload is successful. If wait_to_complete is False, returns the ID of the upload job if the upload is accepted.

wait_for_job_outcome(_id: str, interval: int = 1)

Makes periodic requests to the API to wait for the outcome of a specific job.

Parameters:
  • _id (str) – The ID of the job to wait for the outcome of.

  • interval (int, optional) – The number of seconds to sleep between requests to the API. Defaults to 1.

Returns:

None if the job is successful.

Raises:

rapid.exceptions.JobFailedException – If the job outcome failed.