rAPId-sdk¶
The rAPId-sdk is a lightweight Python wrapper for 10 Downing Street’s rAPId api project. rAPId aims to create a consistent, secure, interoperable data storage and sharing interfaces (APIs).
The sdk is a standalone Python library that can provide easy programmatic access to the core rAPId functionality. It can handle programmatic schema creation and updation using modern Python classes and data structures.
Installation¶
Install the sdk easily with pip:
$ pip install rapid-sdk
How to Use¶
Once installed into your project the first thing you will want to do is create an instance of the rAPId class.
In order for your code to connect to rAPId you will need your rAPId client_id, client_secret and url values. By default the authentication module will try and read these from your environment variables as RAPID_CLIENT_ID, RAPID_CLIENT_SECRET and RAPID_URL respectively. Alternatively you can create your own instance of the rAPId authentication class.:
from rapid import Rapid
from rapid import RapidAuth
rapid_authentication = RapidAuth()
rapid = Rapid(auth=rapid_authentication)
If you do not want to use environment variables (however this is discouraged as secrets should always be kept safe), you can pass the values directly to the class as follows.:
rapid_authentication = RapidAuth(
client_id="RAPID_CLIENT_ID",
client_secret="RAPID_CLIENT_SECRET",
url="RAPID_URL"
)
Useful Patterns¶
With the sdk we ship useful functions that handle common programmatic functionality for rAPId.
Below is an simple example for uploading a Pandas DataFrame to the API.:
import pandas as pd
from rapid import Rapid
from rapid.patterns import data
from rapid.items.schema import SchemaMetadata, SensitivityLevel, Owner
from rapid.exceptions import DataFrameUploadValidationException
rapid = Rapid()
raw_data = [
{"a": 1, "b": 2, "c": 3},
{"a": 10, "b": 20, "c": 30}
]
df = pd.DataFrame(raw_data)
metadata = SchemaMetadata(
domain='mydomain',
dataset='mydataset',
owners=[Owner(name="myname", email="myemail@email.com")],
_sensitivity=SensitivityLevel.PUBLIC.value
)
try:
data.upload_and_create_dataframe(
rapid=rapid,
df=df,
metadata=metadata,
upgrade_schema_on_fail=False
)
except DataFrameUploadValidationException:
print('Incorrect DataFrame schema')
Now going forward say for instance we now expect that for column c we can expect some values to be floating points, we want to update the schema.:
import pandas as pd
from rapid import Rapid
from rapid.patterns import data
from rapid.items.schema import SchemaMetadata, SensitivityLevel, Owner, Column
from rapid.exceptions import ColumnNotDifferentException
rapid = Rapid()
raw_data = [
{"a": 1, "b": 2, "c": 3},
{"a": 10, "b": 20, "c": 30}
]
df = pd.DataFrame(raw_data)
metadata = SchemaMetadata(
domain='mydomain',
dataset='mydataset',
owners=[Owner(name="myname", email="myemail@email.com")],
_sensitivity=SensitivityLevel.PUBLIC.value
)
try:
data.update_schema_dataframe(
rapid=rapid,
df=df,
metadata=metadata,
new_columns=[
Column(
name="c",
data_type="Float64"
)
]
)
except ColumnNotDifferentException:
print('Columns not different.')