roksana.datasets package
Submodules
roksana.datasets.datasets module
- class roksana.datasets.datasets.UserDataset(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Bases:
InMemoryDatasetA dataset class for user-provided datasets adhering to PyG’s InMemoryDataset structure.
Users should provide their data in a specific format, typically as a list of torch_geometric.data.Data objects.
- __init__(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Initialize the UserDataset.
- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. If provided, it will be used to initialize the dataset.
- roksana.datasets.datasets.get_dataset_info(dataset: InMemoryDataset) Dict[str, Any][source]
Retrieve basic information about a dataset.
- Parameters:
dataset (InMemoryDataset) – The dataset instance.
- Returns:
A dictionary containing dataset information.
- Return type:
Dict[str, Any]
- roksana.datasets.datasets.list_available_standard_datasets() List[str][source]
List all available standard datasets supported by ROKSANA.
- Returns:
A list of supported dataset names.
- Return type:
List[str]
- roksana.datasets.datasets.load_dataset(dataset_name: str | None = None, root: str = 'data', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None) InMemoryDataset[source]
Load a dataset, either a standard dataset or a user-provided dataset.
- Parameters:
dataset_name (str, optional) – Name of the standard dataset to load (e.g., ‘cora’, ‘citeseer’). If None, a UserDataset should be provided via data_list.
root (str, optional) – Root directory where the dataset should be saved or loaded from. Defaults to ‘data’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. Required if dataset_name is None.
- Returns:
An instance of the loaded dataset.
- Return type:
InMemoryDataset
- roksana.datasets.datasets.load_standard_dataset(name: str, root: str = 'data') Planetoid[source]
Load a standard dataset from PyG’s built-in datasets.
Supported datasets: ‘cora’, ‘citeseer’, ‘pubmed’, etc. Refer to PyG’s Planetoid datasets for more.
- roksana.datasets.datasets.load_user_dataset_from_files(data_dir: str, file_format: str = 'json', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None) UserDataset[source]
Load a user dataset from files in a specified directory.
Supported file formats: ‘json’, ‘csv’, ‘pickle’.
- Parameters:
data_dir (str) – Directory containing the dataset files.
file_format (str, optional) – Format of the dataset files. Defaults to ‘json’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
- Returns:
An instance of the UserDataset loaded from the files.
- Return type:
- roksana.datasets.datasets.prepare_search_set(data: Data, percentage: float = 0.1, seed: int = 42) Tuple[List[int], List[List[int]]][source]
Prepare a search set for search evaluation by selecting a percentage of nodes as queries and creating corresponding gold sets based on feature similarity.
- Parameters:
- Returns:
- A tuple containing:
queries (List[int]): List of node indices selected as queries.
- gold_sets (List[List[int]]): List of gold sets, where each gold set is a list of node indices
with the same features as the corresponding query.
- Return type:
- Raises:
ValueError – If percentage is not between 0 and 1.
AttributeError – If dataset does not contain node features (data.x).
Module contents
- class roksana.datasets.UserDataset(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Bases:
InMemoryDatasetA dataset class for user-provided datasets adhering to PyG’s InMemoryDataset structure.
Users should provide their data in a specific format, typically as a list of torch_geometric.data.Data objects.
- __init__(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Initialize the UserDataset.
- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. If provided, it will be used to initialize the dataset.
- roksana.datasets.get_dataset_info(dataset: InMemoryDataset) Dict[str, Any][source]
Retrieve basic information about a dataset.
- Parameters:
dataset (InMemoryDataset) – The dataset instance.
- Returns:
A dictionary containing dataset information.
- Return type:
Dict[str, Any]
- roksana.datasets.list_available_standard_datasets() List[str][source]
List all available standard datasets supported by ROKSANA.
- Returns:
A list of supported dataset names.
- Return type:
List[str]
- roksana.datasets.load_dataset(dataset_name: str | None = None, root: str = 'data', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None) InMemoryDataset[source]
Load a dataset, either a standard dataset or a user-provided dataset.
- Parameters:
dataset_name (str, optional) – Name of the standard dataset to load (e.g., ‘cora’, ‘citeseer’). If None, a UserDataset should be provided via data_list.
root (str, optional) – Root directory where the dataset should be saved or loaded from. Defaults to ‘data’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. Required if dataset_name is None.
- Returns:
An instance of the loaded dataset.
- Return type:
InMemoryDataset
- roksana.datasets.load_standard_dataset(name: str, root: str = 'data') Planetoid[source]
Load a standard dataset from PyG’s built-in datasets.
Supported datasets: ‘cora’, ‘citeseer’, ‘pubmed’, etc. Refer to PyG’s Planetoid datasets for more.
- roksana.datasets.load_user_dataset_from_files(data_dir: str, file_format: str = 'json', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None) UserDataset[source]
Load a user dataset from files in a specified directory.
Supported file formats: ‘json’, ‘csv’, ‘pickle’.
- Parameters:
data_dir (str) – Directory containing the dataset files.
file_format (str, optional) – Format of the dataset files. Defaults to ‘json’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
- Returns:
An instance of the UserDataset loaded from the files.
- Return type:
- roksana.datasets.prepare_search_set(data: Data, percentage: float = 0.1, seed: int = 42) Tuple[List[int], List[List[int]]][source]
Prepare a search set for search evaluation by selecting a percentage of nodes as queries and creating corresponding gold sets based on feature similarity.
- Parameters:
- Returns:
- A tuple containing:
queries (List[int]): List of node indices selected as queries.
- gold_sets (List[List[int]]): List of gold sets, where each gold set is a list of node indices
with the same features as the corresponding query.
- Return type:
- Raises:
ValueError – If percentage is not between 0 and 1.
AttributeError – If dataset does not contain node features (data.x).