roksana package
Subpackages
- roksana.attack_methods package
- Submodules
- roksana.attack_methods.base_attack module
- roksana.attack_methods.custom_attack module
- roksana.attack_methods.degree module
- roksana.attack_methods.pagerank module
- roksana.attack_methods.random module
- roksana.attack_methods.registry module
- roksana.attack_methods.viking module
- Module contents
- roksana.datasets package
- roksana.evaluation package
- roksana.search_methods package
- Submodules
- roksana.search_methods.base_search module
- roksana.search_methods.gat_search module
- roksana.search_methods.gcn_search module
- roksana.search_methods.registry module
- roksana.search_methods.sage_search module
- roksana.search_methods.search_methods module
- Module contents
Submodules
roksana.leaderboard module
roksana.utils module
- roksana.utils.remove_edges(data, edges_to_remove, inplace=False)[source]
Remove specified edges from the given undirected graph data object.
This function modifies the data.edge_index attribute by removing the specified edges. It can handle either a single list of edges or a list of lists of edges. Since the graph is undirected, (u, v) and (v, u) are considered the same edge.
- Parameters:
data – A PyG Data object with an edge_index attribute.
edges_to_remove (List[Tuple[int, int]] or List[List[Tuple[int, int]]]) – A collection of edges to remove. For example: - [(u1, v1), (u2, v2), …] - [[(u1, v1), (u2, v2)], [(u3, v3), …]]
inplace (bool, optional) – If True, modifies the input data object in-place. Default is False.
- Returns:
The modified data object with the specified edges removed.
- roksana.utils.removed_edges_list_stat(data, removed_edges_list, verbose=True)[source]
Calculate and report statistics about a list of removed edges, including checking if the reverse direction of these edges exists in the main graph.
This function takes a list of edge lists that represent edges removed during multiple perturbation operations and aggregates them to determine:
The total number of removed edges across all operations.
The number of duplicate edges (edges that have appeared more than once across all operations).
The number of unique edges that have been removed overall.
The number of removed edges including their reversed counterpart present in the main graph (data.edge_index).
If verbose is True, it prints these statistics and does not return anything. If verbose is Flase, it returns the statistics as a tuple.
- Parameters:
removed_edges_list (List[List[Tuple[int, int]]]) – A list of lists, where each inner list contains tuples representing edges that were removed in a particular operation.
data – A PyG Data object that contains the main graph edges in data.edge_index.
verbose (bool, optional) – If True, prints out the statistics. Defaults to True.
- Returns:
- A tuple containing:
int: The total number of removed edges.
int: The number of duplicate edges across all operations.
int: The number of unique edges removed overall.
- int: The number of removed edges including their reversed counterpart
in the main graph.
- Return type:
Module contents
- class roksana.Evaluator(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]
Bases:
objectEvaluator class to assess the impact of attack methods on search strategies.
- __init__(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]
Initialize the Evaluator.
- Parameters:
search_method_before – Instance of SearchMethod before attack.
search_method_after – Instance of SearchMethod after attack.
k_values (List[int], optional) – List of k values for Hit@k and Recall@k. Defaults to [5, 10, 20].
- evaluate(queries: List[int], gold_sets: List[List[int]], results_dir: str = 'results', filename: str = 'evaluation_results.csv') None[source]
Perform evaluation on the given queries and save the results.
- Parameters:
queries (List[int]) – List of query node indices.
gold_sets (List[List[int]]) – List of gold sets corresponding to each query.
results_dir (str, optional) – Directory to save the results file. Defaults to ‘results’.
filename (str, optional) – Name of the results file. Defaults to ‘evaluation_results.csv’.
- class roksana.GATSearch(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]
Bases:
SearchMethodSearch method using Graph Attention Networks (GAT).
- __init__(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]
Initialize and train the GAT model.
- Parameters:
data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
hidden_channels (int, optional) – Number of hidden channels in GAT layers.
heads (int, optional) – Number of attention heads in GAT layers.
epochs (int, optional) – Number of training epochs.
lr (float, optional) – Learning rate for the optimizer.
- evaluate() float[source]
Evaluate the model’s accuracy on the training set.
- Returns:
Training accuracy.
- Return type:
- get_node_embeddings() Tensor[source]
Generate node embeddings by passing the data through the model.
- Returns:
Node embeddings.
- Return type:
torch.Tensor
- class roksana.GCNSearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]
Bases:
SearchMethodSearch method using Graph Convolutional Networks (GCN).
- __init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]
Initialize and train the GCN model.
- Parameters:
- evaluate() float[source]
Evaluate the model’s accuracy on the training set.
- Returns:
Training accuracy.
- Return type:
- get_node_embeddings() Tensor[source]
Generate node embeddings by passing the data through the model.
- Returns:
Node embeddings.
- Return type:
torch.Tensor
- search(query_features: Tensor, top_k: int = 10) List[List[int]][source]
Perform a search with the given query features using GCN embeddings.
- Parameters:
query_features (torch.Tensor) – Feature tensor of the query nodes, shape [num_queries, feature_dim] or [feature_dim].
top_k (int, optional) – Number of top similar nodes to retrieve.
- Returns:
List of lists containing node indices sorted by similarity to each query.
- Return type:
List[List[int]]
- class roksana.SAGESearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]
Bases:
SearchMethodSearch method using GraphSAGE.
- __init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]
Initialize and train the GraphSAGE model.
- Parameters:
- evaluate() float[source]
Evaluate the model’s accuracy on the training set.
- Returns:
Training accuracy.
- Return type:
- get_node_embeddings() Tensor[source]
Generate node embeddings by passing the data through the model.
- Returns:
Node embeddings.
- Return type:
torch.Tensor
- class roksana.SearchMethod(data: Any, device: str = None, **kwargs)[source]
Bases:
ABCAbstract base class for search methods.
- abstract __init__(data: Any, device: str = None, **kwargs)[source]
Initialize the search method with the given dataset.
- Parameters:
data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
- class roksana.UserDataset(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Bases:
InMemoryDatasetA dataset class for user-provided datasets adhering to PyG’s InMemoryDataset structure.
Users should provide their data in a specific format, typically as a list of torch_geometric.data.Data objects.
- __init__(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]
Initialize the UserDataset.
- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. If provided, it will be used to initialize the dataset.
- roksana.demotion_value(before_attack_rank: int, after_attack_rank: int) int[source]
Calculate the Demotion Value metric.
- roksana.get_attack_method(name: str, data: Any, **kwargs) BaseAttack[source]
Retrieve an instance of the specified attack method.
- Parameters:
name (str) – Name of the attack method (e.g., ‘degree’, ‘pagerank’, ‘random’, ‘viking’).
data (Any) – The graph dataset.
**kwargs – Additional keyword arguments for initializing the attack method.
- Returns:
An instance of the requested attack method.
- Return type:
- Raises:
ValueError – If the specified attack method is not registered.
Example
>>> from roksana.attack_methods.registry import get_attack_method >>> attack = get_attack_method('degree', data=my_graph, param1=value1)
- roksana.get_dataset_info(dataset: InMemoryDataset) Dict[str, Any][source]
Retrieve basic information about a dataset.
- Parameters:
dataset (InMemoryDataset) – The dataset instance.
- Returns:
A dictionary containing dataset information.
- Return type:
Dict[str, Any]
- roksana.get_search_method(name: str, data: Any, device: str = None, **kwargs) SearchMethod[source]
Retrieve an instance of the specified search method.
- Parameters:
- Returns:
An instance of the requested search method.
- Return type:
- Raises:
ValueError – If the specified search method is not registered.
- roksana.hit_at_k(retrieved: List[int], gold_set: List[int], k: int) float[source]
Calculate Hit@k metric.
- roksana.list_available_standard_datasets() List[str][source]
List all available standard datasets supported by ROKSANA.
- Returns:
A list of supported dataset names.
- Return type:
List[str]
- roksana.load_dataset(dataset_name: str | None = None, root: str = 'data', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None) InMemoryDataset[source]
Load a dataset, either a standard dataset or a user-provided dataset.
- Parameters:
dataset_name (str, optional) – Name of the standard dataset to load (e.g., ‘cora’, ‘citeseer’). If None, a UserDataset should be provided via data_list.
root (str, optional) – Root directory where the dataset should be saved or loaded from. Defaults to ‘data’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. Required if dataset_name is None.
- Returns:
An instance of the loaded dataset.
- Return type:
InMemoryDataset
- roksana.load_standard_dataset(name: str, root: str = 'data') Planetoid[source]
Load a standard dataset from PyG’s built-in datasets.
Supported datasets: ‘cora’, ‘citeseer’, ‘pubmed’, etc. Refer to PyG’s Planetoid datasets for more.
- roksana.load_user_dataset_from_files(data_dir: str, file_format: str = 'json', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None) UserDataset[source]
Load a user dataset from files in a specified directory.
Supported file formats: ‘json’, ‘csv’, ‘pickle’.
- Parameters:
data_dir (str) – Directory containing the dataset files.
file_format (str, optional) – Format of the dataset files. Defaults to ‘json’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
- Returns:
An instance of the UserDataset loaded from the files.
- Return type:
- roksana.prepare_search_set(data: Data, percentage: float = 0.1, seed: int = 42) Tuple[List[int], List[List[int]]][source]
Prepare a search set for search evaluation by selecting a percentage of nodes as queries and creating corresponding gold sets based on feature similarity.
- Parameters:
- Returns:
- A tuple containing:
queries (List[int]): List of node indices selected as queries.
- gold_sets (List[List[int]]): List of gold sets, where each gold set is a list of node indices
with the same features as the corresponding query.
- Return type:
- Raises:
ValueError – If percentage is not between 0 and 1.
AttributeError – If dataset does not contain node features (data.x).