roksana package

Subpackages

Submodules

roksana.leaderboard module

class roksana.leaderboard.LeaderboardClient(api_url: str, api_key: str)[source]

Bases: object

get_leaderboard()[source]

submit_result(user_id: str, result: Any)[source]

roksana.utils module

roksana.utils.compare_original_vs_updated(original_data, updated_data)[source]

roksana.utils.remove_edges(data, edges_to_remove, inplace=False)[source]

Remove specified edges from the given undirected graph data object.

This function modifies the data.edge_index attribute by removing the specified edges. It can handle either a single list of edges or a list of lists of edges. Since the graph is undirected, (u, v) and (v, u) are considered the same edge.

Parameters:

data – A PyG Data object with an edge_index attribute.
edges_to_remove (List[Tuple[int, int]] or List[List[Tuple[int, int]]]) – A collection of edges to remove. For example: - [(u1, v1), (u2, v2), …] - [[(u1, v1), (u2, v2)], [(u3, v3), …]]
inplace (bool, optional) – If True, modifies the input data object in-place. Default is False.

Returns:

The modified data object with the specified edges removed.

roksana.utils.removed_edges_list_stat(data, removed_edges_list, verbose=True)[source]

Calculate and report statistics about a list of removed edges, including checking if the reverse direction of these edges exists in the main graph.

This function takes a list of edge lists that represent edges removed during multiple perturbation operations and aggregates them to determine:

The total number of removed edges across all operations.
The number of duplicate edges (edges that have appeared more than once across all operations).
The number of unique edges that have been removed overall.
The number of removed edges including their reversed counterpart present in the main graph (data.edge_index).

If verbose is True, it prints these statistics and does not return anything. If verbose is Flase, it returns the statistics as a tuple.

Parameters:

removed_edges_list (List[List[Tuple[int, int]]]) – A list of lists, where each inner list contains tuples representing edges that were removed in a particular operation.
data – A PyG Data object that contains the main graph edges in data.edge_index.
verbose (bool, optional) – If True, prints out the statistics. Defaults to True.

Returns:

A tuple containing:

int: The total number of removed edges.
int: The number of duplicate edges across all operations.
int: The number of unique edges removed overall.
int: The number of removed edges including their reversed counterpart
in the main graph.

Return type:

Tuple[int, int, int, int]

Module contents

class roksana.Evaluator(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]

Bases: object

Evaluator class to assess the impact of attack methods on search strategies.

__init__(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]

Initialize the Evaluator.

Parameters:

search_method_before – Instance of SearchMethod before attack.
search_method_after – Instance of SearchMethod after attack.
k_values (List[int], optional) – List of k values for Hit@k and Recall@k. Defaults to [5, 10, 20].

evaluate(queries: List[int], gold_sets: List[List[int]], results_dir: str = 'results', filename: str = 'evaluation_results.csv') → None[source]

Perform evaluation on the given queries and save the results.

Parameters:

queries (List[int]) – List of query node indices.
gold_sets (List[List[int]]) – List of gold sets corresponding to each query.
results_dir (str, optional) – Directory to save the results file. Defaults to ‘results’.
filename (str, optional) – Name of the results file. Defaults to ‘evaluation_results.csv’.

get_all_results() → List[Dict[str, Any]][source]

Retrieve all evaluation results.

Returns:: List of evaluation result dictionaries.
Return type:: List[Dict[str, Any]]

class roksana.GATSearch(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]

Bases: SearchMethod

Search method using Graph Attention Networks (GAT).

__init__(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]

Initialize and train the GAT model.

Parameters:

data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
hidden_channels (int, optional) – Number of hidden channels in GAT layers.
heads (int, optional) – Number of attention heads in GAT layers.
epochs (int, optional) – Number of training epochs.
lr (float, optional) – Learning rate for the optimizer.

evaluate() → float[source]

Evaluate the model’s accuracy on the training set.

Returns:: Training accuracy.
Return type:: float

get_node_embeddings() → Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:: Node embeddings.
Return type:: torch.Tensor

search(query_features: Tensor, top_k: int = 10) → List[int][source]

Perform a search with the given query features using GAT embeddings.

Parameters:

query_features (torch.Tensor) – Feature vector of the query node.
top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

train_model()[source]: Train the GAT model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.GCNSearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Bases: SearchMethod

Search method using Graph Convolutional Networks (GCN).

__init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Initialize and train the GCN model.

Parameters:

data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
hidden_channels (int, optional) – Number of hidden channels in GCN layers.
epochs (int, optional) – Number of training epochs.
lr (float, optional) – Learning rate for the optimizer.

evaluate() → float[source]

Evaluate the model’s accuracy on the training set.

Returns:: Training accuracy.
Return type:: float

get_node_embeddings() → Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:: Node embeddings.
Return type:: torch.Tensor

search(query_features: Tensor, top_k: int = 10) → List[List[int]][source]

Perform a search with the given query features using GCN embeddings.

Parameters:

query_features (torch.Tensor) – Feature tensor of the query nodes, shape [num_queries, feature_dim] or [feature_dim].
top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of lists containing node indices sorted by similarity to each query.

Return type:

List[List[int]]

train_model()[source]: Train the GCN model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.SAGESearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Bases: SearchMethod

Search method using GraphSAGE.

__init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Initialize and train the GraphSAGE model.

Parameters:

data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
hidden_channels (int, optional) – Number of hidden channels in SAGE layers.
epochs (int, optional) – Number of training epochs.
lr (float, optional) – Learning rate for the optimizer.

evaluate() → float[source]

Evaluate the model’s accuracy on the training set.

Returns:: Training accuracy.
Return type:: float

get_node_embeddings() → Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:: Node embeddings.
Return type:: torch.Tensor

search(query_features: Tensor, top_k: int = 10) → List[int][source]

Perform a search with the given query features using GraphSAGE embeddings.

Parameters:

query_features (torch.Tensor) – Feature vector of the query node.
top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

train_model()[source]: Train the GraphSAGE model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.SearchMethod(data: Any, device: str = None, **kwargs)[source]

Bases: ABC

Abstract base class for search methods.

abstract __init__(data: Any, device: str = None, **kwargs)[source]

Initialize the search method with the given dataset.

Parameters:

data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

abstract search(query_features: Any, top_k: int = 10) → List[int][source]

Perform a search with the given query features.

Parameters:

query_features (Any) – Feature vector of the query node.
top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

class roksana.UserDataset(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]

Bases: InMemoryDataset

A dataset class for user-provided datasets adhering to PyG’s InMemoryDataset structure.

Users should provide their data in a specific format, typically as a list of torch_geometric.data.Data objects.

__init__(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]

Initialize the UserDataset.

Parameters:

root (str) – Root directory where the dataset should be saved.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. If provided, it will be used to initialize the dataset.

download()[source]: Users are expected to provide their own data, so no download is necessary.

process()[source]

Process the user-provided data and save it in the processed file.

Users can modify this method if they have specific processing requirements.

property processed_file_names: List[str]: The name of the processed file.

property raw_file_names: List[str]: Since users provide their own data, this can be left empty or used to list expected raw files.

roksana.demotion_value(before_attack_rank: int, after_attack_rank: int) → int[source]

Calculate the Demotion Value metric.

Parameters:

before_attack_rank (int) – The rank of the target node before the attack.
after_attack_rank (int) – The rank of the target node after the attack.

Returns:

Difference in rank (after_attack_rank - before_attack_rank).: A positive value indicates demotion.

Return type:

int

roksana.get_attack_method(name: str, data: Any, **kwargs) → BaseAttack[source]

Retrieve an instance of the specified attack method.

Parameters:

name (str) – Name of the attack method (e.g., ‘degree’, ‘pagerank’, ‘random’, ‘viking’).
data (Any) – The graph dataset.
**kwargs – Additional keyword arguments for initializing the attack method.

Returns:

An instance of the requested attack method.

Return type:

BaseAttack

Raises:

ValueError – If the specified attack method is not registered.

Example

>>> from roksana.attack_methods.registry import get_attack_method
>>> attack = get_attack_method('degree', data=my_graph, param1=value1)

roksana.get_dataset_info(dataset: InMemoryDataset) → Dict[str, Any][source]

Retrieve basic information about a dataset.

Parameters:: dataset (InMemoryDataset) – The dataset instance.
Returns:: A dictionary containing dataset information.
Return type:: Dict[str, Any]

roksana.get_search_method(name: str, data: Any, device: str = None, **kwargs) → SearchMethod[source]

Retrieve an instance of the specified search method.

Parameters:

name (str) – Name of the search method (e.g., ‘gcn’, ‘gat’, ‘sage’).
data (Any) – The graph dataset.
device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).
**kwargs – Additional keyword arguments for the search method.

Returns:

An instance of the requested search method.

Return type:

SearchMethod

Raises:

ValueError – If the specified search method is not registered.

roksana.hit_at_k(retrieved: List[int], gold_set: List[int], k: int) → float[source]

Calculate Hit@k metric.

Parameters:

retrieved (List[int]) – List of retrieved node indices.
gold_set (List[int]) – List of gold node indices.
k (int) – The k in Hit@k.

Returns:

Hit@k value (1 if at least one gold node is in the top-k, else 0).

Return type:

float

roksana.list_available_standard_datasets() → List[str][source]

List all available standard datasets supported by ROKSANA.

Returns:: A list of supported dataset names.
Return type:: List[str]

roksana.load_dataset(dataset_name: str | None = None, root: str = 'data', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None) → InMemoryDataset[source]

Load a dataset, either a standard dataset or a user-provided dataset.

Parameters:

dataset_name (str, optional) – Name of the standard dataset to load (e.g., ‘cora’, ‘citeseer’). If None, a UserDataset should be provided via data_list.
root (str, optional) – Root directory where the dataset should be saved or loaded from. Defaults to ‘data’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. Required if dataset_name is None.

Returns:

An instance of the loaded dataset.

Return type:

InMemoryDataset

roksana.load_standard_dataset(name: str, root: str = 'data') → Planetoid[source]

Load a standard dataset from PyG’s built-in datasets.

Supported datasets: ‘cora’, ‘citeseer’, ‘pubmed’, etc. Refer to PyG’s Planetoid datasets for more.

Parameters:

name (str) – Name of the dataset to load (e.g., ‘Cora’, ‘Citeseer’).
root (str, optional) – Root directory where the dataset should be saved. Defaults to ‘data’.

Returns:

An instance of the Planetoid dataset.

Return type:

Planetoid

roksana.load_user_dataset_from_files(data_dir: str, file_format: str = 'json', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None) → UserDataset[source]

Load a user dataset from files in a specified directory.

Supported file formats: ‘json’, ‘csv’, ‘pickle’.

Parameters:

data_dir (str) – Directory containing the dataset files.
file_format (str, optional) – Format of the dataset files. Defaults to ‘json’.
transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.

Returns:

An instance of the UserDataset loaded from the files.

Return type:

UserDataset

roksana.prepare_search_set(data: Data, percentage: float = 0.1, seed: int = 42) → Tuple[List[int], List[List[int]]][source]

Prepare a search set for search evaluation by selecting a percentage of nodes as queries and creating corresponding gold sets based on feature similarity.

Parameters:

data (Data) – The graph dataset.
percentage (float, optional) – Percentage of nodes to select as queries. Must be between 0 and 1. Defaults to 0.1 (10%).
seed (int, optional) – Seed for random number generator to ensure reproducibility. Defaults to 42.

Returns:

A tuple containing:

queries (List[int]): List of node indices selected as queries.
gold_sets (List[List[int]]): List of gold sets, where each gold set is a list of node indices
with the same features as the corresponding query.

Return type:

Tuple[List[int], List[List[int]]]

Raises:

ValueError – If percentage is not between 0 and 1.
AttributeError – If dataset does not contain node features (data.x).

roksana.recall_at_k(retrieved: List[int], gold_set: List[int], k: int) → float[source]

Calculate Recall@k metric.

Parameters:

retrieved (List[int]) – List of retrieved node indices.
gold_set (List[int]) – List of gold node indices.
k (int) – The k in Recall@k.

Returns:

Recall@k value.

Return type:

float