roksana package

Subpackages

Submodules

roksana.leaderboard module

class roksana.leaderboard.LeaderboardClient(api_url: str, api_key: str)[source]

Bases: object

get_leaderboard()[source]
submit_result(user_id: str, result: Any)[source]

roksana.utils module

roksana.utils.compare_original_vs_updated(original_data, updated_data)[source]
roksana.utils.remove_edges(data, edges_to_remove, inplace=False)[source]

Remove specified edges from the given undirected graph data object.

This function modifies the data.edge_index attribute by removing the specified edges. It can handle either a single list of edges or a list of lists of edges. Since the graph is undirected, (u, v) and (v, u) are considered the same edge.

Parameters:
  • data – A PyG Data object with an edge_index attribute.

  • edges_to_remove (List[Tuple[int, int]] or List[List[Tuple[int, int]]]) – A collection of edges to remove. For example: - [(u1, v1), (u2, v2), …] - [[(u1, v1), (u2, v2)], [(u3, v3), …]]

  • inplace (bool, optional) – If True, modifies the input data object in-place. Default is False.

Returns:

The modified data object with the specified edges removed.

roksana.utils.removed_edges_list_stat(data, removed_edges_list, verbose=True)[source]

Calculate and report statistics about a list of removed edges, including checking if the reverse direction of these edges exists in the main graph.

This function takes a list of edge lists that represent edges removed during multiple perturbation operations and aggregates them to determine:

  • The total number of removed edges across all operations.

  • The number of duplicate edges (edges that have appeared more than once across all operations).

  • The number of unique edges that have been removed overall.

  • The number of removed edges including their reversed counterpart present in the main graph (data.edge_index).

If verbose is True, it prints these statistics and does not return anything. If verbose is Flase, it returns the statistics as a tuple.

Parameters:
  • removed_edges_list (List[List[Tuple[int, int]]]) – A list of lists, where each inner list contains tuples representing edges that were removed in a particular operation.

  • data – A PyG Data object that contains the main graph edges in data.edge_index.

  • verbose (bool, optional) – If True, prints out the statistics. Defaults to True.

Returns:

A tuple containing:
  • int: The total number of removed edges.

  • int: The number of duplicate edges across all operations.

  • int: The number of unique edges removed overall.

  • int: The number of removed edges including their reversed counterpart

    in the main graph.

Return type:

Tuple[int, int, int, int]

Module contents

class roksana.Evaluator(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]

Bases: object

Evaluator class to assess the impact of attack methods on search strategies.

__init__(search_method_before, search_method_after, k_values: List[int] = [5, 10, 20])[source]

Initialize the Evaluator.

Parameters:
  • search_method_before – Instance of SearchMethod before attack.

  • search_method_after – Instance of SearchMethod after attack.

  • k_values (List[int], optional) – List of k values for Hit@k and Recall@k. Defaults to [5, 10, 20].

evaluate(queries: List[int], gold_sets: List[List[int]], results_dir: str = 'results', filename: str = 'evaluation_results.csv') None[source]

Perform evaluation on the given queries and save the results.

Parameters:
  • queries (List[int]) – List of query node indices.

  • gold_sets (List[List[int]]) – List of gold sets corresponding to each query.

  • results_dir (str, optional) – Directory to save the results file. Defaults to ‘results’.

  • filename (str, optional) – Name of the results file. Defaults to ‘evaluation_results.csv’.

get_all_results() List[Dict[str, Any]][source]

Retrieve all evaluation results.

Returns:

List of evaluation result dictionaries.

Return type:

List[Dict[str, Any]]

class roksana.GATSearch(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]

Bases: SearchMethod

Search method using Graph Attention Networks (GAT).

__init__(data: Any, device: str = None, hidden_channels: int = 64, heads: int = 8, epochs: int = 200, lr: float = 0.005)[source]

Initialize and train the GAT model.

Parameters:
  • data (Any) – The graph dataset.

  • device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

  • hidden_channels (int, optional) – Number of hidden channels in GAT layers.

  • heads (int, optional) – Number of attention heads in GAT layers.

  • epochs (int, optional) – Number of training epochs.

  • lr (float, optional) – Learning rate for the optimizer.

evaluate() float[source]

Evaluate the model’s accuracy on the training set.

Returns:

Training accuracy.

Return type:

float

get_node_embeddings() Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:

Node embeddings.

Return type:

torch.Tensor

search(query_features: Tensor, top_k: int = 10) List[int][source]

Perform a search with the given query features using GAT embeddings.

Parameters:
  • query_features (torch.Tensor) – Feature vector of the query node.

  • top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

train_model()[source]

Train the GAT model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.GCNSearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Bases: SearchMethod

Search method using Graph Convolutional Networks (GCN).

__init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Initialize and train the GCN model.

Parameters:
  • data (Any) – The graph dataset.

  • device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

  • hidden_channels (int, optional) – Number of hidden channels in GCN layers.

  • epochs (int, optional) – Number of training epochs.

  • lr (float, optional) – Learning rate for the optimizer.

evaluate() float[source]

Evaluate the model’s accuracy on the training set.

Returns:

Training accuracy.

Return type:

float

get_node_embeddings() Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:

Node embeddings.

Return type:

torch.Tensor

search(query_features: Tensor, top_k: int = 10) List[List[int]][source]

Perform a search with the given query features using GCN embeddings.

Parameters:
  • query_features (torch.Tensor) – Feature tensor of the query nodes, shape [num_queries, feature_dim] or [feature_dim].

  • top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of lists containing node indices sorted by similarity to each query.

Return type:

List[List[int]]

train_model()[source]

Train the GCN model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.SAGESearch(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Bases: SearchMethod

Search method using GraphSAGE.

__init__(data: Any, device: str = None, hidden_channels: int = 64, epochs: int = 200, lr: float = 0.01)[source]

Initialize and train the GraphSAGE model.

Parameters:
  • data (Any) – The graph dataset.

  • device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

  • hidden_channels (int, optional) – Number of hidden channels in SAGE layers.

  • epochs (int, optional) – Number of training epochs.

  • lr (float, optional) – Learning rate for the optimizer.

evaluate() float[source]

Evaluate the model’s accuracy on the training set.

Returns:

Training accuracy.

Return type:

float

get_node_embeddings() Tensor[source]

Generate node embeddings by passing the data through the model.

Returns:

Node embeddings.

Return type:

torch.Tensor

search(query_features: Tensor, top_k: int = 10) List[int][source]

Perform a search with the given query features using GraphSAGE embeddings.

Parameters:
  • query_features (torch.Tensor) – Feature vector of the query node.

  • top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

train_model()[source]

Train the GraphSAGE model on the dataset. Assumes that the dataset has a ‘y’ attribute for node labels.

class roksana.SearchMethod(data: Any, device: str = None, **kwargs)[source]

Bases: ABC

Abstract base class for search methods.

abstract __init__(data: Any, device: str = None, **kwargs)[source]

Initialize the search method with the given dataset.

Parameters:
  • data (Any) – The graph dataset.

  • device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

abstract search(query_features: Any, top_k: int = 10) List[int][source]

Perform a search with the given query features.

Parameters:
  • query_features (Any) – Feature vector of the query node.

  • top_k (int, optional) – Number of top similar nodes to retrieve.

Returns:

List of node indices sorted by similarity to the query.

Return type:

List[int]

class roksana.UserDataset(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]

Bases: InMemoryDataset

A dataset class for user-provided datasets adhering to PyG’s InMemoryDataset structure.

Users should provide their data in a specific format, typically as a list of torch_geometric.data.Data objects.

__init__(root: str, transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None)[source]

Initialize the UserDataset.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.

  • pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.

  • pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.

  • data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. If provided, it will be used to initialize the dataset.

download()[source]

Users are expected to provide their own data, so no download is necessary.

process()[source]

Process the user-provided data and save it in the processed file.

Users can modify this method if they have specific processing requirements.

property processed_file_names: List[str]

The name of the processed file.

property raw_file_names: List[str]

Since users provide their own data, this can be left empty or used to list expected raw files.

roksana.demotion_value(before_attack_rank: int, after_attack_rank: int) int[source]

Calculate the Demotion Value metric.

Parameters:
  • before_attack_rank (int) – The rank of the target node before the attack.

  • after_attack_rank (int) – The rank of the target node after the attack.

Returns:

Difference in rank (after_attack_rank - before_attack_rank).

A positive value indicates demotion.

Return type:

int

roksana.get_attack_method(name: str, data: Any, **kwargs) BaseAttack[source]

Retrieve an instance of the specified attack method.

Parameters:
  • name (str) – Name of the attack method (e.g., ‘degree’, ‘pagerank’, ‘random’, ‘viking’).

  • data (Any) – The graph dataset.

  • **kwargs – Additional keyword arguments for initializing the attack method.

Returns:

An instance of the requested attack method.

Return type:

BaseAttack

Raises:

ValueError – If the specified attack method is not registered.

Example

>>> from roksana.attack_methods.registry import get_attack_method
>>> attack = get_attack_method('degree', data=my_graph, param1=value1)
roksana.get_dataset_info(dataset: InMemoryDataset) Dict[str, Any][source]

Retrieve basic information about a dataset.

Parameters:

dataset (InMemoryDataset) – The dataset instance.

Returns:

A dictionary containing dataset information.

Return type:

Dict[str, Any]

roksana.get_search_method(name: str, data: Any, device: str = None, **kwargs) SearchMethod[source]

Retrieve an instance of the specified search method.

Parameters:
  • name (str) – Name of the search method (e.g., ‘gcn’, ‘gat’, ‘sage’).

  • data (Any) – The graph dataset.

  • device (str, optional) – Device to run the computations on (‘cpu’ or ‘cuda’).

  • **kwargs – Additional keyword arguments for the search method.

Returns:

An instance of the requested search method.

Return type:

SearchMethod

Raises:

ValueError – If the specified search method is not registered.

roksana.hit_at_k(retrieved: List[int], gold_set: List[int], k: int) float[source]

Calculate Hit@k metric.

Parameters:
  • retrieved (List[int]) – List of retrieved node indices.

  • gold_set (List[int]) – List of gold node indices.

  • k (int) – The k in Hit@k.

Returns:

Hit@k value (1 if at least one gold node is in the top-k, else 0).

Return type:

float

roksana.list_available_standard_datasets() List[str][source]

List all available standard datasets supported by ROKSANA.

Returns:

A list of supported dataset names.

Return type:

List[str]

roksana.load_dataset(dataset_name: str | None = None, root: str = 'data', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, data_list: List[Data] | None = None) InMemoryDataset[source]

Load a dataset, either a standard dataset or a user-provided dataset.

Parameters:
  • dataset_name (str, optional) – Name of the standard dataset to load (e.g., ‘cora’, ‘citeseer’). If None, a UserDataset should be provided via data_list.

  • root (str, optional) – Root directory where the dataset should be saved or loaded from. Defaults to ‘data’.

  • transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.

  • pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.

  • pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.

  • data_list (List[Data], optional) – A list of torch_geometric.data.Data objects. Required if dataset_name is None.

Returns:

An instance of the loaded dataset.

Return type:

InMemoryDataset

roksana.load_standard_dataset(name: str, root: str = 'data') Planetoid[source]

Load a standard dataset from PyG’s built-in datasets.

Supported datasets: ‘cora’, ‘citeseer’, ‘pubmed’, etc. Refer to PyG’s Planetoid datasets for more.

Parameters:
  • name (str) – Name of the dataset to load (e.g., ‘Cora’, ‘Citeseer’).

  • root (str, optional) – Root directory where the dataset should be saved. Defaults to ‘data’.

Returns:

An instance of the Planetoid dataset.

Return type:

Planetoid

roksana.load_user_dataset_from_files(data_dir: str, file_format: str = 'json', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None) UserDataset[source]

Load a user dataset from files in a specified directory.

Supported file formats: ‘json’, ‘csv’, ‘pickle’.

Parameters:
  • data_dir (str) – Directory containing the dataset files.

  • file_format (str, optional) – Format of the dataset files. Defaults to ‘json’.

  • transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.

  • pre_transform (Callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.

  • pre_filter (Callable, optional) – A function that takes in a torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.

Returns:

An instance of the UserDataset loaded from the files.

Return type:

UserDataset

roksana.prepare_search_set(data: Data, percentage: float = 0.1, seed: int = 42) Tuple[List[int], List[List[int]]][source]

Prepare a search set for search evaluation by selecting a percentage of nodes as queries and creating corresponding gold sets based on feature similarity.

Parameters:
  • data (Data) – The graph dataset.

  • percentage (float, optional) – Percentage of nodes to select as queries. Must be between 0 and 1. Defaults to 0.1 (10%).

  • seed (int, optional) – Seed for random number generator to ensure reproducibility. Defaults to 42.

Returns:

A tuple containing:
  • queries (List[int]): List of node indices selected as queries.

  • gold_sets (List[List[int]]): List of gold sets, where each gold set is a list of node indices

    with the same features as the corresponding query.

Return type:

Tuple[List[int], List[List[int]]]

Raises:
  • ValueError – If percentage is not between 0 and 1.

  • AttributeError – If dataset does not contain node features (data.x).

roksana.recall_at_k(retrieved: List[int], gold_set: List[int], k: int) float[source]

Calculate Recall@k metric.

Parameters:
  • retrieved (List[int]) – List of retrieved node indices.

  • gold_set (List[int]) – List of gold node indices.

  • k (int) – The k in Recall@k.

Returns:

Recall@k value.

Return type:

float