
Detect main temporal trajectories in group-year DAG
Source:R/trajectories_core.R
detect_main_trajectories.RdIdentifies the most significant temporal trajectories within a group's evolution over time by building a directed acyclic graph (DAG) from similarity data and extracting highest-scoring disjoint paths using dynamic programming.
Usage
detect_main_trajectories(
groups_cumulative_trajectories,
group,
jaccard_min = 0.05,
intra_min = 0.1,
k_out = 2,
alpha = 1,
beta = 0.1,
top_M = 5,
min_len = 3,
use_docs_per_group = TRUE
)Arguments
- groups_cumulative_trajectories
List containing three components:
groups_similarity: Nested list with similarity data for each group, containing edges withfrom,to,weight(Jaccard), anddocumentsgroups_attributes: Nested list with node attributes for each group, containingquantity_papers,prop_tracked_intra_group,tracked_documents,PY.sd, andnetwork_untildocs_per_group: Data frame mapping group IDs to document IDs for accurate unique document counting
- group
Character ID of the group to analyze (e.g., "component1_g01")
- jaccard_min
Minimum Jaccard similarity for edges (default: 0.05). Higher values create sparser graphs with stronger connections.
- intra_min
Minimum proportion of tracked documents within group for nodes (default: 0.10). Higher values filter out weaker nodes.
- k_out
Maximum number of outgoing edges to keep per node (default: 2). Controls graph sparsity - lower values create simpler backbone structures.
- alpha
Weight for edge strength in path scoring (default: 1). Higher values emphasize transition strength over node quality.
- beta
Per-step persistence bonus in path scoring (default: 0.1). Higher values encourage longer trajectories.
- top_M
Maximum number of disjoint trajectories to extract (default: 5)
- min_len
Minimum number of distinct years for valid trajectory (default: 3)
- use_docs_per_group
Whether to use document IDs for accurate unique document counting (default: TRUE). If FALSE, uses approximation.
Value
A list with two components:
graph: An igraph object representing the temporal DAG with scoring attributes and optional document IDstrajectories: A tibble of detected trajectories sorted by score, with columns:traj_id: Trajectory identifier ("tr1", "tr2", ...)start,end: First and last year of the trajectorylength: Number of distinct years in the trajectorynodes: List of node names along the path (e.g., "y2009g03")score: Total path score from dynamic programmingmean_w: Mean edge score along the pathsum_docs: Count of unique documents covered by the pathmean_size: Mean node size (quantity_papers × proportion tracked)mean_PYsd: Mean publication year standard deviation
Details
This function implements a comprehensive pipeline for detecting significant temporal trajectories in research group evolution:
Algorithm Overview
Build Temporal DAG: Constructs a directed acyclic graph where:
Nodes represent group-year combinations filtered by
intra_minquality thresholdEdges represent transitions between consecutive years filtered by
jaccard_minGraph is sparsified to top
k_outedges per node
Score Components: Computes node and edge scores:
Node score: \(s_v = \log(1 + \text{quantity\_papers}_v \times \text{prop\_tracked\_intra\_group}_v)\)
Edge score: \(s_e = \text{weight}_e \times \log(1 + \text{documents}_e)\)
Extract Trajectories: Uses dynamic programming to find heaviest paths:
Path score: \(\text{best}(v) = \max\left( s_v, \max_{u \to v} \left( \text{best}(u) + s_v + \alpha \cdot s_{(u,v)} + \beta \right) \right)\)
Iteratively extracts top
top_Mdisjoint trajectoriesTrajectories must span at least
min_lendistinct years
Count Documents: Calculates unique document coverage:
If
use_docs_per_group = TRUE: Exact count via set union of document IDsOtherwise: Approximation: \(\sum \text{node documents} - \sum \text{edge documents}\)
Parameter Tuning Guidance
For smoother, longer trajectories: Increase
beta(persistence bonus)For transition-focused scoring: Increase
alpha(edge weight)For denser connectivity: Lower
jaccard_minor increasek_outFor higher quality nodes: Increase
intra_minFor exact document counts: Ensure
use_docs_per_group = TRUEand providedocs_per_groupdata
See also
filter_trajectories() for post-processing detected trajectories,
plot_group_trajectories_lines_2d() and plot_group_trajectories_lines_3d()
for visualization
Examples
if (FALSE) { # \dontrun{
# Basic usage with default parameters
trajectories <- detect_main_trajectories(
groups_cumulative_trajectories = my_data,
group = "component1_g01"
)
# Tuned for longer, transition-focused trajectories
trajectories <- detect_main_trajectories(
groups_cumulative_trajectories = my_data,
group = "component1_g01",
jaccard_min = 0.03, # More permissive connectivity
k_out = 3, # Denser backbone
alpha = 1.5, # Emphasize edge strength
beta = 0.2, # Encourage longer paths
top_M = 8, # Extract more trajectories
min_len = 4 # Require longer trajectories
)
# Access results
graph <- trajectories$graph
trajectory_data <- trajectories$trajectories
# Plot the top trajectory
top_trajectory <- trajectory_data[1, ]
} # }