Tutorial on How to Use FIM-PP

Tutorial on How to Use FIM-PP#

This notebook shows the recommended Hugging Face workflow for the point-process model from [BSC+26]: load the pretrained model with AutoModel.from_pretrained(...), download a small example dataset from the Hub, prepare the context/inference tensors, visualize the inferred intensities, and finish with a fine-tuning command template.

%matplotlib inline

from pathlib import Path
import warnings

import torch
from huggingface_hub import snapshot_download
from transformers import AutoModel

from point_process_tutorial_helper import load_hawkes_tensors, move_to_device, plot_intensity_comparison, prepare_hawkes_batch

warnings.filterwarnings("ignore")
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    if torch.backends.mps.is_available():
        print("MPS is not yet supported for this FIM-PP tutorial path; using CPU instead.")
    device = torch.device("cpu")
device

device(type='cuda')

Load the Pretrained Model#

The standardized user-facing path is now the Transformers AutoModel interface.

model_root = Path(snapshot_download(repo_id="FIM4Science/FIM-PP", repo_type="model"))
model = AutoModel.from_pretrained(model_root, trust_remote_code=True)
model = model.to(device)
model.eval()
model_root

PosixPath('/home/manuel/.cache/huggingface/hub/models--FIM4Science--FIM-PP/snapshots/9b43be14fc922bb389c873bf242fa9dd642662a5')

Download Example Data#

The tutorial dataset is stored as raw tensors on Hugging Face. We download the snapshot and load the .pt files directly.

dataset_root = Path(snapshot_download(repo_id="FIM4Science/10D-Hawkes", repo_type="dataset"))
dataset_root

PosixPath('/home/manuel/.cache/huggingface/hub/datasets--FIM4Science--10D-Hawkes/snapshots/63a12564d24e7ab41e0381b62528781a95036884')

tensors = load_hawkes_tensors(dataset_root)
sorted(tensors)

['event_times', 'event_types']

Build a Context / Inference Batch#

We hold out a single path for inference and use the remaining paths as context. The helper also builds a dense evaluation grid for plotting the intensity curves between events.

batch = prepare_hawkes_batch(tensors, sample_idx=0, inference_path_idx=0, num_points_between_events=10)
batch = move_to_device(batch, device)

for key, value in batch.items():
    if torch.is_tensor(value):
        print(f"{key}: {tuple(value.shape)}")
    else:
        print(f"{key}: {value}")

context_event_times: (1, 1999, 100, 1)
context_event_types: (1, 1999, 100, 1)
context_seq_lengths: (1, 1999)
inference_event_times: (1, 1, 100, 1)
inference_event_types: (1, 1, 100, 1)
inference_seq_lengths: (1, 1)
intensity_evaluation_times: (1, 1, 1100)
num_marks: 10

Run Zero-Shot Inference#

with torch.no_grad():
    output = model(batch)

sorted(output.keys())

['intensity_function', 'losses', 'predicted_intensity_values']

fig = plot_intensity_comparison(output, batch, path_idx=0)
fig

_images/69188d3598eb6c7d977ce8de0827130e86f87c16a41c5099d8189860aebb0503.png

Fine-Tuning Starting from FIM-PP#

A short fine-tuning run can be started with the existing Hawkes entrypoint. The point-process checkpoint is the initialization source, while the dataset can be either a local tensor folder or an EasyTPP dataset id.

For this tutorial, the 10D-Hawkes snapshot is used for inference and visualization. The fine-tuning CLI was smoke-tested with easytpp/retweet, which matches the expected training layout directly.

Use the downloaded model directory from the earlier snapshot_download(...) call as --resume_model. The script accepts either that directory or a specific file inside it such as model-checkpoint.pth.

python scripts/hawkes/fim_finetune.py \
  --config configs/train/hawkes/david.yaml \
  --dataset easytpp/retweet \
  --resume_model /absolute/path/to/downloaded/FIM-PP \
  --save_dir results/finetuned_cdiff \
  --epochs 200 \
  --val-every 10

If you use the notebook variable directly, --resume_model should point to model_root.

The fine-tuned model is written under save_dir/<dataset_name>/<timestamp>/. With the command above, a run on easytpp/retweet will be stored in a directory like results/finetuned_cdiff/retweet/260401-1430/, and the exported checkpoint will appear in best-model/ inside that folder.

If --save_dir is omitted, the script defaults to results/finetuned_cdiff/<dataset_name>/<timestamp>/.

For local debugging, the lower-level fallback fim.models.hawkes.FIMHawkes.load_model(...) is still available, but the primary public workflow should use AutoModel.from_pretrained(...).