Amin Shabani

Amin Shabani

ML Researcher at RBC Borealis

I received my Ph.D. in Computer Science from Simon Fraser University (SFU) under the supervision of Prof. Yasutaka Furukawa, and my M.Sc. from Seoul National University (SNU) under Prof. Kyoung Mu Lee.

Amin Shabani Profile Picture

Research Interests

Generative Models Large Language Models Computer Vision Time Series Forecasting

News

Jan 2026 Paper "Embedding-based Context-aware Reranker" accepted at ICLR 2026. (new)

Oct 2025 1st Place Winner, Financial Agentic Retrieval Grand Challenge (FinAgentBench), AI for Finance Symposium '25. (new)

Sep 2025 Paper "FACTS: Table Summarization via Offline Template Generation with Agentic Workflows" accepted at NeurIPS 2025 Workshop on Generative AI in Finance. (new)

Apr 2024 Joined RBC Borealis as a Machine Learning Researcher.

Mar 2024 Paper "Visual Layout Composer" accepted at CVPR 2024.

Jan 2024 Gave a talk on diffusion models for layout generation at Samsung AI Center, Toronto.

Sep 2023 Paper "PuzzleFusion" accepted as Spotlight at NeurIPS 2023.

Mar 2023 Paper "HouseDiffusion" accepted at CVPR 2023.

Jan 2023 Paper "Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting" accepted at ICLR 2023.

Photos

Work Experience

Machine Learning Researcher

Apr 2024 – Present

Contributing to research and development of machine learning models for real-world applications in finance.

Research Scientist/Engineer Intern

Jun 2023 – Nov 2023

Worked on a new method for automatically generating layout designs. Published at CVPR 2024.

View Publication

Machine Learning Research Intern

Jan 2022 – May 2022

Worked on Iterative Multi-scale Refining Transformers for Time Series Forecasting. Published at ICLR 2023.

View Publication

Research and Development Engineer

Feb 2017 – Aug 2017

Worked on finding online methods for object detection, considering the trade-off between speed and accuracy.

Education

Simon Fraser University

2019 – 2025

Ph.D. in Computer Science

Ph.D. Thesis

Seoul National University

2017 – 2019

M.Sc. in Computer Engineering

M.Sc. Thesis

Sharif University of Technology

2012 – 2017

B.Sc. in Computer Science

Publications

Embedding-based Context-aware Reranker diagram

Embedding-based Context-aware Reranker

ICLR 2026

Ye Yuan, Amin Shabani, Siqi Liu

TL;DR: Proposes an embedding-based context-aware reranking method for retrieval-augmented generation (RAG) that improves document ranking by incorporating query context, leading to more relevant and accurate retrieval results.

Show Abstract

Retrieval-Augmented Generation (RAG) systems rely on retrieving relevant evidence from a corpus to support downstream generation. The common practice of splitting a long document into multiple shorter passages enables finer-grained and targeted information retrieval. However, it also introduces challenges when a correct retrieval would require inference across passages, such as resolving coreference, disambiguating entities, and aggregating evidence scattered across multiple sources. Many state-of-the-art (SOTA) reranking methods, despite utilizing powerful large pretrained language models with potentially high inference costs, still neglect the aforementioned challenges. Therefore, we propose Embedding-Based Context-Aware Reranker (EBCAR), a lightweight reranking framework operating directly on embeddings of retrieved passages with enhanced cross-passage understandings through the structural information of the passages and a hybrid attention mechanism, which captures both high-level interactions across documents and low-level relationships within each document. We evaluate EBCAR against SOTA rerankers on the ConTEB benchmark, demonstrating its effectiveness for information retrieval requiring cross-passage inference and its advantages in both accuracy and efficiency.

FACTS table summarization pipeline overview

FACTS: Table Summarization via Offline Template Generation with Agentic Workflows

NeurIPS Workshop, 2025

Ye Yuan, Amin Shabani, Siqi Liu

TL;DR: Introduces a framework for financial table summarization that combines offline template generation with agentic workflows, enabling structured and accurate natural language summaries of complex tabular financial data.

Show Abstract

Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights beyond fact retrieval. Existing approaches face key limitations: table-to-text models require costly fine-tuning and struggle with complex reasoning, prompt-based LLM methods suffer from token-limit and efficiency issues while exposing sensitive data, and prior agentic pipelines often rely on decomposition, planning, or manual templates that lack robustness and scalability. To mitigate these issues, we introduce an agentic workflow, FACTS, a Fast, Accurate, and Privacy-Compliant Table Summarization approach via Offline Template Generation. FACTS produces offline templates, consisting of SQL queries and Jinja2 templates, which can be rendered into natural language summaries and are reusable across multiple tables sharing the same schema. It enables fast summarization through reusable offline templates, accurate outputs with executable SQL queries, and privacy compliance by sending only table schemas to LLMs. Evaluations on widely-used benchmarks show that FACTS consistently outperforms baseline methods, establishing it as a practical solution for real-world query-focused table summarization.

Visual Layout Composer dual diffusion model overview

Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation

CVPR 2024

Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, Yasutaka Furukawa

TL;DR: Presents a dual diffusion model that jointly generates raster image content and corresponding vector layout for graphic design templates, enabling coherent, high-quality, and editable design generation from user-provided conditions.

Show Abstract

This paper proposes an image-vector dual diffusion model for generative layout design. Distinct from prior efforts that mostly ignore element-level visual information, our approach integrates the power of a pre-trained large image diffusion model to guide layout composition in a vector diffusion model by providing enhanced salient region understanding and high-level inter-element relationship reasoning. Our proposed model simultaneously operates in two domains: it generates the overall design appearance in the image domain while optimizing the size and position of each design element in the vector domain. The proposed method achieves the state-of-the-art results on several datasets and enables new layout design applications.

PuzzleFusion jigsaw puzzle solving with diffusion models

PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

NeurIPS 2023 (Spotlight)

Sepidehsadat Hosseini, Amin Shabani, Saghar Irandoust, and Yasutaka Furukawa

TL;DR: Reformulates spatial jigsaw puzzle solving as a denoising diffusion process, jointly predicting piece positions and orientations without hand-crafted features or piece ordering assumptions, achieving state-of-the-art results on challenging benchmarks.

Show Abstract

This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system "PuzzleFusion" takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements: 1) 2D Voronoi jigsaw dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D pointset; and 2) MagicPlan dataset, a real one offered by MagicPlan from its production pipeline, where pieces are room layouts constructed by augmented reality App by real-estate consumers. The qualitative and quantitative evaluations demonstrate that our approach outperforms the competing methods by significant margins in all the tasks.

HouseDiffusion vector floorplan generation examples

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

CVPR 2023

Amin Shabani, Sepidehsadat Hosseini, and Yasutaka Furukawa

TL;DR: Generates structured vector floorplans using a diffusion model that simultaneously denoises discrete room types and continuous polygon coordinates, producing diverse and realistic architectural layouts from user-specified room programs.

Show Abstract

The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room.

Scaleformer iterative multi-scale architecture

Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting

ICLR 2023

Amin Shabani, Amir Abdi, Lili Meng, and Tristan Sylvain

TL;DR: Proposes an iterative multi-scale refining framework that applies Transformer-based forecasters at multiple temporal resolutions, progressively sharpening predictions from coarse to fine scales for improved time series forecasting accuracy.

Show Abstract

The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing architecture adaptations, and a specially-designed normalization scheme, we are able to achieve significant performance improvements, from 5.5% to 38.5% across datasets and transformer architectures, with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of each of our contributions across the architecture and methodology. Furthermore, our experiments on various public datasets demonstrate that the proposed improvements outperform their corresponding baseline counterparts.

Vectorizing Building Blueprints method results

Vectorizing Building Blueprints

ACCV 2022

Weilian Song, Mahsa Maleki Abyaneh, Amin Shabani, and Yasutaka Furukawa

TL;DR: Converts raster building blueprint images into structured vector representations by detecting and reconstructing walls, doors, windows, and symbols as scalable primitives, enabling downstream architectural analysis.

Show Abstract

This paper proposes a novel vectorization algorithm for high-definition floorplans with construction-level intricate architectural details, namely a blueprint. A state-of-the-art floorplan vectorization algorithm starts by detecting corners, whose process does not scale to high-definition floorplans with thin interior walls, small door frames, and long exterior walls. Our approach 1) obtains rough semantic segmentation by running off-the-shelf segmentation algorithms; 2) learning to infer missing smaller architectural components; 3) adding the missing components by a refinement generative adversarial network; and 4) simplifying the segmentation boundaries by heuristics. We have created a vectorized blueprint database consisting of 200 production scanned blueprint images. Qualitative and quantitative evaluations demonstrate the effectiveness of the approach, making significant boost in standard vectorization metrics over the current state-of-the-art and baseline methods.

Extreme SfM indoor panorama 3D reconstruction

Extreme Structure from Motion for Indoor Panoramas without Visual Overlaps

ICCV 2021

Amin Shabani, Weilian Song, Makoto Odamaki, Hirochika Fujiki, and Yasutaka Furukawa

TL;DR: Estimates 3D camera configurations for indoor panoramic images that have no visual feature overlap, leveraging structural layout constraints and Manhattan-world assumptions to recover global scene geometry.

Show Abstract

This paper proposes an extreme Structure from Motion (SfM) algorithm for residential indoor panoramas that have little to no visual overlaps. Only a single panorama is present in a room for many cases, making the task infeasible for existing SfM algorithms. Our idea is to learn to evaluate the realism of room/door/window arrangements in the top-down semantic space. After using heuristics to enumerate possible arrangements based on door detections, we evaluate their realism scores, pick the most realistic arrangement, and return the corresponding camera poses. We evaluate the proposed approach on a dataset of 1029 panorama images with 286 houses. Our qualitative and quantitative evaluations show that an existing SfM approach completely fails for most of the houses. The proposed approach achieves the mean positional error of less than 1.0 meter for 47% of the houses and even 78% when considering the top five reconstructions.

Local Visual Microphones sound extraction from video

Local Visual Microphones: Improved Sound Extraction from Silent Video

BMVC 2017

MA Shabani, L Samadfam, MA Sadeghi

TL;DR: Recovers audio signals from silent video by analyzing subtle, high-frequency vibrations in local image regions, improving over global visual microphone methods by capturing spatially-varying sound sources more accurately.

Show Abstract

Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.

Layer-wise Progressive Knowledge Distillation diagram

Layer-wise Progressive Knowledge Distillation

2019

MA Shabani

TL;DR: Introduces a layer-wise progressive knowledge distillation strategy that transfers knowledge from a teacher to a student network one layer at a time, improving the efficiency and effectiveness of model compression.

Complete tripartite graph 5-cycle decomposition illustration

On decomposing complete tripartite graphs into 5-cycles

arXiv preprint

M Abdolmaleki, SG Ilchi, ES Mahmoodian, MA Shabani

TL;DR: Proves necessary and sufficient conditions for decomposing complete tripartite graphs into 5-cycles, extending known results on cycle decompositions in combinatorial graph theory.

Show Abstract

The problem of finding necessary and sufficient conditions to decompose a complete tripartite graph Kr,s,t into 5-cycles was first considered by E.S. Mahmoodian and Maryam Mirzakhani (1995). They stated some necessary conditions and conjectured that those conditions are also sufficient. Since then, many cases of the problem have been solved by various authors; however, the case when the partite sets r≤s≤t have odd and distinct sizes remained open. We show the conjecture is true when r, s and t are all multiples of 5, t+90 ≤ 4rs/(r+s), and t ≠ s+10.

Uniquely k-list colorable graph examples

On Uniquely k-List Colorable Planar Graphs, Graphs on Surfaces, and Regular Graphs

Graphs and Combinatorics, 2018

M Abdolmaleki, J. P. Hutchinson, S. Gh. Ilchi, E. S. Mahmoodian, N Matsumoto, M. A. Shabani

TL;DR: Characterizes uniquely k-list colorable planar graphs and graphs on surfaces, establishing new structural properties and bounds for list coloring in graph theory.

Show Abstract

A graph G is called uniquely k-list colorable (UkLC) if there exists a list of colors on its vertices, say L={Sv | v ∈ V(G)}, each of size k, such that there is a unique proper list coloring of G from this list of colors. A graph G is said to have property M(k) if it is not uniquely k-list colorable. Mahmoodian and Mahdian characterized all graphs with property M(2). For k≥3 property M(k) has been studied only for multipartite graphs. Here we find bounds on M(k) for graphs embedded on surfaces, and obtain new results on planar graphs. We begin a general study of bounds on M(k) for regular graphs, as well as for graphs with varying list sizes.

Get In Touch

Contact

Location — Vancouver, Canada

Connect