ML Researcher at RBC Borealis
I received my Ph.D. in Computer Science from Simon Fraser University (SFU) under the supervision of Prof. Yasutaka Furukawa, and my M.Sc. from Seoul National University (SNU) under Prof. Kyoung Mu Lee.
Research Interests
Jan 2026 Paper "Embedding-based Context-aware Reranker" accepted at ICLR 2026. (new)
Oct 2025 1st Place Winner, Financial Agentic Retrieval Grand Challenge (FinAgentBench), AI for Finance Symposium '25. (new)
Sep 2025 Paper "FACTS: Table Summarization via Offline Template Generation with Agentic Workflows" accepted at NeurIPS 2025 Workshop on Generative AI in Finance. (new)
Apr 2024 Joined RBC Borealis as a Machine Learning Researcher.
Mar 2024 Paper "Visual Layout Composer" accepted at CVPR 2024.
Jan 2024 Gave a talk on diffusion models for layout generation at Samsung AI Center, Toronto.
Sep 2023 Paper "PuzzleFusion" accepted as Spotlight at NeurIPS 2023.
Mar 2023 Paper "HouseDiffusion" accepted at CVPR 2023.
Jan 2023 Paper "Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting" accepted at ICLR 2023.
Contributing to research and development of machine learning models for real-world applications in finance.
Worked on a new method for automatically generating layout designs. Published at CVPR 2024.
View PublicationWorked on Iterative Multi-scale Refining Transformers for Time Series Forecasting. Published at ICLR 2023.
View PublicationWorked on finding online methods for object detection, considering the trade-off between speed and accuracy.
B.Sc. in Computer Science
Ye Yuan, Amin Shabani, Siqi Liu
TL;DR: Proposes an embedding-based context-aware reranking method for retrieval-augmented generation (RAG) that improves document ranking by incorporating query context, leading to more relevant and accurate retrieval results.
Retrieval-Augmented Generation (RAG) systems rely on retrieving relevant evidence from a corpus to support downstream generation. The common practice of splitting a long document into multiple shorter passages enables finer-grained and targeted information retrieval. However, it also introduces challenges when a correct retrieval would require inference across passages, such as resolving coreference, disambiguating entities, and aggregating evidence scattered across multiple sources. Many state-of-the-art (SOTA) reranking methods, despite utilizing powerful large pretrained language models with potentially high inference costs, still neglect the aforementioned challenges. Therefore, we propose Embedding-Based Context-Aware Reranker (EBCAR), a lightweight reranking framework operating directly on embeddings of retrieved passages with enhanced cross-passage understandings through the structural information of the passages and a hybrid attention mechanism, which captures both high-level interactions across documents and low-level relationships within each document. We evaluate EBCAR against SOTA rerankers on the ConTEB benchmark, demonstrating its effectiveness for information retrieval requiring cross-passage inference and its advantages in both accuracy and efficiency.
Ye Yuan, Amin Shabani, Siqi Liu
TL;DR: Introduces a framework for financial table summarization that combines offline template generation with agentic workflows, enabling structured and accurate natural language summaries of complex tabular financial data.
Query-focused table summarization requires generating natural language summaries of tabular data conditioned on a user query, enabling users to access insights beyond fact retrieval. Existing approaches face key limitations: table-to-text models require costly fine-tuning and struggle with complex reasoning, prompt-based LLM methods suffer from token-limit and efficiency issues while exposing sensitive data, and prior agentic pipelines often rely on decomposition, planning, or manual templates that lack robustness and scalability. To mitigate these issues, we introduce an agentic workflow, FACTS, a Fast, Accurate, and Privacy-Compliant Table Summarization approach via Offline Template Generation. FACTS produces offline templates, consisting of SQL queries and Jinja2 templates, which can be rendered into natural language summaries and are reusable across multiple tables sharing the same schema. It enables fast summarization through reusable offline templates, accurate outputs with executable SQL queries, and privacy compliance by sending only table schemas to LLMs. Evaluations on widely-used benchmarks show that FACTS consistently outperforms baseline methods, establishing it as a practical solution for real-world query-focused table summarization.
Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, Yasutaka Furukawa
TL;DR: Presents a dual diffusion model that jointly generates raster image content and corresponding vector layout for graphic design templates, enabling coherent, high-quality, and editable design generation from user-provided conditions.
This paper proposes an image-vector dual diffusion model for generative layout design. Distinct from prior efforts that mostly ignore element-level visual information, our approach integrates the power of a pre-trained large image diffusion model to guide layout composition in a vector diffusion model by providing enhanced salient region understanding and high-level inter-element relationship reasoning. Our proposed model simultaneously operates in two domains: it generates the overall design appearance in the image domain while optimizing the size and position of each design element in the vector domain. The proposed method achieves the state-of-the-art results on several datasets and enables new layout design applications.
Sepidehsadat Hosseini, Amin Shabani, Saghar Irandoust, and Yasutaka Furukawa
TL;DR: Reformulates spatial jigsaw puzzle solving as a denoising diffusion process, jointly predicting piece positions and orientations without hand-crafted features or piece ordering assumptions, achieving state-of-the-art results on challenging benchmarks.
This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system "PuzzleFusion" takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements: 1) 2D Voronoi jigsaw dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D pointset; and 2) MagicPlan dataset, a real one offered by MagicPlan from its production pipeline, where pieces are room layouts constructed by augmented reality App by real-estate consumers. The qualitative and quantitative evaluations demonstrate that our approach outperforms the competing methods by significant margins in all the tasks.
Amin Shabani, Sepidehsadat Hosseini, and Yasutaka Furukawa
TL;DR: Generates structured vector floorplans using a diffusion model that simultaneously denoises discrete room types and continuous polygon coordinates, producing diverse and realistic architectural layouts from user-specified room programs.
The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. Our task is graph-conditioned floorplan generation, a common workflow in floorplan design. We represent a floorplan as 1D polygonal loops, each of which corresponds to a room or a door. Our diffusion model employs a Transformer architecture at the core, which controls the attention masks based on the input graph-constraint and directly generates vector-graphics floorplans via a discrete and continuous denoising process. We have evaluated our approach on RPLAN dataset. The proposed approach makes significant improvements in all the metrics against the state-of-the-art with significant margins, while being capable of generating non-Manhattan structures and controlling the exact number of corners per room.
Amin Shabani, Amir Abdi, Lili Meng, and Tristan Sylvain
TL;DR: Proposes an iterative multi-scale refining framework that applies Transformer-based forecasters at multiple temporal resolutions, progressively sharpening predictions from coarse to fine scales for improved time series forecasting accuracy.
The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing architecture adaptations, and a specially-designed normalization scheme, we are able to achieve significant performance improvements, from 5.5% to 38.5% across datasets and transformer architectures, with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of each of our contributions across the architecture and methodology. Furthermore, our experiments on various public datasets demonstrate that the proposed improvements outperform their corresponding baseline counterparts.
Weilian Song, Mahsa Maleki Abyaneh, Amin Shabani, and Yasutaka Furukawa
TL;DR: Converts raster building blueprint images into structured vector representations by detecting and reconstructing walls, doors, windows, and symbols as scalable primitives, enabling downstream architectural analysis.
This paper proposes a novel vectorization algorithm for high-definition floorplans with construction-level intricate architectural details, namely a blueprint. A state-of-the-art floorplan vectorization algorithm starts by detecting corners, whose process does not scale to high-definition floorplans with thin interior walls, small door frames, and long exterior walls. Our approach 1) obtains rough semantic segmentation by running off-the-shelf segmentation algorithms; 2) learning to infer missing smaller architectural components; 3) adding the missing components by a refinement generative adversarial network; and 4) simplifying the segmentation boundaries by heuristics. We have created a vectorized blueprint database consisting of 200 production scanned blueprint images. Qualitative and quantitative evaluations demonstrate the effectiveness of the approach, making significant boost in standard vectorization metrics over the current state-of-the-art and baseline methods.
Amin Shabani, Weilian Song, Makoto Odamaki, Hirochika Fujiki, and Yasutaka Furukawa
TL;DR: Estimates 3D camera configurations for indoor panoramic images that have no visual feature overlap, leveraging structural layout constraints and Manhattan-world assumptions to recover global scene geometry.
This paper proposes an extreme Structure from Motion (SfM) algorithm for residential indoor panoramas that have little to no visual overlaps. Only a single panorama is present in a room for many cases, making the task infeasible for existing SfM algorithms. Our idea is to learn to evaluate the realism of room/door/window arrangements in the top-down semantic space. After using heuristics to enumerate possible arrangements based on door detections, we evaluate their realism scores, pick the most realistic arrangement, and return the corresponding camera poses. We evaluate the proposed approach on a dataset of 1029 panorama images with 286 houses. Our qualitative and quantitative evaluations show that an existing SfM approach completely fails for most of the houses. The proposed approach achieves the mean positional error of less than 1.0 meter for 47% of the houses and even 78% when considering the top five reconstructions.
MA Shabani, L Samadfam, MA Sadeghi
TL;DR: Recovers audio signals from silent video by analyzing subtle, high-frequency vibrations in local image regions, improving over global visual microphone methods by capturing spatially-varying sound sources more accurately.
Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.
MA Shabani
TL;DR: Introduces a layer-wise progressive knowledge distillation strategy that transfers knowledge from a teacher to a student network one layer at a time, improving the efficiency and effectiveness of model compression.
M Abdolmaleki, SG Ilchi, ES Mahmoodian, MA Shabani
TL;DR: Proves necessary and sufficient conditions for decomposing complete tripartite graphs into 5-cycles, extending known results on cycle decompositions in combinatorial graph theory.
The problem of finding necessary and sufficient conditions to decompose a complete tripartite graph Kr,s,t into 5-cycles was first considered by E.S. Mahmoodian and Maryam Mirzakhani (1995). They stated some necessary conditions and conjectured that those conditions are also sufficient. Since then, many cases of the problem have been solved by various authors; however, the case when the partite sets r≤s≤t have odd and distinct sizes remained open. We show the conjecture is true when r, s and t are all multiples of 5, t+90 ≤ 4rs/(r+s), and t ≠ s+10.
M Abdolmaleki, J. P. Hutchinson, S. Gh. Ilchi, E. S. Mahmoodian, N Matsumoto, M. A. Shabani
TL;DR: Characterizes uniquely k-list colorable planar graphs and graphs on surfaces, establishing new structural properties and bounds for list coloring in graph theory.
A graph G is called uniquely k-list colorable (UkLC) if there exists a list of colors on its vertices, say L={Sv | v ∈ V(G)}, each of size k, such that there is a unique proper list coloring of G from this list of colors. A graph G is said to have property M(k) if it is not uniquely k-list colorable. Mahmoodian and Mahdian characterized all graphs with property M(2). For k≥3 property M(k) has been studied only for multipartite graphs. Here we find bounds on M(k) for graphs embedded on surfaces, and obtain new results on planar graphs. We begin a general study of bounds on M(k) for regular graphs, as well as for graphs with varying list sizes.