publications | Zhanxing Zhu

2026

ICML

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Bochen Lyu, Yiyang Jia, Xiaohao Cai, and Zhanxing Zhu

In International Conference for Machine Learning (ICML), 2026

Abs PDF

Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primary approaches to this end. In this work, we specifically examine RL with process rewards and SFT for learning k-sparse Boolean functions with a one-layer transformer through intermediate reasoning steps akin to CoT. In particular, we consider k-sparse Boolean functions that can be recursively decomposed into fixed 2-sparse Boolean functions. We first analyze the learning dynamics of fine-tuning the transformer via either RL with process reward or SFT in a unified way. This allows us to identify sufficient conditions for the transformer to provably learn the general sparse Boolean functions. We then verify that these conditions hold for three basic examples, including k-\textttPARITY, k-\textttAND, and k-\textttOR, thus demonstrating their learnability via both RL with process reward and SFT. Notably, we reveal that RL and SFT exhibit distinct learning behaviors: RL learns the whole CoT chain simultaneously, whereas SFT naturally learns the CoT chain step by step. Overall, our findings provide theoretical insights into the mechanisms underlying RL with process reward and SFT and how they differ in triggering the CoT capabilities of transformers.
ICML

Modelling Attention with Aitchison Geometry: Token Distinguishability and Temperature Scaling

Sam Hilton-Jones, Tim Norman, and Zhanxing Zhu

In International Conference for Machine Learning (ICML), 2026

Abs PDF

The attention mechanism with softmax normalisation is a foundational component of Transformer-based large language models. However, with very long contexts, attention scores are known to diminish, raising fundamental questions about token distinguishability and how it can be preserved. In this work, we provide a formal characterisation of token distinguishability in attention as a function of context length and embedding dimension. We introduce Aitchison distance to quantify relative differences among attention probabilities, and show that, with Gaussian queries and keys, even in the long-context regime, token distinguishability converges to a finite, non-zero limit rather than vanishing. Leveraging the linear relationship between temperature scaling and Aitchison distance, we derive a theoretical lower bound of Ω(\sqrt\log L) on the logit scaling required to produce a sharp attention distribution. Finally, we demonstrate that Aitchison distance provides a principled and practical alternative to entropy for monitoring training and inference, as it captures the full compositional structure, including the smaller components of the attention probabilities.
ICML

SlaClip: Gradient Norm Slacks can be Indicator for Adaptive Clipping in DP-SGD

Shuyan Zou, Shaowei Wang, Zhanxing Zhu, Jin Li, Changyu Dong, Vladimiro Sassone, and 1 more author

In International Conference for Machine Learning (ICML) Spotlight, 2026

Abs PDF

Differentially private stochastic gradient descent (DP-SGD) achieves privacy by clipping per-sample gradients and injecting Gaussian noise, but its utility is highly sensitive to the choice of the clipping threshold C. A fixed C often degrades performance and necessitates repeated empirical calibration. Existing adaptive clipping methods either modify the gradient update in vanilla DP-SGD, causing additional tuning or optimization overhead, or introduce separate query mechanisms to monitor gradient statistics. In contrast, we leverage the \emphslack information induced by the standard clipping operation, an overlooked signal in prior work, and show that it provides an effective indication for adapting C. In light of this, we propose \emphSlaClip, a privacy-preserving adaptive clipping strategy using a post-hoc \emphSlack Indicator. Under the same training configuration, both \emphSlaClip-DP-SGD and vanilla DP-SGD instantiate the identical Gaussian mechanism, and therefore incur equivalent privacy cost. Moreover, it requires minimal task-specific hyperparameter tuning and exhibits robust performance improvement across diverse datasets and model architectures.
ICLR

MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning

Xiaojing Zhang, Haifeng Wu, Gang He, Jiyang Shen, Bochen Lyu, and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2026

Abs PDF

Large Language Models have demonstrated superior reasoning capabilities by generating step-by-step reasoning in natural language before deriving the final answer. Recently, Geiping et al. introduced 3.5B-Huginn as an alternative to this paradigm, a depth-recurrent Transformer that increases computational depth per token by reusing a recurrent block in latent space. Despite its performance gains with increasing recurrences, this approach is inadequate for tasks demanding exploration and adaptivity, a limitation arising from its single, chain-like propagation mechanism. To address this, we propose a novel dynamic multi-branches routing approach for Huginn, termed as Mixture-of-Depth-Recurrent (MoDr) Transformer, which enables effective exploration of the solution space by shifting chain-like latent reasoning into a LoRA-based multi-branch dynamic relay mode with a learnable hard-gate routing. Meanwhile, we introduce an auxiliary-loss-free load balancing strategy to mitigate the potential routing collapse. Our empirical results reveal that MoDr achieves average accuracy improvements of +7.2% and +2.48% over the original Huginn model and its fine-tuned variant, respectively, across various mathematical reasoning benchmarks and improvements of +21.21% and +1.52% on commonsense reasoning benchmarks.
ICLR

Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning

Yue Cheng, Jiajun Zhang, Xiaohui Gao, Weiwei Xing, and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2026

Abs HTML PDF Code

Long-tailed distributions are prevalent in real-world semi-supervised learning (SSL), where pseudo-labels tend to favor majority classes, leading to degraded generalization. Although numerous long-tailed SSL (LTSSL) methods have been proposed, the underlying mechanisms of class bias remain underexplored. In this work, we investigate LTSSL through the lens of learning dynamics and introduce the notion of baseline images to characterize accumulated bias during training. We provide a step-wise decomposition showing that baseline predictions are determined solely by shallow bias terms, making them reliable indicators of class priors. Building on this insight, we propose a novel framework, DyTrim, which leverages baseline images to guide data pruning. Specifically, we perform class-aware pruning on labeled data to balance class distribution and label-agnostic soft pruning with confidence filtering on unlabeled data to mitigate error accumulation. Theoretically, we show that our method implicitly realizes risk reweighting, effectively suppressing class bias. Extensive experiments on public benchmarks show that DyTrim consistently enhances the performance of existing LTSSL methods by improving representation quality and prediction accuracy.
ICLR

Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction

Shilong Tao, Zhe Feng, Shaohan Chen, Weichen Zhang, Zhanxing Zhu, and Yunhuai Liu

In International Conference on Learning Representation (ICLR), 2026

Abs PDF

Fluid-solid interaction (FSI) problems are fundamental in many scientific and engineering applications, yet effectively capturing the highly nonlinear two-way interactions remains a significant challenge. Most existing deep learning methods are limited to simplified one-way FSI scenarios, often assuming rigid and static solid to reduce complexity. Even in two-way setups, prevailing approaches struggle to capture dynamic, heterogeneous interactions due to the lack of cross-domain awareness. In this paper, we introduce \textbfFisale, a data-driven framework for handling complex two-way \textbfFSI problems. It is inspired by classical numerical methods, namely the Arbitrary Lagrangian–Eulerian (\textbfALE) method and the partitioned coupling algorithm. Fisale explicitly models the coupling interface as a distinct component and leverages multiscale latent ALE grids to provide unified, geometry-aware embeddings across domains. A partitioned coupling module (PCM) further decomposes the problem into structured substeps, enabling progressive modeling of nonlinear interdependencies. Compared to existing models, Fisale introduces a more flexible framework that iteratively handles complex dynamics of solid, fluid and their coupling interface on a unified representation, and enables scalable learning of complex two-way FSI behaviors. Experimentally, Fisale excels in three reality-related challenging FSI scenarios, covering 2D, 3D and various tasks. The code is included in the supplementary material for reproductivity.
ICLR

MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation

Zhe Feng, Shilong Tao, Haonan Sun, Shaohan Chen, Zhanxing Zhu, and Yunhuai Liu

In International Conference on Learning Representation (ICLR), 2026

Abs PDF

Deep learning-based approaches, particularly graph neural networks (GNNs), have gained prominence in simulating flexible deformations and contacts of solids, due to their ability to handle unstructured physical fields and nonlinear regression on graph structures. However, existing GNNs commonly represent meshes with graphs built solely from vertices and edges. These approaches tend to overlook higher-dimensional spatial features, e.g. 2D facets and 3D cells, from the original geometry. As a result, it is challenging to accurately capture boundary representations and volumetric characteristics, though this information is critically important for modeling contact interactions and internal physical quantity propagation, particularly under sparse mesh discretization. In this paper, we introduce MAVEN, a mesh-aware volumetric encoding network for simulating 3D flexible deformation, which explicitly models geometric mesh elements of higher dimension to achieve a more accurate and natural physical simulation. MAVEN establishes learnable mappings among 3D cells, 2D facets, and vertices, enabling flexible mutual transformations. Explicit geometric features are incorporated into the model to alleviate the burden of implicitly learning geometric patterns. Experimental results show that MAVEN consistently achieves state-of-the-art performance across established datasets and a novel metal stretch-bending task featuring large deformations and prolonged contacts.
KDD

FilDeep: Learning Large Deformations of Elastic-Plastic Solids with Multi-Fidelity Data

Jianheng Tang, Shilong Tao, Zhe Feng, Haonan Sun, Menglu Wang, Zhanxing Zhu, and 1 more author

In 31st SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) - Applied Data Science Track, 2026

Abs PDF

The scientific computation of large deformations in elastic-plastic solids is crucial in various manufacturing applications. Traditional numerical methods exhibit several inherent limitations, prompting Deep Learning (DL) as a promising alternative. The effectiveness of current DL techniques typically depends on the availability of high-quantity and high-accuracy datasets, which are yet difficult to obtain in large deformation problems. During the dataset construction process, a dilemma stands between data quantity and data accuracy, leading to suboptimal performance in the DL models. To address this challenge, we focus on a representative application of large deformations, the stretch bending problem, and propose FilDeep, a Fidelity-based Deep Learning framework for large Deformation of elastic-plastic solids. Our FilDeep aims to resolve the quantity-accuracy dilemma by simultaneously training with both low-fidelity and high-fidelity data, where the former provides greater quantity but lower accuracy, while the latter offers higher accuracy but in less quantity. In FilDeep, we provide meticulous designs for the practical large deformation problem. Particularly, an attention-based cross-fidelity module is proposed to effectively capture long-range physical interactions across MF data. To our knowledge, our FilDeep presents the first DL framework for large deformation problems using MF data. Extensive experiments demonstrate that our FilDeep consistently achieves state-of-the-art performance and can be efficiently deployed in real-world manufacturing applications.
AAAI

ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction

Ruochen Li, Zhanxing Zhu, Tanqiu Qiao, and Hubert P. H. Shum

In The 40th Annual AAAI Conference on Artificial Intelligence (AAAI), 2026

Abs PDF

Pedestrian trajectory prediction is critical for ensuring safety in autonomous driving, surveillance systems, and urban planning applications. While early approaches primarily focus on one-hop pairwise relationships, recent studies attempt to capture high-order interactions by stacking multiple Graph Neural Network (GNN) layers. However, these approaches face a fundamental trade-off: insufficient layers may lead to under-reaching problems that limit the model’s receptive field, while excessive depth can result in prohibitive computational costs. We argue that an effective model should be capable of adaptively modeling both explicit one-hop interactions and implicit high-order dependencies, rather than relying solely on architectural depth. To this end, we propose ViTE (Virtual graph Trajectory Expert router), a novel framework for pedestrian trajectory prediction. ViTE consists of two key modules: a Virtual Graph that introduces dynamic virtual nodes to model long-range and high-order interactions without deep GNN stacks, and an Expert Router that adaptively selects interaction experts based on social context using a Mixture-of-Experts design. This combination enables flexible and scalable reasoning across varying interaction patterns. Experiments on three benchmarks (ETH/UCY, NBA, and SDD) demonstrate that our method consistently achieves state-of-the-art performance, validating both its effectiveness and practical efficiency.
WWW

Diffusion-based Kriging Model with Graph-enhanced Attention

Mingtao Zhang, Guoli Yang, Zhanxing Zhu, Guangyin Jin, Mengzhu Wang, and Xiaoying Bai

In The Web Conference (WWW), 2026

Abs PDF

In web-based systems, elements are commonly organized within a graph structure, with each node collecting essential spatio-temporal data. Examples include websites on the World Wide Web, traffic monitors in transportation networks, or sensors in the Internet of Things (IoT). However, sensors are typically deployed sparsely and unevenly, leaving the remaining nodes unobserved. The spatio- temporal kriging task, which infers values at unobserved nodes from observed ones, has thus attracted significant research interest. Due to limitations such as reliance on static graph structures and iterative Graph Convolution Network (GCN) frameworks, accurate kriging remains challenging. To address these issues, we propose a Diffusion-based Kriging Model with Graph-enhanced Attention (DKM-GA). Our approach first introduces a graph-enhanced at- tention mechanism that dynamically learns more accurate graph structures by combining predefined graph knowledge with global node value similarities. It is then integrated into a diffusion-based framework, which is tailored for the reliance of attention on known values. Therefore, the framework progressively refines the target values using correlated nodes, and the graph-enhanced attention selects more relevant neighbors based on the refined values. Fur- thermore, a node-based rescaling strategy is introduced to align the inference phase graphs to the training ones. Experiments on eight real-world datasets demonstrate that DKM-GA achieves superior performance, reducing estimation errors by up to 12.66%. Moreover, our analysis identifies three practical scenarios where the model delivers greater performance gains, even achieving 19.51% improve- ments on datasets that show minor gains under standard settings. These results highlight the effectiveness and potential of our model, while the scenarios provide settings for more comprehensive eval- uations in terms of performance and robustness.

2025

NeurIPS

Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis

Bochen Lyu, Xiaojing Zhang, Fangyi Zheng, He Wang, Zheng Wang, and Zhanxing Zhu

In Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS), 2025

Abs PDF

This paper establishes a continuous time approximation, a piece-wise continuous differential equation, for the discrete Heavy-Ball (HB) momentum method with explicit discretization error. Investigating continuous differential equations has been a promising approach for studying the discrete optimization methods. Despite the crucial role of momentum in gradient-based optimization methods, the gap between the original dynamics and the continuous time approximations due to the discretization error has not been comprehensively bridged yet. In this work, we study the HB momentum method in continuous time while putting more focus on the discretization error to provide additional theoretical tools to this area. In particular, we design a first-order piece-wise continuous differential equation, where we add a number of counter terms to account for the discretization error explicitly. As a result, we provide a continuous time model for the HB momentum method that allows the control of discretization error to arbitrary order of the learning rate. As an application, we leverage it to find a new implicit regularization of the directional smoothness and investigate the implicit bias of HB for diagonal linear networks, indicating how our results can be used in deep learning. Our theoretical findings are further supported by numerical experiments.
TPAMI

Analyzing the Implicit Bias of Adversarial Training from a Generalized Margin Perspective

Bochen Lyu and Zhanxing Zhu

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

Abs DOI PDF

Adversarial training has been empirically demonstrated as an effective strategy to improve the robustness of deep neural networks (DNNs) against adversarial examples. However, the underlying reason of its effectiveness is still non-transparent. In this paper we conduct both extensive theoretical and empirical analysis on the implicit bias induced by adversarial training from a generalized margin perspective. Our results focus on adversarial training for homogeneous DNNs. In particular, (i). For deep linear networks with ℓp-norm perturbation, we show that weight matrices of adjacent layers get aligned and the converged parameters maximize the margin of adversarial examples, which can be further viewed as a generalized margin of the original dataset that can be achieved by an interpolation solution between ℓ2-SVM and ℓq-SVM where 1/p+1/q=1. (ii). For general homogeneous DNNs, including both linear and nonlinear ones, we investigate adversarial training with a variety of adversarial perturbations in a unified manner. Specifically, we show that the direction of the limit point of parameters converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Additionally, as an application of this general result for two special linear homogeneous DNNs, diagonal linear networks and linear convolutional networks, we show that adversarial training with ℓp-norm perturbation equivalently minimizes an interpolation norm that depends on the depth, the architecture, and the value of p in the predictor space. Extensive experiments are conducted to verify theoretical claims. Our results theoretically provide the basis for the longstanding folklore [1] that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provide insights for designing new robust training strategies.
ICML

Unisoma: A Unified Transformer-based Solver for Multi-Solid Systems

Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, and Yunhuai Liu

In International Conference for Machine Learning (ICML), 2025

Abs PDF

Multi-solid systems are foundational to a wide range of real-world applications, yet modeling their complex interactions remains challenging. Existing deep learning methods predominantly rely on implicit modeling, where the factors influencing solid deformation are not explicitly represented but are instead indirectly learned. However, as the number of solids increases, these methods struggle to accurately capture intricate physical interactions. In this paper, we introduce a novel explicit modeling paradigm that incorporates factors influencing solid deformation through structured modules. Specifically, we present Unisoma, a unified and flexible Transformer-based model capable of handling variable numbers of solids. Unisoma directly captures physical interactions using contact modules and adaptive interaction allocation mechanism, and learns the deformation through a triplet relationship. Compared to implicit modeling techniques, explicit modeling is more well-suited for multi-solid systems with diverse coupling patterns, as it enables detailed treatment of each solid while preventing information blending and confusion. Experimentally, Unisoma achieves consistent state-of-the-art performance across seven well-established datasets and two complex multi-solid tasks.
KDD

LaDEEP: A Deep Learning-based Surrogate Model for Large Deformation of Elastic-Plastic Solids

Shilong Tao, Zhe Feng, Haonan Sun, Zhanxing Zhu, and Yunhuai Liu

In 31st SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) - Applied Data Science Track, 2025

Abs PDF

Scientific computing for large deformation of elastic-plastic solids is critical for numerous real-world applications. Classical numerical solvers rely primarily on local discrete linear approximation and are constrained by an inherent trade-off between accuracy and efficiency. Recently, deep learning models have achieved impressive progress in solving the continuum mechanism. While previous models have explored various architectures and constructed coefficient-solution mappings, they are designed for general instances without considering specific problem properties and hard to accurately handle with complex elastic-plastic solids involving contact, loading and unloading. In this work, we take stretch bending, a popular metal fabrication technique, as our case study and introduce LaDEEP, a deep learning-based surrogate model for \textbfLarge \textbfDeformation of \textbfElastic-\textbfPlastic Solids. We encode the partitioned regions of the involved slender solids into a token sequence to maintain their essential order property. To characterize the physical process of the solid deformation, a two-stage Transformer-based module is designed to predict the deformation with the sequence of tokens as input. Empirically, LaDEEP achieves five magnitudes faster speed than finite element methods with a comparable accuracy, and gains 20.47% relative improvement on average compared to other deep learning baselines. We have also deployed our model into a real-world industrial production system, and it has shown remarkable performance in both accuracy and efficiency.
ICLR

A Solvable Attention for Neural Scaling Laws

Bochen Lyu, Di Wang, and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2025

Abs PDF

Transformers and many other deep learning models are empirically shown to predictably enhance their performance as a power law in training time, model size, or the number of training data points, which is termed as the neural scaling law. This paper studies this intriguing phenomenon particularly for the transformer architecture in theoretical setups. Specifically, we propose a framework for self-attention, the underpinning block of transformer, to learn in an in-context manner, where the corresponding learning dynamics is modeled as a non-linear ordinary differential equation (ODE) system. Furthermore, we establish a procedure to derive a tractable solution for this ODE system by reformulating it as a Riccati equation, which allows us to precisely characterize neural scaling laws for self-attention with training time, model size, data size, and the optimal compute. In addition, we reveal that the self-attention shares similar neural scaling laws with several other architectures when the context sequence length of the in-context learning is fixed, otherwise it would exhibit a different scaling law of training time.
ICLR

DyCAST: Learning Dynamic Causal Structure from Time Series

Yue Cheng, Bochen Lyu, Weiwei Xing, and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2025

Abs PDF Code

Understanding the dynamics of causal structures is crucial for uncovering the underlying processes in time series data. Previous approaches rely on static assumptions, where contemporaneous and time-lagged dependencies are assumed to have invariant topological structures. However, these models fail to capture the evolving causal relationship between variables when the underlying process exhibits such dynamics. To address this limitation, we propose DyCAST, a novel framework designed to learn dynamic causal structures in time series using Neural Ordinary Differential Equations (Neural ODEs). The key innovation lies in modeling the temporal dynamics of the contemporaneous structure, drawing inspiration from recent advances in Neural ODEs on constrained manifolds. We reformulate the task of learning causal structures at each time step as solving the solution trajectory of a Neural ODE on the directed acyclic graph (DAG) manifold. To accommodate high-dimensional causal structures, we extend DyCAST by learning the temporal dynamics of the hidden state for contemporaneous causal structure. Experiments on both synthetic and real-world datasets demonstrate that DyCAST achieves superior or comparable performance compared to existing causal discovery models.
AAAI

Effects of Momentum in Implicit Bias of Gradient Flow for Diagonal Linear Networks

Bochen Lyu, He Wang, Zheng Wang, and Zhanxing Zhu

In The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025

Abs PDF

This paper targets on the regularization effect of momentum-based methods in regression settings and analyzes the popular diagonal linear networks to precisely characterize the implicit bias of continuous versions of heavy-ball (HB) and Nesterov’s method of accelerated gradients (NAG). We show that, HB and NAG exhibit different implicit bias compared to GD for diagonal linear networks, which is different from the one for classic linear regression problem where momentum-based methods share the same implicit bias with GD. Specifically, the role of momentum in the implicit bias of GD is twofold: (a) HB and NAG induce extra initialization mitigation effects similar to SGD that are beneficial for generalization of sparse regression; (b) the implicit regularization effects of HB and NAG also depend on the initialization of gradients explicitly, which may not be benign for generalization. As a result, whether HB and NAG have better generalization properties than GD jointly depends on the aforementioned twofold effects determined by various parameters such as learning rate, momentum factor, and integral of gradients. Our findings highlight the potential beneficial role of momentum and can help understand its advantages in practice such as when it will lead to better generalization performance.
AILS

SynthFormer: Equivariant pharmacophore-based generation of synthesizable molecules for ligand-based drug design

Zygimantas Jocys, Zhanxing Zhu, Henriette M.G. Willems, and Katayoun Farrahi

Artificial Intelligence in the Life Sciences, 2025

Abs DOI PDF

Drug discovery is a complex, resource-intensive process requiring significant time and cost to bring new medicines to patients. Many generative models aim to accelerate drug discovery, but few produce synthetically accessible molecules. Conversely, synthesis-focused models do not leverage the 3D information crucial for effective drug design. We introduce SynthFormer, a novel machine learning model that generates fully synthesizable molecules, structured as synthetic trees, by introducing both 3D information and pharmacophores as input. SynthFormer features a 3D equivariant graph neural network to encode pharmacophores, followed by a Transformer-based synthesis-aware decoding mechanism for constructing synthetic trees as a sequence of tokens. This provides capabilities for designing active molecules based on pharmacophores, exploring the local synthesizable chemical space around hit molecules and optimizing their properties. We demonstrate its effectiveness through various challenging tasks, including designing active compounds for a range of proteins, performing hit expansion and optimizing molecular properties.
TCSVT

Unified Spatial-Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction

Ruochen Li, Tanqiu Qiao, Stamos Katsigiannis, Zhanxing Zhu, and Hubert P. H. Shum

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2025

DOI PDF

2024

NeurIPS

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

Qianli Shen, Yezhen Wang, Zhouhao Yang, Xiang Li, Haonan Wang, Yang Zhang, and 3 more authors

In Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS), 2024

PDF
AI Journal

Functional Relation Field: A Model-Agnostic Framework for Multivariate Time Series Forecasting

Ting Li, Bing Yu, Jianguo Li, and Zhanxing Zhu

Artificial Intelligence, 2024

Abs DOI PDF

In multivariate time series forecasting, the most popular strategy for modeling the relationship between multiple time series is the construction of graph, where each time series is represented as a node and related nodes are connected by edges. However, the relationship between multiple time series is typically complicated, e.g. the sum of outflows from upstream nodes may be equal to the inflows of downstream nodes. Such relations widely exist in many real-world scenarios for multivariate time series forecasting, yet are far from well studied. In these cases, graph might be insufficient for modeling the complex dependency between nodes. To this end, we explore a new framework to model the inter-node relationship in a more precise way based our proposed inductive bias, Functional Relation Field, where a group of functions parameterized by neural networks are learned to characterize the dependency between multiple time series. Essentially, these learned functions then form a “field”, i.e. a particular set of constraints, to regularize the training loss of the backbone prediction network and enforce the inference process to satisfy these constraints. Since our framework introduces the relationship bias in a data-driven manner, it is flexible and model-agnostic such that it can be applied to any existing multivariate time series prediction networks for boosting performance. The experiment is conducted on one toy dataset to show our approach can well recover the true constraint relationship between nodes. And various real-world datasets are also considered with different backbone prediction networks. Results show that the prediction error can be reduced remarkably with the aid of the proposed framework.
PNAS

Genome-wide single-cell and single-molecule footprinting of transcription factors with deaminase

Runsheng He, Wenyang Dong, Zhi Wang, Chen Xie, Long Gao, Wenping Ma, and 15 more authors

Proceedings of the National Academy of Sciences, 2024

Abs PDF

An individual’s somatic cells have essentially the same genome, but each cell type is determined by combinations of transcription factors (TFs) bound to each gene’s regulatory regions, controlling the transcription of DNA into RNA. Investigations of TFs have come from either “bottom–up” or “top–down” approaches. Bottom–up approaches start at the molecular level, including atomic resolution structures and single-molecule imaging of protein–DNA complexes. “Top–down” approaches start at the whole-organism or whole-cell level, including classic genetic studies and molecular biology. Understanding functional genomics requires a holistic approach to combine molecular-, cellular-, and tissue-level studies of TFs. Here, we report a technique that allowes genome-wide studies of TF binding on a single-molecule and single-cell basis. Decades of research have established that mammalian transcription factors (TFs) bind to each gene’s regulatory regions and cooperatively control tissue specificity, timing, and intensity of gene transcription. Mapping the combination of TF binding sites genome wide is critically important for understanding functional genomics. Here, we report a technique to measure TFs’ binding sites on the human genome with a near single-base resolution by footprinting with deaminase (FOODIE) on a single-molecule and single-cell basis. Single-molecule sequencing reads after enzymatic deamination allow detection of the TF binding fraction on a particular footprint and the binding cooperativity of any two adjacent TFs, which can be either positive or negative. As a newcomer of single-cell genomics, single-cell FOODIE enables the detection of cell-type-specific TF footprints in a pure cell population in a heterogeneous tissue, such as the brain. We found that genes carrying out a certain biological function together in a housing-keeping correlated gene module (CGM) or a tissues-specific CGM are coordinated by shared TFs in the gene’s promoters and enhancers, respectively. Scalable and cost-effective, FOODIE allows us to create an open FOODIE database for cell lines, with applicability to human tissues and clinical samples.

2023

ICML

MonoFlow: Rethinking Divergence GANs via the Perspective of Differential Equations

Mingxuan Yi, Zhanxing Zhu, and Song Liu

In International Conference for Machine Learning (ICML), 2023

PDF
NeurIPS

Neural Lad: A Neural Latent Dynamics Framework for Times Series Modeling

Ting Li, Jianguo Li, and Zhanxing Zhu

In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

PDF
NeurIPS

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Bochen Lyu and Zhanxing Zhu

In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023

Abs PDF

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD—by identifying a “potential” function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD. Furthermore, we emphasize a new implicit bias of SGD jointly induced by stochasticity and over-parameterization, which can reduce the dependence of the SGD’s solution on the initialization. Our findings regarding the implicit bias are different from that of a recently popular model, the diagonal linear network. We highlight that the induced bias of our rank-1 model is more consistent with standard linear network while the diagonal one is not. This suggests that the proposed rank-1 linear network might be a plausible proxy for standard linear net.
MLST

Stochastic Gradient Descent with Random Label Noises: Doubly Stochastic Models and Inference Stabilizer

Haoyi Xiong, Xuhong Li, Boyang Yu, Dongrui Wu, Zhanxing Zhu, and Dejing Dou

Machine Learning: Science and Technology, 2023

PDF
ACML

Patch-level neighborhood interpolation: A general and effective graph-based regularization strategy

Ke Sun, Bing Yu, Zhouchen Lin, and Zhanxing Zhu

In Asian Conference on Machine Learning (ACML), 2023

PDF

2022

ICLR

Fine-grained differentiable physics: a yarn-level model for fabrics

Deshan Gong, Zhanxing Zhu, Andrew J Bulpitt, and He Wang

In International Conference on Learning Representation (ICLR), 2022

PDF
ICLR

Implicit Bias of Adversarial Training for Deep Neural Networks

Bochen Lv and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2022

Abs PDF

We provide theoretical understandings of the implicit bias imposed by adversarial training for homogeneous deep neural networks without any explicit regularization. In particular, for deep linear networks adversarially trained by gradient descent on a linearly separable dataset, we prove that the direction of the product of weight matrices converges to the direction of the max-margin solution of the original dataset. Furthermore, we generalize this result to the case of adversarial training for non-linear homogeneous deep neural networks without the linear separability of the dataset. We show that, when the neural network is adversarially trained with l2 or l-infinity FGSM, FGM and PGD perturbations, the direction of the limit point of normalized parameters of the network along the trajectory of the gradient flow converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Our results theoretically justify the longstanding conjecture that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provides insights for designing new robust training strategies.
TKDD

Grod: Deep learning with gradients orthogonal decomposition for knowledge transfer, distillation, and adversarial training

Haoyi Xiong, Ruosi Wan, Jian Zhao, Zeyu Chen, Xingjian Li, Zhanxing Zhu, and 1 more author

ACM Transactions on Knowledge Discovery from Data, 2022

PDF

2021

TVCG

Spatio-Temporal Manifold Learning for Human Motions via Long-Horizon Modeling

He Wang, Edmond S. L. Ho, Hubert P. H. Shum, and Zhanxing Zhu

IEEE Transactions on Visualization and Computer Graphics, 2021

DOI PDF
TKDD

Sampling sparse representations with randomized measurement langevin dynamics

Kafeng Wang, Haoyi Xiong, Jiang Bian, Zhanxing Zhu, Qian Gao, Zhishan Guo, and 3 more authors

ACM Transactions on Knowledge Discovery from Data (TKDD), 2021

PDF
ICML

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, and Masashi Sugiyama

In International Conference for Machine Learning (ICML), 2021

PDF
ICLR

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Ke Sun, Zhanxing Zhu, and Zhouchen Lin

In International Conference on Learning Representation (ICLR), 2021

PDF
CVPR

Adversarial Invariant Learning

Nanyang Ye, Jingxuan Tang, Huayu Deng, Xiao-Yun Zhou, Qianxiao Li, Zhenguo Li, and 2 more authors

In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

PDF
TPAMI

Adaptive Progressive Continual Learning

Ju Xu, Jin Ma, Xuesong Gao, and Zhanxing Zhu

IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2021

PDF
TNNLS

An annealing mechanism for adversarial training acceleration

Nanyang Ye, Qianxiao Li, Xiao-Yun Zhou, and Zhanxing Zhu

IEEE Transactions on Neural Networks and Learning Systems, 2021

PDF
NeurIPS

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, and Jian Sun

Advances in Neural Information Processing Systems (NeurIPS), 2021

Awarded Abs PDF

Spotlight

In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works focus on studying behavior of effective learning rate "inequilibrium" state, i.e. assuming weight norm remains unchanged. However, their discussion on why this equilibrium can be reached is either absent or less convincing. Our work directly explores the cause of equilibrium, as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. We also show that the behavior of angular update in SMD can produce interesting effect to the optimization of neural network in practice.

2020

IJCNN

Learning to search efficient densenet with layer-wise pruning

Xuanyang Zhang, Hao Liu, Zhanxing Zhu, and Zenglin Xu

In 2020 International Joint Conference on Neural Networks (IJCNN), 2020

PDF
ECML

Neural control variates for Monte Carlo variance reduction

Ruosi Wan, Mingjun Zhong, Haoyi Xiong, and Zhanxing Zhu

In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part II, 2020

PDF
AAAI

Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labeled Nodes.

Ke Sun, Zhouchen Lin, and Zhanxing Zhu

In AAAI, 2020

PDF
AAAI

Efficient Neural Architecture Search via Proximal Iterations.

Quanming Yao, Ju Xu, Wei-Wei Tu, and Zhanxing Zhu

In AAAI, 2020

PDF
TOPS

Using generative adversarial networks to break and protect text captchas

Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, and 3 more authors

ACM Transactions on Privacy and Security (TOPS), 2020

PDF
ICML

On the Noisy Gradient Descent that Generalizes as SGD

Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, and Zhanxing Zhu

In International Conference for Machine Learning (ICML), 2020

PDF
AAAI

Amata: An Annealing Mechanism for Adversarial Training Acceleration

Nanyang Ye, Qianxiao Li, Xiao-Yun Zhou, and Zhanxing Zhu

In AAAI, 2020

PDF
ECAI

Simplifying Graph Attention Networks with Source-Target Separation

Hantao Guo, Rui Yan, Yansong Feng, Xuesong Gao, and Zhanxing Zhu

In ECAI, 2020

PDF
NeurIPS

Black-box certification with randomized smoothing: A functional optimization based framework

Dinghuai Zhang, Mao Ye, Chengyue Gong, Zhanxing Zhu, and Qiang Liu

In NeurIPS, 2020

PDF
ICML

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, and Jingdong Wang

In International Conference for Machine Learning (ICML), 2020

PDF
MICCAI

Automatic data augmentation for 3D medical image segmentation

Ju Xu, Mengzhang Li, and Zhanxing Zhu

In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, 2020

PDF
ACML

Towards understanding and improving the transferability of adversarial examples in deep neural networks

Lei Wu and Zhanxing Zhu

In Asian Conference on Machine Learning (ACML), 2020

PDF
ICLR

Neural Approximate Sufficient Statistics for Implicit Models

Yanzhi Chen, Dinghuai Zhang, Michael Gutmann, Aaron Courville, and Zhanxing Zhu

In International Conference on Learning Representation (ICLR), 2020

Awarded PDF

Spotlight
NeurIPS

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Guangda Ji and Zhanxing Zhu

In Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), 2020

PDF
ICML

On breaking deep generative model-based defenses and beyond

Yanzhi Chen, Renjie Xie, and Zhanxing Zhu

In International Conference on Machine Learning (ICML), 2020

PDF
AAAI

Spatial-temporal fusion graph neural networks for traffic flow forecasting

Mengzhang Li and Zhanxing Zhu

In AAAI Conference on Artificial Intelligence, 2020

PDF

2019

CVPR

Tangent-Normal Adversarial Regularization for Semi-supervised Learning

Bing Yu, Jingfeng Wu, and Zhanxing Zhu

In CVPR, 2019

PDF
AAAI

SpHMC: Spectral Hamiltonian Monte Carlo

Haoyi Xiong, Kafeng Wang, Jiang Bian, Zhanxing Zhu, Cheng-Zhong Xu, Zhishan Guo, and 1 more author

In AAAI 2019, 2019

PDF
Lancet

Novel subgroups of patients with adult-onset diabetes in Chinese and US populations

Xiantong Zou, Xianghai Zhou, Zhanxing Zhu, and Linong Ji

The Lancet Diabetes & Endocrinology, 2019

PDF
PRCV

Virtual adversarial training on graph convolutional networks in node classification

Ke Sun, Zhouchen Lin, Hantao Guo, and Zhanxing Zhu

In Pattern Recognition and Computer Vision: Second Chinese Conference, PRCV 2019, Xi’an, China, November 8–11, 2019, Proceedings, Part I 2, 2019

PDF
3D graph convolutional networks with temporal graphs: A spatial information free framework for traffic forecasting

Bing Yu, Mengzhang Li, Jiyong Zhang, and Zhanxing Zhu

arXiv preprint arXiv:1903.00919, 2019

PDF
ST-UNet: A spatio-temporal U-network for graph-structured time series modeling

Bing Yu, Haoteng Yin, and Zhanxing Zhu

arXiv preprint arXiv:1903.05631, 2019

PDF
NeurIPS

You only propagate once: Accelerating adversarial training via maximal principle

Dinghuai Zhang, Tianyuan Zhang, Lu, Zhanxing Zhu, and Bin Dong

In Advances in Neural Information Processing Systems (NeurIPS), 2019

Abs PDF

Deep learning achieves state-of-the-art results in many tasks in computer vision and natural language processing. However, recent works have shown that deep networks can be vulnerable to adversarial perturbations, which raised a serious robustness issue of deep networks. Adversarial training, typically formulated as a robust optimization problem, is an effective way of improving the robustness of deep networks. A major drawback of existing adversarial training algorithms is the computational overhead of the generation of adversarial examples, typically far greater than that of the network training. This leads to the unbearable overall computational cost of adversarial training. In this paper, we show that adversarial training can be cast as a discrete time differential game. Through analyzing the Pontryagin’s Maximum Principle (PMP) of the problem, we observe that the adversary update is only coupled with the parameters of the first layer of the network. This inspires us to restrict most of the forward and back propagation within the first layer of the network during adversary updates. This effectively reduces the total number of full forward and backward propagation to only one for each group of adversary updates. Therefore, we refer to this algorithm YOPO (You Only Propagate Once). Numerical experiments demonstrate that YOPO can achieve comparable defense accuracy with approximately 1/5 1/4 GPU time of the projected gradient descent (PGD) algorithm.
ICML

Interpreting Adversarially Trained Convolutional Neural Networks

Tianyuan Zhang and Zhanxing Zhu

In International Conference on Machine Learning (ICML), 2019

PDF
ICML

The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma

In International Conference on Machine Learning (ICML), 2019

Abs PDF

Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).
NLPCC

How question generation can help question answering over knowledge base

Sen Hu, Lei Zou, and Zhanxing Zhu

In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part I 8, 2019

PDF
ICDM

Towards making deep transfer learning never hurt

Ruosi Wan, Haoyi Xiong, Xingjian Li, Zhanxing Zhu, and Jun Huan

In 2019 IEEE International Conference on Data Mining (ICDM), 2019

PDF

2018

IJCAI

Spatio-temporal graph convolutional neural network: A deep learning framework for traffic forecasting

Bing Yu, Haoteng Yin, and Zhanxing Zhu

In International Joint Conference of Artificial Intelligence (IJCAI), 2018

Awarded Abs DOI PDF

2019-2024 IJCAI Most Cited Paper

Timely accurate traffic forecast is crucial for urban traffic control and guidance. Due to the high nonlinearity and complexity of traffic flow, traditional methods cannot satisfy the requirements of mid-and-long term prediction tasks and often neglect spatial and temporal dependencies. In this paper, we propose a novel deep learning framework, Spatio-Temporal Graph Convolutional Networks (STGCN), to tackle the time series prediction problem in traffic domain. Instead of applying regular convolutional and recurrent units, we formulate the problem on graphs and build the model with complete convolutional structures, which enable much faster training speed with fewer parameters. Experiments show that our model STGCN effectively captures comprehensive spatio-temporal correlations through modeling multi-scale traffic networks and consistently outperforms state-of-the-art baselines on various real-world traffic datasets.
NIPS

Reinforced continual learning

Ju Xu and Zhanxing Zhu

In Advances in Neural Information Processing Systems (NeurIPS), 2018

PDF
CCS

Yet another text captcha solver: A generative adversarial network based approach

Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, and 2 more authors

In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (ACM CCS), 2018

Awarded PDF

Best Paper Finalist
ISBI

SIPID: A deep learning framework for sinogram interpolation and image denoising in low-dose CT reconstruction

Huizhuo Yuan, Jinzhu Jia, and Zhanxing Zhu

In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI), 2018

PDF
IJCAI

Stochastic Fractional Hamiltonian Monte Carlo.

Nanyang Ye and Zhanxing Zhu

In IJCAI, 2018

PDF
NIPS

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

Rui Luo, Jianhong Wang, Yaodong Yang, Jun Wang, and Zhanxing Zhu

Advances in Neural Information Processing Systems (NIPS), 2018

PDF
NIPS

Bayesian adversarial learning

Nanyang Ye and Zhanxing Zhu

Advances in Neural Information Processing Systems (NIPS), 2018

PDF

2017

ICML

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Lei Wu and Zhanxing Zhu

In The 34th International Conference on Machine Learning (ICML 2017): Theoretical Machine Learning Workshop, 2017

PDF
NIPS

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Nanyang Ye, Zhanxing Zhu, and Rafal K Mantiuk

In 31st Neural Information Processing Systems (NIPS), 2017

PDF
ACL

Learning with noise: Enhance distantly supervised relation extraction with dynamic transition matrix

Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, Rui Yan, and 1 more author

In ACL, 2017

PDF

2016

AAAI

Stochastic parallel block coordinate descent for large-scale saddle point problems

Zhanxing Zhu and Amos Storkey

In Proceedings of the AAAI Conference on Artificial Intelligence, 2016

PDF

2015

ECML

Adaptive stochastic primal-dual coordinate descent for separable saddle point problems

Zhanxing Zhu and Amos J Storkey

In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15, 2015

PDF
ECML

Aggregation under bias: Rényi divergence aggregation and its implementation via machine learning markets

Amos J Storkey, Zhanxing Zhu, and Jinli Hu

In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15, 2015

PDF
NIPS

Covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling

Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, and Amos J Storkey

In NIPS, 2015

PDF

2014

ICML

A continuum from mixtures to products: Aggregation under bias

A Storkey, Zhanxing Zhu, and Jinli Hu

In ICML workshop on divergence methods for probabilistic inference, 2014

2013

NPL

Supervised distance preserving projections

Zhanxing Zhu, Timo Similä, and Francesco Corona

Neural processing letters, 2013

PDF
Multiplicative updates for learning with stochastic matrices

Zhanxing Zhu, Zhirong Yang, and Erkki Oja

In Image Analysis: 18th Scandinavian Conference, SCIA 2013, Espoo, Finland, June 17-20, 2013. Proceedings 18, 2013

PDF

2011

IFAC

Local linear regression for soft-sensor design with application to an industrial deethanizer

Zhanxing Zhu, Francesco Corona, Amaury Lendasse, Roberto Baratti, and Jose A Romagnoli

IFAC Proceedings Volumes, 2011

PDF

2010

Automatic rank determination in projective nonnegative matrix factorization

Zhirong Yang, Zhanxing Zhu, and Erkki Oja

In International Conference on Latent Variable Analysis and Signal Separation, 2010

PDF