AI 辅助直觉（DeepMind）

笔记本内容

发表在 Nature 的封面文章AI 辅助直觉（AI-guided intuition） 希望通过机器学习辅助发现纯数学的猜想和定理。

Dave 用大白话概括一下：

对于两个数学对象 X(z), Y(z) ，如果 机器学习 能够学到 f 使得 f(X(z)) 约等于 Y(z) ，说明X与Y之间有一定关系。
其中利用 归因技术（Attribution Techniques） 来辅助发现哪些特征更加重要。（归因技术：计算输入关于输出的梯度，梯度大说明该输入（特征）重要，梯度小说明该输入（特征）不重要）

流程 #

提出猜想（数学家）
生成数据（采样）（AI）
训练监督学习模型 （AI）
发现模式，归因技术减小空间 （AI）
猜想候选（数学家）
证明理论（数学家）

论文中的实验 #

通过节（Knot）的几何不变量预测它的 signature （https://knotinfo.math.indiana.edu/descriptions/signature.html）

该关系是在之前的研究中没有发现过的

代码 #

# 安装必要的包
from IPython.display import clear_output

!pip install dm-haiku
!pip install optax
clear_output()

# 导入各种包
import tempfile

import haiku as hk
import jax
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np
import optax
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt

# 下载数据集
!featurize -t [token] dataset download ea22c102-a9c5-4ce5-aa31-576bd52ff7c1

100%|█████████████████████████████████████| 18.3M/18.3M [00:00<00:00, 38.0MiB/s]
🍬  下载完成，正在解压...
🏁  数据集已经成功添加

# 对数据进行载入以及前处理
# 对于一个节 k， X(k) 是一个由这些量组成的向量，在这种情况下，这些量是这个节的几何不变量

full_df = pd.read_csv('/home/featurize/data/knot_theory_invariants.csv')
display_name_from_short_name = {
    'chern_simons': 'Chern-Simons',
    'cusp_volume': 'Cusp volume',
    'hyperbolic_adjoint_torsion_degree': 'Adjoint Torsion Degree',
    'hyperbolic_torsion_degree': 'Torsion Degree',
    'injectivity_radius': 'Injectivity radius',
    'longitudinal_translation': 'Longitudinal translation',
    'meridinal_translation_imag': 'Re(Meridional translation)',
    'meridinal_translation_real': 'Im(Meridional translation)',
    'short_geodesic_imag_part': 'Im(Short geodesic)',
    'short_geodesic_real_part': 'Re(Short geodesic)',
    'Symmetry_0': 'Symmetry: $0$',
    'Symmetry_D3': 'Symmetry: $D_3$',
    'Symmetry_D4': 'Symmetry: $D_4$',
    'Symmetry_D6': 'Symmetry: $D_6$',
    'Symmetry_D8': 'Symmetry: $D_8$',
    'Symmetry_Z/2 + Z/2': 'Symmetry: $\\frac{Z}{2} + \\frac{Z}{2}$',
    'volume': 'Volume',
}
column_names = list(display_name_from_short_name)
target = 'signature'

# 分割训练数据集和测试数据集
random_seed = 2 # @param {type: "integer"}
random_state = np.random.RandomState(random_seed)
train_df, validation_and_test_df = train_test_split(
    full_df, random_state=random_state)
validation_df, test_df = train_test_split(
    validation_and_test_df, test_size=.5, random_state=random_state)

train_df.head(2)

	Unnamed: 0	hyperbolic_adjoint_torsion_degree	hyperbolic_torsion_degree	short_geodesic_real_part	short_geodesic_imag_part	injectivity_radius	chern_simons	cusp_volume	longitudinal_translation	meridinal_translation_imag	meridinal_translation_real	volume	Symmetry_0	Symmetry_D3	Symmetry_D4	Symmetry_D6	Symmetry_D8	Symmetry_Z/2 + Z/2	signature
70746	73193	0	10	1.015512	-2.760601	0.507756	0.090530	12.226322	10.685555	1.144192	-0.519157	11.393225	0.0	0.0	0.0	0.0	0.0	1.0	-2
240827	249190	0	14	0.827289	-3.013258	0.413645	0.232453	13.800773	10.453156	1.320249	-0.158522	12.742782	0.0	0.0	0.0	0.0	0.0	1.0	0

print('训练集样本数量：', len(train_df))
print('测试集样本数量：', len(test_df))

训练集样本数量： 182809
测试集样本数量： 30469

使用 AutoML 工具 AutoGluon 来快速验证 #

参考李沐大神的视频

!pip install autogluon
clear_output()

from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label=target).fit(
    train_df[column_names + [target]],
    tuning_data=validation_df[column_names + [target]],
    time_limit=60
)

No path specified. Models will be saved in: "AutogluonModels/ag-20220209_052125/"
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to "AutogluonModels/ag-20220209_052125/"
AutoGluon Version:  0.3.1
Train Data Rows:    182809
Train Data Columns: 17
Tuning Data Rows:    30468
Tuning Data Columns: 17
Preprocessing data ...
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
	First 10 (of 14) unique label values:  [-2, 0, 2, -8, 4, -4, -6, 8, 6, 10]
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type argument in fit() (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 12 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.9999452980980149
Train Data Class Count: 12
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    25132.68 MB
	Train Data (Original)  Memory Usage: 29.0 MB (0.1% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 6 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 15 | ['chern_simons', 'cusp_volume', 'injectivity_radius', 'longitudinal_translation', 'meridinal_translation_imag', ...]
		('int', [])   :  2 | ['hyperbolic_adjoint_torsion_degree', 'hyperbolic_torsion_degree']
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])     : 9 | ['chern_simons', 'cusp_volume', 'injectivity_radius', 'longitudinal_translation', 'meridinal_translation_imag', ...]
		('int', [])       : 2 | ['hyperbolic_adjoint_torsion_degree', 'hyperbolic_torsion_degree']
		('int', ['bool']) : 6 | ['Symmetry_0', 'Symmetry_D3', 'Symmetry_D4', 'Symmetry_D6', 'Symmetry_D8', ...]
	0.6s = Fit runtime
	17 features in original data used to generate 17 features in processed data.
	Train Data (Processed) Memory Usage: 20.05 MB (0.1% of available memory)
Data preprocessing and feature engineering runtime = 0.79s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric argument of fit()
Fitting 13 L1 models ...
Fitting model: KNeighborsUnif ... Training model for up to 59.21s of the 59.2s of remaining time.
	0.9354	 = Validation score   (accuracy)
	16.84s	 = Training   runtime
	1.03s	 = Validation runtime
Fitting model: KNeighborsDist ... Training model for up to 41.24s of the 41.23s of remaining time.
	0.9483	 = Validation score   (accuracy)
	16.64s	 = Training   runtime
	0.83s	 = Validation runtime
Fitting model: NeuralNetFastAI ... Training model for up to 23.7s of the 23.7s of remaining time.
	Ran out of time, stopping training early. (Stopping on epoch 0)
	0.9095	 = Validation score   (accuracy)
	31.75s	 = Training   runtime
	0.5s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.21s of the -11.39s of remaining time.
	0.9624	 = Validation score   (accuracy)
	4.5s	 = Training   runtime
	0.01s	 = Validation runtime
AutoGluon training complete, total runtime = 76.0s ...
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220209_052125/")

leaderboard = predictor.leaderboard(test_df, silent=True)
feature_importance = predictor.feature_importance(test_df, silent=True)

!sudo apt-get install -y graphviz graphviz-dev
!pip install pygraphviz
clear_output()

from PIL import Image
Image.open(predictor.plot_ensemble_model())

选择出来的 Ensemble 模型 #

KNeighborsDist (Accuracy: 0.9483)
NeurallNetFastAI (Accuracy: 0.9485)
LightGBMXT (Accuracy: 0.8522 应该是因为时间关系没有充分训练)

最终 Accuracy：0.9689

# 训练过的模型的详细信息
leaderboard

	model	score_test	score_val	pred_time_test	pred_time_val	fit_time	pred_time_test_marginal	pred_time_val_marginal	fit_time_marginal	stack_level	can_infer	fit_order
0	WeightedEnsemble_L2	0.963996	0.962418	1.488950	1.331209	52.897020	0.007039	0.005616	4.500332	2	True	4
1	KNeighborsDist	0.947488	0.948272	1.049718	0.826673	16.642866	1.049718	0.826673	16.642866	1	True	2
2	KNeighborsUnif	0.934327	0.935373	1.055187	1.031045	16.843561	1.055187	1.031045	16.843561	1	True	1
3	NeuralNetFastAI	0.911287	0.909509	0.432193	0.498919	31.753822	0.432193	0.498919	31.753822	1	True	3

# 特征的重要程度
plt.figure(figsize=(16,8))
sns.barplot(x=feature_importance.importance.values, y=feature_importance.importance.index);

发现特征 #

可以从上图看到最重要的三个特征

通过 AI 辅助的方法，作者发现了这些几何不变量与 Signature 相关，提出了假设：

并且进行了证明(由于 Dave 并非数学专业，就不详述了，对数学证明感兴趣的同学可以戳下方链接)：

THE SIGNATURE AND CUSP GEOMETRY OF HYPERBOLIC KNOTS

我个人对这些应用于不同领域的机器学习方法很感兴趣，所以把 DeepMind 的研究搬运过来进行翻译，希望能够帮助到同样感兴趣的同学。

笔记本

AI 辅助直觉 （DeepMind）