数据预处理|第1天
Steps

Datasets
| Country | Age | Salary | Purchased |
|---|---|---|---|
| France | 44 | 72000 | No |
| Spain | 27 | 48000 | Yes |
| Germany | 30 | 54000 | No |
| Spain | 38 | 61000 | No |
| Germany | 40 | Yes | |
| France | 35 | 58000 | Yes |
| Spain | 52000 | No | |
| France | 48 | 79000 | Yes |
| Germany | 50 | 83000 | No |
| France | 37 | 67000 | Yes |
Code
第1步:导入库
1 | import numpy as np |
第2步:导入数据集
1 | dataset = pd.read_csv('Data.csv')#读取csv文件 |
第3步:处理丢失数据
1 | from sklearn.preprocessing import Imputer |
第4步:解析分类数据
1 | from sklearn.preprocessing import LabelEncoder,OneHotEncoder |
创建虚拟变量
1 | onehotencoder=OneHotEncoder(categorical_features = [0]) |
第5步:拆分数据集为训练集合和测试集合
1 | from sklearn.model_selection import train_test_split |
第6步:特征量化
1 | from sklearn.preprocessing import StandardScaler |