Basic Text Feature Creation in Python
rpi.analyticsdojo.com
27. Basic Text Feature Creation in Python#
!wget https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/train.csv
!wget https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/test.csv
--2019-03-11 14:58:22-- https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61194 (60K) [text/plain]
Saving to: ‘train.csv.1’
train.csv.1 100%[===================>] 59.76K --.-KB/s in 0.03s
2019-03-11 14:58:23 (2.32 MB/s) - ‘train.csv.1’ saved [61194/61194]
--2019-03-11 14:58:23-- https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/test.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28629 (28K) [text/plain]
Saving to: ‘test.csv.1’
test.csv.1 100%[===================>] 27.96K --.-KB/s in 0.01s
2019-03-11 14:58:24 (2.27 MB/s) - ‘test.csv.1’ saved [28629/28629]
import numpy as np
import pandas as pd
import pandas as pd
train= pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
#Print to standard output, and see the results in the "log" section below after running your script
train.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
#Print to standard output, and see the results in the "log" section below after running your script
train.describe()
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
train.dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
#Let's look at the age field. We can see "NaN" (which indicates missing values).s
train["Age"]
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
5 NaN
6 54.0
7 2.0
8 27.0
9 14.0
10 4.0
11 58.0
12 20.0
13 39.0
14 14.0
15 55.0
16 2.0
17 NaN
18 31.0
19 NaN
20 35.0
21 34.0
22 15.0
23 28.0
24 8.0
25 38.0
26 NaN
27 19.0
28 NaN
29 NaN
...
861 21.0
862 48.0
863 NaN
864 24.0
865 42.0
866 27.0
867 31.0
868 NaN
869 4.0
870 26.0
871 47.0
872 33.0
873 47.0
874 28.0
875 15.0
876 20.0
877 19.0
878 NaN
879 56.0
880 25.0
881 33.0
882 22.0
883 28.0
884 25.0
885 39.0
886 27.0
887 19.0
888 NaN
889 26.0
890 32.0
Name: Age, Length: 891, dtype: float64
#Now let's recode.
medianAge=train["Age"].median()
print ("The Median age is:", medianAge, " years old.")
train["Age"] = train["Age"].fillna(medianAge)
#Option 2 all in one shot!
train["Age"] = train["Age"].fillna(train["Age"].median())
train["Age"]
The Median age is: 28.0 years old.
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
5 28.0
6 54.0
7 2.0
8 27.0
9 14.0
10 4.0
11 58.0
12 20.0
13 39.0
14 14.0
15 55.0
16 2.0
17 28.0
18 31.0
19 28.0
20 35.0
21 34.0
22 15.0
23 28.0
24 8.0
25 38.0
26 28.0
27 19.0
28 28.0
29 28.0
...
861 21.0
862 48.0
863 28.0
864 24.0
865 42.0
866 27.0
867 31.0
868 28.0
869 4.0
870 26.0
871 47.0
872 33.0
873 47.0
874 28.0
875 15.0
876 20.0
877 19.0
878 28.0
879 56.0
880 25.0
881 33.0
882 22.0
883 28.0
884 25.0
885 39.0
886 27.0
887 19.0
888 28.0
889 26.0
890 32.0
Name: Age, Length: 891, dtype: float64
#For Recoding Data, we can use what we know of selecting rows and columns
train["Embarked"] = train["Embarked"].fillna("S")
train.loc[train["Embarked"] == "S", "EmbarkedRecode"] = 0
train.loc[train["Embarked"] == "C", "EmbarkedRecode"] = 1
train.loc[train["Embarked"] == "Q", "EmbarkedRecode"] = 2
# We can also use something called a lambda function
# You can read more about the lambda function here.
#http://www.python-course.eu/lambda.php
gender_fn = lambda x: 0 if x == 'male' else 1
train['Gender'] = train['Sex'].map(gender_fn)
#or we can do in one shot
train['NameLength'] = train['Name'].map(lambda x: len(x))
train['Age2'] = train['Age'].map(lambda x: x*x)
train
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | EmbarkedRecode | Gender | NameLength | Age2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | 0.0 | 0 | 23 | 484.0 |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | 1.0 | 1 | 51 | 1444.0 |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | 0.0 | 1 | 22 | 676.0 |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | 0.0 | 1 | 44 | 1225.0 |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | 0.0 | 0 | 24 | 1225.0 |
5 | 6 | 0 | 3 | Moran, Mr. James | male | 28.0 | 0 | 0 | 330877 | 8.4583 | NaN | Q | 2.0 | 0 | 16 | 784.0 |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S | 0.0 | 0 | 23 | 2916.0 |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S | 0.0 | 0 | 30 | 4.0 |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S | 0.0 | 1 | 49 | 729.0 |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C | 1.0 | 1 | 35 | 196.0 |
10 | 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4.0 | 1 | 1 | PP 9549 | 16.7000 | G6 | S | 0.0 | 1 | 31 | 16.0 |
11 | 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58.0 | 0 | 0 | 113783 | 26.5500 | C103 | S | 0.0 | 1 | 24 | 3364.0 |
12 | 13 | 0 | 3 | Saundercock, Mr. William Henry | male | 20.0 | 0 | 0 | A/5. 2151 | 8.0500 | NaN | S | 0.0 | 0 | 30 | 400.0 |
13 | 14 | 0 | 3 | Andersson, Mr. Anders Johan | male | 39.0 | 1 | 5 | 347082 | 31.2750 | NaN | S | 0.0 | 0 | 27 | 1521.0 |
14 | 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | female | 14.0 | 0 | 0 | 350406 | 7.8542 | NaN | S | 0.0 | 1 | 36 | 196.0 |
15 | 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55.0 | 0 | 0 | 248706 | 16.0000 | NaN | S | 0.0 | 1 | 32 | 3025.0 |
16 | 17 | 0 | 3 | Rice, Master. Eugene | male | 2.0 | 4 | 1 | 382652 | 29.1250 | NaN | Q | 2.0 | 0 | 20 | 4.0 |
17 | 18 | 1 | 2 | Williams, Mr. Charles Eugene | male | 28.0 | 0 | 0 | 244373 | 13.0000 | NaN | S | 0.0 | 0 | 28 | 784.0 |
18 | 19 | 0 | 3 | Vander Planke, Mrs. Julius (Emelia Maria Vande... | female | 31.0 | 1 | 0 | 345763 | 18.0000 | NaN | S | 0.0 | 1 | 55 | 961.0 |
19 | 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | 28.0 | 0 | 0 | 2649 | 7.2250 | NaN | C | 1.0 | 1 | 23 | 784.0 |
20 | 21 | 0 | 2 | Fynney, Mr. Joseph J | male | 35.0 | 0 | 0 | 239865 | 26.0000 | NaN | S | 0.0 | 0 | 20 | 1225.0 |
21 | 22 | 1 | 2 | Beesley, Mr. Lawrence | male | 34.0 | 0 | 0 | 248698 | 13.0000 | D56 | S | 0.0 | 0 | 21 | 1156.0 |
22 | 23 | 1 | 3 | McGowan, Miss. Anna "Annie" | female | 15.0 | 0 | 0 | 330923 | 8.0292 | NaN | Q | 2.0 | 1 | 27 | 225.0 |
23 | 24 | 1 | 1 | Sloper, Mr. William Thompson | male | 28.0 | 0 | 0 | 113788 | 35.5000 | A6 | S | 0.0 | 0 | 28 | 784.0 |
24 | 25 | 0 | 3 | Palsson, Miss. Torborg Danira | female | 8.0 | 3 | 1 | 349909 | 21.0750 | NaN | S | 0.0 | 1 | 29 | 64.0 |
25 | 26 | 1 | 3 | Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... | female | 38.0 | 1 | 5 | 347077 | 31.3875 | NaN | S | 0.0 | 1 | 57 | 1444.0 |
26 | 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | 28.0 | 0 | 0 | 2631 | 7.2250 | NaN | C | 1.0 | 0 | 23 | 784.0 |
27 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | 0.0 | 0 | 30 | 361.0 |
28 | 29 | 1 | 3 | O'Dwyer, Miss. Ellen "Nellie" | female | 28.0 | 0 | 0 | 330959 | 7.8792 | NaN | Q | 2.0 | 1 | 29 | 784.0 |
29 | 30 | 0 | 3 | Todoroff, Mr. Lalio | male | 28.0 | 0 | 0 | 349216 | 7.8958 | NaN | S | 0.0 | 0 | 19 | 784.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
861 | 862 | 0 | 2 | Giles, Mr. Frederick Edward | male | 21.0 | 1 | 0 | 28134 | 11.5000 | NaN | S | 0.0 | 0 | 27 | 441.0 |
862 | 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Ba... | female | 48.0 | 0 | 0 | 17466 | 25.9292 | D17 | S | 0.0 | 1 | 51 | 2304.0 |
863 | 864 | 0 | 3 | Sage, Miss. Dorothy Edith "Dolly" | female | 28.0 | 8 | 2 | CA. 2343 | 69.5500 | NaN | S | 0.0 | 1 | 33 | 784.0 |
864 | 865 | 0 | 2 | Gill, Mr. John William | male | 24.0 | 0 | 0 | 233866 | 13.0000 | NaN | S | 0.0 | 0 | 22 | 576.0 |
865 | 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42.0 | 0 | 0 | 236852 | 13.0000 | NaN | S | 0.0 | 1 | 24 | 1764.0 |
866 | 867 | 1 | 2 | Duran y More, Miss. Asuncion | female | 27.0 | 1 | 0 | SC/PARIS 2149 | 13.8583 | NaN | C | 1.0 | 1 | 28 | 729.0 |
867 | 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | male | 31.0 | 0 | 0 | PC 17590 | 50.4958 | A24 | S | 0.0 | 0 | 36 | 961.0 |
868 | 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | 28.0 | 0 | 0 | 345777 | 9.5000 | NaN | S | 0.0 | 0 | 27 | 784.0 |
869 | 870 | 1 | 3 | Johnson, Master. Harold Theodor | male | 4.0 | 1 | 1 | 347742 | 11.1333 | NaN | S | 0.0 | 0 | 31 | 16.0 |
870 | 871 | 0 | 3 | Balkic, Mr. Cerin | male | 26.0 | 0 | 0 | 349248 | 7.8958 | NaN | S | 0.0 | 0 | 17 | 676.0 |
871 | 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47.0 | 1 | 1 | 11751 | 52.5542 | D35 | S | 0.0 | 1 | 48 | 2209.0 |
872 | 873 | 0 | 1 | Carlsson, Mr. Frans Olof | male | 33.0 | 0 | 0 | 695 | 5.0000 | B51 B53 B55 | S | 0.0 | 0 | 24 | 1089.0 |
873 | 874 | 0 | 3 | Vander Cruyssen, Mr. Victor | male | 47.0 | 0 | 0 | 345765 | 9.0000 | NaN | S | 0.0 | 0 | 27 | 2209.0 |
874 | 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C | 1.0 | 1 | 37 | 784.0 |
875 | 876 | 1 | 3 | Najib, Miss. Adele Kiamie "Jane" | female | 15.0 | 0 | 0 | 2667 | 7.2250 | NaN | C | 1.0 | 1 | 32 | 225.0 |
876 | 877 | 0 | 3 | Gustafsson, Mr. Alfred Ossian | male | 20.0 | 0 | 0 | 7534 | 9.8458 | NaN | S | 0.0 | 0 | 29 | 400.0 |
877 | 878 | 0 | 3 | Petroff, Mr. Nedelio | male | 19.0 | 0 | 0 | 349212 | 7.8958 | NaN | S | 0.0 | 0 | 20 | 361.0 |
878 | 879 | 0 | 3 | Laleff, Mr. Kristo | male | 28.0 | 0 | 0 | 349217 | 7.8958 | NaN | S | 0.0 | 0 | 18 | 784.0 |
879 | 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | female | 56.0 | 0 | 1 | 11767 | 83.1583 | C50 | C | 1.0 | 1 | 45 | 3136.0 |
880 | 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | NaN | S | 0.0 | 1 | 44 | 625.0 |
881 | 882 | 0 | 3 | Markun, Mr. Johann | male | 33.0 | 0 | 0 | 349257 | 7.8958 | NaN | S | 0.0 | 0 | 18 | 1089.0 |
882 | 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22.0 | 0 | 0 | 7552 | 10.5167 | NaN | S | 0.0 | 1 | 28 | 484.0 |
883 | 884 | 0 | 2 | Banfield, Mr. Frederick James | male | 28.0 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | NaN | S | 0.0 | 0 | 29 | 784.0 |
884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.0 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S | 0.0 | 0 | 22 | 625.0 |
885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q | 2.0 | 1 | 36 | 1521.0 |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S | 0.0 | 0 | 21 | 729.0 |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S | 0.0 | 1 | 28 | 361.0 |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | 28.0 | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S | 0.0 | 1 | 40 | 784.0 |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C | 1.0 | 0 | 21 | 676.0 |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q | 2.0 | 0 | 19 | 1024.0 |
891 rows × 16 columns
#We can start to create little small functions that will find a string.
def has_title(name):
for s in ['Mr.', 'Mrs.', 'Miss.', 'Dr.', 'Sir.']:
if name.find(s) >= 0:
return True
return False
#Now we are using that separate function in another function.
title_fn = lambda x: 1 if has_title(x) else 0
#Finally, we call the function for name
train['Title'] = train['Name'].map(title_fn)
test['Title']= train['Name'].map(title_fn)
test
PassengerId | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | Title | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892 | 3 | Kelly, Mr. James | male | 34.5 | 0 | 0 | 330911 | 7.8292 | NaN | Q | 1 |
1 | 893 | 3 | Wilkes, Mrs. James (Ellen Needs) | female | 47.0 | 1 | 0 | 363272 | 7.0000 | NaN | S | 1 |
2 | 894 | 2 | Myles, Mr. Thomas Francis | male | 62.0 | 0 | 0 | 240276 | 9.6875 | NaN | Q | 1 |
3 | 895 | 3 | Wirz, Mr. Albert | male | 27.0 | 0 | 0 | 315154 | 8.6625 | NaN | S | 1 |
4 | 896 | 3 | Hirvonen, Mrs. Alexander (Helga E Lindqvist) | female | 22.0 | 1 | 1 | 3101298 | 12.2875 | NaN | S | 1 |
5 | 897 | 3 | Svensson, Mr. Johan Cervin | male | 14.0 | 0 | 0 | 7538 | 9.2250 | NaN | S | 1 |
6 | 898 | 3 | Connolly, Miss. Kate | female | 30.0 | 0 | 0 | 330972 | 7.6292 | NaN | Q | 1 |
7 | 899 | 2 | Caldwell, Mr. Albert Francis | male | 26.0 | 1 | 1 | 248738 | 29.0000 | NaN | S | 0 |
8 | 900 | 3 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 7.2292 | NaN | C | 1 |
9 | 901 | 3 | Davies, Mr. John Samuel | male | 21.0 | 2 | 0 | A/4 48871 | 24.1500 | NaN | S | 1 |
10 | 902 | 3 | Ilieff, Mr. Ylio | male | NaN | 0 | 0 | 349220 | 7.8958 | NaN | S | 1 |
11 | 903 | 1 | Jones, Mr. Charles Cresson | male | 46.0 | 0 | 0 | 694 | 26.0000 | NaN | S | 1 |
12 | 904 | 1 | Snyder, Mrs. John Pillsbury (Nelle Stevenson) | female | 23.0 | 1 | 0 | 21228 | 82.2667 | B45 | S | 1 |
13 | 905 | 2 | Howard, Mr. Benjamin | male | 63.0 | 1 | 0 | 24065 | 26.0000 | NaN | S | 1 |
14 | 906 | 1 | Chaffee, Mrs. Herbert Fuller (Carrie Constance... | female | 47.0 | 1 | 0 | W.E.P. 5734 | 61.1750 | E31 | S | 1 |
15 | 907 | 2 | del Carlo, Mrs. Sebastiano (Argenia Genovesi) | female | 24.0 | 1 | 0 | SC/PARIS 2167 | 27.7208 | NaN | C | 1 |
16 | 908 | 2 | Keane, Mr. Daniel | male | 35.0 | 0 | 0 | 233734 | 12.3500 | NaN | Q | 0 |
17 | 909 | 3 | Assaf, Mr. Gerios | male | 21.0 | 0 | 0 | 2692 | 7.2250 | NaN | C | 1 |
18 | 910 | 3 | Ilmakangas, Miss. Ida Livija | female | 27.0 | 1 | 0 | STON/O2. 3101270 | 7.9250 | NaN | S | 1 |
19 | 911 | 3 | Assaf Khalil, Mrs. Mariana (Miriam")" | female | 45.0 | 0 | 0 | 2696 | 7.2250 | NaN | C | 1 |
20 | 912 | 1 | Rothschild, Mr. Martin | male | 55.0 | 1 | 0 | PC 17603 | 59.4000 | NaN | C | 1 |
21 | 913 | 3 | Olsen, Master. Artur Karl | male | 9.0 | 0 | 1 | C 17368 | 3.1708 | NaN | S | 1 |
22 | 914 | 1 | Flegenheim, Mrs. Alfred (Antoinette) | female | NaN | 0 | 0 | PC 17598 | 31.6833 | NaN | S | 1 |
23 | 915 | 1 | Williams, Mr. Richard Norris II | male | 21.0 | 0 | 1 | PC 17597 | 61.3792 | NaN | C | 1 |
24 | 916 | 1 | Ryerson, Mrs. Arthur Larned (Emily Maria Borie) | female | 48.0 | 1 | 3 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C | 1 |
25 | 917 | 3 | Robins, Mr. Alexander A | male | 50.0 | 1 | 0 | A/5. 3337 | 14.5000 | NaN | S | 1 |
26 | 918 | 1 | Ostby, Miss. Helene Ragnhild | female | 22.0 | 0 | 1 | 113509 | 61.9792 | B36 | C | 1 |
27 | 919 | 3 | Daher, Mr. Shedid | male | 22.5 | 0 | 0 | 2698 | 7.2250 | NaN | C | 1 |
28 | 920 | 1 | Brady, Mr. John Bertram | male | 41.0 | 0 | 0 | 113054 | 30.5000 | A21 | S | 1 |
29 | 921 | 3 | Samaan, Mr. Elias | male | NaN | 2 | 0 | 2662 | 21.6792 | NaN | C | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
388 | 1280 | 3 | Canavan, Mr. Patrick | male | 21.0 | 0 | 0 | 364858 | 7.7500 | NaN | Q | 1 |
389 | 1281 | 3 | Palsson, Master. Paul Folke | male | 6.0 | 3 | 1 | 349909 | 21.0750 | NaN | S | 1 |
390 | 1282 | 1 | Payne, Mr. Vivian Ponsonby | male | 23.0 | 0 | 0 | 12749 | 93.5000 | B24 | S | 1 |
391 | 1283 | 1 | Lines, Mrs. Ernest H (Elizabeth Lindsey James) | female | 51.0 | 0 | 1 | PC 17592 | 39.4000 | D28 | S | 1 |
392 | 1284 | 3 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 20.2500 | NaN | S | 1 |
393 | 1285 | 2 | Gilbert, Mr. William | male | 47.0 | 0 | 0 | C.A. 30769 | 10.5000 | NaN | S | 1 |
394 | 1286 | 3 | Kink-Heilmann, Mr. Anton | male | 29.0 | 3 | 1 | 315153 | 22.0250 | NaN | S | 1 |
395 | 1287 | 1 | Smith, Mrs. Lucien Philip (Mary Eloise Hughes) | female | 18.0 | 1 | 0 | 13695 | 60.0000 | C31 | S | 1 |
396 | 1288 | 3 | Colbert, Mr. Patrick | male | 24.0 | 0 | 0 | 371109 | 7.2500 | NaN | Q | 1 |
397 | 1289 | 1 | Frolicher-Stehli, Mrs. Maxmillian (Margaretha ... | female | 48.0 | 1 | 1 | 13567 | 79.2000 | B41 | C | 1 |
398 | 1290 | 3 | Larsson-Rondberg, Mr. Edvard A | male | 22.0 | 0 | 0 | 347065 | 7.7750 | NaN | S | 1 |
399 | 1291 | 3 | Conlon, Mr. Thomas Henry | male | 31.0 | 0 | 0 | 21332 | 7.7333 | NaN | Q | 1 |
400 | 1292 | 1 | Bonnell, Miss. Caroline | female | 30.0 | 0 | 0 | 36928 | 164.8667 | C7 | S | 1 |
401 | 1293 | 2 | Gale, Mr. Harry | male | 38.0 | 1 | 0 | 28664 | 21.0000 | NaN | S | 1 |
402 | 1294 | 1 | Gibson, Miss. Dorothy Winifred | female | 22.0 | 0 | 1 | 112378 | 59.4000 | NaN | C | 1 |
403 | 1295 | 1 | Carrau, Mr. Jose Pedro | male | 17.0 | 0 | 0 | 113059 | 47.1000 | NaN | S | 1 |
404 | 1296 | 1 | Frauenthal, Mr. Isaac Gerald | male | 43.0 | 1 | 0 | 17765 | 27.7208 | D40 | C | 1 |
405 | 1297 | 2 | Nourney, Mr. Alfred (Baron von Drachstedt")" | male | 20.0 | 0 | 0 | SC/PARIS 2166 | 13.8625 | D38 | C | 1 |
406 | 1298 | 2 | Ware, Mr. William Jeffery | male | 23.0 | 1 | 0 | 28666 | 10.5000 | NaN | S | 1 |
407 | 1299 | 1 | Widener, Mr. George Dunton | male | 50.0 | 1 | 1 | 113503 | 211.5000 | C80 | C | 0 |
408 | 1300 | 3 | Riordan, Miss. Johanna Hannah"" | female | NaN | 0 | 0 | 334915 | 7.7208 | NaN | Q | 1 |
409 | 1301 | 3 | Peacock, Miss. Treasteall | female | 3.0 | 1 | 1 | SOTON/O.Q. 3101315 | 13.7750 | NaN | S | 1 |
410 | 1302 | 3 | Naughton, Miss. Hannah | female | NaN | 0 | 0 | 365237 | 7.7500 | NaN | Q | 1 |
411 | 1303 | 1 | Minahan, Mrs. William Edward (Lillian E Thorpe) | female | 37.0 | 1 | 0 | 19928 | 90.0000 | C78 | Q | 1 |
412 | 1304 | 3 | Henriksson, Miss. Jenny Lovisa | female | 28.0 | 0 | 0 | 347086 | 7.7750 | NaN | S | 1 |
413 | 1305 | 3 | Spector, Mr. Woolf | male | NaN | 0 | 0 | A.5. 3236 | 8.0500 | NaN | S | 1 |
414 | 1306 | 1 | Oliva y Ocana, Dona. Fermina | female | 39.0 | 0 | 0 | PC 17758 | 108.9000 | C105 | C | 1 |
415 | 1307 | 3 | Saether, Mr. Simon Sivertsen | male | 38.5 | 0 | 0 | SOTON/O.Q. 3101262 | 7.2500 | NaN | S | 1 |
416 | 1308 | 3 | Ware, Mr. Frederick | male | NaN | 0 | 0 | 359309 | 8.0500 | NaN | S | 1 |
417 | 1309 | 3 | Peter, Master. Michael J | male | NaN | 1 | 1 | 2668 | 22.3583 | NaN | C | 1 |
418 rows × 12 columns
#Writing to File
submission=pd.DataFrame(test.loc[:,['PassengerId','Survived']])
#Any files you save will be available in the output tab below
submission.to_csv('submission.csv', index=False)
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
"""Entry point for launching an IPython kernel.
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:1367: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
return self._getitem_tuple(key)