Overview

Dataset statistics

Number of variables7
Number of observations12000
Missing cells0
Missing cells (%)0.0%
Duplicate rows65
Duplicate rows (%)0.5%
Total size in memory656.4 KiB
Average record size in memory56.0 B

Variable types

Numeric6
Categorical1

Alerts

Dataset has 65 (0.5%) duplicate rowsDuplicates
fl is highly correlated with eHigh correlation
ia is highly correlated with s and 1 other fieldsHigh correlation
s is highly correlated with ia and 1 other fieldsHigh correlation
t is highly correlated with ia and 1 other fieldsHigh correlation
e is highly correlated with flHigh correlation
fl is highly correlated with eHigh correlation
ia is highly correlated with sHigh correlation
s is highly correlated with ia and 1 other fieldsHigh correlation
t is highly correlated with sHigh correlation
e is highly correlated with flHigh correlation
fl is highly correlated with eHigh correlation
e is highly correlated with flHigh correlation
fl is highly correlated with eHigh correlation
ia is highly correlated with s and 2 other fieldsHigh correlation
s is highly correlated with ia and 1 other fieldsHigh correlation
t is highly correlated with ia and 2 other fieldsHigh correlation
e is highly correlated with flHigh correlation
et is highly correlated with ia and 1 other fieldsHigh correlation

Reproduction

Analysis started2022-06-14 18:59:43.124639
Analysis finished2022-06-14 18:59:49.642948
Duration6.52 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

fl
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct11684
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4168125574
Minimum0.029390577
Maximum0.9492131
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:49.705416image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.029390577
5-th percentile0.1360574175
Q10.2850291325
median0.40379098
Q30.5011375575
95-th percentile0.786797873
Maximum0.9492131
Range0.919822523
Interquartile range (IQR)0.216108425

Descriptive statistics

Standard deviation0.1852367088
Coefficient of variation (CV)0.4444124954
Kurtosis-0.09893912846
Mean0.4168125574
Median Absolute Deviation (MAD)0.10433418
Skewness0.4737160247
Sum5001.750689
Variance0.03431263827
MonotonicityNot monotonic
2022-06-14T18:59:49.837129image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.843702388
 
0.7%
0.9028914640
 
0.3%
0.9480082413
 
0.1%
0.414916211
 
0.1%
0.414350411
 
0.1%
0.431191910
 
0.1%
0.4493705310
 
0.1%
0.4776078810
 
0.1%
0.403790988
 
0.1%
0.418399968
 
0.1%
Other values (11674)11791
98.3%
ValueCountFrequency (%)
0.0293905771
< 0.1%
0.0299559251
< 0.1%
0.0336045441
< 0.1%
0.0336064031
< 0.1%
0.0343010281
< 0.1%
0.0364542231
< 0.1%
0.0384882131
< 0.1%
0.039615331
< 0.1%
0.0406706371
< 0.1%
0.0412856641
< 0.1%
ValueCountFrequency (%)
0.94921312
 
< 0.1%
0.9480082413
0.1%
0.94403421
 
< 0.1%
0.943534141
 
< 0.1%
0.93949181
 
< 0.1%
0.926736651
 
< 0.1%
0.91190691
 
< 0.1%
0.906205831
 
< 0.1%
0.90560271
 
< 0.1%
0.905452671
 
< 0.1%

ia
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10409
Distinct (%)86.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4382163503
Minimum0.037709847
Maximum0.99387753
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:49.976878image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.037709847
5-th percentile0.11280035
Q10.23262316
median0.35341949
Q30.6069273
95-th percentile0.9490587
Maximum0.99387753
Range0.956167683
Interquartile range (IQR)0.37430414

Descriptive statistics

Standard deviation0.2662644614
Coefficient of variation (CV)0.6076096003
Kurtosis-0.7448420262
Mean0.4382163503
Median Absolute Deviation (MAD)0.17451961
Skewness0.6597911337
Sum5258.596203
Variance0.07089676341
MonotonicityNot monotonic
2022-06-14T18:59:50.112808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.6069273165
 
1.4%
0.9490587158
 
1.3%
0.9149589136
 
1.1%
0.6342086124
 
1.0%
0.934033285
 
0.7%
0.958673680
 
0.7%
0.8113608477
 
0.6%
0.689487973
 
0.6%
0.9063853673
 
0.6%
0.877266660
 
0.5%
Other values (10399)10969
91.4%
ValueCountFrequency (%)
0.0377098471
< 0.1%
0.0449708741
< 0.1%
0.045080641
< 0.1%
0.0454822331
< 0.1%
0.0495477651
< 0.1%
0.051897431
< 0.1%
0.0521899571
< 0.1%
0.053629841
< 0.1%
0.0542792751
< 0.1%
0.0559334351
< 0.1%
ValueCountFrequency (%)
0.993877533
< 0.1%
0.99305881
 
< 0.1%
0.992568971
 
< 0.1%
0.99212371
 
< 0.1%
0.990658041
 
< 0.1%
0.990367231
 
< 0.1%
0.990199861
 
< 0.1%
0.99015011
 
< 0.1%
0.99000251
 
< 0.1%
0.989794971
 
< 0.1%

s
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7420
Distinct (%)61.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.801580311
Minimum0.024430014
Maximum0.9943364
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:50.269109image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.024430014
5-th percentile0.4998674795
Q10.699709325
median0.8435214
Q30.93682694
95-th percentile0.97533107
Maximum0.9943364
Range0.969906386
Interquartile range (IQR)0.237117615

Descriptive statistics

Standard deviation0.1620618911
Coefficient of variation (CV)0.2021779838
Kurtosis0.3933632595
Mean0.801580311
Median Absolute Deviation (MAD)0.1099997
Skewness-0.978641045
Sum9618.963732
Variance0.02626405655
MonotonicityNot monotonic
2022-06-14T18:59:50.403197image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.9605577349
 
2.9%
0.9535211341
 
2.8%
0.9658004317
 
2.6%
0.9076751311
 
2.6%
0.93682694283
 
2.4%
0.9098734278
 
2.3%
0.97533107278
 
2.3%
0.9811452263
 
2.2%
0.72723687191
 
1.6%
0.59276253187
 
1.6%
Other values (7410)9202
76.7%
ValueCountFrequency (%)
0.0244300141
< 0.1%
0.061020651
< 0.1%
0.0716035961
< 0.1%
0.078171411
< 0.1%
0.07889231
< 0.1%
0.083819211
< 0.1%
0.10610111
< 0.1%
0.113730251
< 0.1%
0.114405681
< 0.1%
0.115151711
< 0.1%
ValueCountFrequency (%)
0.99433641
 
< 0.1%
0.992914861
 
< 0.1%
0.98560391
 
< 0.1%
0.985435869
0.6%
0.98540411
 
< 0.1%
0.985332551
 
< 0.1%
0.985185441
 
< 0.1%
0.985119341
 
< 0.1%
0.985111951
 
< 0.1%
0.98509031
 
< 0.1%

t
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7967
Distinct (%)66.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8218832335
Minimum0.024729094
Maximum0.984462
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:50.538766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.024729094
5-th percentile0.70857113
Q10.7473183
median0.8214078
Q30.89414346
95-th percentile0.95164716
Maximum0.984462
Range0.959732906
Interquartile range (IQR)0.14682516

Descriptive statistics

Standard deviation0.09560208289
Coefficient of variation (CV)0.1163207607
Kurtosis7.324972053
Mean0.8218832335
Median Absolute Deviation (MAD)0.0740895
Skewness-1.288609868
Sum9862.598802
Variance0.009139758252
MonotonicityNot monotonic
2022-06-14T18:59:50.834322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.7473183782
 
6.5%
0.8214078521
 
4.3%
0.95164716455
 
3.8%
0.9450092453
 
3.8%
0.8546382410
 
3.4%
0.86612374292
 
2.4%
0.9456558280
 
2.3%
0.88678163256
 
2.1%
0.91536796232
 
1.9%
0.89414346180
 
1.5%
Other values (7957)8139
67.8%
ValueCountFrequency (%)
0.0247290941
< 0.1%
0.0377664831
< 0.1%
0.0525376281
< 0.1%
0.062947911
< 0.1%
0.0795924961
< 0.1%
0.087827991
< 0.1%
0.1066056861
< 0.1%
0.124882871
< 0.1%
0.127893731
< 0.1%
0.134272751
< 0.1%
ValueCountFrequency (%)
0.9844624
< 0.1%
0.98106981
 
< 0.1%
0.97985551
 
< 0.1%
0.978133741
 
< 0.1%
0.97539481
 
< 0.1%
0.97497051
 
< 0.1%
0.97455111
 
< 0.1%
0.97366281
 
< 0.1%
0.973010061
 
< 0.1%
0.971794961
 
< 0.1%

e
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8215
Distinct (%)68.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5565736624
Minimum0.017585102
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:50.971773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.017585102
5-th percentile0.1259514565
Q10.3056789975
median0.5481356
Q30.820784365
95-th percentile0.97691213
Maximum1
Range0.982414898
Interquartile range (IQR)0.5151053675

Descriptive statistics

Standard deviation0.2865405723
Coefficient of variation (CV)0.5148295574
Kurtosis-1.294983056
Mean0.5565736624
Median Absolute Deviation (MAD)0.2510637
Skewness-0.03276281222
Sum6678.883949
Variance0.08210549957
MonotonicityNot monotonic
2022-06-14T18:59:51.104497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.6522458815
 
6.8%
0.51694775336
 
2.8%
0.9770172295
 
2.5%
0.7991993247
 
2.1%
0.7722222219
 
1.8%
0.87872106200
 
1.7%
0.95597285180
 
1.5%
0.8688342165
 
1.4%
0.90230405152
 
1.3%
0.69036824133
 
1.1%
Other values (8205)9258
77.1%
ValueCountFrequency (%)
0.0175851021
< 0.1%
0.0183349961
< 0.1%
0.0184374121
< 0.1%
0.0187397731
< 0.1%
0.0187623761
< 0.1%
0.022626271
< 0.1%
0.0252221661
< 0.1%
0.0263897481
< 0.1%
0.0274004581
< 0.1%
0.0294377751
< 0.1%
ValueCountFrequency (%)
12
 
< 0.1%
0.99960841
 
< 0.1%
0.998305741
 
< 0.1%
0.996968271
 
< 0.1%
0.99608661
 
< 0.1%
0.995125118
0.1%
0.99502571
 
< 0.1%
0.994960841
 
< 0.1%
0.994908331
 
< 0.1%
0.994719451
 
< 0.1%

ea
Real number (ℝ≥0)

Distinct10554
Distinct (%)87.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4052619229
Minimum0.026123213
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size93.9 KiB
2022-06-14T18:59:51.239496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.026123213
5-th percentile0.141569518
Q10.2243731675
median0.30714799
Q30.4974262425
95-th percentile0.975200513
Maximum1
Range0.973876787
Interquartile range (IQR)0.273053075

Descriptive statistics

Standard deviation0.2566860404
Coefficient of variation (CV)0.6333830689
Kurtosis0.0605287453
Mean0.4052619229
Median Absolute Deviation (MAD)0.0987772
Skewness1.146051382
Sum4863.143075
Variance0.06588772336
MonotonicityNot monotonic
2022-06-14T18:59:51.372246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.9083624159
 
1.3%
0.9456398146
 
1.2%
0.9800452143
 
1.2%
0.9262451125
 
1.0%
0.80709213110
 
0.9%
0.8496349104
 
0.9%
0.765875997
 
0.8%
0.986080368
 
0.6%
0.962105862
 
0.5%
0.980533360
 
0.5%
Other values (10544)10926
91.0%
ValueCountFrequency (%)
0.0261232131
< 0.1%
0.042044711
< 0.1%
0.04216071
< 0.1%
0.048827821
< 0.1%
0.049499341
< 0.1%
0.0501345881
< 0.1%
0.0502952231
< 0.1%
0.050530831
< 0.1%
0.0509054661
< 0.1%
0.0524964071
< 0.1%
ValueCountFrequency (%)
13
 
< 0.1%
0.999965131
 
< 0.1%
0.999621571
 
< 0.1%
0.99842931
 
< 0.1%
0.998084334
0.3%
0.998001161
 
< 0.1%
0.997675361
 
< 0.1%
0.99595751
 
< 0.1%
0.995508431
 
< 0.1%
0.994987131
 
< 0.1%

et
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size93.9 KiB
Offensive
5966 
Profanity
3430 
Very offensive
1138 
Neutral
814 
Extremely offensive
 
426
Other values (2)
 
226

Length

Max length19
Median length9
Mean length9.6845
Min length7

Characters and Unicode

Total characters116214
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOffensive
2nd rowOffensive
3rd rowVery offensive
4th rowNeutral
5th rowProfanity

Common Values

ValueCountFrequency (%)
Offensive5966
49.7%
Profanity3430
28.6%
Very offensive1138
 
9.5%
Neutral814
 
6.8%
Extremely offensive426
 
3.5%
Unknown140
 
1.2%
Hate speech86
 
0.7%

Length

2022-06-14T18:59:51.491586image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-14T18:59:51.609427image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
offensive7530
55.2%
profanity3430
25.1%
very1138
 
8.3%
neutral814
 
6.0%
extremely426
 
3.1%
unknown140
 
1.0%
hate86
 
0.6%
speech86
 
0.6%

Most occurring characters

ValueCountFrequency (%)
f18490
15.9%
e18122
15.6%
n11380
9.8%
i10960
9.4%
s7616
 
6.6%
v7530
 
6.5%
O5966
 
5.1%
r5808
 
5.0%
o5134
 
4.4%
y4994
 
4.3%
Other values (18)20214
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter102564
88.3%
Uppercase Letter12000
 
10.3%
Space Separator1650
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
f18490
18.0%
e18122
17.7%
n11380
11.1%
i10960
10.7%
s7616
7.4%
v7530
7.3%
r5808
 
5.7%
o5134
 
5.0%
y4994
 
4.9%
t4756
 
4.6%
Other values (10)7774
7.6%
Uppercase Letter
ValueCountFrequency (%)
O5966
49.7%
P3430
28.6%
V1138
 
9.5%
N814
 
6.8%
E426
 
3.5%
U140
 
1.2%
H86
 
0.7%
Space Separator
ValueCountFrequency (%)
1650
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin114564
98.6%
Common1650
 
1.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
f18490
16.1%
e18122
15.8%
n11380
9.9%
i10960
9.6%
s7616
6.6%
v7530
6.6%
O5966
 
5.2%
r5808
 
5.1%
o5134
 
4.5%
y4994
 
4.4%
Other values (17)18564
16.2%
Common
ValueCountFrequency (%)
1650
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII116214
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
f18490
15.9%
e18122
15.6%
n11380
9.8%
i10960
9.4%
s7616
 
6.6%
v7530
 
6.5%
O5966
 
5.1%
r5808
 
5.0%
o5134
 
4.4%
y4994
 
4.3%
Other values (18)20214
17.4%

Interactions

2022-06-14T18:59:48.753385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.317695image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.011721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.653170image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.287570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.108278image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.855989image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.491304image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.114408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.756495image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.390783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.210369image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.964787image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.596225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.220579image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.862595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.496750image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.320800image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:49.070383image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.700340image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.329545image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.968816image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.603239image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.428344image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:49.174662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.804106image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.438893image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.074418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.715031image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.538975image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:49.280139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:45.907988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:46.545764image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:47.180293image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.000608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T18:59:48.647322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-14T18:59:51.711954image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-14T18:59:51.833560image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-14T18:59:51.953366image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-14T18:59:52.078760image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-14T18:59:49.437359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-14T18:59:49.588261image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

fliasteeaet
00.5938280.5635160.8490900.8646320.7773470.602494Offensive
10.2131930.4072530.9250100.8564510.4569830.592931Offensive
20.4745320.3235740.7108310.7473180.9337150.208848Very offensive
30.5034260.4075570.7966850.8546380.9559730.343336Neutral
40.3948070.1700780.5618490.7665630.4593000.223698Profanity
50.2772150.2506700.8676020.8838610.3557240.228288Profanity
60.1970420.3881470.8532360.7790980.1479110.188907Profanity
70.3844750.9889420.9753310.9516470.7722220.366735Extremely offensive
80.1104730.0864100.4526980.7007860.1772490.108329Profanity
90.3883940.2439460.6790470.7473180.5919660.246288Profanity

Last rows

fliasteeaet
119900.4895070.9640970.9605580.9450090.7722220.472298Offensive
119910.2417160.5214990.9764760.9450090.6466520.380112Offensive
119920.3721190.6069270.9811450.9422450.6522460.363261Offensive
119930.4651920.6342090.9658000.9450400.7722220.648024Offensive
119940.4707470.5365050.9368270.9450090.8381130.496711Offensive
119950.5433660.6844810.9511760.9450090.8786510.765876Offensive
119960.8437020.8772670.9334370.9456560.9800180.476850Very offensive
119970.3723350.9798840.9634830.9450090.6854370.308390Very offensive
119980.7470680.9063850.9535210.9455890.9870910.926245Very offensive
119990.6400170.7509590.9343470.9450090.9023040.692056Offensive

Duplicate rows

Most frequently occurring

fliasteeaet# duplicates
510.4776080.4981150.8248780.8214080.8688340.203815Profanity8
250.4143500.4928130.8318830.8221280.7885620.199998Profanity6
260.4149160.4289460.8248590.8166350.7722220.153293Offensive6
290.4184000.5320760.8394150.8285280.7747790.202531Profanity6
330.4311920.5939160.8964350.8660370.7991990.248875Profanity6
380.4419250.6069270.8671020.8546380.8330850.214615Profanity6
400.4493710.5713340.8532360.8546380.8379730.215436Profanity6
270.4149160.4289460.8248590.8166350.7722220.153293Profanity5
220.4037910.5227750.8532360.8372390.7673770.203527Offensive4
240.4143500.4928130.8318830.8221280.7885620.199998Offensive4