# These pandas tips will save you hours of head scratching

To Step Up Your Pandas Game, read:
- [5 lesser-known pandas tricks](https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431)
- [Exploratory Data Analysis with pandas](https://towardsdatascience.com/exploratory-data-analysis-with-pandas-508a5e8a5964)
- [How NOT to write pandas code](https://towardsdatascience.com/how-not-to-write-pandas-code-ef88599c6e8f)
- [These pandas tips will save you hours of headscratching](https://)

In [1]:
import os
import platform
from platform import python_version

import jupyterlab
import numpy as np
import pandas as pd

print("System")
print("os name: %s" % os.name)
print("system: %s" % platform.system())
print("release: %s" % platform.release())
print()
print("Python")
print("version: %s" % python_version())
print()
print("Python Packages")
print("jupterlab==%s" % jupyterlab.__version__)
print("pandas==%s" % pd.__version__)
print("numpy==%s" % np.__version__)

System
os name: posix
system: Darwin
release: 18.7.0

Python
version: 3.7.3

Python Packages
jupterlab==1.1.5
pandas==0.25.3
numpy==1.17.4


In [2]:
np.random.seed(42)

In [3]:
np.random.random_sample(10)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864,
       0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])

In [4]:
np.random.random_sample(10)

array([0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497,
       0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914])

What happens when we set the same seed again?
We reset the seed and we get the same sequence of numbers as above.
This makes deterministic pseudorandom number generator.

In [5]:
np.random.seed(42)

In [6]:
np.random.random_sample(10)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864,
       0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])

In [7]:
df = pd.DataFrame(index=[1, 2, 3, 4, 4], data={"col1": ["a", "b", "c", "d", "d"]})
df

Unnamed: 0,col1
1,a
2,b
3,c
4,d
4,d


In [8]:
assert len(df[df.index.duplicated()]) == 0, "Dataframe has duplicates"

AssertionError: Dataframe has duplicates

In [9]:
df.shape

(5, 1)

In [10]:
df_new = df.join(df, lsuffix='_l', rsuffix='_r')
df_new.shape

(7, 2)

In [11]:
df_new

Unnamed: 0,col1_l,col1_r
1,a,a
2,b,b
3,c,c
4,d,d
4,d,d
4,d,d
4,d,d


In [12]:
df

Unnamed: 0,col1
1,a
2,b
3,c
4,d
4,d


In [14]:
df = pd.read_clipboard(sep='\s\s+')
df

Unnamed: 0,col1
1,a
2,b
3,c
4,d
4,d
