Essential Steps to Kickstart Your Data Science Career
Written on
Chapter 1: Getting Started
Congratulations on your new role as a Data Scientist! You’re excited to dive into fascinating Data Science projects, but first, you must tackle the necessary setup for your machine.
In this guide, we will cover three vital steps to prepare your setup:
- Installing your development environment with Xcode, Homebrew, and Python.
- Creating virtual environments.
- Configuring your Data Science framework with essential libraries.
This article is tailored for Python enthusiasts and MacOS users, so let’s jump in!
Section 1.1: Installing Your Development Environment
Xcode - This is a free Integrated Development Environment (IDE) for macOS, which includes Command Line Tools (CTL). You will need Xcode to install Homebrew.
To install Xcode, refer to this guide by Rohan Paul:
How to Download and Setup Xcode 11 for iOS Development
Homebrew - A package manager designed for macOS, Homebrew simplifies the installation of various technologies.
To set up Homebrew, open your terminal and execute the following command to download the installer script:
For those who prefer a cautious approach, Matthew (Brender) Broberg recommends using the curl command to retrieve the installer and reviewing it before execution:
$ more homebrew_installer.sh
$ bash homebrew_installer.sh
pyenv - While macOS comes with Python 2.7 pre-installed, you’ll need a newer version (e.g., Python 3.x) for many machine learning packages. Therefore, installing pyenv is a good idea to manage your Python versions.
To install pyenv, run:
brew install pyenv
Remember to update your ~/.bash_profile with:
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
You can then install your desired version of Python with:
pyenv install 3.x.x
Note: Pip3 is included with Python 3.2 and later.
Section 1.2: Creating a Virtual Environment
First, install the necessary library:
pip3 install virtualenv
To create a virtual environment, use the following command:
virtualenv -p python3 <your_path_here>
Activate the environment by running:
source <your_path_here>/bin/activate
Once activated, the name of your environment will appear at the start of your terminal prompt, indicating you are working within it.
To share your virtual environment with a teammate, create a requirements file listing all packages and their versions:
pip freeze > requirements.txt
Your teammate can replicate your environment with:
pip install -r requirements.txt
Chapter 2: Setting Up Your Framework
To ensure your virtual environment inherits pre-installed packages, use the following command:
virtualenv <name_of_virtual_environment> --system-site-packages
You can install multiple packages in your virtual environment using your requirements.txt file or by running:
pip install <package>
Make sure your requirements file adheres to the standard layout and syntax as outlined in the pip documentation.
Here are some of my favorite libraries for Data Science endeavors:
- TensorFlow
- NumPy
- SciPy
- Pandas
- Matplotlib
- Keras
- SciKit-Learn
- PyTorch
- Scrapy
- BeautifulSoup
- Pandas Profiling
- Seaborn
- Plotly
- Bokeh
- Statsmodels
At this point, you have the essential components to embark on your Data Science projects. I recommend concluding your setup with:
- PyCharm
- Jupyter Notebook or JupyterLab
- GitHub or GitLab
- Power BI, Metabase, or Tableau
If you prefer terminal usage, consider enhancing iTerm2 by following this comprehensive guide, or personalize your terminal theme in just seven minutes.
There are numerous alternatives to these tools, so feel free to explore what best suits your working style and budget. I welcome any suggestions you may have!
Thank you for reading! Follow me for more insights into Data Science on LinkedIn.
Stay tuned for my upcoming articles and join a community of thousands of writers on Medium, where I share tips on leveraging Data Science effectively.