austinsymbolofquality.com

Essential Steps to Kickstart Your Data Science Career

Written on

Chapter 1: Getting Started

Congratulations on your new role as a Data Scientist! You’re excited to dive into fascinating Data Science projects, but first, you must tackle the necessary setup for your machine.

Setting up your Data Science environment

In this guide, we will cover three vital steps to prepare your setup:

  1. Installing your development environment with Xcode, Homebrew, and Python.
  2. Creating virtual environments.
  3. Configuring your Data Science framework with essential libraries.

This article is tailored for Python enthusiasts and MacOS users, so let’s jump in!

Section 1.1: Installing Your Development Environment

Xcode - This is a free Integrated Development Environment (IDE) for macOS, which includes Command Line Tools (CTL). You will need Xcode to install Homebrew.

To install Xcode, refer to this guide by Rohan Paul:

How to Download and Setup Xcode 11 for iOS Development

Homebrew - A package manager designed for macOS, Homebrew simplifies the installation of various technologies.

To set up Homebrew, open your terminal and execute the following command to download the installer script:

For those who prefer a cautious approach, Matthew (Brender) Broberg recommends using the curl command to retrieve the installer and reviewing it before execution:

$ more homebrew_installer.sh

$ bash homebrew_installer.sh

pyenv - While macOS comes with Python 2.7 pre-installed, you’ll need a newer version (e.g., Python 3.x) for many machine learning packages. Therefore, installing pyenv is a good idea to manage your Python versions.

To install pyenv, run:

brew install pyenv

Remember to update your ~/.bash_profile with:

echo 'eval "$(pyenv init -)"' >> ~/.bash_profile

You can then install your desired version of Python with:

pyenv install 3.x.x

Note: Pip3 is included with Python 3.2 and later.

Section 1.2: Creating a Virtual Environment

First, install the necessary library:

pip3 install virtualenv

To create a virtual environment, use the following command:

virtualenv -p python3 <your_path_here>

Activate the environment by running:

source <your_path_here>/bin/activate

Once activated, the name of your environment will appear at the start of your terminal prompt, indicating you are working within it.

To share your virtual environment with a teammate, create a requirements file listing all packages and their versions:

pip freeze > requirements.txt

Your teammate can replicate your environment with:

pip install -r requirements.txt

Chapter 2: Setting Up Your Framework

To ensure your virtual environment inherits pre-installed packages, use the following command:

virtualenv <name_of_virtual_environment> --system-site-packages

You can install multiple packages in your virtual environment using your requirements.txt file or by running:

pip install <package>

Make sure your requirements file adheres to the standard layout and syntax as outlined in the pip documentation.

Here are some of my favorite libraries for Data Science endeavors:

  • TensorFlow
  • NumPy
  • SciPy
  • Pandas
  • Matplotlib
  • Keras
  • SciKit-Learn
  • PyTorch
  • Scrapy
  • BeautifulSoup
  • Pandas Profiling
  • Seaborn
  • Plotly
  • Bokeh
  • Statsmodels

At this point, you have the essential components to embark on your Data Science projects. I recommend concluding your setup with:

  • PyCharm
  • Jupyter Notebook or JupyterLab
  • GitHub or GitLab
  • Power BI, Metabase, or Tableau

If you prefer terminal usage, consider enhancing iTerm2 by following this comprehensive guide, or personalize your terminal theme in just seven minutes.

Enhancing your terminal experience

There are numerous alternatives to these tools, so feel free to explore what best suits your working style and budget. I welcome any suggestions you may have!

Thank you for reading! Follow me for more insights into Data Science on LinkedIn.

Stay tuned for my upcoming articles and join a community of thousands of writers on Medium, where I share tips on leveraging Data Science effectively.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Empower Your Health: Four Essential Tips to Take Charge

Discover four empowering tips that will help you take control of your health and well-being.

Discovering the Best AI Tools of 2023: My Top Picks

Explore my favorite AI tools of 2023, beyond ChatGPT and Midjourney, and see how they can enhance your creativity and productivity.

# Transform Your Life: 7 Self-Experiments for Positive Change

Discover seven impactful self-experiments to embrace positive change and transform your life for the better.