Table of Contents

1 Pythonic Thinking

1.1 PEP 8 style guide

1.1.1 Naming:

  • Protected instance attributes should be in _leading_underscore format.
  • Private instance attributes should be in __double_leading_underscore

format.

  • Classes and exceptions should be in CapitalizedWord format.
  • Module-level constants should be in ALL_CAPS format.
  • Instance methods in classes should use self as the name of the first parameter

(which refers to the object).

  • Class methods should use cls as the name of the first parameter (which refers to the class).

1.1.2 Expressions and Statements:

  • Always use absolute names for modules when importing them, not names relative to the current module’s own path. For example, to import the foo module from the bar package, you should do from bar import foo, not just import foo.
  • If you must do relative imports, use the explicit syntax from . import foo.
  • Imports should be in sections in the following order: standard library modules, thirdparty modules, your own modules. Each subsection should have imports in alphabetical order.

1.2 Differences between bytes, str, and unicode

  • In Python 3, there are two types that represent sequences of characters: bytes and str. Instances of bytes contain raw 8-bit values. Instances of str contain Unicode characters.
  • In Python 2, there are two types that represent sequences of characters: str and unicode. In contrast to Python 3, instances of str contain raw 8-bit values. Instances of unicode contain Unicode characters.
  • To convert Unicode characters to binary data, you must use the encode method. To convert binary data to Unicode characters, you must use the decode method.
  • When you’re writing Python programs, it’s important to do encoding and decoding of Unicode at the furthest boundary of your interfaces. The core of your program should use Unicode character types (str in Python 3, unicode in Python 2) and should not assume anything about character encodings.

1.3 Slice Sequences

  • Slicing lets you access a subset of a sequence’s items with minimal effort. The simplest uses for slicing are the built-in types list, str, and bytes.
  • Python has special syntax for the stride of a slice in the form somelist[start:end:stride].

1.4 Enumerate

enumerate wraps any iterator with a lazy generator.

for i, flavor in enumerate(flavor_list):
print(‘%d: %s’ % (i + 1, flavor))
>>>
1: vanilla
2: chocolate
3: pecan
4: strawberry

1.5 zip

There are two problems with the zip built-in.

  • The first issue is that in Python 2 zip is not a generator; it will fully exhaust the supplied iterators and return a list of all the tuples it creates. This could potentially use a lot of memory and cause your program to crash. If you want to zip very large iterators in Python 2, you should use izip from the itertools built-in module.
  • The second issue is that zip behaves strangely if the input iterators are of different lengths.

2 Functions

3 Classes and inheritance

4 Metaclasses and atributes

5 Concurrency and parallelism

6 Built-in modules

6.1 pickle

  • The pickle built-in module is only useful for serializing and deserializing objects between trusted programs.
  • The pickle module may break down when used for more than trivial use cases.
  • Use the copyreg built-in module with pickle to add missing attribute values, allow versioning of classes, and provide stable import paths.

6.2 datetime

converting time between UTC and local clocks.

# UTC to local
from time import localtime, strftime
now = 1407694710
local_tuple = localtime(now)
time_format = ‘%Y-%m-%d %H:%M:%S’
time_str = strftime(time_format, local_tuple)
print(time_str)
# local to UTC
time_tuple = strptime(time_str, time_format)
utc_now = mktime(time_tuple)
print(utc_now)
# take present time in UTC and convert it to local time.
from datetime import datetime, timezone
now = datetime(2014, 8, 10, 18, 18, 30)
now_utc = now.replace(tzinfo=timezone.utc)
now_local = now_utc.astimezone()
print(now_local)
>>>
2014-08-10 11:18:30-07:00

6.3 Ordered dictionary

Standard dictionaries are unordered. That means a dict with the same keys and values can result in different orders of iteration. This behavior is a surprising byproduct of the way the dictionary’s fast hash table is implemented. The OrderedDict class from the collections module is a special type of dictionary that keeps track of the order in which its keys were inserted. Iterating the keys of an OrderedDict has predictable behavior. This can vastly simplify testing and debugging by making all code deterministic.

6.4 Default Dictionary

One problem with dictionaries is that you can’t assume any keys are already present. If you wish to increment a counter stored in a dictionary when that counter key is not existed in the dictionary before, you have to assign a default value to that key.

stats = {}
key = 'my_counter'
if key not in stats:
stats[key] = 0
stats[key] += 1

# Using defaultdict doesn't have such problem.
from collections import defaultdict
stats = defaultdict(int)
stats['my_counter'] += 1

6.5 Decimal for precision

rounded = number.quantize(Decimal(0.01), rounding=ROUND_UP)
print(rounded)

7 Collaboration

7.1 Documenting

Write docstrings for every function, class, and module.

7.2 Packages

--main.py
--mypackage
    |-- __init__.py
    |-- mypackage/models.py
    |-- mypackage/utils.py
# main.py
from mypackage import utils

7.2.1 stable API

The value of all is a list of every name to export from the module as part of its public API.

# utils.py
from . models import Projectile
all__ = [‘simulate_collision’]
def _dot_product(a, b):
# 
def simulate_collision(a, b):
# 
# __init__.py
all__ = []
from . models import *
all__ += models.__all__
from . utils import *
all__ += utils.__all__
# api_consumer.py
from mypackage import *
a = Projectile(1.5, 3)
b = Projectile(4, 1.7)
after_a, after_b = simulate_collision(a, b)

internal-only functions like mypackage.utils._dot_product will not be available to the API consumer on mypackage because they weren’t present in all__. Being omitted from all means they weren’t imported by the from mypackage import * statement.

7.3 Virtual environments

  • By default, pip installs new packages in a global location. That

causes all Python programs on your system to be affected by these installed modules. If one package requires newer version of another package, which could cause conflict version dependency issue. pip3 show flask pip3 show Sphinx

  • use the pip freeze command to save all

of your explicit package dependencies into a file.

pip freeze > requirements.txt
pip install -r requirements.txt

8 Production

8.1 Test Everything with unittest

# utils.py
def to_str(data):
    if isinstance(data, str):
        return data
    elif isinstance(data, bytes):
        return data.decode(‘utf-8)
    else:
        raise TypeError('Must supply str or bytes,''found: %r' % data)
# utils_test.py
from unittest import TestCase, main
from utils import to_str
class UtilsTestCase(TestCase):
    @classmethod
    def setUpClass(cls):
        cls.dates = pd.date_range('20160101', periods = 5)
        cls.dataDf = pd.DataFrame({'date': cls.dates,
            'count': np.array([3, 7, 4, 66, 9])})

    def test_to_str_bytes(self):
        self.assertEqual('hello', to_str(b'hello'))

    def test_to_str_str(self):
        self.assertEqual('hello', to_str('hello'))

    def test_to_str_bad(self):
        self.assertRaises(TypeError, to_str, object())


if __name__ == '__main__':
    main()

The TestCase class provides helper methods for making assertions in your tests, such as assertEqual for verifying equality, assertTrue for verifying Boolean expressions, and assertRaises for verifying that exceptions are raised when appropriate (see help(TestCase) for more). You can define your own helper methods in TestCase subclasses to make your tests more readable; just ensure that your method names don’t begin with the word test.

8.1.1 unittest.mock built-in module

8.2 Profile before optimizing

  1. 如何查看python程序的运行效率:

The dynamic nature of Python causes surprising behaviors in its runtime performance. Operations you might assume are slow are actually very fast (string manipulation, generators). Language features you might assume are fast are actually very slow (attribute access, function calls). The true source of slowdowns in a Python program can be obscure. The best approach is to ignore your intuition and directly measure the performance of a program before you try to optimize it. Python provides a built-in profiler for determining which parts of a program are responsible for its execution time. This lets you focus your optimization efforts on the biggest sources of trouble and ignore parts of the program that don’t impact speed. STEPS: 1). install snakeviz using pip from cmd.

pip install snakeviz

2). profile the test python file using below command.

$ python -m cProfile -o profile.stats test.py
# test.py
from random import randint
max_size = 10**4
data = [randint(0, max_size) for _ in range(max_size)]
test = lambda: insertion_sort(data)

3). check the efficiency result from profile.stats file.

$ snakeviz profile.stats

8.2.1 enhancing performance with Cython(writing C extensions for pandas)

8.3 Use tracemalloc to Understand Memory Usage and Leaks