Pages

20180129

Effective Python by Brett Slatkin


  • There are two major versions of Python still in active use: Python 2 and Python 3.
  • There are multiple popular run times for Python: CPython, Jython, IronPython, PyPy, etc.
  • Be sure that the command-line for running Python on your system is the version you expect it to be.
  • Prefer Python 3 for your next project because that is the primary focus of the Python community.
  • Always follow the PEP 8 style guide when writing Python code.
  • Sharing a common style with the larger Python community facilitates collaboration with others.
  • Using a consistent style makes it easier to modify your own code later.
  • In Python 3, bytes contains sequences of 8-bit values, str contains sequences of Unicode characters. bytes and str instances can't be used together with operators (like > or +).
  • In Python 2, str contains sequences of 8-bit value,s Unicode contains sequences of Unicode characters. str and Unicode can be used together with operators if the str only contains 7-bit ASCII characters.
  • Use helper functions to ensure that the inputs you operate on are the type of character sequence you expect (8-bit values, UTF-8 encoded characters, Unicode characters, etc.).
  • If you want to read or write data to/from a file, always open the file using a binary mode (like 'rb' or 'wb').
  • Python's syntax makes it all too easy to write single-line expression that are overly complicated and difficult to read.
  • Move complex expressions into helper functions, especially if you need to use the same logic repeatedly.
  • The if/else expression provides a more readable alternative to using Boolean operators like or and and in expressions.
  • Avoid being verbose: Don't supply 0 for the start index or the length of the sequence for the end index.
  • Slicing is forgiving of start or end indexes that are out of bounds, making it easy to express slices on the front or back boundaries of a sequence (like a[:20] or a[-20:]).
  • Assigning to a list slice will replace that range in the original sequence with what references even if their lengths are different.
  • Specifying start, end, and stride in a slice can be extremely confusing.
  • Prefer suing positive stride value in sluices without start or end indexes. Avoid negative stride values if possible.
  • Avoid using start, end, and stride together in a single slice. If you need all three parameters, consider doing two assignments (one to slice, another to stride) or using islice from the itertools built-in module.
  • List comprehensions are clearer than the map and filter built-in functions because they don't require extra lambda expressions.
  • List comprehensions allow you to easily skip items from the input list, a behavior map doesn't support without help from filter.
  • Dictionaries and sets also support comprehension expressions.
  • List comprehensions support multiple levels of loops and multiple conditions per loop level.
  • List comprehensions with more than two expressions are very difficult to read and should be avoided.
  • List comprehensions can cause problems for large inputs by using too much memory.
  • Generator expressions avoid memory issues by producing outputs one at a time as an iterator.
  • Generator expressions can be composed by passing the iterator from one generator expression into the for sub-expression of another.
  • Generator expression execute very quickly when chained together.
  • enumerate provides concise syntax for looping over an iterator and getting the index of each item from the iterator as you go.
  • Prefer enumerate instead of looping over a range and indexing into a sequence.
  • You can supply a second parameter to enumerate to specify the number from which to begin counting (zero is the default).
  • The zip built-in function can be used to iterate over multiple iterators in parallel.
  • Tn Python3, zip is a lazy generator that produces tuples. In python 2, zip returns the full result as a list of tuples.
  • zip truncates its output silently if you supply it with iterators of different lengths.
  • The zip_longest function from the itertools built-in module lets you iterate over multiple iterators in parallel regardless of their lengths.
  • Python has special syntax that allows else blocks to immediately follow fro and while loop interior blocks.
  • The else block after a loop only runs if the loop body did not encounter a break statement.
  • Avoid using else blocks after loops because their behavior isn't intuitive and can be confusing.
  • The try/finally compound statement lets you run cleanup code regardless of whether exceptions were raised in the try block.
  • The else block helps you minimize the amount of code in try blocks and visually distinguish the success case from the try/except blocks.
  • An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block.
  • Functions that return None to indicate special meaning are error prone because None and other values (e.g. zero, the empty string) all evaluate to False in conditional expressions.
  • Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they're documented.
  • Closure functions can refer to variables from any of the scopes in which they were defined.
  • By default, closures can't affect enclosing scopes by assigning variables.
  • In Python 3, use the non-local statements to indicate when a closure can modify a variable in its enclosing scopes.
  • In Python 2, use a mutable value (like a single-item list) to work around the lack of the non-local statement.
  • Avoid using non-local statements for anything beyond simple functions.
  • Using generators can be clearer than the alternative for returning lists of accumulated results.
  • The iterator returned by a generator produces the set of values passed to yield expressions within the generator function's body.
  • Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn't include all inputs and outputs.
  • Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
  • Python's iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.
  • You can easily define your own iterable container type by implementing the __iter__ method as a generator.
  • You can detect that a value is an iterator (instead of a container) if calling iter on it twice produces the same result, which can then be progressed with the next built-in function.
  • Functions can accept a variable number of positional arguments by using *args in the def statement.
  • You can use the items from a sequence as the positional arguments for a function with the * operator.
  • Using the * operator with a generator may cause your program to run out of memory and crash.
  • Adding new positional parameters to functions that accept *args can introduce hard-to-find bugs.
  • Function arguments can be specified by position or by keyword.
  • Keywords make it clear what the purpose of each argument is when it would be confusing with only positional arguments.
  • Keyword arguments with default values make it easy to add new behaviors to a function, especially when the function has existing callers.
  • Optional keyword arguments should always be passed by the keyword instead of by position.
  • Default arguments are only evaluated once: during function definition at module load time. This can cause odd behaviors for dynamic values (like {} or []).
  • Use None as the default value for keyword arguments that have a dynamic value. Document the actual default behavior in the function's doc-string.
  • Keyword arguments make the intention of a function call more clear.
  • Use keyword-only arguments to force callers to supply keyword arguments for potentially confusing functions, especially those that accept multiple Boolean flags.
  • Python 3 supports explicit syntax for keyword-only arguments in functions.
  • Python 2 can emulate keyword-only arguments for functions by using **kwargs and manually raising TypeError exceptions.
  • void making dictionaries with values that are other dictionaries or long tuples.
  • Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
  • Move your bookkeeping code to use multiple helper classes when your internal state dictionaries get complicated.
  • Instead of defining and instantiating classes, functions are are often all you need for simple interfaces between components in Python.
  • References to functions and methods in Python are first class, meaning they can be used in expressions like any other type.
  • The __call__ special method enables instances of a class to be called like plain Python functions.
  • When you need a function to maintain state, consider defining a class that provides the __call__ method instead of defining a stateful closure.
  • Python only supports a single constructor per class, the __init__ method.
  • Use @classmethod to define alternative constructors for your classes.
  • Use class method polymorphism to provide generic ways to build and connect concrete sub-classes.
  • Python's standard method resolution order (MRO) solves the problems of super-class initialization order and diamond inheritance.
  • Always use the super built-in function to initialize parent classes.
  • Avoid using multiple inheritance if mix-in classes can achieve the same outcome.
  • Use pluggable behaviors at the instance level to provide per-class customization when mix-in classes may require it.
  • Compose mix-ins to create complex functionality from simple behaviors.
  • Private attributes aren't rigorously enforced by the Python compiler.
  • Plan from the beginning to allow sub-classes to do more with your internal APIs and attributes instead of locking them out by default.
  • Use documentation of protected fields to guide sub-classes instead of trying to force access control with private attributes.
  • Only consider using private attributes to avoid naming conflicts with sub-classes that are out of your control.
  • Inherit directly from Python's container types (like list or dict) for simple use cases.
  • Beware of the large number of methods required to implement custom container types correctly.
  • Have your custom container types inherit from the interfaces defined in collections.abc to ensure that your classes match required interfaces and behaviors.
  • Define new class interfaces using simple public attributes, and avoid set and get methods.
  • Use @property to define special behavior when attributes are accessed on your objects, if necessary.
  • Follow the rule of least surprise and avoid weird side effects in your @property methods.
  • Ensure that @property methods are fast; do slow or complex work using normal methods.
  • Use @property to give existing instance attributes new functionality.
  • Make incremental progress toward better data models by using @property.
  • Consider refactoring a class and all call sites when you find yourself using @property too heavily.
  • Reuse the behavior and validation of @property methods by defining your own descriptor classes.
  • Use WeakKeyDictionary to ensure that your descriptor classes don't cause memory leaks.
  • Don't get bogged won trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes.
  • Use __getattr__ and __setattr__ to lazily load and save attributes for an object.
  • Understand that __getattr__ only gets called once when accessing a missing attribute, whereas __getatrribute__ gets called every time an attribute is accessed.
  • Avoid infinite recursion in __getattribute__ and __setattr__ by using methods from super() (i.e. object class) to access instance attributes directly.
  • Use meta-classes to ensure that sub-classes are well formed at the time they are defined, before objects of their type or constructed.
  • Meta-classes have slightly different syntax in Python 2 vs Python 3.
  • The __new__ method of meta-classes is run after the class statement's entire body has been processed.
  • Class registration is a helpful pattern for building modular Python programs.
  • Meta-classes let you run registration code automatically each time your base class is sub-classes in a program.
  • Using meta-classes for class registration avoids errors by ensuring that you never miss a registration call.
  • Meta-classes enable you to modify a class's attributes before the class is fully defined.
  • Descriptors and meta-classes make a powerful combination for declarative behavior and run-time introspection.
  • You can avoid both memory leaks and the weakref module by using meta-classes along with descriptors.
  • Use the sub-process module to run child processes and manage their input and output streams.
  • Child processes run in parallel with the Python interpreter, enabling you to maximize your CPU usage.
  • Use the timeout parameters with communicate to avoid deadlocks and hanging child processes.
  • Python threads can't run byte-code in parallel on multiple CPU cores because of the global interpreter lock (GIL).
  • Python threads are still useful despite the GIL because they provide an easy way to do multiple things at seemingly the same time.
  • Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation.
  • Even though Python has a global interpreter lock, you're still responsible for protecting against data races between the threads in your programs.
  • Your programs will corrupt their data structures if you allow multiple threads to modify the same objects without locks.
  • The Lock class in the threading built-in module is Python's standard mutual exclusion lock implementation.
  • Pipelines are a great way to organize sequences of work that run concurrently using multiple Python threads.
  • Be aware of the many problems in building concurrent pipelines: busy waiting, stopping workers, and memory explosion.
  • The Queue class has all of the facilities you need to build robust pipelines: blocking operations, buffer sizes, and joining.
  • Co-routines provide an efficient way to run tens of thousands of functions seemingly at the same time.
  • Within a generator, the value of the yield expression will be whatever value was passed to the generator's send method from the exterior code.
  • Co-routines give you a powerful tool for separating the core logic of your program from its interaction with the surrounding environment.
  • Python 2 doesn't support yield from or returning values from generators.
  • Moving CPU bottlenecks to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code. However, the cost of doing so is high and may introduce bugs.
  • The multiprocessing module provides powerful tools that can parallelize certain types of Python computations with minimal effort.
  • The power of multiprocessing is best accessed through the concurrent.futures built-in module and its simple ProcessPoolExecutor class.
  • The advanced parts of the multiprocessing module should be avoided because they are so complex.
  • Decorators are Python syntax for allowing one function to modify another function at run-time.
  • Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
  • Use the wraps decorator from the functools built-in module when you define your own decorators to avoid any issues.
  • The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
  • The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
  • The value yielded by context managers is supplied to the as part of the with statement. It's useful for letting your code directly access the cause of the special context.
  • The pickle built-in module is only useful for serializing and deserializing objects between trusted programs.
  • The pickle module may break down when used for more than trivial use cases.
  • Use the copyreg built-in module with pickle to add missing attribute values, allow versioning of classes, and provide stable import paths.
  • Avoid using the time module for translating between different time zones.
  • Use the datetime built-in module along with the pytz module to reliably convert between times in different time zones.
  • Always represent time in UTC and do conversions to local time as the final step before presentation.
  • Use Python's built-in modules for algorithms and data structures.
  • Don't re-implement this functionality yourself. It's hard to get right.
  • Python has built-in types and classes in modules that can represent practically every type of numerical value.
  • The Decimal class is ideal or situations that require high precision and exact rounding behavior, such as computations of monetary values.
  • The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
  • pip is the command-line tool to use for installing packages form PyPI.
  • pip is installed by default in Python 3.4 and above; you must install it yourself for older versions.
  • The majority of PyPI modules are free and open source software.
  • Write documentation for every module, class, and function using doc-strings. Keep them up to date as your code changes.
  • For modules: Introduce all contents of the module and any important classes or functions all users should know about.
  • For classes: Document behavior, important attributes, and subclass behavior in the doc-string following the class statement.
  • For functions and methods: Document every argument, returned value, raised exception, and other behaviors in the doc-string following the def statement.
  • Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting name-spaces with unique absolute module names.
  • Simple packages are defined by adding an __init__.py file to a directory that contains other source files. These files become the child modules of the directory's package. Package directories may also contain other packages.
  • You can provide an explicitly API for a module by listing its publicly visible names in its __all__ special attribute.
  • You can hide a package's internal implementation by only important public names in the package's __init__.py file or by naming internal-only members with a leading underscore.
  • When collaborating within a single team or on a single code-base, using __all__ for explicit APIs is probably unnecessary.
  • Defining root exceptions for your modules allows API consumers to insulate themselves form your API.
  • Catching root exceptions can help you find bugs in code that consumes an API.
  • Catching the Python Exception base class can help you find bugs in API implementations.
  • Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers.
  • Circular dependencies happen when two modules must call into each other at import time. They can cause your program to crash at startup.
  • The best way to break a circular dependency is refactoring mutual dependencies into a separate module at the bottom of the dependency tree.
  • Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity.
  • Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
  • Virtual environments are created with pyvenv, enabled with source bin/activate, and disabled with deactivate.
  • You can dump all of the requirements of an environment with pip freeze. You can reproduce the environment by supplying the requirements.txt file to pip install -r.
  • In versions of Python before 3.4, the pyvenv tool must be downloaded and installed separately. The command-line tool is called virtualenv instead of pyvenv.
  • Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
  • You can tailor a module's contents to different deployment environments by using normal Python statements in module scope.
  • Module contents can be the product of any external condition, including host introspection though the sys and os modules.
  • Calling print on built-in Python types will produce the human-readable string version of a value, which hides type information.
  • Calling repr on built-in Python types will produce the printable string version of a value. These repr strings could be passed to the eval built-in function to get back the original value.
  • %s in format strings will produce human-readable strings like str. %r will price printable stings like repr.
  • You can define the __repr__ method to customize the printable representation of a class and provide more detailed debugging information.
  • You can reach into any object's __dict__ attribute to view its internals.
  • The only way to have confidence in a Python program is to write tests.
  • The unittest built-in module provides most of the facilities you'll need to write good tests.
  • You can define tests by sub-classing TestCase and defining one method per behavior you'd like to test. Test methods on TestCase classes must start with the word test.
  • It's important to write both unit tests (for isolated functionality) and integration tests (for modules that interact).
  • You can initiate the Python interactive debugger at a point of interest directly in your program with the import pdb; pdb.set_trace() statements.
  • The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
  • pdb shell commands let you precisely control program execution, allowing you to alternate between inspecting program state and progressing program execution.
  • It's important to profile Python programs before optimizing because the source of slowdowns is often obscure.
  • Use the cProfile module instead of the profile module because it provides more accurate profiling information.
  • The Profile object's runcall method provides everything you need to profile a tree of function calls in isolation.
  • The Stats object lets you select and print the subset of profiling information you need to see to understand your program's performance.
  • It can be difficult to understand how Python programs use and leak memory.
  • The gc module can help you understand which objects exist, but it has no information about how they were allocated.
  • The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.
  • tracemalloc is only available in Python 3.4 and above.

No comments:

Post a Comment