Pragmatic Designs: Argument Passing in Airflow’s Operator Inheritance

Pragmatic Designs: Argument Passing in Airflow’s Operator InheritanceA review of *args, **kwargs, and argument passing when extending a class in Python.

Plus considerations when or when not to use them in practice.

Tom SzumowskiBlockedUnblockFollowFollowingApr 1You ever find yourself needing to pass configuration parameters around a code base with many classes or methods? — I do, all the time.

One simple approach in Python is create a config dictionary and pass it through everywhere.

Any object that needs it can use the specific keys it is interested.

But as demonstrated in this example, that can be a rather dangerous practice.

Photo by Keith Johnston on Unsplash — Argument Passing!Enter *args and **kwargsThis is where you may find the use of *args and **kwargs mentioned for Python.

As written by a *args **kwargs tutorial by DigitalOcean here: “When programming, you may not be aware of all the possible use cases of your code, and may want to offer more options for future programmers working with the module, or for users interacting with the code.

”*args provides non-keyworded variable-length argument lists,**kwargs provides keyworded variable-length argument lists.

DigitalOcean provides examples of how to use them in functions.

As always, the Python docs are a fantastic reference (when you can find the right section).

You can also find references on StackOverflow for how they are helpful when subclassing: here, here, and here.

To illustrate from one of those references, you can use *args and **kwargs to pass arguments through subclasses (thanks Mark van Lent):class Foo(object): def __init__(self, value1, value2): # do something with the values print value1, value2class MyFoo(Foo): def __init__(self, *args, **kwargs): # do something else, don't care about the args print 'myfoo' super(MyFoo, self).

__init__(*args, **kwargs)But how is it used in real-life — Enter AirflowWhile a simple example is great to illustrate the concept, I sometimes struggled applying this technique consistently in practice.

Early on, I sometimes ended up with errors such as: TypeError: multiply() takes 2 positional arguments but 3 were given , and then giving up on the workflow entirely out of need to get a prototype out (Danger! Tech Debt!).

I then came across Airflow’s DatastoreExportOperator class when experimenting with Airflow for this article.

Let’s walk through how they use *args and **kwargs in this context, as well as consider why Airflow moved away from it at the end their 2.

0 release.

Permalink reference.

Note: “(…)” skips some details.

Side note: @apply_defaults is pretty slick too, and is related to my previous article on configuration parsing.

Dissecting DatastoreExportOperator’s Init for Argument PassingLet’s dissect the key compents related to argument passing:Class definition: Inherits BaseOperator__init__() : Includes it’s own arguments,*args , and **kwargssuper() : Calls the parent class __init__ with *args and **kwargsself.

xyz = … : Parses explicit input arguments specific to this classkwargs.

get(‘xcom_push’): Manages any optional input arguments specific to this class, in this case, managing deprecated variables.

Example Pitfall — Accidental PopsSo by using *args and **kwargs when subclassing, you can flow arguments through up to the parent while still managing arguments specific to your subclass.

This isn’t foolproof though.

For example, in one of the above StackOverflow references, the subclass’s __init__ executes: self.

myvalue = kwargs.

pop(‘myvalue’, None).

This has the benefit of keeping the final list of **kwargs slim by popping off arguments as they are used, but can also be dangerous if a subclass accidentally pops off and removes an argument the parent needs.

When should one use, or not use, args/kwargs? — Airflow Case StudyThis is an opinionated topic.

Recall the Python docs provide examples for both and even address it in their FAQ.

My current perspective follows this StackOverflow opinion from lemonhead:**kwargs and properties have nice specific use cases, but just stick to explicit keyword arguments whenever practical/possible.

If there are too many instance variables, consider breaking up your class into hierarchical container objects.

“Whenever practical/possible” should be considered on a project-by-project basis.

Take Airflow for example.

Up until recently, it used *args and **kwargs extensively across all Operator subclasses (as shown above).

For those following Airflow development, you may have noticed that in Airflow 2.

0, the ability to pass *args and **kwargs to the BaseOperator is marked for deprecation in order to prevent invalid arguments from entering an operator (PR reference).

So there is an example where one starts with using them, and then later decides to move to explicit arguments for possible reasons such as safety/consistency in the parent class (i.

e.

BaseOperator ).

That may not be the exact reason, but overall I currently resonate with that philosophy.

Start with something flexible (e.

g.

*args, **kwargs) so that your classes aren’t exploding with explicit arguments.

Then you can move to explicit once the design is firmed up and you know where additional protection is necessary.

Just when you do migrate, be sure to warn your users just as Airflow did:Source: Nice example from Airflow of deprecation warning for no longer accepthing args/kwargs in future releases.

.. More details

Leave a Reply