Effective Python: Projects
The best way to learn Python is to use it to solve a real-life problem you face. I’m often asked to suggest example projects that are good for beginners. That’s a tricky question because we all have different issues and interests. Rather than recommend contrived problems, here is a partial list of how I’ve used Python since I started programming in 2016.
Going out on a limb, if I had to pick one for an actuary, it would be number 19. If you ever have to pick up and use a new database, a generic explorer tool is very helpful.
Please write me at steve@convexrisk.com and let me know how you’ve used Python in your work or leisure.
Python Projects since 2016
- Aggregate: tools for creating aggregate loss distributions. Aggregate was used to generate all the examples and figures in my forthcoming book, written with John Major. There is even (Python generated) documentation.
- Document manipulation tools to build presentations and websites from Markdown called GREAT Generalizable, Reusable, Extensible Actuarial Toolkit (formally a set of VBA macros). GREAT functions were used to create all PDF the files in PIRC. There is partial documentation, again created automatically from the Python source code. (Yes, you can do that in R too.)
- Websites: my websites are all written in Python, using Flask and jinja. For example, my blog is run from a Pandas dataframe using GREAT to create the html from Markdown.
- Regular expressions for text analysis. E.g., solutions to the Wordle word puzzle on my website.
- Webscraping: Collect information from websites and aggregate into a useful database. We will partially cover this in sessions 4, or 5. Examples include an early COVID tracker, a database of start-ups, and lots of BEA, FRED related analysis.
- Implementing the Iman-Conover method for correlation using
numpy
, session 5.
- Simulating random random correlation matrices using vines, session 5.
- Implementing the Rearrangement Algorithm for worst-case VaR performance, session 5.
- Building Dashboards, for example, an automated system to analyze subreddits. We’ll cover graphics later too.
- Making word clouds, including extracting stop words and stemming. See the left-hand side of the Actuary reddit dashboard
- Exploring the Bitcoin blockchain.
- Converting between graphics file types and manipulating PDFs.
- Making cartograms (my current project), such as this analysis of the 2020 Presidential Election. This project leverages geopandas, which provides a full mapping and visualization querying language and toolkit.
- Playing with open APIs such as flickr, news, reddit, twitter, Evernote, fitbit, Google Scholar, Wikipedia. E.g., the background image quit in my LinkedIn profile was assembled from images scraped from Google books.
- Organizing files, e.g., finding duplicate files, and creating links to presentations by client and by file type.
- Data analysis based on SNL/S&P Market Intelligence downloads, all done in Pandas.
- General text re-arrangement, extracting information from free-form text. E.g., accurately counting the number of words in my book, which turned out to be surprisingly difficult.
- Miscellaneous nerdy math stuff, such as testing for primes, computing the CAS Nonce using parallel processing, and evaluating \(2^{402653211}\) exactly (yes, that number comes up).
- Database manipulation and exploration: building a generic wrapper around a database to facilitate metadata querying: show me all the tables, their constructor, the queries, indexes, field, return the top 10 rows in each, etc. I used it to explore and reverse engineer my Evernote SQLite database.
- Tracking my investments, building near real-time price downloads and analyzing historical price, risk and performance.
- Extensions to SublimeText, which is the editor I use. ST is written in Python and you can write Python extensions. Mine are workflow and formatting related.
- Computing 1000! exactly. Yes, it can do that. The answer equals