Solvedhandson ml NameError: name 'prepare_country_stats' is not defined


anyone knows the workaround ?

in page 43/564 Example 1-1. Training and running a linear model using Scikit-Learn
how do i overcome this error ?

Prepare the data

country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)

Traceback (most recent call last):
File ".\", line 12, in
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
NameError: name 'prepare_country_stats' is not defined

Hands-On Machine Learning with Scikit-Learn and TensorFlow

25 Answers

✔️Accepted Answer

Hi everyone,

Apparently this missing code is causing some confusion, I'm sorry about that. It is only there to "whet your appetite", to give you a feel of what's coming next, no to be actually executed. But I understand that some readers might want to run it as is. If you really want to execute it, then here's a prepare_country_stats() function you can use:

def prepare_country_stats(oecd_bli, gdp_per_capita):
    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
    gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
    gdp_per_capita.set_index("Country", inplace=True)
    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita,
                                  left_index=True, right_index=True)
    full_country_stats.sort_values(by="GDP per capita", inplace=True)
    remove_indices = [0, 1, 6, 8, 33, 34, 35]
    keep_indices = list(set(range(36)) - set(remove_indices))
    return full_country_stats[["GDP per capita", 'Life satisfaction']].iloc[keep_indices]

Just add this function at the beginning of the code, and run the program in the directory that contains the data files (oecd_bli_2015.csv and gdp_per_capita.csv) and you should be fine (except that you must add an import sklearn.linear_model, at least in recent versions of Scikit-Learn).

As you can see, it's a long and boring function that prepares the data to have a nice and clean matrix in the end. Just Pandas stuff, nothing special about it, and nothing interesting with regards to Machine Learning, which is why I didn't want to include it in the book. In general, I avoid including every single line of code in the book, for readability, to keep it short and focused on what matters most, but hopefully, from chapter 2 onwards, you should be able to follow along in the Jupyter notebook very easily.

In the latest release, I added a footnote saying "The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction data into a single Pandas dataframe."
Perhaps that's not clear enough, though: I think I will change this to explicitly tell readers that if they want to run the code, they should do so in the Jupyter notebook which contains all the boring details (this is strongly suggested in the preface, but I know not everyone reads the preface, I certainly don't).

What do you think?

Other Answers:

As @pprivulet pointed out (thanks!), the function is defined in the notebook. I left some code out of the book when there was really nothing interesting or machine learning specific to it. Things like plotting an image, etc. If you get stuck at any point, check out the corresponding notebook, and don't hesitate to ping me, I'll be happy to help.


01_the_machine_learning_landscape.ipynb: "def prepare_country_stats(oecd_bli, gdp_per_capita):\n",

The function is defined in 01_the_machine_learning_landscape.ipynb
Good luck

I replaced the footnote with this: "The prepare_country_stats() function's definition is not shown here (see this chapter's Jupyter notebook if you want all the gory details). It's just boring Pandas code that joins the life satisfaction data from the OECD with the GDP per capita data from the IMF."

I also updated the notebook to make the example 1-1's code stand out at the beginning, and I added the prepare_country_stats() function from my previous comment.

Thanks everyone for your very useful feedback! Hopefully, the book will get better and better. :)

More Issues: