SolvedDataFrames.jl Handling of strings for column indexing

Something @StefanKarpinski pointed out on Slack, and I was not aware of is that we could overload getproperty and setproperty to accept a string as an agrument, so things like:

df."first column"

and

for n in names(df)
    df."$n" = some_vector
end

would work.

@nalimilan - do you think we want to allow this?

51 Answers

✔️Accepted Answer

but we are aspiring towards that, no?

I do not want to say "no", but at least for now I do not see how to achieve this.

The current design is the following:

  1. we are type unstable
  2. we have all the benefits of type instability - we can add/remove columns, we can change column types, we can change column names, we can have thousands of heterogeneous columns without huge compilation cost (even recently we changed some bits of code to be type unstable, as otherwise CSV.jl was very slow when saving files)
  3. all exposed methods are type stable internally - i.e. they process things fast and only input and output is type unstable - roughly there is at most one dynamic dispatch per column processed (unless we explicitly want to be unstable or we have forgotten to fix something); in particular purposefully by default we drop column names when processing data to avoid constant recompilation even in type-stable branches (as passing around column names would trigger recompilation each time names change)
  4. if you want type stability for your own methods then call Tables.columntable or Tables.namedtupleiterator to have a no-copy type-stable object (or Tables.rowtable - at the cost of performing a copy)
  5. Hopefully one day Julia will be able to cache compiled functions better than it does now between sessions, so given the points 1-4 DataFrames.jl will have a very fast lading and response time (as many things will already be compiled in cache)
  6. There are other packages in the ecosystem that focus on type stability, so if despite points 1-5 one still needs type stability there are other options

Other Answers:

I may be late to this, but a point of interest: I thought Julia tended to embrace the "don't offer too many ways to do the same thing" design philosophy (e.g. using " for strings, and not allowing multiple options like ' and " like Python)? Or am I making that up?

I will confess (particularly if there's a 2x performance advantage) I'm inclined to restricting to the current Option 1. This is more flexible, but also a potential invitation for confusion among new users?

I suspect that if we go to option 3, then over time Strings will become what everyone uses (similar to pandas / R), people will basically forget about the symbol functionality most of the time, and people will end up with column accesses that are 2x slower than they could be...). But we'll still have to support two use cases instead of one.

EDIT: Typo.
EDIT 2: Add note about eventual shift in use tendencies.

I am still fighting with myself what is better.

@kleinschmidt - what do you think about this PR in the context of StatsModels.jl and formula interface (which now requires Symbols).

The alternative is to stick to Option 1 + recommend Wrangling.jl (it is more powerful than what we would have anyway) in the manual if someone wants more flexibility (possibly we would change rename! and unstack API for convenience).

Now the report from the field (I have started preparing for the PR): almost all functions would need some minor update with this change (this is not super bad, but just shows how big this PR is).

Maybe let me also ask - is there someone who "strongly" wants strings accepted? (just to hear the other side, as it seems that most people do not have a strong preference but only mild one in one way or the other).

I am sorry for possible bikeshedding here, but this is a fundamental design decision for me that will have very long lasting consequences.

I'm a fairly new Julia user, and just to share my experience, for the first week or two I was definitely a bit confused between Symbol("col_1") and :col_1. I was also a little thrown off by why Symbol("col 1") worked but not :col 1. I obviously figured these things out, but for someone just starting Julia, and if the assumption is that many new Julia users are coming from Python like myself, I think switching to string indexing instead of symbol would be great.

So this exactly was my thinking, but I was afraid that just switching from symbols to strings (and dropping symbols) seems as "too breaking" even for 2.0, and would require a really serious consideration (to avoid problems like Python had moving from 2 to 3).

Some additional comments for a decision:

  • string lookup vs symbol lookup is ~ 2.5x slower (for typical column name length); this is not hugely problematic, but still I wanted to note this
  • we should consider other tabular data formats (and Tables.jl in general), where Tables.jl strictly assumes that column names should be Symbols

Essentially our current design is:

  • fast to lookup, and consistent with Tables.jl
  • at the cost that if the user wants to work with strings one needs to use Symbol and string for conversions in both ways.

And the additional question is in what cases this is really problematic (i.e. worth considering to change) given that:

  • in rename we already handle strings
  • we allow using regex for column name matching

(as maybe it is enough to just add 1-2 convenience functions to cover 90% of use cases where strings are needed)

Related Issues:

6
DataFrames.jl Handling of strings for column indexing
but we are aspiring towards that no? I do not want to say no but at least for now I do not see how t...
246
sheetjs How to simply export a Worksheet to xlsx?
There are two issues: each object is mapped to a row so if you want a row with name John and city Se...
79
sheetjs Doesn't work with browserify or webpack.
I was able to get it building by adding the following to my webpack config: EDIT: please raise a new...
45
sheetjs Corrupt XLSX file after downloading
Ok Sorry for the monologue but I solved it: In the frontend when making the GET Request ...
44
sheetjs Change header´s title when using json_to_sheet
There's no json_to_xlsx function you probably mean json_to_sheet That being said the easiest way is ...
27
sheetjs how to set the automatic width?
@cjlhll Please see my solution get maximum width from the json data set column width ...
25
metabase Question mark in SQL query (postgres JSON operator) interpreted as prepared statement param
@agilliland ? is the Postgres JSONB operator to check whether an object contains a given key ...
21
metabase Connect to MongoDB error: Connection to ... successful, but could not connect to DB.
I was having the same issue and this solved it: I'm running a docker image of metabase and a separat...
21
pandas datareader data_source='yahoo': reading data fails since July 01
Hello all Since 2021-07-01 reading data from Yahoo fails with a nondescript error message ...
21
react query How to use useInfiniteQuery with custom props
The issue here is that you are not mapping up your query key to your query function's arguments prop...
20
sheetjs merge cells from Array of Arrays
You need to build up the worksheet first then add the merges to the worksheet Hi ...
20
metabase Unable to connect to MongoDB Atlas Cluster
Receiving the exact same error message: com.mongodb.MongoTimeoutException: Timed out after 3000 ms w...
17
browser compat data Breaking changes: scoping the npm package and dropping Node.js 8 support (RFC for BCD 2.0)
I have now merged #7155 completing the process of renaming the package to @mdn/browser-compat-data a...
17
metabase Add a template tag to use already defined queries
It would be great if we could pass variables to nested questions: At its simplest ...
16
react query Array of queries hook
@tannerlinsley yeah Problem: I have a use case were a component would ideally consume a dynamic numb...
15
sheetjs format Date question
On the read side you also need to pass cellText:false: In the conversion you shouldn't set raw: raw:...
14
sheetjs Date conversion loses 1 day
My date in excel is 2019-03-04 and finally i got 2019-03-03T15:59:17.000Z when XLSX.utils.sheet_to_j...
14
react query Unable to type useQueries options or results without casting
Hey @matthewdavidrodgers! As @TkDodo has already indicated there's a PR open which looks to improve ...
12
react query Thoughts on mutate function not handling rejected promises
I came to this issue because I was following the docs and my try/catch wasn't catching even though m...
11
sheetjs Get the header column from excel
@kalai7890 If you just want to pull the header row the easiest way is to just walk the cells in the ...
9
metabase Connections can not be acquired from the underlying database!
had this issue yesterday when upgrading from 0.30.* to 0.32.8 and fixed it by adding trustServerCert...
9
react query "No QueryClient set" when upgrading from 3.12.0 to 3.12.1
We are experiencing this issue with ReactQueryDevtools in v3.13.3 We are using Vite Downgrading to v...
7
metabase Hide x-rays on homepage (alternatively: make it possible to select which database is shown)
I would like to be able to hide/disable x-rays for a different reason - they may result in expensive...
7
pandas datareader _get_response without headers doesn't work (at least with 'yahoo' source
No change to pandas_datareader code is required to fix I put in base.py: ...
5
rest hooks Request for more examples: Adding authentication header to every request
I had the same problem Here is another solution specific to cases where the session is stored in red...
5
data populator Image placeholders are broken with Sketch 47
Looks like image placeholders are not getting updated due to this method that has been deprecated in...
4
tabulator How to change the color of row ?
You can do this by adjusting the code for the customFormatter that you are applying to cells in your...
4
pandas datareader AlphaVantage raises "Please input a valid date range" for previously working code
This is the error message I see So I drill into the library file and see what when wrong ...
4
react query Observables support
@Johannes5 I work around this by creating a wrapper around useQuery Similar to: https://github.com/L...
3
browser compat data Safari does not support scroll-margin
Current summary: it looks like scroll-margin can be used with or without scroll-snap-type If used wi...
3
tabulator Cannot edit cells from mobile devices
@adryx92 I had the same issue on Chrome for Android (v 71) and latest tabulator but option autoResiz...
3
tabulator Code Example for Updating Edited Cell on the Server?
Hi Oli Thank you for your response I can see you've thought things through This is a quality piece o...
4
datasets load_dataset for text files not working
I found a way to implement it without third party lib and without separator/delimiter logic Creating...
3
label studio Encoding
@makseq I installed Label Studio with your realease/0.7.4 branch with the command : pip install --up...
3831
axios Axios catch error returns javascript error not server response
I have exactly the same environment Try this: Modify from console.log(error) to console.log(error.re...
731
scrapy ' error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 '
@euler16 for scrapy with Python 3 you'll need with Python 2 you'll need I wanted to install scrapy i...
684
laradock Mysql. The server requested authentication method unknown to the client [caching_sha2_password]
alter user 'username'@'localhost' identified with mysql_native_password by 'password'; would fix it....
517
react navigation screenIsActive prop / componentDidFocus event for TabNavigator items
It probably makes sense to add lifecycle hooks to screens In one of my Tabs i need to load Data from...
474
meteor [1.4.2.1] Error: ENFILE: file table overflow
I was getting the same after an upgrade to macOS Sierra Turns out macOS have a harsh limit on number...
423
ipython Last jedi release (0.18.0) is incompatible with ipython (7.19 and 7.18 tested); reason - column arg was deprecated, and now removed
As a temporary fix for anyone just trying to get things working again: It would be really nice if yo...
397
material ui Module not found: Can't resolve 'material-ui-icons/Menu' Martial Next
For anyone else experiencing this issue: npm install @material-ui/icons https://www.npmjs.com/packag...
395
laravel dompdf (1/1) ErrorException Non-static method Barryvdh\DomPDF\PDF::loadView() should not be called statically
This happens because you are namespacing the wrong PDF class You are namespacing Barryvdh\DomPDF\PDF...
378
webpacker localIdentName option moved in css-loader configuration
I faced same issue after upading css-loader but I solved it If you check css-loader readme ...
364
react navigation Send data back from child screen?
@itswaze You can do something along these lines to pass back from the child screen ...
358
react navigation How to goBack from nested StackNavigator?
@dhruvparmar372 According to the NOTE in the doc a navigator's navigation prop may not have the help...
352
axios POST request works in Browser but not on Node
This might be considered a duplicate of #789 I was able to use the form-data package with Axios in n...
317
react navigation Best pattern for a 'Save' button in the header
Try setting your component instance's handleSave function as a navigation state parameter after the ...
310
DefinitelyTyped [@types/react] RefObject.current should no longer be readonly
It's not It'a intentionally left readonly to ensure correct usage even if it's not frozen ...
306
react native navigation [V3][Android] FATAL EXCEPTION: create_react_context
OK after a good night of sleep I've found why I was having this issue In the MainApplication.java I ...