OpenMl Feedback
To contact
Name | Paper | Done | Where | ||
---|---|---|---|---|---|
Damian | y | ||||
Irene | y | SURF | |||
Aparna | y | Ireland | |||
Guillaume | y | Paris, probabl | |||
Ihsan | y | Edinburg | |||
Sebastian | y | Paris, sklearn | |||
Binh P.Nguyen | binh.p.nguyen@vuw.ac.nz | Challenges and opportunities of generative models on tabular data | New zealand | ||
Hariharan Manikandan | hmanikan@cs.cmu.edu | Language models are weak learners | CMU | ||
Jacobus G. M. van der Linden | J.G.M.vanderLinden@tudelft.nl | Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance | Tu delft | https://arxiv.org/pdf/2409.12788 | |
Ning Zhang, Ruidong Wang | drningzhang@global.tencent.com , ruidwang@global.tencent.com | Class-aware and Augmentation-free Contrastive Learning from Label Proportion | Tencent Amsterdam | https://arxiv.org/pdf/2408.06743 | |
Dan Stowell | elodie.briefer@bio.ku.dk | Sound evidence for biodiversity monitoring | Tilburg | ||
Ben van Werkhoven | b.van.werkhoven@liacs.leidenuniv.nl | FAIR Sharing of Data in Autotuning Research | Leiden | ||
Email Templates
Idea 1 - People on Openml Slack
Hello, this is Subhaditya, a research engineer from OpenML. We are doing a small user study for version 2.0 of OpenML python and would love to hear about your experience in using OpenML.
- Would you be open to a chat?
- if its in NL, then we can meet over a coffee
- Hello <>,This is Subhaditya, a developer at OpenML. It's nice to e-meet you :)
- I saw that you recently used OpenML for a paper, and would love to hear about your experience in using OpenML. We are currently doing a user study of sorts for version 2.0 of OpenML python, and any views are welcome.
- Would you be open for a chat sometime next week?
Idea 2 - People Who Dont Know Openml
Hello <name>,
I’m Subhaditya, a researcher at TU Eindhoven working on an NWO project about open science in machine learning. It’s centered around OpenML, a free and open platform for sharing datasets and benchmarks (maybe you’ve heard about it?). I’m exploring how to make it easier to use and help accelerate scientific research.
I’d love to talk to you about your experiences in storing research data and how we can help researchers across the Netherlands.
Would you or someone in your lab be open to a brief call sometime? I made a calendar where you can pick any time that suits you. If you’re in the Netherlands, I’d also be happy to meet in person over coffee.
Idea 3
- local /Eindhoven
- NWO project
this is Subhaditya, a research engineer from OpenML. We are a platform for collaborative open science that has features for easily and reproducibly storing datasets and benchmarks. We are a team based in TU Eindhoven, maybe you have heard of us before?
We would very much like to hear about your experiences with storing research data and if there is some way we can help.
Our goal is to make open science more accessible for the research community, especially in the context of broader ML research. All the data is freely available (both to store and access) and can be accessed from the web or via programmatic interfaces. You can use OpenML to share your data and experiments. We find that a lot of our users tend to use this in their research papers as well.
Perhaps this might be useful to your lab?
We are currently working on a new version of our API and trying to make it more user friendly for researchers. To that end, would you or someone in your lab be open to a brief chat sometime next week? If you’re in the Netherlands, I’d be happy to meet in person over some coffee. If not, we can set up a short call as well.
Damian
- dataset size
- gui interface does not really work for it
- bitbucket
- dataset class
- evaluation datasets
- automl
- company - pdf export, excel export
- monetization (probably not but yeah)
- academia - model download
- information - metrics, performance metrics, how long it take to run
- what to do next from here?
- data distribution is narrow for example
- how to optimize diff metrics
- run
autoattack
model : adversarial robustness guarantees - what can be derived from the data
- similar datasets
- acting like a proxy
- company - pdf export, excel export
- how can you help people with all the knowledge in OpenML
Irene
- Dataset upload: This process seems very straightforward, so overall it is enough to use.
- def_tar_att and ‘Ignore attribute’: Most users will likely know what to fill in there, but some people getting started with OpenML and machine learning in general may have a harder time. Perhaps it’d be good to add a link to this page or a short description?
- Collection date: I am not sure if you are looking for some uniformity there, but it would be nice to add instructions on the date/month/year format you prefer.
- Auto-ML reports: Since I won’t be an end user, it’s a bit difficult for me to provide feedback here. I am looping in my colleague @Yue Zhao from the SURF HPML Team, who joined our meeting in Eindhoven and who may provide feedback herself or forward this request to someone else within the HPML team.
Aparna
- part time lecturer business Tue-dublin + phd
- hasnt super used it before
- keel
- randomized dataset
- different variants of datasets
- hard to understand which is original
- keep track of results manually
- ran small scale only and openml feels large scale
- found openml because of datasets
- unique datasets
- feels like itll do everything automatically
- phd
- metalearning - feature selection recommendation
- ~100 datasets
- auto extraction of target variables
- first and last col : column
- images features - issues (like mnist)
- metalearning - feature selection recommendation
Guillaume
- probabl
- we met during the hackathon as well
- scikit learn
- parquet not done
- fetch from openml - merged
- scrub - data preprocessing
- use files from openML
- logs about files
- away from simple csvs
- more than one single file
- upload several sources of data to openml and connect
- difficult part - harder data
- real life datasets
- whats there now is well curated
- time series data - not so easy to find
- so try to generate synthetic since its hard to find
Ihsan Ullah
- Research Software Engineer at ChaLearn. The projects I work on span around Machine Learning, Competitions, Software development, Academic writing etc
- Him and his team were very happy with the OpenML team and support
- Uses codabench
- mainly focused on competitions
- on Codabench datasets/experiments/tasks are not visible like in OpenML but I think OpenML is the best when we want to find a specific item (runs/tasks/datasets etc)
- Problems with OpenML
- when you search Meta_Album you get nothing but with Meta Album you get the correct datasets (Personal note. This really is true :/ I just checked)
- Yes, I had a bad experience when uploading through the UI, but the python API works really well. And the support from OpenML is also great.
- Difficult to understand what OpenML does
- add some videos, diagrams etc (Personal note - Bring back the cute robot!!)
- UI Upload was suuper bad it seems
- Modernize the UI (Personal note - Probably done with the new version)
Sebastian Fisher
- hardest to use
- weird error messages
- mark a wrong type col : evaluation engine runs and ds stays unprocessed - no error messages
- as long as you follow the main path
- so many features but not everything up to same standard
- survival task - doesnt really work
- weird error messages
- benchmark datasets and stuff are indistinguishable , task collections are just confusing (for neurips)
- edit the description
- researcher doesnt need the automl report
- but still useful
- upload datasets where the point is not to run on it
General
- User page