Fair question. I didn't personally supervise our last intern, it was my turn the summer before, so I'm not as deeply familiar with it. Now that you bring this up though, I think perhaps I may have misspoken. When I said muti-label, I think that was our goal originally, but because of the constraints of Snorkel you mentioned, we ended up reframing the problem into many single class models instead. They would both work, but because of how our business users worked, multi-label wasn't super important. For example, not all business users are interested in every label, so I think what happened was more than one model was trained, one for each label, and then ensembled based on the business users interests. Our final output allowed users to effectively sort, filter, and search documents based on any combination of these labels. Keep in mind too, some of these labels are fairly abstract, so just one of them was fairly powerful by itself and could perhaps power an entire team in some cases. I hope that helps, I'm sorry I can't go into too much more detail.
yeah, you can do single-label w/ snorkel, but not multi-label. Multi-label snorkel would be the killer feature bc making the negatives (ie for a softmax) is very hard especially when you work w/ user-interaction systems with an unknown negative distribution.
You can always do multi-label as a multi-task learning model (or just a set of binary models), which is something we (and many others) have explored before! A lot of the adjustments for mainline Snorkel have to do with (A) the semantics of the labeling functions (need to be able to express that something is not class A and/or have a general per-class prior) and (B) all the infra to support what is just now a bunch of independent per-label binary tasks, at base
Snorkel has a label mutual exclusion assumption right?
My core problem is a multi-label problem, but my snorkel data, from the LabelModel is inherently single-label (mutually exclusive). What is the prevailing recommendation to do multi-label w/ Snorkel? Is the below what you are currently recommending?
For a given, k-wise multi-label problem:
1. Generate k binary datasets w/ LabelModel
2. Train k separate binary classifiers for each respective dataset
3. At inference/prediction time pass input though the k classifiers and get scores.
Is this what the current recommendation is? Create a set of binary classifiers?