A Dataset Exploration Case Examine with Know Your Knowledge


Knowledge underlies a lot of machine studying (ML) analysis and improvement, serving to to construction what a machine studying algorithm learns and the way fashions are evaluated and benchmarked. Nonetheless, information assortment and labeling may be difficult by unconscious biases, information entry limitations and privateness issues, amongst different challenges. In consequence, machine studying datasets can mirror unfair social biases alongside dimensions of race, gender, age, and extra.

Strategies of inspecting datasets that may floor details about how completely different social teams are represented inside are a key part of making certain improvement of ML fashions and datasets is aligned with our AI Ideas. Such strategies can inform the accountable use of ML datasets and level towards potential mitigations of unfair outcomes. For instance, prior analysis has demonstrated that some object recognition datasets are biased towards photographs sourced from North America and Western Europe, prompting Google’s Crowdsource effort to steadiness out picture representations in different components of the world.

Right this moment, we reveal a few of the performance of a dataset exploration software, Know Your Knowledge (KYD), just lately launched at Google I/O, utilizing the COCO Captions dataset as a case research. Utilizing this software, we discover a vary of gender and age biases in COCO Captions — biases that may be traced to each dataset assortment and annotation practices. KYD is a dataset evaluation software that enhances the rising suite of accountable AI instruments being developed throughout Google and the broader analysis neighborhood. Presently, KYD solely helps evaluation of a small set of picture datasets, however we’re working laborious to make the software accessible past this set.

Introducing Know Your Knowledge
Know Your Knowledge helps ML analysis, product and compliance groups perceive datasets, with the purpose of enhancing information high quality, and thus serving to to mitigate equity and bias points. KYD provides a variety of options that enable customers to discover and look at machine studying datasets — customers can filter, group, and research correlations based mostly on annotations already current in a given dataset. KYD additionally presents mechanically computed labels from Google’s Cloud Imaginative and prescient API, offering customers with a easy strategy to discover their information based mostly on alerts that weren’t initially current within the dataset.

A KYD Case Examine
As a case research, we discover a few of these options utilizing the COCO Captions dataset, a picture dataset that comprises 5 human-generated captions for every of over 300k photographs. Given the wealthy annotations supplied by free-form textual content, we focus our evaluation on alerts already current throughout the dataset.

Exploring Gender Bias
Earlier analysis has demonstrated undesirable gender biases inside laptop imaginative and prescient datasets, together with pornographic imagery of girls and picture label correlations that align with dangerous gender stereotypes. We use KYD to discover gender biases inside COCO Captions by inspecting gendered correlations throughout the picture captions. We discover a gender bias within the depiction of various actions throughout the photographs within the dataset, in addition to biases regarding how folks of various genders are described by annotators.

The primary a part of our evaluation aimed to floor gender biases with respect to completely different actions depicted within the dataset. We examined photographs captioned with phrases describing completely different actions and analyzed their relation to gendered caption phrases, akin to “man” or “girl”. The KYD Relations tab makes it simple to look at the relation between two completely different alerts in a dataset by visualizing the extent to which two alerts co-occur extra (or much less) than could be anticipated by likelihood. Every cell signifies both a optimistic (blue coloration) or detrimental (orange coloration) correlation between two particular sign values together with the energy of that correlation.

KYD additionally permits customers to filter rows of a relations desk based mostly on substring matching. Utilizing this performance, we initially probed for caption phrases containing “-ing”, as a easy strategy to filter by verbs. We instantly noticed sturdy gendered correlations:

Utilizing KYD to research the connection between any phrase and gendered phrases. Every cell reveals if the 2 respective phrases co-occur in the identical caption extra (up arrow) or much less typically (down arrow) than pure likelihood.

Digging additional into these correlations, we discovered that a number of actions stereotypically related to girls, akin to “procuring” and “cooking”, co-occur with photographs captioned with “girls” or “girl” at a better charge than with photographs captioned with “males” or “man”. In distinction captions describing many bodily intensive actions, akin to “skateboarding”, “browsing”, and “snowboarding”, co-occur with photographs captioned with “man” or “males” at greater charges.

Whereas particular person picture captions could not use stereotypical or derogatory language, akin to with the instance under, if sure gender teams are over (or underneath) represented inside a selected exercise throughout the entire dataset, fashions developed from the dataset threat studying stereotypical associations. KYD makes it simple to floor, quantify, and make plans to mitigate this threat.

A picture with one of many captions: “Two girls cooking in a beige and white kitchen.” Picture licensed underneath CC-BY 2.0.

Along with inspecting biases with respect to the social teams depicted with completely different actions, we additionally explored biases in how annotators described the looks of individuals they perceived as male or feminine. Impressed by media students who’ve examined the “male gaze” embedded in different types of visible media, we examined the frequency with which people perceived as girls in COCO are described utilizing adjectives that place them as an object of need. KYD allowed us to simply look at co-occurrences between phrases related to binary gender (e.g. “feminine/woman/girl” vs. “male/man/boy”) and phrases related to evaluating bodily attractiveness. Importantly, these are captions written by human annotators, who’re making subjective assessments concerning the gender of individuals within the picture and selecting a descriptor for attractiveness. We see that the phrases “enticing”, “lovely”, “fairly”, and “attractive” are overrepresented in describing folks perceived as girls as in comparison with these perceived as males, confirming what prior work has stated about how gender is seen in visible media.

A screenshot from KYD exhibiting the connection between phrases that describe attractiveness and gendered phrases. For instance, “enticing” and “male/man/boy” co-occur 12 occasions, however we anticipate ~60 occasions by likelihood (the ratio is 0.2x). Alternatively, “enticing” and “feminine/girl/woman” co-occur 2.62 occasions greater than likelihood.

KYD additionally permits us to manually examine photographs for every relation by clicking on the relation in query. For instance, we can see photographs whose captions embrace feminine phrases (e.g. “girl”) and the phrase “lovely”.

Exploring Age Bias
Adults older than 65 have been proven to be underrepresented in datasets relative to their presence within the normal inhabitants — a primary step towards enhancing age illustration is to permit builders to evaluate it of their datasets. By caption phrases describing completely different actions and analyzing their relation to caption phrases describing age, KYD helped us to evaluate the vary of instance captions depicting older adults. Having instance captions of adults in a variety of environments and actions is vital for quite a lot of duties, akin to picture captioning or pedestrian detection.

The primary pattern that KYD made clear is how not often annotators described folks as older adults in captions detailing completely different actions. The relations tab additionally reveals a pattern whereby “aged”, “previous”, and “older” have a tendency to not happen with verbs that describe quite a lot of bodily actions that could be vital for a system to have the ability to detect. Necessary to notice is that, relative to “younger”, “previous” is extra typically used to explain issues aside from folks, akin to belongings or clothes, so these relations are additionally capturing some makes use of that don’t describe folks.

The connection between phrases related to age and motion from a screenshot of KYD.

The underrepresentation of captions containing the references to older adults that we examined right here may very well be rooted in a relative lack of photographs depicting older adults in addition to in an inclination for annotators to omit older age-related phrases when describing folks in photographs. Whereas guide inspection of the intersection of “previous” and “working” reveals a detrimental relation, we discover that it reveals no older folks and a variety of locomotives. KYD makes it simple to quantitatively and qualitatively examine relations to determine dataset strengths and areas for enchancment.

Understanding the contents of ML datasets is a crucial first step to growing appropriate methods to mitigate the downstream affect of unfair dataset bias. The above evaluation factors in the direction of a number of potential mitigations. For instance, correlations between sure actions and social teams, which might lead skilled fashions to breed social stereotypes, may be probably mitigated by “dataset balancing” — growing the illustration of under-represented group/exercise combos. Nonetheless, mitigations targeted completely on dataset balancing aren’t adequate, as our evaluation of how completely different genders are described by annotators demonstrated. We discovered annotators’ subjective judgements of individuals portrayed in photographs had been mirrored throughout the remaining dataset, suggesting a deeper have a look at strategies of picture annotations are wanted. One answer for information practitioners who’re growing picture captioning datasets is to think about integrating pointers which were developed for writing picture descriptions which can be delicate to race, gender, and different id classes.

The above case research spotlight solely some of the KYD options. For instance, Cloud Imaginative and prescient API alerts are additionally built-in into KYD and can be utilized to deduce alerts that annotators have not labeled immediately. We encourage the broader ML neighborhood to carry out their very own KYD case research and share their findings.

KYD enhances different dataset evaluation instruments being developed throughout the ML neighborhood, together with Google’s rising Accountable AI toolkit. We stay up for ML practitioners utilizing KYD to higher perceive their datasets and mitigate potential bias and equity issues. If in case you have suggestions on KYD, please write to knowyourdata-feedback@google.com.

The evaluation and write-up on this put up had been carried out with equal contribution by Emily Denton, Mark Díaz, and Alex Hanna. We thank Marie Pellat, Ludovic Peran, Daniel Smilkov, Nikhil Thorat and Tsung-Yi for his or her contributions to and evaluations of this put up.


Please enter your comment!
Please enter your name here