The corpus includes the samples along with their corresponding transcription and the date of appearance on social media or digital platforms. In cases where the original spelling deviates from current standards, a normalized transcription is also provided, prepared in accordance with the prevailing rules of Spanish writing.
Each sample in the corpus has a unique identifier code, which allows classification according to several criteria:
- Original format of the sample (1 = image; 2 = video; 3 = GIF; 4 = text; 5 = multi-format; 0 = other);
- Source social network or platform (1 = X/Twitter; 2 = Facebook; 3 = Instagram; 4 = YouTube; 5 = TikTok; 6 = Reddit; 7 = WhatsApp; 8 = Bluesky; 9 = Tumblr; 0 = other);
- Origin (ES = Spain; MX = Mexico; AR = Argentina, etc.) of the account/page from which the sample is obtained;
- Unique identifier within the corpus. For example, the code 00027_ES_MEME_1_2 corresponds to sample number 27 from Humnet, classified as a meme, in image format (1), published on Facebook (2), and originating from Spain.