There are three kinds of data that AI platforms need for training, particularly generative systems such as large language models used by companies like OpenAI. These include publicly available information; images, sound recordings, videos, logos, or text that are protected by copyright, image rights, or other related intellectual property regimes; and information that may have previously been protected as IP but, due to the lapse of the protection period, is now freely available for use.
