By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Security Parrot - Cyber Security News, Insights and ReviewsSecurity Parrot - Cyber Security News, Insights and Reviews
Notification
Latest News
OpenAI may use Associated Press archive for AI training
July 14, 2023
EU users can hold conversations with Google Bard from training set
July 14, 2023
Aptos, the new default font for Microsoft Office
July 14, 2023
BlackLotus UEFI bootkit sources published on GitHub
July 14, 2023
Hackers from the XDSpy cyber-espionage group attacked Russian organizations on behalf of the Ministry of Emergency Situations
July 14, 2023
Aa
  • News
  • Tutorials
  • Security InsiderComing Soon
  • Expert InsightComing Soon
Reading: Sophos and ReversingLabs Introduce a Database with 20 Million Data for Information Security Researchers
Share
Security Parrot - Cyber Security News, Insights and ReviewsSecurity Parrot - Cyber Security News, Insights and Reviews
Aa
Search
  • News
  • Tutorials
  • Security InsiderComing Soon
  • Expert InsightComing Soon
Follow US
Security Parrot - Cyber Security News, Insights and Reviews > News > Sophos and ReversingLabs Introduce a Database with 20 Million Data for Information Security Researchers
sophos
News

Sophos and ReversingLabs Introduce a Database with 20 Million Data for Information Security Researchers

Last updated: 2020/12/15 at 6:15 PM
Jim Koohyar Biniyaz Published December 15, 2020
Share
sophos
SHARE

Information security companies Sophos and ReversingLabs announced the release of the SoReL-20M database , which consists of 20 million Windows Portable Executable files. Of these, 10 million files are malware images.

The database, designed to improve the information security industry, provides metadata, labels and functions for files, and also allows interested parties to download available malware samples for further research. A publicly available dataset containing carefully selected samples and relevant metadata is expected to help accelerate research into the use of machine learning for malware detection.

Although machine learning models are built on data, there is no standard large-scale database in the field of information security, which can be easily accessed by everyone, from independent researchers to information security laboratories and corporations. According to Sophos experts, the lack of such a database impeded the development of the information security sector.

“Collecting large numbers of carefully selected, labeled samples is costly and complex, and sharing datasets is often complicated by intellectual property issues and the risk of exposing malware to unknown third parties. As a result, most malware detection research uses proprietary internal datasets, so the results cannot be compared, ”Sophos said.

The industrial-scale SoReL-20M database, covering 20 million samples, including 10 million malware cleaned, is designed to solve this problem. For each sample, the database contains functions extracted from the EMBER 2.0 dataset, labels, detection metadata, and complete malware binaries.

It also provides PyTorch and LightGBM machine learning models trained using this data, as well as scripts for loading and iterating the data, and scripts for training and testing the models.

Sophos accepts the possibility that experienced hackers will be able to use the database to their advantage and create tools to carry out cyberattacks. However, according to experts, there are currently many other sources that attackers can use to gain access to information about malware.

Weekly Updates For Our Loyal Readers!

Jim Koohyar Biniyaz December 15, 2020
Share this Article
Facebook Twitter Email Copy Link Print

Archives

  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • February 2023
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020

You Might Also Like

News

OpenAI may use Associated Press archive for AI training

July 14, 2023
News

EU users can hold conversations with Google Bard from training set

July 14, 2023
News

Aptos, the new default font for Microsoft Office

July 14, 2023
News

BlackLotus UEFI bootkit sources published on GitHub

July 14, 2023

© 2022 Parrot Media Network. All Rights Reserved.

  • Home
  • Parrot Media Group
  • Privacy Policy
  • Terms and Conditions
Join Us!

Subscribe to our newsletter and never miss our latest news, podcasts etc..

Zero spam, Unsubscribe at any time.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?