7z on array

Python library to unzip 7z files straight to numpy arrays

View project onGitHub

7z_on_array

Python library to unzip 7z files straight to numpy arrays

this small library has the objective of dealing with .7z files without the need to extract them to the HD. It uses the 7zip bash program to unzip files directly into nemory and than convert them to arrays

It is directed to treat the image files that were sent on kaggle competition CIFAR but it may be changed to deal with other zipped files.

On the first test, if I had to decompress and read each file, it would take 2 hours to complete.

Using this library, it took 1 minute.

references and places to find help

http://stackoverflow.com/questions/646286/python-pil-how-to-write-png-image-to-string https://docs.python.org/2/library/stringio.html http://stackoverflow.com/questions/11552926/how-to-read-raw-png-from-an-array-in-python-opencv/17547525#17547525 http://stackoverflow.com/questions/25186591/having-cv2-imread-reading-images-from-file-objects-or-memory-stream-like-data-h http://superuser.com/questions/548349/how-can-i-install-7zip-so-i-can-run-it-from-terminal-on-os-x

How to use

Just copy the directory to the same directory where your original code is.

Then:

    import 7z_on_array

    file_name = '~/Machine Learning/Kaggle/CIFAR/train.7z'
    f_info = get_files_info(file_name)
    raw_data = uncompress_file(file_name)
    data = get_files_array(f_info, raw_data)