Category "large-data"

Best way to compare two large files on multiple columns

I am working on a feature which will allow users to upload two csv files, write the rules to compare the rows and output a result into a file. Both files can ha

How to chunk and generate pdf using dompdf in codeigniter for large data set

I have to generate a pdf file for very large data set (more than 1M). Can anyone explain how to chunk those data set into smaller units and download all data in

How to feed in a list of numpy arrays into a TensorFlow model?

I have a large list of numpy arrays that I want to feed into a TensorFlow model. I can not concatenate the lists into one due to RAM memory issues. Below, I hav

How to split a vcf.gz file based on the first column, keeping the header in each subset and save back to vcf.gz files

I have a large vcf.gz file (40GB) that I have to split to be able to load into R and run a script on each of the subset. I want to split it by the first column

Scalable/Iterative Large Data Frame Dimensionality Reduction R

I often have truly large data frames (ie 10 to 40 columns, millions to hundreds of millions of rows) that I would like to perform dimensionality reduction on in

Errors due to vowpal wabbit's dependencies on boost library

I'm trying real hard to install vowpal wobbit and it fails when i run the make file, throwing: cd library; make; cd .. g++ -g -o ezexample temp2.cc -

Powershell large hex range

I am trying to make a range of hex from a000000000-afffffffff, but the values are too large for Int32. Getting error Cannot convert value "687194767360" to type