'Using Windows Batch Files to split large "Wrapped" files

I need a method to split files into multiple (or even half) based on KB not on number of lines.

I am a Senior EDI Analyst and wrapped data tends to show up as one single long line. Every "solution" I find splits based on number of lines. I need something that will split based on size.

The end-goal is to "Unwrap" this data, meaning each segment will be on its own line. To do this I need to change the delimiters (as there are "special characters" as delimiters).

I do have a solution for that (see below), but for some reason this will not work on files larger than 10 KB. If you know anything about EDI, that's not very big.

I need to find a solution to split files into smaller files of about 5KB each (then I can use the string replacement and re-combine them myself).

Does anyone have an idea of how I might accomplish this with one, huge line?

(Sorry I have to remove the code I placed here only as AN EXAMPLE because someone flagged this as a duplicate WITHOUT READING IT. Please read above and advise.)



Solution 1:[1]

The reason you cannot process files > 10k byte is because batch variables (and command lines) are limited to ~8191 bytes.

You are attacking the problem in an inefficient way. Rather than look for a way to split a file into chunks so that you can use your slow batch "solution", you should be looking for a tool that allows you to work with the large files directly, without resorting to splitting, processing, and re-assembly.

As others have stated, PowerShell, JavaScript, and VBS are all good scripting languages that can solve your problem, and they are native to Windows.

If your files are all less than 1 gigabyte in length, then I suggest you try JREPL.BAT - a regex text processing utility. It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward - no 3rd party exe file required. Full documentation is available from the command line via jrepl /?, or jrepl /?? for paged help.

To Unwrap a file, translating | into *\r\n (\r is carriage return, and \n a newline):

jrepl "|" "*\r\n" /l /m /x /f "wrappedFileName" /o "unwrappedFileName"

To wrap a file (reverse the process)

jrepl "*\r\n" "|" /l /m /x /f "unwrappedFileName" /o "wrappedFileName"

If you put either command within a batch script, then you must use call jrepl instead of jrepl. This is because JREPL is also a batch script, so control will not return to your script unless you use CALL.

Solution 2:[2]

Although your description is extensive, there are multiple points that are not clear. There are too many unrelated details that just deviates from the core point of the problem. If each segment in the line is separated by a | delimiter (you did not explained this point, but it is assumed from the example code) and you want to split the file based on a certain KB size (you did not specified how many KB), then a segment may be splitted in two different files. Also, I don't understand how changing the | delimiters by asterisks may help to solve the problem. After read this question several times, I assumed that the problem is this:

"Split a file that just contain a very long line (with not a single CR+LF pair) into segments delimited by | character, so each segment will be on its own line".

The Batch file below is a solution for this problem:

@echo off
setlocal EnableDelayedExpansion

call :ProcessFile  < input.txt  > output.txt
goto :EOF


:ProcessFile
set "previous="

:nextChunk
rem Read the next 1023-bytes chunk
set /P "chunk="
if errorlevel 1 goto endOfFile

rem Break segment if previous one ends at a chunk limit
if "!chunk:~0,1!" equ "|" if defined previous (
   echo !previous!
   set "previous="
)

rem Extract each segment from the chunk and place it on its own line
set "last="
for /F "delims=" %%a in (^"!chunk:^|^=^
% This line separate segments by the given delimiter %
!^") do (

   if defined last echo !last!
   set "last=!previous!%%a"
   set "previous="

)
set "previous=!last!"
goto nextChunk

:endOfFile
rem Show the last segment
if defined previous echo !previous!

exit /B

EDIT: JScript solution added

As others have mentioned, you may also use a solution based on JScript, that is a standard programming language preinstalled in all Windows versions from XP on. In this way, the solution is really simple, because you just need to insert the following two lines in your Batch file:

echo WScript.Stdout.Write(WScript.Stdin.ReadAll().replace(/\^|/g,"\r\n")) > replace.js
cscript //nologo replace.js  < input.txt  > output.txt

This is a very simple, but powerful method that you may use in other similar replace operations; just read the corresponding documentation.

Solution 3:[3]

Split file into 5kB chunks:

set file="x.edb"
set max=5000

REM Findstr line limit 8k
REM Workaround: wrap in an archive to generate CRLF pairs for chunks > 8kB

for %i in (%file%) do (
set /a num=%~zi/%max% >nul      &REM No. of chunks
set /a last=%~zi%%max% >nul     &REM size of last chunk
if %last%==0 set /a num=num-1       &REM ove zero byte chunk
set size=%~zi
)

ren %file% %file%.0

for /l %i in (1 1 %num%) do (
set /a s1=%i*%max% >nul
set /a s2="(%i+1)*%max%" >nul
set /a prev=%i-1 >nul

echo Writing %file%.%i
type %file%.!prev! | (
  (for /l %j in (1 1 %max%) do pause)>nul& findstr "^"> %file%.%i)

FSUTIL file seteof %file%.!prev! %max% >nul
)
if not %last%==0 FSUTIL file seteof %file%.%num% %last% >nul
echo Done.

Tested on Win 10

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dbenham
Solution 2
Solution 3 Zimba