'How do I extract data from a string into a 2d array?

Using axios, I am fetching data from a website. Unfortunately,the data fetch is in HTML format. The data fetched is like this:

1 Agartala VEAT 120830Z 23004KT 5000 HZ SCT018 SCT025 34/27 Q1004 NOSIG= 2 Ahmedabad VAAH 120830Z 23008KT 6000 NSC 44/21 Q1001 NOSIG= 3 Allahabad VEAB 120800Z 03006KT 6000 FEW025 39/26 Q0999 NOSIG= 4 Amritsar VIAR 120830Z VRB02KT 2800 DU NSC 42/13 Q1000 BECMG 3000= 5 Bangalore VOBL 120830Z 28014KT 6000 -DZ BKN008 SCT012 OVC080 24/21 Q1009 NOSIG= 6 Baroda VABO 120830Z 20008KT 6000 NSC 41/19 Q1001 NOSIG= 7 Bhaunagar VABV 120830Z 14016KT 5000 DU NSC 39/22 Q1002 NOSIG= 8 Bhopal VABP 120830Z 31010KT 6000 SCT030 43/03 Q1002 NOSIG= 9 Bhubaneswar ...

I have removed the HTML tags using some for loops. The original HTML data is:

...
<tr>
<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">1</font></td>

<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">Agartala   </font></td>

<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">VEAT 120830Z 23004KT 5000 HZ SCT018 SCT025 34/27 Q1004 NOSIG= 
</font></td>

</tr>
...

I want to extract the above data and store it in a 2d array like this: [['1', 'Agartala', 'VEAT 120830Z 23004KT 5000 HZ SCT018 SCT025 34/27 Q1004 NOSIG='], [...], ...]. I have tried extracting the above using simple for loop, but it does not work. This is the function which I have tried:

let extractData = () => {
    
    let str1 = cleanHTML(), count = 0;
    let tempList = [], str2 = '';
    
    for (let i = 0; i < str1.length; i++) {
        if (count == 0) {
            if (str1[i] != ' ') {
                str2 += str1[i];
                
            }
            else {
                //console.log(str2);
                tempList.push(parseInt(str2));
                
                count = 1;
                str2 = '';
            }
        }
        else if (count == 1) {
            //console.log(str1[i]);
            if (str1[i] != ' ') {
                str2 += str1[i];
            }
            else {
                tempList.push(str2);
                count = 2;
                str2 = '';
            }
        }
        else if (count == 2 && i+2<str1.length) {
            if (str1[i] != ' ' && str1[i+1] != ' ' && str1[i+2] != ' ') {
                str2 += str1[i];
            }
            else {
                tempList.push(str2);
                //console.log(tempList);
                count = 0;
                str2 = '';
                dataList.push(tempList);
                tempList = [];
            }
        }
    }

    console.log(dataList);
    
}

But, the result is strange. enter image description here

I have also tried checking for \t, \r and \n instead of a space. But, the result comes out to be different. What am I doing wrong?



Solution 1:[1]

As Chris G suggested, You can achieve that simply by setting the html as innerHTML of a div and then use querySelectorAll() to grab the <font> elements and get their .innerText.

Demo :

const axiosResponse = `<table><tr>
<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">1</font></td>

<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">Agartala   </font></td>

<td align="left" style="padding:3px; border-style:solid; border-width:1px; border- 
collapse:collapse; border-color:#3366aa;"><font style="font-family:verdana; font- 
size:11px; color:#000000;">VEAT 120830Z 23004KT 5000 HZ SCT018 SCT025 34/27 Q1004 NOSIG= 
</font></td>
</tr></table>`;

document.getElementById('showContent').innerHTML = axiosResponse;

const fontHTML = document.getElementById('showContent').querySelectorAll('font');

const res = [];

for (i = 0; i < fontHTML.length; i++) {
    res.push(fontHTML[i].innerText);
}

console.log(res);
<div id="showContent">
</div>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rohìt Jíndal