'polars dataframe TypeError: must be real number, not str

so bascially i changed panda.frame to polars.frame for better speed in yolov5 but when i run the code, it works fine till some point (i dont exactly know when error occurs) and it gives me TypeError: must be real number, not str. running it with panda works great without any errors but only with polars. i know it must be using wrong type of data type but i dont really know where i should look for since i've just started python. so i would really appreciate it if someone could help me with this! thanx for reading and have a nice day!

Traceback (most recent call last):
 File "C:\yolov5\test.py", line 61, in <module>
  boxes = results.polars().xywh[0]
 File "c:\yolov5\.\models\common.py", line 684, in polars
  setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])
 File "c:\yolov5\.\models\common.py", line 684, in <listcomp>
  setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])
 File 
 "C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\frame.py", line 311, in __init__
self._df = sequence_to_pydf(data, columns=columns, orient=orient)
 File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
packages\polars\internals\construction.py", line 495, in 
sequence_to_pydf
data_series = [
File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\construction.py", line 496, in 
 <listcomp>
pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
 File 
"C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
 packages\polars\internals\series.py", line 227, in __init__
self._s = sequence_to_pyseries(name, values, dtype=dtype, 
strict=strict)
 File 
 "C:\Users\jojow\AppData\Local\Programs\Python\Python39\lib\site- 
packages\polars\internals\construction.py", line 239, in 
sequence_to_pyseries
return constructor(name, values, strict)
TypeError: must be real number, not str

heres my code (edited)

import polars as pl 
import pandas as pd

class new:
    xyxy = 0

a = [[[370.01605224609375, 346.4305114746094, 398.3968811035156, 
384.5684814453125, 0.9011853933334351, 0, 'corn'], 
[415.436767578125, 279.4227294921875, 433.930419921875, 
305.5151672363281, 0.8829901814460754, 0, 'corn'], 
[383.8118896484375, 268.781494140625, 402.35479736328125, 
292.4585266113281, 0.8579609394073486, 0, 'corn'], 
[431.42791748046875, 570.9154663085938, 476.672119140625, 600.0, 
0.810459554195404, 0, 'corn'], [414.912841796875, 
257.7676086425781, 427.7708740234375, 274.69635009765625,
0.7384995818138123, 0, 'corn'], [391.22821044921875, 
250.48876953125, 403.9199523925781, 268.1374816894531, 
0.6828912496566772, 0, 'corn'], [414.2362060546875, 
250.18174743652344, 423.82537841796875, 264.02667236328125, 
0.517136812210083, 0, 'corn']]]

ca = 'xmin', 'ymin', 'xmax', 'ymax', 'confidence', 'class', 'name'  # xyxy columns
cb = 'xcenter', 'ycenter', 'width', 'height', 'confidence', 'class', 'name'  # xywh columns

for k, c in zip(['xyxy', 'xyxyn', 'xywh', 'xywhn'], [ca, ca, cb, 
cb]):
    setattr(new, k, [pl.DataFrame(x, columns=c) for x in a])

print (new.xyxy[0])


Solution 1:[1]

Thanks for adding the data. It made solving the problem easy.

What you need to do is add orient="row" to your call to create a DataFrame:

pl.DataFrame(x, columns=c, orient="row")

Once we make the change to your code by adding the orient="row" keyword and re-run it, we get:

shape: (7, 7)
?????????????????????????????????????????????????????????????????????????????????
? xmin       ? ymin       ? xmax       ? ymax       ? confidence ? class ? name ?
? ---        ? ---        ? ---        ? ---        ? ---        ? ---   ? ---  ?
? f64        ? f64        ? f64        ? f64        ? f64        ? i64   ? str  ?
?????????????????????????????????????????????????????????????????????????????????
? 370.016052 ? 346.430511 ? 398.396881 ? 384.568481 ? 0.901185   ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 415.436768 ? 279.422729 ? 433.9304   ? 305.515167 ? 0.8829     ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 383.8118   ? 268.781494 ? 402.354797 ? 292.458527 ? 0.857961   ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 431.427917 ? 570.915466 ? 476.672119 ? 600.0      ? 0.8104     ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 414.912842 ? 257.767609 ? 427.770874 ? 274.6963   ? 0.7385     ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 391.2282   ? 250.4887   ? 403.919952 ? 268.137482 ? 0.682891   ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????
? 414.236206 ? 250.181747 ? 423.825378 ? 264.026672 ? 0.517137   ? 0     ? corn ?
?????????????????????????????????????????????????????????????????????????????????

Why the orient keyword is needed in this case

Let's start with a simple example. We'll supply three lists, and two column names:

pl.DataFrame([[1.1, 'a'], [2.2, 'b'], [3.3, 'c']], columns=['col_1', 'col_2'])

In this example, Polars tries to deduce whether each list (for example, [1.1, 'a']) represents a row or a column. From the documentation for polars.DataFrame:

orient{‘col’, ‘row’}, default None

Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.

So, in the case above, Polars attempts to deduce whether each list represents a column or a row by looking at the number of column names in the columns keyword. Since there are three lists, but only two column names, Polars (correctly) deduces that each list must represent a row, not a column.

shape: (3, 2)
?????????????????
? col_1 ? col_2 ?
? ---   ? ---   ?
? f64   ? str   ?
?????????????????
? 1.1   ? a     ?
?????????????????
? 2.2   ? b     ?
?????????????????
? 3.3   ? c     ?
?????????????????

Now, let's remove one of the lists, so that there are two lists and two column names:

pl.DataFrame([[1.1, 'a'], [2.2, 'b']], columns=['col_1', 'col_2'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 311, in __init__
    self._df = sequence_to_pydf(data, columns=columns, orient=orient)
  File "/home/xxxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 495, in sequence_to_pydf
    data_series = [
  File "/home/xxxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 496, in <listcomp>
    pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
  File "/home/xxxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/series.py", line 227, in __init__
    self._s = sequence_to_pyseries(name, values, dtype=dtype, strict=strict)
  File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 239, in sequence_to_pyseries
    return constructor(name, values, strict)
TypeError: must be real number, not str

That error looks very familiar, doesn't it?

Since there are now two lists and two column names, it's unclear whether each list represents a row or a column. So, per the documentation, Polars interprets each list a column, and not a row.

But that leads to a problem because each list (in this case, [1, 'a']) has both numbers and strings. And this leads to the error.

So, since the number of lists equals the number of column names, we need to tell Polars that each list represents a row, not a column.

pl.DataFrame([[1.1, 'a'], [2.2, 'b']], columns=['col_1', 'col_2'], orient='row')

And now the error disappears.

shape: (2, 2)
?????????????????
? col_1 ? col_2 ?
? ---   ? ---   ?
? f64   ? str   ?
?????????????????
? 1.1   ? a     ?
?????????????????
? 2.2   ? b     ?
?????????????????

With this in mind, let's look at your code. How many lists are in a? Seven. And how many column names are supplied? Both ca and cb supply seven column names. Since the number of lists and the number of column names are equal, Polars interprets each list as a column, not a row. For example, Polars interprets

[370.01605224609375, 346.4305114746094, 398.3968811035156, 
384.5684814453125, 0.9011853933334351, 0, 'corn']

as a column, not a row. As such, Polars sees the string "corn" mixed in with numbers in the same column. Hence the error.

Solution 2:[2]

From the information you provided, I can only provide a hint as to where to look.

Near the end of your code, you are creating a list of new DataFrame

setattr(new, k, [polars.DataFrame(x, columns=c) for x in a])

And the error is caused by this call:

polars.DataFrame(x, columns=c)

What is occurring is that one of your lists that is being passed in (x) to one of these DataFrames has a mix of numbers and strings. More specifically, one of those lists starts with one or more numbers, but contains a string somewhere after that. And this is causing an error as Polars tries to make a column of numbers from that list.

An Example

Let's take a closer look. Here is an example of creating a DataFrame:

import polars as pl
pl.DataFrame([["one", "two", "three"], [1.0, 2.0, 3.0]],
             columns=["col1", "col2"])

Notice that ["one", "two", "three"] are all strings. And [1.0, 2.0, 3.0] are all numbers. So, in each column, we have data of only one type. And we get no errors...

shape: (3, 2)
????????????????
? col1  ? col2 ?
? ---   ? ---  ?
? str   ? f64  ?
????????????????
? one   ? 1.0  ?
????????????????
? two   ? 2.0  ?
????????????????
? three ? 3.0  ?
????????????????

Now let's see what happens when we accidentally mix a string in with the column of numbers:

pl.DataFrame([["one", "two", "three"], [1.0, 2.0, "Oops, this is a string mixed in with numbers"]],
             columns=["col1", "col2"])

We get an error...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 311, in __init__
    self._df = sequence_to_pydf(data, columns=columns, orient=orient)
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 495, in sequence_to_pydf
    data_series = [
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 496, in <listcomp>
    pli.Series(columns[i], data[i], dtypes.get(columns[i])).inner()
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/series.py", line 227, in __init__
    self._s = sequence_to_pyseries(name, values, dtype=dtype, strict=strict)
  File "/home/xxxx/.virtualenvs/PolarsTesting3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 239, in sequence_to_pyseries
    return constructor(name, values, strict)
TypeError: must be real number, not str

Compare this error message with the one you received. They match closely (except for the directories, which are specific to each computer).

So, you need to look for a list that starts with one or more numbers, but contains a string. Polars tries to make a column of numbers with this list, and throws an error.

Perhaps one or more elements in a list that is meant to be numbers contain a string such as "Error" or "NULL" or "#N/A" or something similar.

You'll have to debug this to find out.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2