Category "data-analysis"

Package sparse_dot_topn in Pyspark AWS EMR Jupyter install error

Running on AWS and EMR, Jupyter, Pyspark notebook and trying to install a python package "sparse_dot_topn" version 0.2.9 I'm getting an error I don't understand

How can I aggregate only on the hour component in Elasticsearch?

I have a variety of behavior data in a big Elasticsearch database, and I'd like to do some analysis. In particular, I want to look at repeat behaviors by the ti

Compute Similarity(percentage) between two Matrix/Array

How to compute similarity(percentage) between two matrix/arrays. or find the closest array/matrix to a given array, on the basis of how similar their data value

How to match the unique ids that I created in df1 to df2 based on two column values?

I have two dataframes, and I am struggling to match the unique ids that I created in df1 to df2 based on 'name' and 'version' values. I need to add a column to

Can we call any external REST API inside DBT(Data Build Tool)?

I am working on some analytical work and we need to transform data from one source to another and we are using DBT for transformation purpose. one of the data a

Can we call any external REST API inside DBT(Data Build Tool)?

I am working on some analytical work and we need to transform data from one source to another and we are using DBT for transformation purpose. one of the data a

R column with a strange value [duplicate]

In one column of my dataset the assignment of a record to a phase is listed. Phase I (I), Phase II (I), Phase III (I). Each dataset has an ass

Extracting from CSV file knowing row and column number on command line

I have a CSV file and I want to extract the element in the first row and 3rd column. How might I go about doing this?

How to replace column values with dictionary keys

I have a df, A B one six two seven three level five one and a dictionary my_dict={1:"one,two",2:"three,four"} I want to replace df.A with my_di

Python: pandas merge multiple dataframes

I have diferent dataframes and need to merge them together based on the date column. If I only had two dataframes, I could use df1.merge(df2, on='date'), to do

Fitting data to numerical solution of an ode in python

I have a system of two first order ODEs, which are nonlinear, and hence difficult to solve analytically in a closed form. I want to fit the numerical solution t

SQL group by in Subquery

I'm trying to get monthly production using group by after converting the unix column into regular timestamp. Can you please tell how to use group by here in the

Query (SQL like joins) remote CSV for data analysis

I would like to query (SQL with joins) CSV files sitting in a network folder for performing data analysis work. I'm not allowed to move the files out of the net

ERR_CONNECTION_REFUSED on browser when opening dtale with Eclipse Pydev

Opening a dtale sheet using Eclipse Pydev on Windows leads to ERR_CONNECTION_REFUSED on browser. The same code works on spyder and jupyter however. I know dtale

Edit text in PDF with python

I have a pdf file and I need to edit some text/values in the pdf. For example, in the pdfs that I have "BIRTHDAY DD/MM/YYYY" is always "N/A". I want to change i

How to name the column when using value_count function in pandas?

I was counting the no of occurrence of angle and dist by the code below: g = new_df.value_counts(subset=['Current_Angle','Current_dist'] ,sort = False) the out

Category "data-analysis"

Other Categories